Advertisement · 728 × 90

Posts by Matt Miller

Video

Busy day on the national mall with the annual colorectal cancer awareness toilet seat data visualizations and new epstein titanic sculpture

1 month ago 2 0 0 0

A lot of great posts recently on this topic of "vibe coding as enabler," including the two linked here.

For me it's manifested as a way to scratch a digital project itch I've had for a decade, a data explorer for a set of Brewery directories from 1899 - 1918:
hadro.github.io/brewery-guid...

1 month ago 5 3 1 0
Video

oh shit is "mario day"? let me fire up the emulator...
"Super Mario Sister (Asia) (En) (Pirate).nes"

1 month ago 1 0 0 0
Preview
Our Tools – Post45 Data Collective

We presented on our tool for enriching and clustering book data at Code4Lib today. Check it out, and let us know what you think!

data.post45.org/our-tools.html

Huge thanks to @thisismattmiller.com for leading development on this project.

#code4lib #c4l26

1 month ago 9 8 0 0
Video

Roy Lichtenstein Catalogue Raisonné site got a serious terms and conditions, complete with auto scroll button before you can use it. Though at least its online + free

1 month ago 1 0 0 0
Preview
WeppySnap - Chrome Web Store Capture a region of any browser tab as an animated WebP

I wrote a little Chrome extension to make animated WebP (“weppy”) files from a region of a webpage: chromewebstore.google.com/detail/weppy...
I use it when writing documentation and I want to show a short animation (in Github README for example). Simpler than a WebM video and more modern than GIF.

1 month ago 0 0 0 0
Video

...state of the union? 👎 , look at this bluesky quote post network explorer I just made. I added 6 networks so far:
thisismattmiller.github.io/bsky-quote-m...

1 month ago 0 0 0 0
Advertisement
Title page of book:
FREAK TREES OF THE STATE OF NEW YORK The New York State College of Forestry Syracuse University NELSON C. BROWN Acting Dean 1930 Second Edition New York (State) (allege of Forestry, Syracuse.

Title page of book: FREAK TREES OF THE STATE OF NEW YORK The New York State College of Forestry Syracuse University NELSON C. BROWN Acting Dean 1930 Second Edition New York (State) (allege of Forestry, Syracuse.

Shows a strange tree:
PRIZE WINNERS First Prize. G. W. Gotham, 89 River Street, Cortland, N. Y. Two elms, the larger tree appears to have absorbed the growth of the smaller tree. Trunk of large tree is bigger above the graft. 3

Shows a strange tree: PRIZE WINNERS First Prize. G. W. Gotham, 89 River Street, Cortland, N. Y. Two elms, the larger tree appears to have absorbed the growth of the smaller tree. Trunk of large tree is bigger above the graft. 3

Shows two strange trees:
PRIZE WINNERS Second Prize. C. B. Cox, Adams Center, N. Y. Elm, trunk runs along surface of earth in half circle 45 feet near Adams Center on North Harbor Road. Third Prize. A. Wilson Insley, 30 Eagle Street, Mt. Morris, N. Y. Elm, one mile south of Conesus Lake. 4

Shows two strange trees: PRIZE WINNERS Second Prize. C. B. Cox, Adams Center, N. Y. Elm, trunk runs along surface of earth in half circle 45 feet near Adams Center on North Harbor Road. Third Prize. A. Wilson Insley, 30 Eagle Street, Mt. Morris, N. Y. Elm, one mile south of Conesus Lake. 4

Shows two strange trees:
PRIZE WINNERS Fourth Prize. George J. Wiedmaier, 222 King Street, Dunkirk, N. Y. Maple, 14 inches in diameter arched 7 feet and anchored in birch tree near Ark- wright, N. Y. Fifth Prize. H. L. Tayntor, McGraw, N. Y. Double beech, near Homer, N. Y. Graft 18 feet in length and 8 inches in diameter. 5

Shows two strange trees: PRIZE WINNERS Fourth Prize. George J. Wiedmaier, 222 King Street, Dunkirk, N. Y. Maple, 14 inches in diameter arched 7 feet and anchored in birch tree near Ark- wright, N. Y. Fifth Prize. H. L. Tayntor, McGraw, N. Y. Double beech, near Homer, N. Y. Graft 18 feet in length and 8 inches in diameter. 5

Building my HathiTrust 1930 public domain survey and coming across interesting volumes... like the tree shaming
"Freak trees of the State of New York."
babel.hathitrust.org/cgi/pt?id=co...

3 months ago 4 3 0 1
Video

Also a lot of photocopies of physical media

4 months ago 1 0 0 0
Image from Epstine files yellow postit note on green background black redaction bars

Image from Epstine files yellow postit note on green background black redaction bars

Grid paper with black redaction bars

Grid paper with black redaction bars

Pink postit note with green background black redaction bars

Pink postit note with green background black redaction bars

Green postit note on white background black redaction bars

Green postit note on white background black redaction bars

Some of these Epstein file redactions are very aesthetic. Reminds me of updates.timsherratt.org/2021/04/21/s...

4 months ago 0 0 1 0

Theoretically yes, the HathiTrust builds a local database. But the tool would need to be updated to know how to work with it, a new service would need to be added, it wouldn't work out of the box.

4 months ago 1 0 0 0
Diagram illustrating the BookReconciler workflow. On the left, a book cover of The Book of Salt by Monique Truong appears alongside “Minimal Metadata,” listing Author: Truong, Monique and Title: The Book of Salt. An arrow points to a box labeled “BookReconciler” with book and diamond icons. A downward arrow leads to “Enriched + Clustered Metadata,” showing multiple editions of the book cover and expanded metadata, including several ISBNs, subject headings (e.g., Vietnamese–France fiction, women authors, household employees, gay men, cooking), and an author VIAF identifier.

Diagram illustrating the BookReconciler workflow. On the left, a book cover of The Book of Salt by Monique Truong appears alongside “Minimal Metadata,” listing Author: Truong, Monique and Title: The Book of Salt. An arrow points to a box labeled “BookReconciler” with book and diamond icons. A downward arrow leads to “Enriched + Clustered Metadata,” showing multiple editions of the book cover and expanded metadata, including several ISBNs, subject headings (e.g., Vietnamese–France fiction, women authors, household employees, gay men, cooking), and an author VIAF identifier.

Very happy to introduce a new tool, BookReconciler!

You can take spreadsheets with book data and add subject headings, descriptions, ISBNs, HathiTrust IDs, & more. You can also cluster editions & variations of the same "Work."

Led by @thisismattmiller.com and supported by @post45data.bsky.social.

4 months ago 123 56 7 1
Preview
BookReconciler: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering We present BookReconciler, an open-source tool for enhancing and clustering book data. BookReconciler allows users to take spreadsheets with minimal metadata, such as book title and author, and automa...

A hard problem with literary data is navigating btwn editions of books and what the "work," or the theoretical text that unites all editions. I've been lucky to work with @thisismattmiller.com and @mellymeldubs.bsky.social, who built a tool to address this + do much more

arxiv.org/abs/2512.10165

4 months ago 64 22 4 1

www.google.com/maps/@47.232...

4 months ago 2 0 0 0

Example and analysis of how AI web scrapers are breaking small and medium cultural heritage sites.

4 months ago 1 0 0 0
A screen shot of the viz showing clustered email graphed across time by contact

A screen shot of the viz showing clustered email graphed across time by contact

Blog: Visualizing 14,000 Released Epstein Emails.

I built a viz of the emails released as part of the 20K House Oversight Committee docs.

thisismattmiller.com/post/email-v...

- A clustered high level view of the emails by contact across time
- Zoom into individual emails and open the sources

4 months ago 2 0 0 0
Advertisement

Thanks for checking it out!

5 months ago 0 0 0 0
Preview
LCNAF & Trie Storing +11M unique LCNAF names in 50MB Trie data structure

LCNAF & Trie – Storing +11M unique names in 50MB data structure in the browser

thisismattmiller.com/post/lcnaf-t...

- Optimizing LCNAF authorized headings into a trie data structure
- In browser MARC file name reconciliation + search tool
- OpenRefine / Command line tools for reconciliation

5 months ago 6 4 1 0
Preview
Giallo Using a vision language model to analyze Italian Giallo films

Halloween blog post: Italian Giallo Horror Films

thisismattmiller.com/post/giallo/

- Using vision language model to analyze a 70 film corpus (🧟) / 80,000 frames
- Build and plot “trope clusters” across movies

Probably the longest eye acting supercut you've seen: youtu.be/cGrmkOwut6k

5 months ago 4 3 0 1
Shows a county map of the united states the counties with school districts with banned books are highlighted red.

Shows a county map of the united states the counties with school districts with banned books are highlighted red.

A screenshot of a the banned book browser interface showing rows of book covers.

A screenshot of a the banned book browser interface showing rows of book covers.

New Post: PEN America Banned Books 2025 dataset
thisismattmiller.com/post/book-ba...

Looking at school district book bans

- Interactive Map interface to the books banned in 2024-2025
- A faceted browse interface to the 3700 books
- Subject heading analysis

6 months ago 1 0 0 0
Preview
LC & Flickr Commons Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images.

New Blog Post.
Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images
thisismattmiller.com/post/lc-flic...
- Organizing 95K photo comments.
- Viewer to explore user georectified images
- Folksonomy tagging vs LCSH Vocabulary
- Placing into the Wiki* knowledge graph

6 months ago 9 5 2 0

One output, 1 hour 40mins of Siskel and Ebert summaries:
www.youtube.com/watch?v=hFLM...

8 months ago 1 0 0 0
Preview
Building datasets from video collections using local & cloud LLMs Using Qwen2.5-VL, Gemini 2.5 and Whisper to build a Siskel and Ebert dataset

Trying out workflows that use multimodal LLMs for validating and QA.

In this blog I walk through a test using 1000 Siskel and Ebert videos to extract key video frames and other data.

thisismattmiller.com/post/buildin...

8 months ago 4 0 0 1
A woodcut mashup image titled: maintenance

A woodcut mashup image titled: maintenance

maintenance

8 months ago 1 1 1 0
Advertisement
Video

New dataset on bestsellers from 40+ countries, with consistent coverage for France, Germany, Spain, Italy, and the U.S.

Congrats to the authors @sdileonardi.bsky.social, @beccacohen.bsky.social, and @dan-sinnamon.bsky.social on this major contribution! 🎉

🔗: doi.org/10.18737/386...

8 months ago 40 23 1 9
A screen capture from the Siskel and Ebert Show reviewing the movie Gremlins 2: The New Batch. Ebert gave it a thumbs down.
 Siskel gave it a thumbs down.

A screen capture from the Siskel and Ebert Show reviewing the movie Gremlins 2: The New Batch. Ebert gave it a thumbs down. Siskel gave it a thumbs down.

Gremlins 2: The New Batch (1990)
Director: Joe Dante
Cast: Phoebe Cates-Kline, Sylvester Stallone, Hulk Hogan, Zach Galligan, Christopher Lee
Watch Review
wp / wd

8 months ago 0 1 0 1

thisismattmiller.com/post/glitch/

New blog post about @glitch.com shutdown, how I migrated my apps, and how I used glitch for teaching and creative projects.

8 months ago 1 0 0 0

The Library of Congress BIBFRAME Update is online today at 1PM EDT.
Talks about:
- Hubs (BF ontology)
- BF Cataloging at Penn Libraries
- BF Validation Tooling
listserv.loc.gov/cgi-bin/wa?A...

9 months ago 1 0 0 0

Need a robots.txt directive indicating bulk download is available, not that they would abide by robots.txt

10 months ago 1 0 1 0

Yeah we have bots endlessly flooding id.loc.gov stressing servers to the limit trying to scrape millions of html pages even though we offer pretty much all of it as bulk downloads: id.loc.gov/download/

10 months ago 11 5 1 0