Advertisement · 728 × 90

Posts by j soma

Post image

Notes from my #NICAR26 session on browser automation and scraping with Playwright!

jsoma.github.io/workshop-bro...

1 month ago 4 0 0 0
Post image

Notes and code from my #NICAR26 talk on my FAVORITE pdf processing library (very not biased), Natural PDF

jsoma.github.io/natural-pdf-...

1 month ago 2 0 1 0
Analyzing Images & Video with AI

Notes and code from my #NICAR26 talk: Analyzing Images and Video with AI!

jsoma.github.io/workshop-ai-...

1 month ago 5 1 1 0
Preview
GitHub - jsoma/workshop-n8n Contribute to jsoma/workshop-n8n development by creating an account on GitHub.

Slides and notes for #NICAR26 No-Code AI pipelines with n8n!

github.com/jsoma/worksh...

1 month ago 0 0 1 0
no-code pipelines and AI agents

no-code pipelines and AI agents

analyzing images and video with AI

analyzing images and video with AI

PDF processing with Natural PDF

PDF processing with Natural PDF

Browser automation with Playwright

Browser automation with Playwright

Here we go #NICAR26, a zillion and one sessions on the docket!

Thurs: No-code AI pipelines with n8n
Fri: Analyzing images/video with AI + Wrangling PDFs with Natural PDF
Sat: Browser automation with Playwright + Ethical AI in Investigations
Sun: Build your own AI Benchmark

1 month ago 14 4 2 7

With the awful WaPo layoffs and the state of journalism more broadly, if it's useful for any writers and reporters considering going indy, @molly.wiki, @xoxogossipgita.bsky.social of Aftermath, @jasonkoebler.bsky.social of 404 Media, @edzitron.com and me will do a little workshop next week.

2 months ago 547 175 13 9

Amid the hundreds of colleagues we’ve lost today, I wanted to highlight the BRILLIANT data/graphics folks who any newsroom should be fighting to hire right now—threading here:

2 months ago 212 87 3 5
Preview
Donate to Washington Post 2026 layoff fund, organized by Rachel Siegel On Wednesday, Feb. 4, 2026, The Washington Post laid off hundreds of journalists. We ar… Rachel Siegel needs your support for Washington Post 2026 layoff fund

Whatever you think of the Washington Post at this moment, here's a chance to support the dedicated, hard-working journalists who were just laid off. If you have the means, your donation is most welcome. If you don't, a kind thought and maybe spreading the word to others is support enough 💙

2 months ago 617 451 14 11
Preview
The Automated Newsroom: Build AI Workflows That Work A six-week, hands-on course teaching journalists how to design, test, and improve AI workflows. Learn evaluation, testing, and product thinking for newsroom automation.

Find out more about the AI newsroom workflow course at its awful sales-y site, and feel free to shoot me any questions you might have!

littlecolumns.com/courses/ai-n...

5 months ago 0 0 0 0

The course itself is six weeks long, and while it does cost money (which is crazy strange for me!), there are steep geographic pricing discounts and coupon codes for close readers of the course site.

5 months ago 0 0 1 0
Advertisement

It's maybe like 35% a tech course, and a lot of the theory is stuff that seems simple once you've heard it: see what goes wrong, fix it, track it. That's it!

Yes, we'll learn automation tools like n8n/ActivePieces and eval suites like Opik/Arize Phoenix, buuut they're just one part

5 months ago 0 0 1 0

This course is going to solve every step of those crises. How do you...

- set up an AI pipeline?
- measure if it's working?
- iterate and improve it?
- make sure you're solving a reader/reporter problem instead of just playing tech games?

It isn't magic! It's easy!!!!

5 months ago 1 0 1 0
Preview
The Automated Newsroom: Build AI Workflows That Work A six-week, hands-on course teaching journalists how to design, test, and improve AI workflows. Learn evaluation, testing, and product thinking for newsroom automation.

I'm running a six-week course in November on building and evaluating AI newsroom workflows!

It's targeted at people who don't know where to start, or who build little prototypes and end up stumped about making them production-ready.

littlecolumns.com/courses/ai-n...

5 months ago 2 1 1 0
a three-column table with the middle column highlighted

a three-column table with the middle column highlighted

three columns being restructured into a vertical flow

three columns being restructured into a vertical flow

tables being selected irrespective of their columns

tables being selected irrespective of their columns

the eventual pandas df

the eventual pandas df

Natural PDF v0.1.13 out – a handful of useful changes but my favorite is🗼page restructuring support!

Grab sections and "flow" them together vertically or horizontally, making multi-column extraction infinitely easier than 24 hours ago.

Details at jsoma.github.io/natural-pdf/...

10 months ago 1 0 0 0
Post image Post image Post image Post image

it looks like someone has been going very hard on scans

ONE MORE DAY OF ACCEPTING BAD PDF SUBMISSIONS

11 months ago 0 0 0 0

you could have won EVERY CATEGORY

11 months ago 0 0 1 0
Post image Post image Post image

Woke up to ton of new non-English BAD PDF CONTEST submissions: 💥 Serbian! Romanian! Chinese! 💥

Mostly not scans, though, so I predict they'll easy-peasy to extract the info from. I want to have to train a custom OCR model!!! Someone submit a big scanned non-English PDF!!!

11 months ago 0 0 1 0
Post image Post image

i know you all are hiding worse scans from me

11 months ago 0 0 1 0
Advertisement
screenshot of a spreadsheet with very tiny text

screenshot of a spreadsheet with very tiny text

i love this giant-pdf-with-tiny-text submission, we need a smallest font size category

11 months ago 1 0 1 0
Preview
Bad PDF Contest I'm looking for the most frustrating, painful, real-world PDFs.

I am running a contest. It is about bad pdfs.

It can make you independently wealthy (for immeasurably small measures of independent wealth)

badpdfs.com

11 months ago 4 2 3 2
Post image Post image

Live colab demo/walkthrough here: colab.research.google.com/github/jsoma...

1 year ago 0 0 0 0
a screenshot of natural PDF documentation

a screenshot of natural PDF documentation

New release of 📝 Natural PDF 📝

A million and one table extraction/document layout/Q&A/quality of life improvements for all your PDF-processing needs

jsoma.github.io/natural-pdf/

1 year ago 4 0 1 0
Preview
Columbia Student Hunted by ICE Sues to Prevent Deportation Yunseo Chung, a legal permanent resident who has lived in the U.S. since she was 7, participated in pro-Palestinian demonstrations. Immigration agents visited residences looking for her.

the law clinic repping this student, CLEAR, is based out of CUNY.....once again the public city university absolutely flounces the ivy league when it comes to having a backbone and standing on actual principles

1 year ago 1565 358 15 21

Thank you – if only we could get a fix for the bug that prevents it from working 100%!

1 year ago 1 0 1 0