🏔️🐴Oh great Tengri, show your favor to the illustrious Brad Underwood and his mighty band of giant Balkan Janissaries as they face down the dogs of puny Connecticut
Posts by Nomadic Warriors for Pritzker
lol
yeah this is a funny phenomenon you come across; counter-intelligence people really dislike the Israeli government! bring up Pollard to them and watch them spit glass
I got LLM psychosis from Mr. Chatterbox
The Mr. chatterbox creator has a good explanation of what they were trying to achieve with it here www.estragon.news/mr-chatterbo...
Interesting to read about building this small-scale LLM project and the fine-tuning and sycophancy that comes in even when trained entirely on Victorian-era novels
Thanks! As the adage goes, you can just do things
Thank you! Demystification was definitely one of my goals in writing that.
Thank you! I'm glad you liked it!
Thank you! Glad you like it!
Mr. Chatterbox is a really fun project. It's not great to talk to but it's a fun demo of what you can build using entirely out-of-copyright training data
It's a 2GB nanochat model - I released a new llm-mrchatterbox plugin that can run it on my Mac simonwillison.net/2026/Mar/30/...
I didn't try that - might be interesting to try!
Latin, Greek, scripture and mathematics! What more do you need
Yup! I wrote it up in detail Here: bsky.app/profile/noma...
This is a great account that really teaches you a lot about how modern LLMs get made
This is in the best traditions of DH. Its success didn't depend on big compute or extensive tech background, but on good design + 19c vibes. And yet, in fact, figuring out how to fine-tune something like this is not a solved problem, and people w/ more compute hours could learn from the experiment.
Hell yeah
I’ve been overwhelmed and thrilled by the response to this! I’ve also gotten a lot of questions asking how I did it. So, this weekend I sat down and wrote a detailed narrative documentation outlining how, exactly, I built Mr. Chatterbox: www.estragon.news/mr-chatterbo...
You never know what data will be used for!
I uploaded a @britishlibrary.bsky.social dataset to Hugging Face in 2022. IIRC one of my first PR to a HF repo!
4 years later, someone trains a Victorian chatbot on it
More libraries should be sharing their public domain collections for AI to build on!
Oh wow, thank you so much! Wouldn't have happened without you.
Thank you!
Hahah, I have pretty thick skin. I normally post About politics.
Hey, I built this! It’s both pre-trained and instruction tuned on corpus data. Documentation coming - I wasn’t expecting this to blow up lol. A little more info here: github.com/karpathy/nan...
I built this! It’s both pre-trained and instruction tuned on corpus data. Documentation coming - I wasn’t expecting this to blow up lol. A little more info here: github.com/karpathy/nan...
Want to talk to the past? Here' an LLM "trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 & 1899, drawn from a dataset made available by the British Library"
Quite different from an LLM roleplaying a Victorian. huggingface.co/spaces/tvent...
Guess I found a good use for AI for once. Not enough to make me cease my appeals for Butlerian Jihad, but at least this one's likely to make me laugh a bit.
I used the BL Books dataset, specifically set for material published between 1837 and 1899 with some filtering for messy data. It ended up being something like 28,000 books. Dickens is all in there - no surprise that he sounds quite Dickensian!
huggingface.co/datasets/The...
Currently attempting to explain the BTS comeback to Mr Chatterbox
Amazing