Inspired by my experience consulting for companies and the rise of inference time compute for LLMs (as well as the acceptance of users waiting while the LLM *thinks*), I suggest many teams working on RAG should rethink if you can improve your accuracy by relaxing latency.
Posts by Vivek Kalyan
Most guides on RAG systems are over-optimizing for latency. If you are using RAG to automate any type of knowledge work, accuracy is much more important.
I explore the idea of spending more compute for RAG systems to significantly improve performance.
www.vivekkalyan.com/writing/scal...
This really resonates with me. I spent 6 years building the AI products at my previous company and felt first hand the consequences of decisions (both good and bad). I've much more opinions on how to do certain things because of it.
Try using a local UI (I'm using msty.app), and call Claude via API. Fits my bursty nature of usage, and monthly cost is much cheaper generally. You do lose their system prompt + Artifacts but I don't miss it much.
This is a super high impact project. There are tons of production models in the real world still running BERT/RoBERTa models from the 2018-2019 era, I'm sincerely hoping these models are easy to finetune, just the 8k context length will be good enough reason to upgrade.
They are the victims of their own media success, they raised at ridiculous valuation and they need to show that they disrupting the software engineering market beyond something like copilot/cursor.
Do you have the link to the public repo? Java is highly requested, we just have been focusing on validating the usefulness of the current docs before spending time adding new languages.
Cartograph solves this by automatically creating and updating both written documentation and visual representations that stay synchronized with code changes.
We are very early, but there's much more to ship. We would love to get some feedback on our product.
In complex systems, we spend most of our time understanding code - but most AI tools are focused on helping you write more code. Engineering teams struggle with outdated documentation and inconsistent system diagrams. This slows down development and makes it difficult to understand complex systems.
We're launching early access for Cartograph (cartograph.app). It takes your codebase, and automatically generates architecture diagrams and documentation for them.
See some demos of open-source repos here (no sign-up required):
cartograph.app/demo
(Reply here if you'd like to see others added)
Spent more than 2 hours yesterday night trying to figure out why my solution for day 6 part 2 was working on the examples but not on the test input. Slept on it, and figured it out in 5mins when I woke up today morning. π€¦ββοΈ
Congrats on the move! And your first π¦ post.
Yeah, hot mess is the right description of it π. Would be interested to see if there is a nicer solution.
Yeah, I've been writing rust for a few months now for parsing codebase into a graph (cartograph.app). But, it's also a conscious decision to force myself to write "idiomatic Rust", i.e. more functional.
Oh look, it's that time of the year. I will doing Advent of Code 2024 in Rust, hoping to get more experience using Rust to solve a wide range of problems.
github.com/vivekkalyan/...
Do you have sources for the budget claims? I have not seen any numbers comparing models so would be interested to know.
@eugeneyan.bsky.social's blog is a gold mine if you are doing ML/AI in the industry. Writing design docs before ML projects start is an important process that I introduced at my prev org, and the post on design docs was one of the references I used to create our template.
Methodology paper detailing how to train small, efficient off-topic classifiers using synthetic data from LLMs.
Yeah, your feed is the training data for your brain. Curate it for what you want to see more of
How does using an assert statement work in practice? Are you using something like pytest as your benchmark runner?
Very disappointed your bio doesn't say learning machine.
The Gemini team really delivered with gemini-exp-1121. It slaps for writing tasks, it's outputs avoid the typical AI feel that other models (like ChatGPT) have, something previously only Claude achieved.
Good guidelines. One thing I want to add is if you are benchmarking your system's real world performance - you probably want to keep a version of the benchmark without the easy examples removed. Especially if you need to report to non-technical management how your system is performing.
The most on-pulse demo would be on the bluesky firehose data.
Models will follow the order of the schema you specify! So just putting the COT field first works well in practice. Structured outputs also ensure that the model response can be parsed as json. Use Pydantic/Zod here.
I'm building cartograph.app, an AI powered platform that helps software teams save time with documentation and architecture diagrams on codebases. Just recently added the ability to extract and link HTTP routes to find dependencies across services. Supports Python/JS/TS/Rust now.
I've never been this excited about social media
I'm really tempted to update my blog...
"The bar after a conference" is a great description for what many of us are craving for.
Unless you absolutely need Claude Artifacts, you can try an open source UI and calling the API directly. It comes to be way cheaper than $20/month and doesn't have rate limits that even Claude pro tier has (had?).