Jeremy Morgan (@jeremymorgan.com) Bsky

Open Models, Pricier Tokens, and the Return of Real Infrastructure | Jeremy Morgan A 3B-active open model is suddenly useful locally. Opus 4.7 may cost more per session. A big breach you should know about. Codex wants to control your desktop. A cloud migration that saved about $14k/year....

A 3B-active open model is suddenly useful locally.
Opus 4.7 may cost more per session.
A big breach you should know about.
Codex wants to control your desktop.
A cloud migration that saved about $14k/year.

All that and more in this week’s AI newsletter:

www.linkedin.com/posts/jeremy...

7 hours ago 0 0 0 0

This is an awesome tool. You can even run it local on your machine in just a few steps. Drop in a GitHub URL and it will visualize the application for you.

Check it out here --> github.com/braedonsaund...

This is a visualization of my ham radio test practice app.

18 hours ago 1 0 0 0

I don't know when "Tell me a funny joke about Python" became my first test of a new model, but it's been working for years. Trying out Qwen 3.6. Subsequent tests will be more thorough

1 day ago 1 0 0 0

GitHub - safishamsi/graphify: AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, GitHub Copilot CLI, OpenClaw, Factory Droid, Trae, Google Antigravity). Turn any folder of code, docs, papers, images, or videos into a queryable knowledge graph AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, GitHub Copilot CLI, OpenClaw, Factory Droid, Trae, Google Antigravity). Turn any folder of code, docs, papers, images, o...

Karpathy sketched the gap. Graphify turned it into a CLI. A persistent knowledge-graph layer for coding agents that claims up to 71.5x fewer query tokens on mixed corpora.

github.com/safishamsi/g...

5 days ago 0 0 0 0

Blink: CPU-Free LLM Inference by Delegating the Serving Stack to GPU and SmartNIC Large Language Model (LLM) inference is rapidly becoming a core datacenter service, yet current serving stacks keep the host CPU on the critical path for orchestration and token-level control. This…

A new paper removes the CPU from LLM inference entirely. Blink uses a SmartNIC to deliver inputs into GPU memory via RDMA while a persistent GPU kernel handles batching and scheduling.

arxiv.org/abs/2604.07609

6 days ago 0 0 0 0

Coders in 2026

6 days ago 0 0 0 0

Surfacing a 60% performance bug in cuBLAS While benchmarking an FP32 SGEMM kernel on the RTX 5090, I found cuBLAS dispatching a tiny kernel for huge batched workloads — stuck at ~40% FMA utilization across the entire size range. The same…

A cuBLAS bug is causing RTX 5090 GPUs to use only about 40% of available compute on batched FP32 matrix multiply workloads. If you bought one for local AI work, you are getting less than half the performance you paid for. NVIDIA has not acknowledged it publicly yet.

www.cloudrift.ai/blog/beating...

6 days ago 0 0 0 0

GitHub Copilot CLI combines model families for a second opinion Discover how Rubber Duck provides a different perspective to GitHub Copilot CLI.

GitHub Copilot CLI now lets you run two different foundation models on the same task. Sonnet generates the plan, GPT-5.4 reviews it before execution. The combo closed 75% of the performance gap on complex multi-file changes.

github.blog/ai-and-ml/gi...

6 days ago 0 1 0 0

US summons bank bosses over cyber risks from Anthropic’s latest AI model Reports say Fed chair Jerome Powell among attenders at meeting in Washington

Anthropic chose not to release its latest model after it found thousands of vulnerabilities in popular software, some dating back 27 years. The US Treasury gathered banking leaders in Washington to discuss the cybersecurity fallout.

www.theguardian.com/technology/2...

1 week ago 0 0 0 0

AI New Hotness Newsletter The newsletter for developers and SREs who want to learn more about what's going on in AI

Anthropic shelved a model after it found thousands of legacy zero-days. The US Treasury called bank execs to Washington. Also this week: GitHub Copilot CLI now cross-checks with a second model before executing. New issue is live.

www.linkedin.com/pulse/anthro...

1 week ago 0 0 0 0

The Schema Is the Product: An Architectural Reading of Karpathy’s LLM Wiki Karpathy published the compiler and the v2 optimizes this with memory

The Schema Is the Product: An Architectural Reading of Karpathy’s LLM Wiki

medium.com/@han.heloir/...

1 week ago 0 0 0 0

This is the best lightning video from the Portland area storm this weekend. My wife filmed this from the couch! This was going on in our front yard.

www.facebook.com/share/r/18TZ...

1 week ago 0 0 0 0

I am blessed. I have a fast machine with an RTX4090 and a bunch of RAM. During the day it runs Linux and I use it to work mostly thru SSH. I've build so much with it.

I reboot it into Windows and it becomes a incredible gaming machine for competitive sim racing!
Poor thing never gets a break.

1 week ago 0 0 0 0

Super happy with my latest Linux Foundation @linuxfoundation.org certification. I would definitely recommend it, it covers some important topics. AMA

1 week ago 2 0 0 0

9 RAG Architectures Every AI Developer Must Know: A Complete Guide with Examples Architectures beyond Naive Rag to build reliable production AI Systems

9 RAG Architectures Every AI Developer Must Know: A Complete Guide with Examples

pub.towardsai.net/rag-architec...

1 week ago 0 0 0 0

1 week ago 1 0 0 0

1 week ago 0 0 0 0

1 week ago 1 0 0 0

1 week ago 2 0 0 0

If you're building stuff with Langchain, you need Langsmith tools. Even if you use it for nothing else, monitoring costs in near realtime is worth the few minutes to set it up.

1 week ago 1 0 0 0

1 week ago 3 0 0 0

1 week ago 1 0 0 0

this is me for sure

1 week ago 1 0 0 0

Everything I Learned About Harness Engineering and AI Factories in San Francisco (April 2026) I spent the last week of March 2026 in San Francisco talking to CTOs, CPOs, and engineering leaders from companies of every size about how they actually build with AI agents today. I've met solo…

Engineering teams in SF are moving from conversational AI coding to "harness engineering": bounded, observable loops where agents pick up test writing and refactoring overnight, presenting PR candidates by morning.

escape.tech/blog/everyth...

1 week ago 1 0 1 0

1 week ago 1 0 0 0

1 week ago 0 0 1 0

1 week ago 1 0 0 0

An open-source library maps agent actions directly to EU AI Act Articles 9 through 15 using HMAC-SHA256 signed audit chains. The August 2026 enforcement deadline is coming and your existing JSON logs will not satisfy regulators.

github.com/airblackbox/...

1 week ago 2 0 2 0

Posts by Jeremy Morgan