- 128K context, advanced reasoning, top-tier price-performance (3× cheaper than comparable Gemini, 5× than DeepSeek-R1.
👉 Official announcement www.aboutamazon.com/news/aws/ope... !
Here’s to many more launches (accidental or otherwise). Views are my own.
#AWS #OpenAI #GenerativeAI #Bedrock
Posts by Daron Yondem
but I’m happy to take (mostly fictional) credit for “bringing OpenAI with me.” 😅
Why this drop is a big deal?
- Builder freedom: open weights = fine-tune, inspect, self-host as needed.
- Enterprise muscle: Bedrock + SageMaker guardrails, security, and scale.
🚀 Apparently TSA missed the “3-oz limit on AI models” in my carry-on. #dadjoke #staywithme
Thirty days after joining AWS, OpenAI’s brand-new open-weight models, gpt-oss-120b and gpt-oss-20b, just landed in Amazon Bedrock and SageMaker. Total coincidence…
Keep chasing what matters, friends. See you on the course, or wherever your “next try” takes you. 💪
#SFMarathon #PRDay #ComebackStory #KeepGoing #RunningCommunity
Attaching my race map and official time for anyone who loves the nerdy details 🙂
If you’re staring at a goal that got derailed, injury, rejection, life-curveball, remember: one “no” doesn’t erase the “yes” that’s still possible.
Celebrate the comeback. Today’s PR isn’t just a faster time; it’s proof that persistence + patience pay off.
Progress isn’t linear. The road back was full of frustrating rehab days, slow runs, and doubts, but every small step stacked up.
Re-evaluate, don’t quit. After the injury I asked myself if the goal still mattered. It did, so the plan changed, not the purpose.
Setbacks aren’t stop signs. Missing the 2023 start line didn’t mean the story was over, it just meant there was another chapter to write.
Fast-forward to this morning: same city, same Golden Gate views, but a very different ending. Crossing that finish line wasn’t just about a medal or a stopwatch; it was a reminder.
Finished the San Francisco Marathon, and set a new PR:4 h 20 m! 🎉
Two years ago I stood on these same streets with a race bib and a busted leg, watching everyone else start while I sat on the sidelines. It hurt 😞 literally and figuratively. I’d trained, I’d planned, and life still said “not today.”
AI isn’t a single “best” model, it’s a toolbox. Pick the wrench that fits the bolt, and watch your efficiency (and your API bill) improve.
Are you experimenting with model size in your workflow? I’d love to hear what’s working for you. 👇
#AI #LLM #Productivity #Sustainability #TechInsights
- Right tool, right job. Reserve the juggernauts (O3 Pro et al.) for truly complex reasoning. For routine writing, data cleanup, or quick ideation, a lighter model is the productivity hack no one talks about.
- Iterative > “perfect.” Fast back-and-forth lets me steer, refine, and co-create. Smaller models aren’t worse, they just leave more room for human intuition to tie everything together.
- Cost & carbon matter. Bigger models draw more compute, more electricity, and more dollars, often to draft a quick email or brainstorm titles. That resource mismatch adds up.
- Speed fuels flow. Waiting 15–30 seconds for a response breaks the creative feedback loop. Snappier, lightweight models keep the conversation, and my momentum alive.
But after a few months, I noticed something curious: for 80 % of my daily tasks, the largest model was actually slowing me down.
Here’s what I’ve learned:
💡 Bigger isn’t always better: why I’m reaching for smaller models in my everyday work
When I first got access to O3, its raw power blew me away, I honestly thought I’d never settle for anything less.
🔥 Deal alert: The first 10 people to snag a standard ticket can take 50 % off with code DARON50.
If you’re building with open-source LLMs, or want a smarter, cheaper, faster stack, this is the summit to bookmark. Direct link to the site drn.fyi/pyk 😉
#DeepSeek #LLM #OpenSourceAI #GenerativeAI
• Real-world deployment playbooks: caching, quantization, rate-limit tricks—the stuff we all Google at 2 AM.
Line-up: Paul Iusztin, Duarte O.Carmo, Karl Zhao, PhD, Miguel Otero Pedrido, Alex S. and more.
When: 6 AM – 1:30 PM PDT / 9 AM – 4:30 PM ET, Aug 16
Where: 100 % online (live + replay)
• Expert sessions that get past the buzzwords—think data security, model selection (DeepSeek-R1 vs V3), and agentic AI workflows.
• A hands-on fine-tuning lab (LoRA + Unsloth) you can run on a consumer GPU.
On Saturday, August 16 I’m trading weekend plans for something way more fun: the DeepSeek Demystified Summit, a one-day, all-access dive into the open-source LLM that’s making waves across cost, performance, and reasoning.
Why I’m excited:
Playground: chat.inceptionlabs.ai
Diffusion isn’t just for images anymore. With Mercury shipping and giants circling, the next generation of language models may be noisy under the hood—but the output is crystal clear to users. 🧑🚀✨
#LLM #DiffusionModels #AI #NLP #DataScience
Want to see it? I’m dropping the playground link + a fun speed-up video (video effect exaggerated for Twitter attention 😉).
Fine print:
• Training is still costlier per token than AR cousins
• Stream UX isn’t char-by-char, Mercury shows a draft, then refines (UI tweaks needed)
• Denoising blurs “first-token” vs “final-answer” latency
Why you should care:
• Voice agents & RAG get sub-100 ms replies
• Fewer GPU-seconds → lower cloud bills
• Bidirectional context → native infill & doc-wide edits
• AR-centric tricks (KV cache, speculative decoding) need a rethink
Big Tech’s onboard too: Google DeepMind’s Gemini Diffusion demo at I/O clocked 1479 tok/sec. 2025 is starting to look like diffusion’s “Transformers 2017” moment. 🌌
Academic teasers like LLaDA (Feb ‘25) hinted at this. Mercury is the first public chat model to deliver the goods.
Speed ≠ sloppy. On MMLU-Pro, Mercury matches GPT-4.1 Nano & Claude 3.5 Haiku—while running >7× faster. The “diffusion is quick but messy” era is over. ✅