🆕 OpenAI just released their AI coding competitor to claude code: Codex CLI
You can find it here: github.com/openai/codex
It's open source and has a YOLO mode with a small safety net build in.🪂
Posts by Martin Koch
Lama 4 Models have been released!
ai.meta.com/blog/llama-4...
Get it here: www.llama.com/llama-downlo...
Still coming: Lama 4 2T Parameter "Behemoth"
- Employs an LLM judge to assess compliance, focusing on rule adherence rather than correctness alone.
Results
PromptPex achieved 5.5% higher non-compliance rates compared to baseline test generators, indicating its effectiveness in identifying prompt weaknesses.
Paper: arxiv.org/abs/2503.05070v1
How It Works:
- Extracts Input Specifications (IS) and Output Rules (OR) directly from prompts using LLMs.
- Generates targeted tests based on IS and OR to validate prompt compliance.
- Creates challenging "inverse" tests from OR rules to evaluate model limits.
🧵3/n
Key Features:
✅ Specification Extraction: Provides insights into prompt behavior, beyond basic black-box testing
✅ Inverse Rule-Based Testing: Uncovers edge cases to enhance prompt robustness
✅ Automated Compliance Checks: Facilitates prompt portability and informed model selection
🧵2/n
Following the "Prompts are Programs" paradigm, Microsoft Research explores unit testing prompts with PromptPex:
github.com/microsoft/pr...
It's a tool for automatic test generation for LLM prompts. It sIimplifies QA by automatically generating clear input/output spec & targeted test cases.
🧵1/n
🆕 OpenAI just released new tools for building agentic systems in their new Responses API:
- Web Search
- File Search (for local filesystem)
- Computer Use (incl. Browser Use)
A new Agents SDK
openai.github.io/agents-sdk-p...
and new Tracing Plattform
More info:
platform.openai.com/docs/guides/...
Also new: A terminal-based coding agent, similar to aider.
Looks pretty useful after playing around with it on my existing project.
👉It's in limited research preview, first come first served! 👈
"npm install -g @anthropic-ai/claude-code"
"Claude" and log in
docs.anthropic.com/en/docs/agen...
Anthropic launches an “extended thinking mode” for the new Claude 3.7 Sonnet. It lets users and devs - through a “thinking budget” - ask the model to spend more time reasoning.
www.anthropic.com/news/visible...
👑 The King is Back!
Claude 3.7 released today and reclaims the crown for AI dev tasks!
It’s surprising how long it took the competition to reach Claude 3.5-level coding skills. For a long long time Claude was the favorite among AI dev communities using tools like Cursor, Windsurf, Aider, etc.
An uncensored version of R1 is released 🔥
“R1 1776 is a DeepSeek-R1 reasoning model that has been post-trained by Perplexity AI to remove CCP censorship. The model provides unbiased, accurate, and factual information while maintaining high reasoning capabilities.”
huggingface.co/perplexity-a...
Zuck folds 🧵6/6
Follows Elons footsteps.
Zuck folds 🧵5/6
Zuck folds 🧵5/6
Will push back against European Censorship Regulations with help of US Gov
Zuck folds 🧵3/6
Will move Content review team from California to Texas.
Zuck folds 🧵2/6
People wanted less political content in their feed as they felt stressed by it, so they toned it down.
But "it feels like we are in a new era now" so they will fill your feed with political content again.
Zuckerberg folds.
- Fires content moderation team, replaced by X-style community notes.
- Less censorship at cost of safety
🧵1/6
New research from Anthropic:
Alignment faking in large language models.
Claude often pretends to have different views during training while actually maintaining its original preferences 💀
www.anthropic.com/research/ali...
The lawsuit between Elon Musk and OpenAI is really heating up. OpenAI just dropped a new blog post defending itself against Musk that outlines some new text messages between cofounders Ilya Sutskever, Greg Brockman, Sam Altman, Elon Musk, and former board member Shivon Zilis. “You can’t sue your way to AGI,” the OpenAI blog post reads, referring to artificial general intelligence, which Altman has promised soon. “We have great respect for Elon’s accomplishments and gratitude for his early contributions to OpenAl, but he should be competing in the marketplace rather than the courtroom. It is critical for the U.S. to remain the global leader in Al. Our mission is to ensure AGI benefits all of humanity, and we have been and will remain a mission-driven organization. We hope Elon shares that goal, and will uphold the values of innovation and free market competition that have driven his own success.”
NEW: OpenAI just dropped new Elon Musk receipts: ‘You can’t sue your way to AGI’ www.theverge.com/2024/12/13/2...
OpenAI’s Sora video generator includes a powerful feature that allows users to seamlessly add, remove, or edit objects within the video, offering new possibilities for customization and creativity.