Martin Koch (@martinkoch) Bsky

🆕 OpenAI just released their AI coding competitor to claude code: Codex CLI

You can find it here: github.com/openai/codex
It's open source and has a YOLO mode with a small safety net build in.🪂

1 year ago 2 0 0 0

Microsoft has created an AI-generated version of Quake You can now try out Microsoft’s new Muse AI model

Microsoft has created an AI-generated version of Quake

1 year ago 83 11 143 230

The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architect...

Lama 4 Models have been released!
ai.meta.com/blog/llama-4...
Get it here: www.llama.com/llama-downlo...

Still coming: Lama 4 2T Parameter "Behemoth"

1 year ago 1 0 0 0

Apple is reportedly bringing live translation to AirPods It could arrive with iOS 19.

Apple is reportedly bringing live translation to AirPods

1 year ago 117 11 11 7

PromptPex: Automatic Test Generation for Language Model Prompts Large language models (LLMs) are being used in many applications and prompts for these models are integrated into software applications as code-like artifacts. These prompts behave much like tradition...

- Employs an LLM judge to assess compliance, focusing on rule adherence rather than correctness alone.

Results
PromptPex achieved 5.5% higher non-compliance rates compared to baseline test generators, indicating its effectiveness in identifying prompt weaknesses.

Paper: arxiv.org/abs/2503.05070v1

1 year ago 0 0 0 0

How It Works:

- Extracts Input Specifications (IS) and Output Rules (OR) directly from prompts using LLMs.

- Generates targeted tests based on IS and OR to validate prompt compliance.

- Creates challenging "inverse" tests from OR rules to evaluate model limits.

🧵3/n

1 year ago 0 0 1 0

Key Features:

✅ Specification Extraction: Provides insights into prompt behavior, beyond basic black-box testing

✅ Inverse Rule-Based Testing: Uncovers edge cases to enhance prompt robustness

✅ Automated Compliance Checks: Facilitates prompt portability and informed model selection

🧵2/n

1 year ago 0 0 1 0

GitHub - microsoft/promptpex: Prompt Exploration Prompt Exploration. Contribute to microsoft/promptpex development by creating an account on GitHub.

Following the "Prompts are Programs" paradigm, Microsoft Research explores unit testing prompts with PromptPex:

github.com/microsoft/pr...

It's a tool for automatic test generation for LLM prompts. It sIimplifies QA by automatically generating clear input/output spec & targeted test cases.

🧵1/n

1 year ago 3 0 1 0

OpenAI Platform Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

🆕 OpenAI just released new tools for building agentic systems in their new Responses API:
- Web Search
- File Search (for local filesystem)
- Computer Use (incl. Browser Use)

A new Agents SDK
openai.github.io/agents-sdk-p...

and new Tracing Plattform
More info:
platform.openai.com/docs/guides/...

1 year ago 0 0 0 0

Claude Code overview - Anthropic Learn about Claude Code, an agentic coding tool made by Anthropic. Currently in beta as a research preview.

Also new: A terminal-based coding agent, similar to aider.
Looks pretty useful after playing around with it on my existing project.

👉It's in limited research preview, first come first served! 👈

"npm install -g @anthropic-ai/claude-code"
"Claude" and log in

docs.anthropic.com/en/docs/agen...

1 year ago 4 0 0 0

Claude's extended thinking Discussing Claude's new thought process

Anthropic launches an “extended thinking mode” for the new Claude 3.7 Sonnet. It lets users and devs - through a “thinking budget” - ask the model to spend more time reasoning.

www.anthropic.com/news/visible...

1 year ago 2 0 0 0

👑 The King is Back!

Claude 3.7 released today and reclaims the crown for AI dev tasks!

It’s surprising how long it took the competition to reach Claude 3.5-level coding skills. For a long long time Claude was the favorite among AI dev communities using tools like Cursor, Windsurf, Aider, etc.

1 year ago 2 0 0 0

perplexity-ai/r1-1776 · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

An uncensored version of R1 is released 🔥

“R1 1776 is a DeepSeek-R1 reasoning model that has been post-trained by Perplexity AI to remove CCP censorship. The model provides unbiased, accurate, and factual information while maintaining high reasoning capabilities.”

huggingface.co/perplexity-a...

1 year ago 58 11 2 7

Zuck folds 🧵6/6
Follows Elons footsteps.

1 year ago 0 0 0 0

Zuck folds 🧵5/6

1 year ago 0 0 1 0

Zuck folds 🧵5/6
Will push back against European Censorship Regulations with help of US Gov

1 year ago 0 0 1 0

Zuck folds 🧵3/6
Will move Content review team from California to Texas.

1 year ago 0 0 1 0

Zuck folds 🧵2/6

People wanted less political content in their feed as they felt stressed by it, so they toned it down.
But "it feels like we are in a new era now" so they will fill your feed with political content again.

1 year ago 0 0 1 0

Zuckerberg folds.
- Fires content moderation team, replaced by X-style community notes.
- Less censorship at cost of safety

🧵1/6

1 year ago 1 0 1 0

Alignment faking in large language models A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models

New research from Anthropic:
Alignment faking in large language models.

Claude often pretends to have different views during training while actually maintaining its original preferences 💀

www.anthropic.com/research/ali...

1 year ago 2 0 0 0

The lawsuit between Elon Musk and OpenAI is really heating up. OpenAI just dropped a new blog post defending itself against Musk that outlines some new text messages between cofounders Ilya Sutskever, Greg Brockman, Sam Altman, Elon Musk, and former board member Shivon Zilis. “You can’t sue your way to AGI,” the OpenAI blog post reads, referring to artificial general intelligence, which Altman has promised soon. “We have great respect for Elon’s accomplishments and gratitude for his early contributions to OpenAl, but he should be competing in the marketplace rather than the courtroom. It is critical for the U.S. to remain the global leader in Al. Our mission is to ensure AGI benefits all of humanity, and we have been and will remain a mission-driven organization. We hope Elon shares that goal, and will uphold the values of innovation and free market competition that have driven his own success.”

NEW: OpenAI just dropped new Elon Musk receipts: ‘You can’t sue your way to AGI’ www.theverge.com/2024/12/13/2...

1 year ago 127 18 7 4

OpenAI’s Sora video generator includes a powerful feature that allows users to seamlessly add, remove, or edit objects within the video, offering new possibilities for customization and creativity.

1 year ago 29 5 4 3

GitHub - souzatharsis/podcastfy: An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI - souzatharsis/podcastfy

AI Podcasts generation similar to NotebookLM but Open Source!
Best one so far in the OSS space.

Podcastfy.ai

1 year ago 1 0 0 0

https://huggingface.co/chat/models/Qwen/QwQ-32B-Preview

QwQ-32B-Preview, the new reasoning model from Alibabas Qwen team is now available unquantized on HuggingChat - for free!

huggingface.co/chat/models/...

1 year ago 0 0 0 0

Posts by Martin Koch