Advertisement · 728 × 90

Posts by Martin Koch

Post image

🆕 OpenAI just released their AI coding competitor to claude code: Codex CLI

You can find it here: github.com/openai/codex
It's open source and has a YOLO mode with a small safety net build in.🪂

1 year ago 2 0 0 0
Preview
Microsoft has created an AI-generated version of Quake You can now try out Microsoft’s new Muse AI model

Microsoft has created an AI-generated version of Quake

1 year ago 83 11 143 230
Preview
The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation We’re introducing Llama 4 Scout and Llama 4 Maverick, the first open-weight natively multimodal models with unprecedented context support and our first built using a mixture-of-experts (MoE) architect...

Lama 4 Models have been released!
ai.meta.com/blog/llama-4...
Get it here: www.llama.com/llama-downlo...

Still coming: Lama 4 2T Parameter "Behemoth"

1 year ago 1 0 0 0
Preview
Apple is reportedly bringing live translation to AirPods It could arrive with iOS 19.

Apple is reportedly bringing live translation to AirPods

1 year ago 117 11 11 7
Preview
PromptPex: Automatic Test Generation for Language Model Prompts Large language models (LLMs) are being used in many applications and prompts for these models are integrated into software applications as code-like artifacts. These prompts behave much like tradition...

- Employs an LLM judge to assess compliance, focusing on rule adherence rather than correctness alone.

Results
PromptPex achieved 5.5% higher non-compliance rates compared to baseline test generators, indicating its effectiveness in identifying prompt weaknesses.

Paper: arxiv.org/abs/2503.05070v1

1 year ago 0 0 0 0

How It Works:

- Extracts Input Specifications (IS) and Output Rules (OR) directly from prompts using LLMs.

- Generates targeted tests based on IS and OR to validate prompt compliance.

- Creates challenging "inverse" tests from OR rules to evaluate model limits.

🧵3/n

1 year ago 0 0 1 0

Key Features:

✅ Specification Extraction: Provides insights into prompt behavior, beyond basic black-box testing

✅ Inverse Rule-Based Testing: Uncovers edge cases to enhance prompt robustness

✅ Automated Compliance Checks: Facilitates prompt portability and informed model selection

🧵2/n

1 year ago 0 0 1 0
Preview
GitHub - microsoft/promptpex: Prompt Exploration Prompt Exploration. Contribute to microsoft/promptpex development by creating an account on GitHub.

Following the "Prompts are Programs" paradigm, Microsoft Research explores unit testing prompts with PromptPex:

github.com/microsoft/pr...

It's a tool for automatic test generation for LLM prompts. It sIimplifies QA by automatically generating clear input/output spec & targeted test cases.

🧵1/n

1 year ago 3 0 1 0
Preview
OpenAI Platform Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

🆕 OpenAI just released new tools for building agentic systems in their new Responses API:
- Web Search
- File Search (for local filesystem)
- Computer Use (incl. Browser Use)

A new Agents SDK
openai.github.io/agents-sdk-p...

and new Tracing Plattform
More info:
platform.openai.com/docs/guides/...

1 year ago 0 0 0 0
Preview
Claude Code overview - Anthropic Learn about Claude Code, an agentic coding tool made by Anthropic. Currently in beta as a research preview.

Also new: A terminal-based coding agent, similar to aider.
Looks pretty useful after playing around with it on my existing project.

👉It's in limited research preview, first come first served! 👈

"npm install -g @anthropic-ai/claude-code"
"Claude" and log in

docs.anthropic.com/en/docs/agen...

1 year ago 4 0 0 0
Advertisement
Preview
Claude's extended thinking Discussing Claude's new thought process

Anthropic launches an “extended thinking mode” for the new Claude 3.7 Sonnet. It lets users and devs - through a “thinking budget” - ask the model to spend more time reasoning.

www.anthropic.com/news/visible...

1 year ago 2 0 0 0
Post image

👑 The King is Back!

Claude 3.7 released today and reclaims the crown for AI dev tasks!

It’s surprising how long it took the competition to reach Claude 3.5-level coding skills. For a long long time Claude was the favorite among AI dev communities using tools like Cursor, Windsurf, Aider, etc.

1 year ago 2 0 0 0
Preview
perplexity-ai/r1-1776 · Hugging Face We’re on a journey to advance and democratize artificial intelligence through open source and open science.

An uncensored version of R1 is released 🔥

“R1 1776 is a DeepSeek-R1 reasoning model that has been post-trained by Perplexity AI to remove CCP censorship. The model provides unbiased, accurate, and factual information while maintaining high reasoning capabilities.”

huggingface.co/perplexity-a...

1 year ago 58 11 2 7
Video

Zuck folds 🧵6/6
Follows Elons footsteps.

1 year ago 0 0 0 0
Video


Zuck folds 🧵5/6

1 year ago 0 0 1 0
Video

Zuck folds 🧵5/6
Will push back against European Censorship Regulations with help of US Gov

1 year ago 0 0 1 0
Video

Zuck folds 🧵3/6
Will move Content review team from California to Texas.

1 year ago 0 0 1 0
Video

Zuck folds 🧵2/6

People wanted less political content in their feed as they felt stressed by it, so they toned it down.
But "it feels like we are in a new era now" so they will fill your feed with political content again.

1 year ago 0 0 1 0
Advertisement
Video

Zuckerberg folds.
- Fires content moderation team, replaced by X-style community notes.
- Less censorship at cost of safety

🧵1/6

1 year ago 1 0 1 0
Preview
Alignment faking in large language models A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models

New research from Anthropic:
Alignment faking in large language models.

Claude often pretends to have different views during training while actually maintaining its original preferences 💀

www.anthropic.com/research/ali...

1 year ago 2 0 0 0
The lawsuit between Elon Musk and OpenAI is really heating up.

OpenAI just dropped a new blog post defending itself against Musk that outlines some new text messages between cofounders Ilya Sutskever, Greg Brockman, Sam Altman, Elon Musk, and former board member Shivon Zilis.

“You can’t sue your way to AGI,” the OpenAI blog post reads, referring to artificial general intelligence, which Altman has promised soon. “We have great respect for Elon’s accomplishments and gratitude for his early contributions to OpenAl, but he should be competing in the marketplace rather than the courtroom. It is critical for the U.S. to remain the global leader in Al. Our mission is to ensure AGI benefits all of humanity, and we have been and will remain a mission-driven organization. We hope Elon shares that goal, and will uphold the values of innovation and free market competition that have driven his own success.”

The lawsuit between Elon Musk and OpenAI is really heating up. OpenAI just dropped a new blog post defending itself against Musk that outlines some new text messages between cofounders Ilya Sutskever, Greg Brockman, Sam Altman, Elon Musk, and former board member Shivon Zilis. “You can’t sue your way to AGI,” the OpenAI blog post reads, referring to artificial general intelligence, which Altman has promised soon. “We have great respect for Elon’s accomplishments and gratitude for his early contributions to OpenAl, but he should be competing in the marketplace rather than the courtroom. It is critical for the U.S. to remain the global leader in Al. Our mission is to ensure AGI benefits all of humanity, and we have been and will remain a mission-driven organization. We hope Elon shares that goal, and will uphold the values of innovation and free market competition that have driven his own success.”

NEW: OpenAI just dropped new Elon Musk receipts: ‘You can’t sue your way to AGI’ www.theverge.com/2024/12/13/2...

1 year ago 127 18 7 4
Video

OpenAI’s Sora video generator includes a powerful feature that allows users to seamlessly add, remove, or edit objects within the video, offering new possibilities for customization and creativity.

1 year ago 29 5 4 3
Preview
GitHub - souzatharsis/podcastfy: An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI - souzatharsis/podcastfy

AI Podcasts generation similar to NotebookLM but Open Source!
Best one so far in the OSS space.

Podcastfy.ai

1 year ago 1 0 0 0
https://huggingface.co/chat/models/Qwen/QwQ-32B-Preview

QwQ-32B-Preview, the new reasoning model from Alibabas Qwen team is now available unquantized on HuggingChat - for free!

huggingface.co/chat/models/...

1 year ago 0 0 0 0