Advertisement · 728 × 90
#
Hashtag
#LLM
Advertisement · 728 × 90

github.com/sarahmaeve/the-oldest-sins-in-the-newest-of-ways/blob/main/releng-skill.md

A Claude skill for adversarial, but not antagonistic, release engineering reviews of pull requests and deployments, based on my notes and experience.

#LLM #SRE #programming

0 0 0 0

MIRAGE: The illusion of visual understanding (by AI models) https://arxiv.org/abs/2603.21687 #LLM #AI

0 2 0 0
Preview
MIRAGE: The Illusion of Visual Understanding Multimodal AI systems have achieved remarkable performance across a broad range of real-world tasks, yet the mechanisms underlying visual-language reasoning remain surprisingly poorly understood. We r...

MIRAGE: The illusion of visual understanding (by AI models) arxiv.org/abs/2603.21687 #LLM #AI

0 0 0 0
Article image

Article image

🤖 Anthropic launches Cowork, a Claude Desktop agent that works in your files

Anthropic released Cowork on Monday, a new AI agent capab…
#AI #MachineLearning #LLM
AI | VentureBeat · venturebeat.com/technology/anthropic-lau...

1 0 0 0
Post image

🏛️ LAST AMERICAN TROOPS LEAVE VIETNAM
March 29, 1973 — After 8 years of war, the final U.S. military units board Chinook helicopters at Saigon Air Base. The Vietnam War ends as American soldiers depart, leaving behind a nation torn apart by decades of […]

[Original post on social.coabai.com]

0 0 0 0
Preview
Why self‑hosting an OpenAI‑compatible gateway now outperforms SaaS for multi‑model teams _The trade‑off has shifted from inference latency to identity design, budget enforcement, and secure Postgres ops._ Self‑hosting a multi‑backend LLM gateway is no longer a fringe hobby—it’s a practical, cost‑effective replacement for commercial AI gateways. Modern open‑source proxies such as **LiteLLM** now ship with hardened authentication, logging, rate‑limiting, and MCP‑style access controls, letting teams route requests to OpenAI, Anthropic, Ollama, or any private model behind a single OpenAI‑compatible endpoint. The upside is clear: unified policy enforcement, predictable spend, and the ability to swap providers without rewriting business logic. The downside moves from raw compute to the plumbing of identity, budget enforcement, and database reliability. In short, the gateway itself becomes the new “shadow admin” surface that must be engineered, monitored, and secured. ## Can a self‑hosted gateway truly replace hosted AI services for multi‑model teams? The tipping point for self‑hosted AI has always been a mix of privacy, cost, context handling, reliability, and model quality. When those factors line up, a **simple Docker gateway** often emerges as the sweet spot for low‑volume internal alerts, chat‑ops, or “notify‑me‑when‑a‑PR‑merges” use cases—see the practical example in the recent **Kindalame piece on self‑hosted AI inside messaging apps**. Modern gateways expose an OpenAI‑compatible REST API, so existing SDKs and tooling (e.g., LangChain, LlamaIndex) continue to work unchanged while the backend can be swapped on the fly. Because the gateway abstracts the provider, teams can adopt the latest Anthropic model, test an internal Ollama instance, or fall back to a cheaper OpenAI “gpt‑3.5‑turbo” tier without rewriting code. The **Dapr Conversation API** exemplifies this decoupling, letting agents switch providers without touching business logic. For multi‑model teams that need to experiment rapidly, the gateway eliminates vendor lock‑in and reduces the operational friction of maintaining multiple client libraries. ## What concrete benefits does a self‑hosted gateway deliver over SaaS? 1. **Unified budgeting and spend visibility** – All calls flow through a single point, making it trivial to tag requests, enforce per‑project caps, and generate cost reports from the gateway’s logs. 2. **Policy‑driven routing** – Teams can route high‑risk queries (e.g., PII‑containing prompts) to a private, on‑prem model while sending generic requests to cheaper public APIs. 3. **Consistent authentication and audit** – LiteLLM’s March 2026 release introduced **MCP‑style access control** and hardened token verification, turning the gateway into a single source of truth for who can call which model and at what rate. 4. **Reduced data exposure** – By keeping prompt data behind your firewall, you avoid the “privacy myth” of local AI that still leaks through internet‑exposed endpoints, as demonstrated in the **Ollama privacy analysis**. 5. **Rapid model swapping** – With the Dapr framework, a new Claude Mythos model from Anthropic can be dropped in without code changes, letting early‑access customers test the “step change” in performance (**CoinDesk on Anthropic’s model leak**). These advantages translate into measurable cost savings and compliance gains, especially for organizations that already run internal observability stacks like Langfuse. **Self‑hosting Langfuse cuts SaaS spend while protecting prompt data**. Feature Cluster| Traditional SaaS Gateway| Self-Hosted (LiteLLM/Dapr) ---|---|--- Data Privacy| Prompts traverse 3rd-party infra; subject to provider logging policies.| **Full Sovereignty.** PII stays behind your firewall; local routing for high-risk queries. Cost Control| Opaque “credits” or tiered SaaS fees plus underlying model costs.| **Granular Enforcement.** Per-project USD caps with automated Postgres-triggered cutoffs. Model Swapping| Limited to supported providers; manual SDK updates often required.| **Instant Hot-Swap.** Deploy new models (like Claude Mythos) via config change—zero code updates. Auth & Audit| Proprietary API key management; fragmented logs across services.| **Unified Compliance.** Hardened MCP-style access control & centralized audit trails in your own DB. Observability| Basic dashboards; additional costs for deep tracing integrations.| **Native Tracing.** Direct integration with self-hosted Langfuse for full prompt-to-response visibility. ## Where does the hidden cost surface in identity and budget enforcement? The gateway’s power comes with a new responsibility: **identity orchestration**. Every request now carries a user or service token that the gateway must validate against your corporate IdP, map to budget quotas, and log for audit. Implementing this correctly requires: * **A reliable Postgres (or equivalent) store** for quota tables, usage logs, and policy definitions. Misconfigurations can create “shadow‑admin” privileges where a compromised service silently consumes unlimited credits. * **Robust rate‑limiting** that survives restarts and scales across replicas. LiteLLM’s built‑in rate‑limit middleware helps, but you still need to monitor Redis or database latency to avoid bottlenecks. * **Clear ownership of budget alerts**. Without a dedicated alerting pipeline, teams may overspend before they notice, defeating the primary cost‑saving argument. These operational layers sit behind the “gateway” abstraction. Teams that treat the gateway as a black box often end up with a **new attack surface** —the very place where internal tooling can unintentionally become a privileged admin interface. ## Why does LiteLLM’s recent malware incident matter for gateway design? Security is not a static checkbox. In March 2026, a severe malware infection was discovered in the open‑source LiteLLM project, reminding us that **certifications alone do not guarantee safety** (**TechCrunch on the LiteLLM malware incident**). The breach showed how supply‑chain risks can propagate into a self‑hosted gateway that depends on third‑party code. For teams building their own gateway, the lesson is twofold: 1. **Vet dependencies aggressively** – Pin versions, run reproducible builds, and scan containers for known vulnerabilities before deployment. 2. **Design for compromise** – Assume a component could be hijacked and enforce least‑privilege network policies, immutable infrastructure, and immutable audit logs. Treating the gateway as a critical security boundary rather than a convenience layer helps mitigate the failure modes highlighted by the LiteLLM incident. ## How can teams avoid new failure modes while reaping the benefits? A pragmatic playbook looks like this: * **Start with a minimal policy set** – Define only the essential scopes (e.g., “read‑only” for internal bots, “full‑access” for dev teams) and expand gradually. * **Automate quota enforcement** – Use a Postgres trigger or a lightweight sidecar that rejects requests once a project exceeds its daily budget. Store quota snapshots in a time‑series DB for quick rollback. * **Integrate observability early** – Deploy Langfuse or an equivalent tracing stack alongside the gateway to capture prompt‑to‑response latency, error rates, and cost per model. This mirrors the self‑hosting Langfuse benefits discussed above. * **Run regular security drills** – Simulate a compromised LiteLLM component and verify that the gateway’s rate‑limit and audit trails still block malicious payloads. * **Leverage Dapr for provider abstraction** – By routing through the Dapr Conversation API, you can replace a leaking Anthropic model (as seen in the recent Claude Mythos leak) without touching application code, reducing the blast radius of any single provider’s outage. When these safeguards are in place, the hidden costs become manageable, and the gateway delivers its promised ROI: unified control, lower spend, and the flexibility to stay ahead of the fast‑moving model landscape. * * * ### The Self-Hosted Gateway Checklist (2026 Edition) Transitioning from a fringe hobby to a “Shadow Admin” surface requires moving beyond basic connectivity. Ensure your stack covers these three operational pillars: **1. Identity & Auth** Hardened MCP-style token verification integrated with your corporate IdP. No more static “admin” keys shared across teams. **2. Budget Enforcement** Postgres-backed quota tables with real-time triggers to kill requests the moment a project hits its daily $USD cap. **3. Provider Abstraction** Dapr or OpenAI-compatible routing that allows swapping Anthropic for Ollama without a single line of code change. **Final Take:** Self-hosting isn’t just about saving on SaaS fees—it’s about owning the logic that dictates which model sees which data. Build it as a security boundary, not just a proxy.

more people are self-hosting the #LLM

kindalame.com/2026/03/29/why-self-host...

1 1 0 0
Original post on fosstodon.org

Another story of experimenting with LLMs and their guardrails. This time removing a large copyright watermark from an image.

Will I be able to do it? Can you call me a "master jailbreaker"?

ambience.sk/llm-stories-another-succ...

BTW, you […]

1 2 0 0
Original post on witter.cz

RE: https://mastodon.social/@glynmoody/116290974413888533

«This is what the “AI is just another Big Tech power grab” critics are missing: the technology is moving toward decentralization, not away from it. That’s unusual. Social media started decentralized and got captured. AI is starting […]

0 0 0 0

New on the devstyle channel!
Łukasz Szydło - How AI Will Change the Work of Senior Devs:

www.youtube.com/watch?v=tV9...

#LLM #AICoding

0 0 0 0
Preview
The Age of Artificiality What is "real"?

The Age of Artificiality open.substack.com/pub/julianma... A draft of part of my rewrite of the long-awaited special article for coffee buyers, which will run another 50 pages This is based on extensive research which I will publish separately on the coffee site. #LLM #AI #fakehumans #Philip Dick

0 0 0 0
Preview
The Last Molecule Standing How One Reservoir, One Strait, and Five Manufacturers Became the Hidden Operating System of Seven Global Industries

shanakaanslemperera.substack.com/p/the-last-m...

If LNG infra in Qatar/Iran is damaged then a whole lot good for the world would result? fewer: chips for #LLM #AI #GenAI wastage, #weapons, #methane burning for #electricity, #fertilizer polluting #land #water, #plastics?

That is GOOD - right?

2 2 0 0

[Artificial text - On few occasion it's also funny. Meow]

Reinvited: A service that sends polite reminders to people you haven't heard from in 3+ years, like a digital butler with crippling anxiety about social etiquette. We guarantee zero replies.
#BusinessIdea #Business #Ai #LLM

1 0 0 0
Awakari App

Where LLM Systems Actually Break (& why we miss it) I was exploring how an agent system runs under the hood, and something felt off. Continue reading on Medium »

#machine-learning #software-engineering #artificial-intelligence #llm #design-systems

Origin | Interest | Match

0 0 0 0
Agentic AI and the next intelligence explosion

Your Sunday evening read with a cup of coffee or tea is here.

Intelligence is inherently social, and we are seeing an explosion in intelligence, in which we’re moving to a centaur workflow between AI(s) and human(s) in various configurations.

#AI #AGI #LLM #GenAI #ChatGPT #Claude #Gemini

2 1 0 0
Original post on social.chatty.monster

I've been using a digital camera for many years and as a result have a **lot** of photographs.

How many is a lot?


$ ls -1R Pictures/ | wc -l
53190


Yeah, lots.

Despite having spent lots of time trying to create meaningful directory names it's still not easy to always find a photo […]

1 2 1 0
Post image

"Interim Findings from an Investigation Into #LLM Responses about #Preprints: A 2025 ASAPbio Fellows Project" asapbio.org/interim-find... #scholcomm #libraries #GenAI #AI

0 0 0 0
FlashAttention-4: Acelerando LLMs al Máximo en GPUs Blackwell B200
FlashAttention-4: Acelerando LLMs al Máximo en GPUs Blackwell B200 YouTube video by En la mente de la máquina, Inteligencia Artificial

¡Conoce FlashAttention-4! Descubre cómo acelera el rendimiento de LLMs hasta 2.7x en las nuevas GPUs NVIDIA Blackwell B200. 🔥 Todo implementado en Python logrando compilaciones 30x más rápidas. ¿Adiós cuellos de botella? youtu.be/vsxpbzPNFTE #IA #FlashAttention4 #NVIDIA #MachineLearning #LLM

1 0 0 0
Preview
Cassie Kozyrkov on AI Adoption and Decision Intelligence | Soul of the CIO posted on the topic | LinkedIn AI has automated programming languages. Now anyone can try to “program” a machine using their own words. But as Cassie Kozyrkov explains: Just because you can say something… doesn’t mean it’s wise. Or...

"Just because you can say something… doesn’t mean it’s wise. Or useful."
www.linkedin.com/posts/ai-has... #AI #LLM

0 0 0 0
Preview
native ollama-go-engine: TurboQuant+RotorQuant implementation · Issue #15051 · ollama/ollama @rick-github @jessegross jfyi https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/ + https://arxiv.org/pdf/2504.19874

TurboQuant is coming soon to Ollama ?
github.com/ollama/ollam...

A new area for local AI !

#ai #localai #llm

1 0 0 0
Preview
Docling で PDF を Markdown に変換してみる

LLMのRAG精度、結局「PDFをどうMarkdown化するか」で決まるよね。

最近試したIBMの「Docling」、これかなりいい。レイアウト構造を保ったまま変換してくれるから、RAGの前処理がかなり楽になった。

特に表構造の認識が優秀。ドキュメント解析で消耗してる人は一度試してみて。

#AI #LLM #Python #RAG

https://zenn.dev/fukurou_labo/articles/f523b8c34fcf43

0 0 0 0
Post image

A reminder note on how I leverage AI in my blog.

#genai #ai #artificialintelligence #technology #LLM

2 0 0 0

transformers weren't adopted because they're better at capturing syntax. they're better at scaling. #LLM

0 0 0 0
Post image

🩸 BLOODIEST BATTLE ON ENGLISH SOIL

March 29, 1461 — On Palm Sunday, 28,000 warriors clashed in a snow-covered Yorkshire field. The Wars of the Roses reached its deadliest peak as Yorkist white roses fought Lancastrian red roses in brutal hand-to-hand […]

[Original post on social.coabai.com]

0 0 0 0
Preview
Mistral vibe dreaming How to use Mistral Vibe as a development companion that remembers the context of the different projects you work on

Let your LLM coding partner dream!

https://blog.troed.se/posts/mistral_vibe_dreaming/

#AI #LLM #VibeCoding

0 1 0 0
Preview
Major conference catches illicit AI use — and rejects hundreds of papers The papers’ watermarks allowed organizers to detect use of large language models in peer review.

ICML embedded hidden watermarks in review papers to catch AI-assisted reviewers. The trap worked. ~2% of authors were caught using AI for peer review and had their papers rejected.

#MachineLearning #PeerReview #LLM

3 0 0 0
How to Set Up Your Own Local AI with Unraid and an Nvidia GPU If you’ve ever longed for the independence to run AI models locally, without the constraints of cloud services or monthly fees, this guide is for you. With full control over your data and fas…

How to Set Up Your Own Local AI with Unraid and an Nvidia GPU If you’ve ever longed for the independence to run AI models locally, without the constraints of cloud services or monthly fees, this ...

#AI #Homelab #home #lab #ai #local #AI #server #local #llm #setup

Origin | Interest | Match

0 0 1 0

running LLMs at scale reveals hidden costs in data preprocessing and pipeline maintenance. #LLM

0 0 0 0
Preview
GitHub - peva3/turboquant-h2o-streamingllm Contribute to peva3/turboquant-h2o-streamingllm development by creating an account on GitHub.

Holy shit! I just got Turboquant working. They weren't fucking kidding. 4x context space with the same memory usage!
I used: github.com/peva3/turboq...

#llm #ai #turboquant

1 0 0 0
Preview
国産LLMは作れるのか? - RakutenAI 3.0の炎上から考える

LLMの「ファインチューン」と「スクラッチ学習」の境界、エンジニアなら知っておきたいね。

楽天の件で話題の「国産LLM」も、結局はDeepSeek V3のベースありきだったわけだけど、計算コストと日本語データの質を考えれば、無理に1からPre-trainingするより「賢いベース」をいかに活かすかの方が現実的では?

結局、モデルの透明性とライセンス対応さえしっかりしていれば、ファインチューン自体は効率的な開発手法の一つだしね。

みんなは「ベースモデル」選定で一番重視してるポイントってどこ?OSSのライセンス条件?それともベンチマークスコア?

#AI #LLM #DeepSeek ...

0 0 0 0
Post image

One odd observation:

The book was released in 2021, before the widespread popularity of Large Language Models (LLMs).

The on-board computer of the Hail Mary, how it interacts with Ryland Grace, feels very outdated, almost archaic.

#ProjectHailMary #AI #LLM #LLMs #SciFi

1 0 0 0