Musah Abdulai (@musabdulai.com) Bsky

Professional Cloud DevOps Engineer Certification was issued by Google Cloud to Musah Abdulai. Professional Cloud DevOps Engineers implement processes throughout the systems development lifecycle using Google-recommended methodologies and tools. They build and deploy software and infrastructure...

Got my Google Cloud Professional Cloud DevOps Engineer cert last week (Jan 4).

What I’m taking into production LLM/RAG work: safer deployments, better monitoring/alerting, tighter access/tool controls, and spend limits.

www.credly.com/badges/2ceb1...

3 months ago 2 0 0 0

Designing with smaller models isn’t just cost-cutting:
• Faster feedback loops
• Easier load planning
• Less painful mistakes

Use the big models for the 10% of flows where they materially change the outcome.

3 months ago 1 0 0 0

Don’t ask “how do we make this LLM smarter?”
First ask:
• What are we willing to be wrong about?
• How much are we willing to pay per success?
• Where must a human always stay in the loop?

Good constraints turn AI from a toy into a system.

3 months ago 1 0 0 0

An AI feature is “MVP” until:
• It has clear SLOs
• It has owners
• It has dashboards
• It has a kill switch

After that, it’s production.
Everything else is a live demo with unsuspecting users.

3 months ago 1 0 0 0

Your AI platform should answer 3 questions instantly:
• What’s our spend today and who drove it?
• What broke in prod in the last hour?
• Which prompts/tools caused the most failures?
If you need a meeting to answer these, you’re not ready to scale usage.

3 months ago 1 0 0 0

Before bragging about “AI agents in production”, show:
• Your rate limits
• Your circuit breakers
• Your rollback plan
• Your max monthly spend per tenant

Otherwise it’s not a system, it’s a stunt.

3 months ago 1 0 0 0

You don’t secure an AI system by “red teaming it once”.
You secure it by:
• Defining what it must never do
• Making those rules enforceable in code
• Monitoring for violations in production
• Having a way to shut it down fast
Policy → controls → telemetry → kill switch.

3 months ago 2 0 0 0

AI agents shouldn’t be trusted by default.
Give them:
• Narrow scope
• Limited tools
• Explicit budgets
• Clear owners
If you can’t answer “who’s on call for this agent?” it has too much power.

3 months ago 1 0 0 0

“The model is cheap” is not a cost strategy.
Real levers:
• Fewer round trips
• Less useless context
• Smarter routing between models
• Caching stable answers
Every avoided call is 100% cheaper and 100% safer.

3 months ago 0 0 0 0

Before tuning prompts, ask:
• What’s the acceptable error rate?
• What’s the max we’re willing to pay per request?
• What does “graceful failure” look like?

LLM systems without these constraints are vibes, not engineering.

4 months ago 1 0 0 0

An AI agent calling tools is cool.
An AI agent calling tools with:
• Timeouts
• Retry limits
• Circuit breakers
• Spend guards

…is something you can show to your SRE and finance teams without apologizing.

4 months ago 1 0 0 0

LLM stacks have 3 pillars:
• Quality → does it help?
• Reliability → does it work today and tomorrow?
• Cost → can we afford success?

Most teams romanticize #1 and discover #2 and #3 when finance and ops show up.

4 months ago 1 0 0 0

AI cost isn’t “our OpenAI bill is high”.

It’s:
• Engineers debugging flaky agents
• Support fixing silent failures
• RevOps dealing with bad insights

Reliability is a cost-optimization strategy.

4 months ago 1 0 0 0

“We have an AI agent that can do everything.”

Translation:
• Unbounded scope
• Unpredictable latency
• Unknown worst-case cost
• Impossible to test

Narrow agents with clear contracts > one omnipotent chaos agent.

4 months ago 1 0 0 0

A lot of “AI observability” talk is dashboards.
What you actually need:
• Can we say “turn this feature OFF now”?
• Can we cap spend per tenant?
• Can we see which prompts keep failing?

Control first, charts later.

4 months ago 0 0 0 0

LLM reliability trick: design like this 👇

1. Small, cheap model for routing & quick wins
2. Medium model for most requests
3. Big model only for high-value, audited paths

You’ll save cost and reduce how often users see “smart but wrong” answers.

4 months ago 1 0 0 0

Optimize LLM cost like an engineer, not a gambler:
• Measure cost per successful outcome, not per token
• Cache aggressively where correctness is stable
• Use smaller models for validation and guardrails

“We shaved 40% of tokens” means nothing if quality tanked.

4 months ago 1 0 0 0

Your AI system is “secure” and “reliable”?
Cool. Now show me:
• How you test changes to prompts & tools
• How you roll back a bad deployment
• How you cap spend in a runaway loop

If the answer is manual heroics, you’re not there yet.

4 months ago 1 0 0 0

AI agents are just microservices that hallucinate.

You still need:
• Timeouts & retries
• Rate limits
• Idempotency
• Cost ceilings

Treat them like unreliable juniors with prod access, not like magic.

4 months ago 1 0 0 0

If your AI app has:
• No p95 latency target
• No cost per-query budget
• No clear failure modes

…you don’t have a product.
You have an expensive, occasionally helpful surprise.

4 months ago 1 0 0 0

The most expensive tokens in your RAG system aren’t the ones you send.

They’re the ones that:
• Hit sensitive docs
• Bypass weak filters
• End up screenshotted into Slack forever

Data minimization is a cost control.

4 months ago 1 0 0 0

Before you optimize RAG latency from 1.2s → 0.8s, ask:

• Do we know our top 10 expensive users?
• Do we know which indexes drive 80% of cost?
• Do we know our riskiest collections?

Performance tuning without cost & risk data is vibes-based engineering.

4 months ago 1 0 0 0

Your vector DB is now:
• A data warehouse
• A search engine
• An attack surface
• A cost center

Still treating it like a sidecar for “chat with your docs” is how you get surprise invoices and surprise incidents.

4 months ago 1 0 0 0

Hot take:
“Guardrails” are often a guilt-offload for not doing:
• Proper access control
• Per-tenant isolation
• Input/output logging

LLM wrappers won’t fix a broken security model. They just make it more expensive.

4 months ago 2 0 0 0

Hidden RAG cost center: abuse.

• No per-user rate limits
• Unlimited queries on expensive models
• Tool calls that hit paid APIs

Congrats, you just built a token-minter for attackers.
Security is also about protecting your wallet.

4 months ago 1 0 0 0

Observability for RAG isn’t just “for quality”:
• Track token spend per user/tenant
• Track which collections are most queried
• Track which prompts hit sensitive docs

Same logs help with cost optimization AND security forensics. Double win.

4 months ago 1 0 0 0

Every “just in case” token you send has a cost:
• Direct $$
• Latency
• Attack surface

Prune your retrieval:
• Fewer, higher-quality chunks
• Explicit collections
• Permission-aware filters

Spend less, answer faster, leak less.

4 months ago 1 0 0 0

Your RAG threat model should include finance:
• Prompt injection that triggers many tool calls
• Queries crafted to hit max tokens every time
• Abuse of “unlimited internal use” policies

Attackers don’t need your data if they can just drain your budget.

4 months ago 1 0 0 0

RAG tradeoff triangle:
• More context → more tokens
• Less context → more hallucinations
• No security → more incidents

Most teams only tune the first two.
Mature teams treat security as a cost dimension too.

4 months ago 1 0 0 0

“Low token cost” demos lie.

In real life RAG:
• 20–50 retrieved chunks
• Tool calls
• Follow-up questions

Now add:
• No rate limits
• No abuse detection
• No guardrails on tools

Congrats, you’ve built a DoS and data-exfil API with pretty UX.

4 months ago 1 0 0 0

Posts by Musah Abdulai