Also happy to arrange this for people who can't afford it. Its in person, so you'll either need to be SF based, or going to be there for the dates.
Posts by MartinDotNet
Hey, all! Super early-bird pricing for #KCDC, Sept 9-11, ends tomorrow. I'll be there speaking about Open Telemetry and the importance of great logging in your apps.
This is a large part of the issue with how people use agents. The promise is that you need to do nothing other than ask a question. The reality is that there is still work to do to help the agent.
Decent tracing, and giving the agent access to query the telemetry is a game changer.
This is the conference I've been organising for the last few months, we managed to secure speakers from
Stackoverflow
Salesforce
Slack
Google
Akamai
MongoDB
Stripe
Github
Superhuman
Thoughtworks
AWS
Cribl
RWX
... and we're still booking people. Early bird is only $100 too!
Which language? Please do let us know what was wrong, happy to fix it!
Yup, moving to otel because you think you can havr the same as you have just cheaper and faster, and that in the future someone make the same thing cheaper and faster are the wrong reasons.
It will be easier to move, as app teams won't need to change, and at scale thats a real issue.
Unknown is the only thing left.
Thats why I'm bullish on the QL, storage etc. They're catering to the old-world, the legacy of staring at dashboards.
Dynamically created UIs, where the correlated graphs are built on demand. Skills are the query language now, making "migration" easy.
Agents will rule the world here. Its what they're great at, known failure modes, spotting trends.
Thats why the QL doesn’t matter, why moving to a new vendor by adding an OTLP endpoint is viable (finally).
Those pages of correlated graphs are replaced with agent skills and knowledge.
For the monitoring side, those are mostly solved issues now, atleast thats what I'm seeing, the known failure modes are covered through self-healing methods.
The migrations you're referring to are those were people are still clinging to that old world of wallboards.
I don't know about that, we've successfully migrated from most of the wellknown vendors, and at scale. We have skills to do that now.
The wider issue is people equating Observability to "dashboards and alerts", rather than systems you use to understand (the original definition).
How you choose to understand is tightly coupled to the backend you choose, are they a partner in production debugging, or just a DB and a fancy UI.
Dashboards aren't for trends either, maybe some reports that you look at ocassionally that show some queries, but dashboards, not so much. I get that a lot of people use "dashboards" as quasi-reports with correlated graphs, and again moving from monitoring to observability is a process.
SLOs are a fundamentally different proposition to threshold alerts, mainly because you should be aiming for 10x fewer than threshold alerts against infra/runtime concerns. There's a reason they're supposed to be based on events and not metrics.
That said though, we're migrated a system like that a month ago, claude did it all in an hour. MCPs fix most of the "old school to old school" migrations, paving the way for the teams to work on the modernisation.
If you want to migrate with a like for like approach "I have these 1000 triggers, and 2000 dashboards, can you make it cheaper and faster", then sure, storage formats, consistent queries etc. Are all great things, but its not really want observability is about.
For a Monitoring system, sure.
I do find it interesting when people mention "storage formats" and "proprietary datastores", those are the value vendors bring, by design.
Migration on small scales is always easy, especially when you have access to the code. When you have 1000 VMs and a live running site though...
Structured well, that migration isn't big
If you're still in the dashboards and triggers world, you will likely have a bad time, and that migration is a good inflection point to start adopting SLOs (more succinct than triggers) and move away from dashboards into agent based investigations and views
Thats not lock-in, those are the vendor features at that stage.
If we resort to vendors being databases, you've lost anyway, its a race to the bottom so just stick it in Clickhouse and be done.
Vendors innovate on the data, without that, what you have is a database and a fancy UI.
Yeah its been on the list for the last 3(?) Releases of .net, but not made it yet.
Here's my cursed implementation from a long time ago.
github.com/martinjt/ote...
It uses Simple Exporters (sync) to push to JS and uses a beacon.
There is one in the blazor repo somewhere.
The issue is that you need to send the span data (when the span ends) somewhere. Normally an in-memory list/queue for some background action to pick up.
Sending to a webworker (without shared memory) means serialisation on span end, with perf hits
Telemetry is important, but vendor lock-in is real. OpenTelemetry is what allows you apps and infra to produce consistent telemetry and send that to any vendor without changing your code or Infrastructure when you want to switch vendor.
Probably worth noting that there are still issues that stops top blazor wasm working. Even with task support, there are no background workers, so you still havw issue with when the exports happen. Not sure about Uno.
Dashboards are a view of assumptions, we call them known unknowns, the idea that there is something we know has an unknown outcome so we monitor it.
Trace Analytics (querying raw data ondemand) is what you use to discover unknown unknowns, and debug complex stuff.
Just wait until you find structured logs with correlation ids, durations of parts of your code, and explicit ordering... we call them spans :)
Then do adhoc queries against them to generate graphs on the fly over multiple days. Then do that without having to choose what queries you want upfront.
Shifting Left isn't always the right answer, and with metrics its better to shift as far right as possible
This is post I created about using the OpenTelemetry Collector to generate RED Metrics in a cost effective way.
There's also a prompt for Claude to do it :)
www.honeycomb.io/blog/shiftin...
I get that disruptive passengers shouldn't be allowed to board the plane, but once they've caused a 40m delay for the rest of us, maybe just let them on and we'll deal with them?
It was an in-person only conference, we may post some of them in our youtube at some point.
Alternatively we'll be doing a similar thing in San Francisco in May and London in October!
You mean theres another place people do this stuff?
Want a sneak peek at what’s coming to #O11yDayNYC? 👀
Martin Thwaites and Jamie Danielson are previewing the live demos they’ll be running, showcasing some of the new things Honeycomb has been building.
See it all in action in NYC. 👉 buff.ly/FQn2yBn
I find it really interesting that buying a game to play with friends that I'll play for an hour or 2 for £15 is a tough decision, but heading out for beers and spending more than that on a taxi is not a decision at all?