#CloudNativeLondon hashtag - Bluesky - nopzon.com

Bluesky Explorer

#

Hashtag

#CloudNativeLondon

9 months ago

Cloud Native London, July 2025, Wed, Jul 2, 2025, 6:00 PM | Meetup Hi folks! Welcome to our July Cloud Native London meetup! Join us to hear from our two great speakers and network with your fellow techies over pizza and drinks, or altern

✈️ kaspernissen.xyz is bringing clarity to London!

Tomorrow at #CloudNativeLondon: breaking down #observability with #OpenTelemetry & #Perses - open, scalable, and devs in control.

Tired of rigid dashboards and expensive tools? This talk is about breaking free.

🔗 www.meetup.com/cloud-native...

1 0 0 0

🇯🇵Carla Gaggini 🌸🏳️‍🌈

@carlagg.bsky.social

6 years ago

Can’t believe another #CloudNativeLondon is over! Today @cpurdy and @gene_gleyzer announced Ecstasy - a brand new language and @sarahjwells closed with some hot tips on complex and distributed systems 🔥Are you joining us next year? http://CloudNativelondon.com

0 0 0 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

What worked before doesn't work in cloud native. Rigorous change management-> agile testing in production. Zero downtime deployment, not deploy windows.

"Tell your business that you need to have a risk/error budget to move faster." --@sarahjwells [fin] #CloudNativeLondon

0 0 0 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Understand your steady state, minimize the blast radius of potential failures, and check your assumptions using chaos engineering or disaster recovery testing. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Both @sarahjwells and @Yuryu have reinforced today that a backup is not a restore. You need to test that you can restore, otherwise "it's just some files on a disk". #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

"When it hurts, do it more often and bring the pain forward," quoting @jezhumble.

Better to discover your failovers only work when both datacenters are up, before you need to during a hard failure. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

[ed: ow, my head hearts looking at that set of 100+ red/green tiles]. But they're hoping to drop down to 6 tiles next year!

"Your dashboards are scar tissue of your previous incidents," says @sarahjwells quoting yours truly, and also plugging @honeycombio <3. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

They went overboard on metrics and overalerting on metrics, but then cut back to RED metrics instead.

Monitoring can tell you when things are wrong, but not *where* they are wrong.

So in the next year they'd like to monitor against business capabilities. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

They have done a primitive version of distributed tracing -- passing request ids and structured logging the request id when they write log lines. [ed: although, hey, @sarahjwells we'd love it if you used @honeycombio ;)] #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

There's now a way to set measurable goals around percentage of service with runbook coverage, etc.

But runbooks don't solve everything. You also need to build with observability in mind. Log aggregation has been useful for @sarahjwells's teams. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Use nudges with checklists & scorecards to enforce paying down operational debt -- ensuring your runbooks are kept up to date and useful.

Once something meets a minimum score, _then_ do the labor intensive human review. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Every system and service needs to have an owner (that's a team), rather than letting things go stale and contacting people on vacation or who have left the company.

Have a service graph encoding which teams, systems, and products exist. Great for GDPR too! #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Quoting @copyconstruct again, @sarahjwells says that there's a taxonomy of testing in production and it's a spectrum.

There's no point in finding things quickly if you can't fix them quickly. So prioritizing time to restore service is the most important. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Canary releases of code for A/B testing and evaluation can also be useful. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Setting up flowcharts for expected behavior can help you achieve common understanding (or correct the code!) of what process should look like.

Also, use feature flags to separate code release from functionality being enabled. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Do contract testing for the key interfaces. [ed: it's... almost like agreeing upon an SLO with your customers, but about the API as well as availability!] #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

defining success: what does "publish succeeded" mean? for the FT, it has to be in all regions they operate in. So they changed their synthetic prober to check their synthetic stories appeared in all of them. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

And then, of course, check your synthetic monitoring prober to make sure it is up, but that's a much smaller problem than trying to monitor "up" for your whole system.

You really need synthetic traffic for bursty/low real traffic. 0 QPS sometimes is normal. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Instead of shifting your tests left, perhaps shift your ability to test rightward towards production.

Have synthetics [ed: or have SLOs!] that expose whether the system is working in prod and let you start debugging if it's not working. #TestInProduction #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

You also get brittleness out of your fixtures if you have rigid acceptance tests. It's not a good ROI to spend weeks fixing tests that get out of date. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

"Full stack on your laptop only works to a point; you eventually get a distributed monolith or have to reproduce your cloud provider's services to do that." --@sarahjwells #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

So let's talk #TestInProduction. That doesn't mean no pre-release testing. You still need automated testing.

Citing @copyconstruct, have fake versions of your services rather than spinning up the entire stack to test one component. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

.@sarahjwells on error budgets and SLOs: "We aren't a nuclear power plant or hospital. Nobody will die if we're broken for a little while. Things working most of the time, and eventually getting fixed, is good enough." #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

We can't do full regression testing on everything, nor should we assume that we only need to test services in isolation; instead, we need to have a risk-driven approach. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

Decrease your change fail rate _and_ increase your release rate. You can have both. 15% failure -> 1% failure rate, and 250x the release rate.

One is a consequence of the other -- smaller changes are easier to understand. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

The lines are blurring, people no longer spend a majority of their time "just writing code" and spend far more time doing ancillary full-lifecycle operations.

But this has a payoff for letting you move faster. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

We must test the resilience of our services and ability to take things down.

Use containers, orchestration, and SaaS, but... then it makes it harder to run it locally and see what happens. Test your interactions with third parties. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

and actually _test_ your automation, or else it'll subtly break and won't work during a real emergency. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

12-factor applications are able to cope with what happens in cloud production environments.

Testing your code is no longer enough; you need to test behaviors you'll see in prod such as being restarted, etc.

and you'll need to automate much more. #CloudNativeLondon

0 0 1 0

Liz Fong-Jones (方禮真)

@lizthegrey.com

6 years ago

If you're lifting and shifting, you're not getting anything good out of it. Go cloud native all the way to get the benefits of it. #CloudNativeLondon

0 0 1 0