Alex Lovell-Troy (@lovelltroy.org) Bsky

Client Challenge

There’s still room for a few more at the #ISC25 OpenCHAMI tutorial tomorrow. Come join us and learn why the community is growing so fast! #HPC

app.swapcard.com/event/isc-hi...

10 months ago 0 0 0 0

Want to move beyond xcat with a provisioner that’s ready for Confidential Computing?

I’ll be at #ISC25 this week talking about OpenCHAMI.

Free and Open Source with a growing community.

openchami.org

10 months ago 1 0 0 0

I’ve been working on OpenCHAMI for a couple of years now. This is an exciting step for the community!

1 year ago 5 2 1 0

I’ve been working on OpenCHAMI for a couple of years now. This is an exciting step for the community!

1 year ago 5 2 1 0

This is the worst thing I can tell you about Japan.

1 year ago 2 0 0 0

I spent a week in Japan with wet hands before anyone told me that I needed to carry my own hand towel. Apparently they’ve been pulling this shit on foreigners for centuries.

1 year ago 2 0 0 1

Inside of you there are two wolves. Inside each wolf there are zero, one, or two wolves. Write a function to rebalance an arbitrary wolftree B such that it has minimal depth. The function should execute in O(logn) time. Show your work.

1 year ago 323 59 11 3

That’s really good to know. Thanks!

1 year ago 1 0 0 0

🌟 Hello BlueSky! 🌟

We’re Honeycomb, the observability platform for teams who manage software that matters. Send any data to our one-of-a-kind data store, solve problems with all the relevant context, and fix issues before your customers find them.

1 year ago 60 4 3 3

*no lies detected*

1 year ago 1 0 0 0

Twizzlers are made of braces wax and taste amazing when used as straws for Cherry Coke.

1 year ago 1 0 1 0

Moving from cloud #SRE to #HPC often means recalibrating what metrics matter.

Time to job launch?
Time to completion?
Mean time to job failure?
Time to snapshot recovery?

Cloud makes node loss a non-event. HPC typically doesn’t work that way.

1 year ago 10 2 2 0

Isambard 3 Supercomputer: Image Credit: Christy Nunns/University of Bristol

GW4 Isambard 3 #Supercomputer is officially online🎉🧠 !

Part of a collaboration between the universities of Bath, Bristol, Cardiff and Exeter, alongside partners HPE, NVIDIA and Arm, Isambard 3 will push the boundaries of science.

🔗 https://buff.ly/4g7HtMK

1 year ago 14 10 1 0

I once had to email someone with an important corporate email address.

Last Name: Fuchs
First Initial: E
Inexplicable Extra Letter: X

That’s right fuchsex was his official email address.

I often wonder why the X.

1 year ago 1 0 2 0

“It’s like watching someone unlock a padlock on a wrench so they can use it to drive a nail”

Why?

“The padlock doesn’t fit the hammer.”

1 year ago 0 0 0 0

Every single time I’ve been to Bristol, the weather has been fantastic. Highly recommend.

1 year ago 3 0 1 0

SC'24 recap The premiere annual conference of the high-performance computing community, SC24, was held in Atlanta last week, and it attracted a reco...

I spent the Thanksgiving break typing up my notes from #SC24 which I've posted online. 30% more words than my notes from SC23 (sorry!). Feedback is welcome!

https://buff.ly/41fBhho

#HPC

1 year ago 53 9 7 2

Picture of two slices of bread, one stacked on top of the other crust side, with ham and cheese in between

This is, technically, a sandwich.

1 year ago 8563 1407 406 357

Astronomy Picture of the Day A different astronomy and space science related image is featured each day, along with a brief explanation.

Lotsa fake Astronomy photos on Bluesky these days. Just remember, if they’re not credited, it’s not credible.

NASA has the original feed and does a good job of curation.

apod.nasa.gov/apod/

1 year ago 3 0 0 0

Cool. Do you know of any large systems that use the feature? Does it help improve boot timing or scalability?

1 year ago 0 0 1 0

As it turns out, when the Thanksgiving pies don’t last all weekend, you’re allowed to make more pie. Who’s going to stop you?

🥧 Maple Pumpkin
🥧 Bourbon Apple

1 year ago 1 0 0 0

I should write a bittorrent client

1 year ago 1073 20 60 8

Let me guess, high stress but only for a few minutes each day.

1 year ago 0 0 1 0

High cardinality exploration is a super power for SRE. Honeycomb changed so much

1 year ago 1 0 0 0

Tail latency has entered the chat!

1 year ago 1 0 1 0

I'd also recommend looking at these metrics broken down by user/project, and try to make sure your 1% least reliable subset is still doing ok, or at least getting support, since failures are often not evenly distributed.

I really like this post on the topic: rachelbythebay.com/w/2019/07/15...

1 year ago 6 1 1 1

I have a half-written blog post about this that I should finish sometime.

I haven’t seen an SLO framework broadly adopted in HPC, but some sites adopt metrics like:

- % nodes up
- Scheduler RPC latency
- FS latency and BW
- Performance on standard benchmarks, either after maintenance or weekly

1 year ago 5 1 2 0

I totally agree. Feels like a good provocative talk for SRECon, especially as cloud SRE folks are being asked to support large training systems for AI.

1 year ago 2 0 0 0

I don’t see a lot of talk about #SLO (Service Level Objectives) for administering #HPC clusters. Does anyone have good examples beyond “the cluster is not down”?

1 year ago 5 2 2 0

Moving from cloud #SRE to #HPC often means recalibrating what metrics matter.

Time to job launch?
Time to completion?
Mean time to job failure?
Time to snapshot recovery?

Cloud makes node loss a non-event. HPC typically doesn’t work that way.

1 year ago 10 2 2 0

Posts by Alex Lovell-Troy