Ryan Rosario (@datajunkie) Bsky

GitHub - aphyr/distsys-class: Class materials for a distributed systems lecture series Class materials for a distributed systems lecture series - aphyr/distsys-class

An Introduction to Distributed Systems by Kyle Kingsbury

github.com/aphyr/distsy...

1 week ago 0 0 0 0

Academics/Researchers: What your suggestions for someone that got their PhD a while ago and wants to re-enter the research space? I think my teaching career is coming to and end (too strict of a schedule) but I want to remain affiliated with academia and need to plan my next move in it.

2 weeks ago 0 0 0 0

No Web Without Women An educational website featuring a collection of innovations by women in the fields of computer science and technology.

I try to always make time to share this website with my students: nowebwithoutwomen.com

2 weeks ago 0 0 0 0

Most of them are Trumpers. They support this.

2 months ago 0 0 0 0

That moment when I was so busy and completely forgot that Monday was a holiday and I had an extra day to prepare. Courtesy of Sora.

2 months ago 0 0 0 0

Inside StarRocks: Why Joins Are Faster Than You’d Expect The engineering choices that turn joins into a strength. A deep dive with real-world case studies.

Challenges in join optimization.
www.starrocks.io/blog/inside-...

2 months ago 1 0 0 0

Anyway, It was humorous seeing these concepts being treated as if they were scientific breakthroughs. It reminds me of the Dynamo paper where the authors believed that they had discovered, or at least revolutionized, the concept of "Tail Latency."

4 months ago 0 0 0 0

(1) and (2) show the disconnect between statistics and/or data science and computer science. It's very inefficient.

4 months ago 0 0 1 0

(3) In the "replacement" discussion, I felt a bit of elitism here. Some of the researchers that are enthusiastic about everyone being replaced with AI seem to the think that they are immune. It comes across differently to those that are not 100% in academia.

4 months ago 0 0 1 0

(2) There was a lot of focus on time series. The talks suggested that time series had just been re-discovered by computer scientists. It's been around for at least a hundred years.

4 months ago 0 0 1 0

A snarkier take from my time at #NeurIPS2025

(1) Large companies have been doing evaluation, on everything, for decades (it's all I did as a DS in Google Search). It was interesting seeing academia catch up, beyond accuracy/precision/recall/AUC/F1 etc. though they acted like this was a new concept.

4 months ago 0 0 1 0

With that said, statisticians and data scientists (or the companies that don't understand how to use them) tend to miss a big opportunity: helping improve systems, algorithms and AI through evaluation and experimental design. I don't get it.

4 months ago 0 0 0 0

(2) Most papers in AI overfit the data, this is why evaluation is important.
(3) System architects may be safe from automation for AI in the near future.
(4) Junior level roles will disappear (My opinion: this is a shift, not a deprecation)

4 months ago 0 0 1 0

These are my takeaways from hashtag #NeurIPS2025:

(1) Evaluation of algorithms and solutions developed from LLM prompts and responses in systems is important (attention statisticians and data scientist). Log the results ("observability"). Iterate based on the results. 1/4

4 months ago 0 0 1 0

Most, if not all, of us who teach and/or do research feel a certain way about what’s going on right now. It was surreal to see UCOP explicitly call it out in a recent (public) document. It made my heart skip a beat.

11 months ago 0 0 0 0

If any of you are thinking of upgrading to Claude Max. Don't. Save your money. Same ridiculous limit on input and conversation length.

11 months ago 0 0 0 0

Whenever I introduce TCP or other network connections, I introduce the concept with two bros, Connor and Logan. Why? Because not much data is exchanged, yet the handshake is important.

1 year ago 0 0 0 0

It's getting to the point that I need to consider canceling my subscription to Claude. Has anyone else noticed a drastic decrease in quality with coding prompts in addition to system reliability issues?

1 year ago 0 0 1 0

Another earthquake! This is getting to become a bit much. All of them have been near Conejo Valley.

1 year ago 0 0 0 0

I believe so. We are about to head for a cliff in the next couple of years when StackExchange shuts down and training data becomes old or limited. Sure there's Github, but there is less human annotation in Github.

1 year ago 1 0 0 0

The Case for Learned Index Structures Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsor...

What baffles me about statistics education is lack of discussion on non-ML importance to computer science:

(1) use for indexing where the keys follow a distribution: arxiv.org/abs/1712.01208
(2) use in evaluating cost of query plans
(3) probabilistic data structures

1 year ago 0 0 0 0

Tonight I took all of my slides and passed them to NotebookLM. The podcast adds more context, some analogies and other examples. With the exception of some minor hallucination, and the host making strange noises, this is mind-blowing. I'm using this for my classes moving forward.

1 year ago 1 0 0 0

THAT was a big earthquake. Damn.

1 year ago 0 0 0 0

Why do I even pay for Claude? It is horribly rate limited, expensive, and is offline more than it is online.

1 year ago 0 0 0 0

Hot take? Tableau is hot garbage.

Believe it or not, today was my first time ever using Tableau as a data scientist. And after today, it is also my last time.

1 year ago 0 0 0 0

MongoDB has the most bizarre authentication model.

1 year ago 0 0 0 0

I am going to have to switch away from Neo4j to another graph database as my choice to teach the graph model. It's too much of a money grab for simple things like read only access on a user, and it's a pain to setup HTTPS and reverse proxy. Any suggestions for worth alternatives?

1 year ago 1 0 0 0

This quarter my data management students are constructing an ETL pipeline as their final project. We are hamstrung by AWS' free tier and so we are using #DuckDB as our serving layer, rather than Snowflake or Redshift, to power a Tableau dashboard. I enjoy it more and more each time.

1 year ago 2 0 0 0

Ultimate nirvana when teaching. This happened for the first time since maybe week 1. Average response time has gone up over the years, but still pretty good if there were an SLA...

1 year ago 0 0 0 0

Brought to you by the parametric equations,
x = 16 (sin x)^3
y = 13 cos x - 5 cos 2x - 2 cos 3x - cos 4x

1 year ago 0 0 0 0

Posts by Ryan Rosario