Riccardo Cappuzzo (@riccardocappuzzo.com) Bsky

Release history Release 0.8.0: New Features: The eager_data_ops configuration option has been added. When set to False, no previews are computed and validation is deferred until the DataOp is actually used (e.g. w...

✨ skrub version 0.8.0 has been released ✨

This version includes several new features, including multiple improvements to the functionality and performance of the Data Ops, along with a few bug fixes and improvements to the docs.

Changelog:
skrub-data.org/stable/CHANG...

Highlights below ⤵️

4 weeks ago 8 4 1 1

For context, $375M is about two days worth of profit for Meta in 2025

4 weeks ago 3 1 0 0

OpenAI to acquire Astral Accelerates Codex growth to power the next generation of Python developer tools

This is such a specifically disheartening piece of news to see

openai.com/index/openai...

1 month ago 1 0 0 0

Dead Internet theory - Wikipedia

Context:
en.wikipedia.org/wiki/Dead_In...

1 month ago 0 0 0 0

a screenshot of an email that reads "hey sorry - my agent got a mind of its own and started applying for jobs for me"

What a world we are already living in

www.adriankrebs.ch/blog/dead-in...

1 month ago 0 0 1 0

And this is the result recorded with asciinema

Selecting a file opens it in VS Code at the given line, very convenient

2 months ago 0 0 0 0

Tags are built with this

2 months ago 0 0 1 0

the screenshot of a shell script that uses fzf and ripgrep to find substrings, classes, and files in the skrub repository

Rabbit hole of the day: writing a command that fuzzy searches in the repository for any substring, shows me a preview of the line with context and opens the file at the given line in VS Code.

Requires fzf, universal-ctags and batcat

2 months ago 0 0 1 0

And yes I understand that there may be some (a lot of?) human influence on the blog post. However, whether it was a human or an AI doesn't matter: the end result is the same, and the conditions for the same thing to happen with no human in the loop are likely already here.

2 months ago 0 0 0 0

An AI Agent Published a Hit Piece on Me Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into acceptin…

I've always thought OpenClaw was a bad idea (giving an AI agent free reign over my PC? *insanity*)

I did not realize it was "write a hit piece and publish it in a blog in retaliation for closing a PR" bad.

theshamblog.com/an-ai-agent-...

2 months ago 0 0 1 0

good pr

2 months ago 3 0 0 0

a green baseball cap with "man I love Fauna" written on it

I'm sorry, I couldn't resist

2 months ago 2 0 0 0

The more I hear about Clawbot the more I'm convinced it's some kind of social experiment trying to figure out how many people are willing to put their entire private and professional lives in the hands of an overeager child open to the unlimited influence of the world wide web

2 months ago 0 0 0 0

This was an interesting bug to track down.

3 months ago 0 0 0 0

A short script demonstrating how the `guess_datetime_format` function of pandas does not work as intended when trying to parse the datetime "1959-01-01 19:59:16": it returns none instead of returning the correct datetime format.

Funny bug of the day: if you try to use pandas' "guess_datetime_format" with datetimes where the hour and minute are the same as the year (like 1959 and 19:59), the parser will fail and return None.

This bug is present in pandas 2.3.3, but has been fixed in the dev version.

3 months ago 0 0 1 0

Broetry: Why is everyone suddenly writing in single line sentences on LinkedIn? In this article, we’ll explore the phenomenon of broetry. Where did it come from? Why is it so popular? Most importantly—how can a form of writing so objectively bad be effective? Do we owe it any cre...

I've seen it being described as "Broetry". It's explored quite well in this article I read some time ago: fenwick.media/rewild/magaz...

3 months ago 10 0 0 1

Something that immediately ended up being a roadblock was the "This cell redefines variables..." error.

I realized that when I'm plotting dataframes I always end up chaining df operations across different cells and this is putting a wrench in that.

You might argue it's for the best, but still 😅

3 months ago 0 0 0 0

Random question shot into the ether: if I'm relying on VSCode's interactive windows to emulate notebooks, what are some reasons why I should switch to @marimo.io notebooks?

I haven't looked into marimo's features, so maybe I'm missing out on things I can't do from VSCode.

3 months ago 0 0 1 0

That's me! It was a fun presentation and we got a lot of interesting questions

Also people laughed at the memes which is the most important thing, obviously

4 months ago 2 0 0 0

"ok the test run is done, let's see"

...

"this will be hard to debug"

6 months ago 0 0 0 0

What a banger is skrub @skrub-data.bsky.social !

Big thumbs up for the sklearn team & the maintainer of this package

6 months ago 14 4 1 0

Thanks a lot for the compliments! I had a lot of fun giving the talk, and I'm happy to see people liked it

6 months ago 2 0 1 0

My first actual talk in front of a ton of people 🙃

6 months ago 0 0 0 0

How do lava lamps help with Internet encryption? The Cloudflare lava lamps are used for Internet encryption. Learn about entropy in cryptography and why randomness is essential for SSL encryption.

TIL about lava lamp encryption
www.cloudflare.com/learning/ssl...

7 months ago 1 0 0 0

Do you have to deal with numerical features that involve large outliers, and need to train linear models or neural networks?

Then you might want to try the skrub SquashingScaler. The SquashingScaler behaves like scikit-learn RobustScaler, but smoothly clips outliers to predefined boundaries.

7 months ago 1 1 1 0

The Powerpuff Girls | Theme Song | Cartoon Network YouTube video by The Powerpuff Girls

context: www.youtube.com/watch?v=f7Mi...

7 months ago 0 0 0 0

Working hard on the next @skrub-data.bsky.social slide deck...

7 months ago 3 0 1 0

Imbalanced classification: pitfalls and solutions — Probabilistic calibration of cost-sensitive learning

Today at #EuroScipy2025, @glemaitre58.bsky.social and I presented a tutorial on pitfalls of machine learning for imbalanced classification problems.

We discussed what (not) to do when fitting a classifier and obtaining degenerate precision or recall values.

probabl-ai.github.io/calibration-...

8 months ago 22 10 1 0

📢 Talk Announcement

"Skrub: machine learning for dataframes", by Guillaume Lemaitre, Jérôme Dockès and @riccardocappuzzo.com.
@skrub-data.bsky.social

📜 Talk info: pretalx.com/pydata-paris-2025/talk/T9KTPU
📅 Schedule: pydata.org/paris2025/schedule
🎟 Tickets: pydata.org/paris2025/tickets

8 months ago 4 1 0 1

Photo of Riccardo presenting skrub DataOps in a lecture room to an audience of ~50 people.

Attending the @skrub-data.bsky.social tutorial by @riccardocappuzzo.com and @glemaitre58.bsky.social at #EuroScipy2025. They introduce the new DataOps feature released in skrub 0.6.

Here is the repo with the material for the tutorial: github.com/skrub-data/E...

8 months ago 5 2 0 1

Posts by Riccardo Cappuzzo