Advertisement · 728 × 90

Posts by Olivier Grisel

The scope of Compute! Paris is a bit less centered around Python, but we still expect many Python related presentations given the popularity of the language.

Note that we will neither organize a JupyterCon nor PyData conferences in Paris in 2026, so join us at Compute! Paris.

5 hours ago 1 0 0 0
Preview
Call for Proposals — Compute! Paris 2026 Submit your talk proposal for Compute! Paris 2026. The Call for Proposals is open from April 15th to May 24th, 2026.

The team of JupyterCon 2023, PyData Paris 2024 & 2025 organizes a new conference named Compute! Paris 2026 on open source computation and data. The event will take place on November 25–26, 2026 at Sorbonne Université in Paris.

CfP deadline: May 24, 2026: compute.events/paris2026/cf...

5 hours ago 4 5 1 0
Preview
Google Colab

And here is the link to the colab notebook: colab.research.google.com/drive/1-FiOQ...

2 weeks ago 2 0 0 0

Here is the recording of the webinar I gave last week on GPU support in @scikit-learn.org and comparison of a scikit-learn pipeline vs the TabICLv2 foundational model on a non-linear heteroscedastic quantile regression task.

app.livestorm.co/probabl/webi...

2 weeks ago 3 2 1 0

We will contrast pros and cons of both approaches.

Spoiler alert:

Manual pipelines are more scalable (faster to train and predict) on larger datasets but require more work (e.g. hparam tuning) while TabICL works better on smaller datasets and yields good predictive performance of the box.

3 weeks ago 1 0 0 0
LinkedIn Login, Sign in | LinkedIn

Tomorrow I will give an online demo of the use of the Python array API to develop a non-linear regression pipeline with GPU acceleration and uncertainty quantification.

We will also introduce TabICLv2 and demo it on the same predictive tasks.

Register here:

www.linkedin.com/events/webin...

3 weeks ago 4 1 1 1
Chan Zuckerberg Initiative considers scikit-learn an Essential Open Source Software Author: Guillaume Lemaitre , Lucy Liu

Thanks to Dea María Léon for the PR and to the Chan Zuckerberg Initiative for the support.

blog.scikit-learn.org/funding/czi-...

1 month ago 2 0 0 0
Post image

The next scikit-learn release will allow inspecting the type and values of attributes of fitted estimators in Jupyter notebooks & example code rendered as HTML in sphinx-gallery powered project websites.

scikit-learn.org/dev/auto_exa...

1 month ago 13 6 2 2

Super hyped that it's finally out!

2 months ago 16 1 2 0

It's perfect now. Thanksn

3 months ago 2 0 0 0
Advertisement

Thanks for sharing the blog post. However it's a bit hard to read the text on a mobile device and one has to zoom and pan around to read it. It would be nice to adopt a reflowing layout that adapts to small screen sizes instead.

3 months ago 3 0 1 0
Preview
Release Highlights for scikit-learn 1.8 We are pleased to announce the release of scikit-learn 1.8! Many bug fixes and improvements were added, as well as some key new features. Below we detail the highlights of this release. For an exha...

A new version of scikit-learn has been released 🥳 check out the highlights: scikit-learn.org/stable/auto_...

Thanks everyone who contributed to this release!

Let me know what you think of the experimental GPU support

4 months ago 9 6 0 0
Preview
JupyterLab 4.5 and Notebook 7.5 are available! JupyterLab 4.5 has been released! This new minor release of JupyterLab includes 51 new features and enhancements, 81 bug fixes, 44…

JupyterLab 4.5 and Jupyter Notebook 7.5 are here! 🎉

Highlights 🎁

- Enhanced notebook scrolling behavior
- Native audio and video support
- New Terminal search
- Debugger, Notebook and File Browser improvements

Check out the blog post to learn more!

blog.jupyter.org/jupyterlab-4...

4 months ago 23 9 1 1

Thanks for sharing. I would be very curious to see if LeJEPA can successfully pretrain good encoders for other input modalities with different kinds of spatial structures and signal smoothness assumptions (audio, time series, signal from robotic sensors, natural language...).

5 months ago 0 0 0 0
Video

LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...)

- 60+ arch., up to 2B params
- 10+ datasets
- in-domain training (>DINOv3)
- corr(train loss, test perf)=95%

5 months ago 8 1 2 0

The Python Software Foundation was recommended for a $1.5M grant from the National Science Foundation. The terms of the award said PSF could not work on DEI, whether or not the grant funding was used for it.

PSF therefore declined the funding.

Science suffers, but commitment to core values remains

5 months ago 431 141 3 8
Preview
Release 0.6.2 · skrub-data/skrub New features The DataOp.skb.full_report() now displays the time each node took to evaluate. #1596 by Jérôme Dockès. The User guide has been reworked and expanded. Changes and deprecations Ken em...

⚡ Release 0.6.2 is out ⚡

github.com/skrub-data/s...

6 months ago 8 4 1 0

I will speak about probabilistic regressions, @skrub-data.bsky.social and skore contributors will also present their libraries. Come join us!

6 months ago 11 3 0 0
Python Free-Threading Guide The free-threading guide is a centralized collection of documentation and trackers around compatibility with free-threaded CPython for the Python open source ecosystem

More info about free-threading here: py-free-threading.github.io

7 months ago 1 0 0 0

We set up some dedicated automated tests and discovered a bunch of thread-safety bugs, but they are now tracked by dedicated issues, and we have plans to fix them all, hopefully in time for 1.8.

7 months ago 0 0 1 0
Advertisement
Preview
MNT Mark cython extensions as free-threaded compatible by lesteve · Pull Request #31342 · scikit-learn/scikit-learn Part of #30007 Cython 3.1 has been released on May 8 2025. Following scipy PR scipy/scipy#22658 to use -Xfreethreading_compatible=True cython argument if cython >= 3.1 This cleans up the lock-fi...

scikit-learn 1.8 will be the first scikit-learn release with native extensions that are officially marked as free-threading compatible.

github.com/scikit-learn...

7 months ago 10 3 1 1
Post image

We’re happy to announce our Social Event, taking place on Tuesday 30th September at 6pm at the Cité des sciences. A perfect opportunity to unwind and connect with fellow attendees after a day of interesting talks!

pydata.org/paris2025/so...
pydata.org/paris2025/ti...

7 months ago 4 4 0 1

Looking forward to attending PyData Paris 2025! I will give a talk about probabilistic predictions for regression problems (I need to start working on my slides ;)

7 months ago 7 1 0 0

👋 JupyterLab and Jupyter Notebook users:

What's one thing you'd love to see improved in JupyterLab, Jupyter Notebook, or JupyterLite?

The team is prepping the upcoming 4.5/7.5 releases and wants to tackle some usability issues.

Drop your feedback below, this will help prioritize what gets fixed!👇

8 months ago 17 10 4 0
Preview
19.08.2025 Predictive modeling for imbalanced classification using scikit-learn YouTube video by EuroSciPy

The video recording is already live!

www.youtube.com/live/jvyWTa1...

8 months ago 2 0 0 0

However, the Elkan 2001 post-hoc prevalence correction can be used for any (well-specified) probabilistic classifier, including gradient boosting classifiers, assuming the training set is a uniform sample of the population conditionally on the class.

8 months ago 0 0 1 0

Interestingly, for logistic regression, this is equivalent to shifting the intercept by the difference of the logits of the prevalence of the positive class in the population and in the training set distributions, respectively.

8 months ago 0 0 1 0

Equivalently, we can append a monotonic post-hoc transformation to a naively trained classifier to get a prevalence-corrected classifier as a result as show in Theorem 2 of cseweb.ucsd.edu/~elkan/resca...

8 months ago 0 0 1 0

In this case, we can use weight-based training to correct the model's probabilistic predictions to stay well calibrated with respect to the target deployment setting.

8 months ago 0 0 1 0
Advertisement

This problem typically happens when the class of interest (positive class) is so rare (medical screening, predictive maintenance, fraud detection...) that collecting training features for the negative cases in the correct proportion would be too costly (or even illegal/unethical).

8 months ago 0 0 1 0