Pete Bachant (@petebachant.me) Bsky

Connect a Jupyter Notebook to a kernel in a SLURM environment in VS Code - Calkit

This may be extremely niche, but if you need to run a Jupyter kernel in a SLURM job, e.g., to reserve a GPU, and connect it to a notebook in VS Code, here's a solution: docs.calkit.org/tutorials/vs...

#opensource #jupyter #hpc #cuda

1 week ago 4 0 0 0

The best time to automate your project is 10 months ago when you started. The second best time is now.

2 weeks ago 1 0 0 0

Version control - Calkit

One of the limitations of DVC is that it's relatively slow/inefficient working with large folders consisting of many small files. Here's a solution: docs.calkit.org/version-cont...

#dataversioncontrol #datascience #python

2 weeks ago 2 0 0 0

Add AI tools (adding virtual cognitive capacity), and things get way out of hand?

4 weeks ago 0 0 1 0

If so, giving a relatively simple problem to a highly capable team will result in over-engineering?

4 weeks ago 0 0 1 0

Is there a parallel to Conway's Law for cognitive load, i.e., that software complexity will increase to hit the cognitive load limit of the team that owns it regardless of the inherent complexity of the problem? #softwareengineering

4 weeks ago 1 0 1 0

Formalizing our commitment to code sharing In support of open science, PLOS Biology routinely asks authors to openly share their research code before publication. We are now formalizing this practice with a mandatory code sharing policy and…

In support of #OpenScience, we routinely ask authors to openly share their #research #code before publication.

We are now formalizing this practice with a mandatory #CodeSharing policy and clarifying what we mean by code sharing.

1 month ago 49 25 1 2

Quick demo of something I've been working on: Execute some Python, R, Julia, MATLAB, and LaTeX scripts/notebooks/docs and they automatically become a DVC pipeline (DAG).

#opensource #reproducibility #datascience

1 month ago 2 0 0 0

GitHub - petebachant/calkit-xr-demo: A demo project showing how to automatically automate scientific workflows. A demo project showing how to automatically automate scientific workflows. - petebachant/calkit-xr-demo

A demo of the new calkit xr ("execute-and-record") feature that will automatically turn a sequence of Python, R, Julia, or MATLAB scripts, notebooks, or LaTeX docs into a pipeline by simply running them. No need to learn Make, Snakemake, DVC, et al. Environment management included!

1 month ago 0 0 0 0

The problem is making shipping a reproducible project pay off for the author. I also think it's pretty hard unless you're already SWE-inclined. Even still, many SWEs and data scientists don't ship reproducible projects either.

1 month ago 2 0 1 0

Why #openscience? Whether or not the input data and processes are described accurately in prose/mathematics doesn't matter if the code and data are included--those become the actual source of truth. A paper is almost never sufficient to describe what actually happened in the research.

1 month ago 1 0 0 0

Release v0.33.5 · calkit/calkit Example use case If you have a project that, e.g., uses a Conda environment defined in environment.yml and you can't switch to uv or pixi, you can achieve a similar experience (no need to create/ch...

If you're stuck with Conda (can't switch to uv or Pixi for whatever reason), here's a way to get a similar experience (no need to create/activate/update environments): github.com/calkit/calki...

2 months ago 0 0 0 0

I see you're still an open science hater eh :) FWIW, re-collecting PIV images would be necessary for replication, not reproduction.

2 months ago 1 0 1 0

What do you do if you're reviewing a paper and the journal policy clearly states all code and data must be supplied, but the authors only share the source code--no input data, no configs, no run scripts, no plotting scripts?

#openscience #opendata #reproducibility

2 months ago 0 0 2 0

What goal would you orient it around?

2 months ago 0 0 0 0

Unlocking the benefits of transparent and reusable science for climate risk management | PNAS People around the world seek climate risk information to guide their decisions. For instance, projections about future flood risk inform where hous...

This article focuses more on the positive potential of doing open science, but I find it very frustrating how common it is to see openness done in a half-baked, almost performative way.

doi.org/10.1073/pnas...

#openscience #opensource

2 months ago 3 0 0 0

Being able to reproduce a computational environment is not the same thing as being able to reproduce the science, but it is an important step

2 months ago 1 0 0 0

Automate and orchestrate groups of Jupyter Notebooks with the Calkit JupyterLab extension YouTube video by Calkit

Stop numbering your Jupyter Notebooks and running them one-by-one to deliver "final" results. Turn them into a reproducible pipeline instead: youtu.be/8q-nFxqfP-k

#openscience #reproducibility #datascience #jupyter

2 months ago 2 0 0 0

JupyterLab - Calkit

New JupyterLab extension just dropped! Manage environments and assemble your notebooks into a pipeline all in the UI.

Give it a try with:

uv tool install --upgrade calkit-python

or

pip install --upgrade calkit-python

docs.calkit.org/jupyterlab/

#openscience #reproducibility #jupyter

2 months ago 1 0 0 0

And they could collaborate without fragmenting their files all over the place, emailing them to each other, etc. This is almost possible today but it takes multiple general purpose tools/platforms.

2 months ago 1 0 0 0

What if we had an open-source, decentralized "GitHub for science?"

Instead of all the code needed to build the binary of an app or library, researchers would share all of the code, data, etc., necessary to build their papers.

#openscience #opensource

2 months ago 5 0 1 0

This principle can be applied to research as well. In short, start with one repo for grad school, for which the product is your thesis. If you create useful products along the way, you can split those off if necessary, but doing do preemptively will most likely slow you down.

2 months ago 0 0 0 0

Should you put your front end and back end in separate repos? Probably not. You should have at lease one product per repo, and if the front end can't provide value on its own, it's not a product.

2 months ago 0 0 1 0

Before moving code away from other code, e.g., into a different module, package, repo, etc., make sure it's loosely coupled. If you can't loosely couple it, keep it close together.

#swe #softwareengineering #softwaredesign

2 months ago 0 0 0 0

A fake computational fluid dynamics visualization created with AI.

Check out this cool CFD simulation I did with my "in house code."

Just kidding. I faked this in 5 seconds using AI. Are journals requiring source code and reproducibility checks for all submissions yet?

#openscience #cfd #aislop

3 months ago 3 0 0 0

True or false: If your paper can't be reproduced by everyone on your team it's an indication that it could have been done much more quickly.

#openscience #reproducibility

3 months ago 0 0 0 0

Open Science Trainings - NASA Science Click on a tab below to learn about available open science training courses.

NASA's free course "Open Science 101" recently reopened for registration: science.nasa.gov/open-science...

#openscience

3 months ago 1 0 0 0

A comparison of research project management approaches showing the manual, closed approach versus the open single-button reproducible approach.

When I was in grad school and started learning software engineering, I got midway between these. I had multiple instead of one pipeline to run to build each paper but there was no environment management or "staleness detection" for intermediate outputs. Working on making those easy/obvious.

3 months ago 0 0 0 0

I'm still angry they renamed NREL. Stupid political nonsense.

3 months ago 0 0 0 0

Is ‘open science’ delivering benefits? Major study finds proof is sparse It’s hard to measure social and economic impacts of making papers and data free, researchers say

Has "open science" even been attempted yet? About 10% of articles share code, and of those, maybe 10% share code and data in a way that actually runs. Virtually none are single-button reproducible. We have a long way to go.

#openscience #reproducibility

3 months ago 2 0 0 0

Posts by Pete Bachant