This may be extremely niche, but if you need to run a Jupyter kernel in a SLURM job, e.g., to reserve a GPU, and connect it to a notebook in VS Code, here's a solution: docs.calkit.org/tutorials/vs...
#opensource #jupyter #hpc #cuda
Posts by Pete Bachant
The best time to automate your project is 10 months ago when you started. The second best time is now.
One of the limitations of DVC is that it's relatively slow/inefficient working with large folders consisting of many small files. Here's a solution: docs.calkit.org/version-cont...
#dataversioncontrol #datascience #python
Add AI tools (adding virtual cognitive capacity), and things get way out of hand?
If so, giving a relatively simple problem to a highly capable team will result in over-engineering?
Is there a parallel to Conway's Law for cognitive load, i.e., that software complexity will increase to hit the cognitive load limit of the team that owns it regardless of the inherent complexity of the problem? #softwareengineering
In support of #OpenScience, we routinely ask authors to openly share their #research #code before publication.
We are now formalizing this practice with a mandatory #CodeSharing policy and clarifying what we mean by code sharing.
Quick demo of something I've been working on: Execute some Python, R, Julia, MATLAB, and LaTeX scripts/notebooks/docs and they automatically become a DVC pipeline (DAG).
#opensource #reproducibility #datascience
A demo of the new calkit xr ("execute-and-record") feature that will automatically turn a sequence of Python, R, Julia, or MATLAB scripts, notebooks, or LaTeX docs into a pipeline by simply running them. No need to learn Make, Snakemake, DVC, et al. Environment management included!
The problem is making shipping a reproducible project pay off for the author. I also think it's pretty hard unless you're already SWE-inclined. Even still, many SWEs and data scientists don't ship reproducible projects either.
Why #openscience? Whether or not the input data and processes are described accurately in prose/mathematics doesn't matter if the code and data are included--those become the actual source of truth. A paper is almost never sufficient to describe what actually happened in the research.
If you're stuck with Conda (can't switch to uv or Pixi for whatever reason), here's a way to get a similar experience (no need to create/activate/update environments): github.com/calkit/calki...
I see you're still an open science hater eh :) FWIW, re-collecting PIV images would be necessary for replication, not reproduction.
What do you do if you're reviewing a paper and the journal policy clearly states all code and data must be supplied, but the authors only share the source code--no input data, no configs, no run scripts, no plotting scripts?
#openscience #opendata #reproducibility
What goal would you orient it around?
This article focuses more on the positive potential of doing open science, but I find it very frustrating how common it is to see openness done in a half-baked, almost performative way.
doi.org/10.1073/pnas...
#openscience #opensource
Being able to reproduce a computational environment is not the same thing as being able to reproduce the science, but it is an important step
Stop numbering your Jupyter Notebooks and running them one-by-one to deliver "final" results. Turn them into a reproducible pipeline instead: youtu.be/8q-nFxqfP-k
#openscience #reproducibility #datascience #jupyter
New JupyterLab extension just dropped! Manage environments and assemble your notebooks into a pipeline all in the UI.
Give it a try with:
uv tool install --upgrade calkit-python
or
pip install --upgrade calkit-python
docs.calkit.org/jupyterlab/
#openscience #reproducibility #jupyter
And they could collaborate without fragmenting their files all over the place, emailing them to each other, etc. This is almost possible today but it takes multiple general purpose tools/platforms.
What if we had an open-source, decentralized "GitHub for science?"
Instead of all the code needed to build the binary of an app or library, researchers would share all of the code, data, etc., necessary to build their papers.
#openscience #opensource
This principle can be applied to research as well. In short, start with one repo for grad school, for which the product is your thesis. If you create useful products along the way, you can split those off if necessary, but doing do preemptively will most likely slow you down.
Should you put your front end and back end in separate repos? Probably not. You should have at lease one product per repo, and if the front end can't provide value on its own, it's not a product.
Before moving code away from other code, e.g., into a different module, package, repo, etc., make sure it's loosely coupled. If you can't loosely couple it, keep it close together.
#swe #softwareengineering #softwaredesign
A fake computational fluid dynamics visualization created with AI.
Check out this cool CFD simulation I did with my "in house code."
Just kidding. I faked this in 5 seconds using AI. Are journals requiring source code and reproducibility checks for all submissions yet?
#openscience #cfd #aislop
True or false: If your paper can't be reproduced by everyone on your team it's an indication that it could have been done much more quickly.
#openscience #reproducibility
NASA's free course "Open Science 101" recently reopened for registration: science.nasa.gov/open-science...
#openscience
A comparison of research project management approaches showing the manual, closed approach versus the open single-button reproducible approach.
When I was in grad school and started learning software engineering, I got midway between these. I had multiple instead of one pipeline to run to build each paper but there was no environment management or "staleness detection" for intermediate outputs. Working on making those easy/obvious.
I'm still angry they renamed NREL. Stupid political nonsense.
Has "open science" even been attempted yet? About 10% of articles share code, and of those, maybe 10% share code and data in a way that actually runs. Virtually none are single-button reproducible. We have a long way to go.
#openscience #reproducibility