Posts by
If anyone knows any excellent and experienced data engineers who would be excited by a Richmond-based job in the cultural heritage sector, have them get in touch.
My department at the Barcelona Supercomputing Centre is organising this event on AI/ML for Computational Humanities and Social Sciences in September! Do submit an abstract if this is relevant to your work.
Deadline: 15th May
www.bsc.es/news/events/...
#DigitalArchaeology #DigitalHumanities #AI
As a user, I love it when #RStats functions feel like they know what I meant, even when I'm a bit sloppy. If it's looking for TRUE and I send in "true" from a variable somewhere upstream, often that's close enough... But not always. That's why I made {stbl} stbl.wrangle.zone. v0.3.0 is on CRAN today
Also: Lovely example of a web version of a paper- visual abstract, tabs for experiments, ...
New (mostly vibe coded) package to generate slide images using Gemini in a (mostly) reproducible way: hadley.github.io/bananarama/ I made this to scratch a personal itch; please let me know if you also find it useful #rstats
1/5‼️Gigantic new #rayverse update + blog post! Introducing {skymodelr}: render 3D scenes in #RStats with realistically-lit skies 🌇 at a given location, using just a latitude, longitude, and a time.
Blog:
www.tylermw.com/posts/rayver...
Github:
github.com/tylermorganw...
Site:
www.skymodelr.com
brief shout out to FOSS and Kaggle in this Thomas Dinsmore memoir
#rstats #databs
open.substack.com/pub/thomaswd...
Blog post image
survivalist: Probabilistic model-agnostic survival analysis using scikit-learn, glmnet, xgboost, lightgbm, pytorch, keras, nnetsauce and mlsauce
thierrymoudiki.github.io/blog/2024/12/15/python/a...
#Techtonique #DataScience #Python #rstats #MachineLearning
People really don't understand technology or how technology evolves.
Data Organization in Spreadsheets Karl W. Broman & Kara H. Woo Pages 2-10 | Received 01 Jun 2017, Accepted author version posted online: 29 Sep 2017, Published online: 24 Apr 2018 1. Introduction 2. Be Consistent 3. Choose Good Names for Things 4. Write Dates as YYYY-MM-DD 5. No Empty Cells 6. Put Just One Thing in a Cell 7. Make it a Rectangle 8. Create a Data Dictionary 9. No Calculations in the Raw Data Files 10. Do Not Use Font Color or Highlighting as Data 11. Make Backups 12. Use Data Validation to Avoid Errors 13. Save the Data in Plain Text Files ABSTRACT Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, do not leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, do not include calculations in the raw data files, do not use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.
Every day is a good day for sharing one of the most useful papers about research data ever written. PLEASE get your people to understand and follow this advice.
www.tandfonline.com/doi/full/10....
Amazing! Congrats! My preorder is IN! Looking forward to reading it :)
PDF of book proofs: SEEING LIKE A SUPPLY CHAIN The Hidden Life of Logistics MIRIAM POSNER Yale University Press New Haven and London
*breathing into a paper bag*
’ve been using Claude Code to take care of scrappy data cleaning tasks for a while. These days though, I’m using Codex as my coding agent. Similar to what I did with Claude, I’ve been “fine-tuning” Codex CLI to work on a few different vaguely defined tasks like classification, voting, filtering, or ranking. The pattern in this post works surprisingly well when you have the following conditions: Loosely defined open-ended tasks. e.g., tagging tweets with a set of predefined labels, extracting structured information from a GitHub issue, … Powerful agentic capabilities. Doing the task requires something more than a simple llm call or PydanticAI script. e.g., using gh api CLI to get the number of stars of a repository. Structured outputs. You need a response in a certain shape! This is something codex exec can do that claude couldn’t and is really powerful. e.g., return exactly True or False and nothing else. Save money! Unlike llm or other tools/libraries that require an OPENAI_API_KEY, Codex can use your ChatGPT subscription, making things “free”.
I've been using this pattern to "specialize" Codex for vaguely defined tasks like classification, filtering, soft sorting, ...
davidgasquez.com/specializing...
Made more than 10,000 invocations so far (reusing my ChatGPT subsciption) and am really happy with the pattern!
I tracked every keyword in 22 years of Cosyne abstracts to map how computational neuroscience evolved — from Bayesian brains to neural manifolds to LLMs — and where it's heading next.
so hard to *really* comprehend
In one week I'll be talking about tips for reproducible R code and why science would love you to try these tips on your own code too 🧪😍🌏
It's an online talk, so feel free to watch comfortably from your couch. Hope to see you there!
@sortee.bsky.social #rstats
events.humanitix.com/sortee-webin...
I knew it. This confirms what I knew all my life. I may have Aphantasia (I do ...) but I see colors exceptionally well.
www.keithcirkel.co.uk/whats-my-jnd...
I've never built anything for a decade professionally, but here we are! blog.marcua.net/2026/03/12/b...
But is it far enough away to start running in the opposite direction? (or at least try to get behind some heavy-duty stuff?)
98 million videos for a grapefruit video. (watched it twice).
Takeaway - the secret to life is caring more about something than anybody else.
In this months' blog post, we’ll explore how to add vector layers and legend in a map with QGIS. step by step here: www.miriam-lerma.com/blog/2026-03...
A new blog post! In which I discover that even Claude Code has its limits, certainly when it comes to replacing data engineers
👉🏻 rmoff.net/2026/03/11/c...
(There's also a companion post if you like poking around Claude session logs to see what it's up to: rmoff.net/2026/03/11/c...)
Figure shows the proportion of successful putts by distance (where we have integrated out the missing distances) and geometrical model 2 (as presented in https://users.aalto.fi/~ave/casestudies/disc_putting/disc_putting.html) based putting probability by distances based on data for top 33 PDGA MPO players.
I've made a geometrical model for disc golf putting with uncertainty in 2D angle and distance control.
Based on the model, the putting angle accuracies of top PDGA MPO and FPO players are about 1° and 1.4°, respectively. See more at users.aalto.fi/~ave/casestu...