Advertisement · 728 × 90

Posts by

Preview
Data Science Weekly - Issue 647 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 647, by @DataSciNews open.substack.com/pub/datascie...

4 days ago 0 0 0 0

If anyone knows any excellent and experienced data engineers who would be excited by a Richmond-based job in the cultural heritage sector, have them get in touch.

1 week ago 32 27 4 1
Workshop on AI/ML Applied Methods in Computational Social Sciences and Humanities Research Workshop on AI/ML Applied Methods in Computational Social Sciences and Humanities Research

My department at the Barcelona Supercomputing Centre is organising this event on AI/ML for Computational Humanities and Social Sciences in September! Do submit an abstract if this is relevant to your work.
Deadline: 15th May
www.bsc.es/news/events/...

#DigitalArchaeology #DigitalHumanities #AI

1 week ago 3 1 0 0
Preview
Data Science Weekly - Issue 646 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 646, by @DataSciNews open.substack.com/pub/datascie...

1 week ago 0 0 0 0
Stabilize Function Arguments A set of consistent, opinionated functions to quickly check function arguments, coerce them to the desired configuration, or deliver informative error messages when that is not possible.

As a user, I love it when #RStats functions feel like they know what I meant, even when I'm a bit sloppy. If it's looking for TRUE and I send in "true" from a variable somewhere upstream, often that's close enough... But not always. That's why I made {stbl} stbl.wrangle.zone. v0.3.0 is on CRAN today

2 weeks ago 24 2 2 0

Also: Lovely example of a web version of a paper- visual abstract, tabs for experiments, ...

1 week ago 5 1 0 0
Preview
Data Science Weekly - Issue 645 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 645, by @DataSciNews open.substack.com/pub/datascie...

2 weeks ago 0 0 0 0
Preview
Generate Presentation Images Using Google Gemini Reproducibly generate slide images using Google Gemini. Define your images in a YAML configuration file with support for reference images and style defaults, eliminating the need to copy and paste pro...

New (mostly vibe coded) package to generate slide images using Gemini in a (mostly) reproducible way: hadley.github.io/bananarama/ I made this to scratch a personal itch; please let me know if you also find it useful #rstats

3 weeks ago 53 7 4 0
Preview
ggpup: place two random dog images next to any ggplot object Using web scraping, raster objects and gridExtra to add images to your plots.

As does ggpup: luisdva.github.io/rstats/ggpup/

3 weeks ago 10 3 0 1
Preview
Data Science Weekly - Issue 644 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 644, by @DataSciNews open.substack.com/pub/datascie...

3 weeks ago 0 0 0 0
Advertisement
Preview
Data Science Weekly - Issue 643 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 643, by @DataSciNews open.substack.com/pub/datascie...

1 month ago 0 0 0 0
Video

1/5‼️Gigantic new #rayverse update + blog post! Introducing {skymodelr}: render 3D scenes in #RStats with realistically-lit skies 🌇 at a given location, using just a latitude, longitude, and a time.

Blog:
www.tylermw.com/posts/rayver...

Github:
github.com/tylermorganw...

Site:
www.skymodelr.com

1 month ago 49 12 3 2
Preview
Is AutoML Dead? Or is it just resting?

brief shout out to FOSS and Kaggle in this Thomas Dinsmore memoir
#rstats #databs
open.substack.com/pub/thomaswd...

1 month ago 1 1 0 0
Blog post image

Blog post image

survivalist: Probabilistic model-agnostic survival analysis using scikit-learn, glmnet, xgboost, lightgbm, pytorch, keras, nnetsauce and mlsauce

thierrymoudiki.github.io/blog/2024/12/15/python/a...

#Techtonique #DataScience #Python #rstats #MachineLearning

1 month ago 1 1 0 0

People really don't understand technology or how technology evolves.

1 month ago 1 0 0 0
Data Organization in Spreadsheets
Karl W. Broman
& Kara H. Woo
Pages 2-10 | Received 01 Jun 2017, Accepted author version posted online: 29 Sep 2017, Published online: 24 Apr 2018

    1. Introduction
    2. Be Consistent
    3. Choose Good Names for Things
    4. Write Dates as YYYY-MM-DD
    5. No Empty Cells
    6. Put Just One Thing in a Cell
    7. Make it a Rectangle
    8. Create a Data Dictionary
    9. No Calculations in the Raw Data Files
    10. Do Not Use Font Color or Highlighting as Data
    11. Make Backups
    12. Use Data Validation to Avoid Errors
    13. Save the Data in Plain Text Files

ABSTRACT

Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, do not leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, do not include calculations in the raw data files, do not use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.

Data Organization in Spreadsheets Karl W. Broman & Kara H. Woo Pages 2-10 | Received 01 Jun 2017, Accepted author version posted online: 29 Sep 2017, Published online: 24 Apr 2018 1. Introduction 2. Be Consistent 3. Choose Good Names for Things 4. Write Dates as YYYY-MM-DD 5. No Empty Cells 6. Put Just One Thing in a Cell 7. Make it a Rectangle 8. Create a Data Dictionary 9. No Calculations in the Raw Data Files 10. Do Not Use Font Color or Highlighting as Data 11. Make Backups 12. Use Data Validation to Avoid Errors 13. Save the Data in Plain Text Files ABSTRACT Spreadsheets are widely used software tools for data entry, storage, analysis, and visualization. Focusing on the data entry and storage aspects, this article offers practical recommendations for organizing spreadsheet data to reduce errors and ease later analyses. The basic principles are: be consistent, write dates like YYYY-MM-DD, do not leave any cells empty, put just one thing in a cell, organize the data as a single rectangle (with subjects as rows and variables as columns, and with a single header row), create a data dictionary, do not include calculations in the raw data files, do not use font color or highlighting as data, choose good names for things, make backups, use data validation to avoid data entry errors, and save the data in plain text files.

Every day is a good day for sharing one of the most useful papers about research data ever written. PLEASE get your people to understand and follow this advice.

www.tandfonline.com/doi/full/10....

1 month ago 1050 402 31 47

Amazing! Congrats! My preorder is IN! Looking forward to reading it :)

1 month ago 1 0 1 0
Advertisement
PDF of book proofs:
SEEING LIKE A SUPPLY CHAIN
The Hidden Life of Logistics
MIRIAM POSNER
Yale University Press
New Haven and London

PDF of book proofs: SEEING LIKE A SUPPLY CHAIN The Hidden Life of Logistics MIRIAM POSNER Yale University Press New Haven and London

*breathing into a paper bag*

1 month ago 1966 180 110 41
Preview
Data Science Weekly - Issue 642 Curated news, articles and jobs related to Data Science, AI, & Machine Learning

Data Science Weekly - Issue 642, by @DataSciNews open.substack.com/pub/datascie...

1 month ago 0 0 0 0
’ve been using Claude Code to take care of scrappy data cleaning tasks for a while. These days though, I’m using Codex as my coding agent. Similar to what I did with Claude, I’ve been “fine-tuning” Codex CLI to work on a few different vaguely defined tasks like classification, voting, filtering, or ranking.

The pattern in this post works surprisingly well when you have the following conditions:

Loosely defined open-ended tasks. e.g., tagging tweets with a set of predefined labels, extracting structured information from a GitHub issue, …
Powerful agentic capabilities. Doing the task requires something more than a simple llm call or PydanticAI script. e.g., using gh api CLI to get the number of stars of a repository.
Structured outputs. You need a response in a certain shape! This is something codex exec can do that claude couldn’t and is really powerful. e.g., return exactly True or False and nothing else.
Save money! Unlike llm or other tools/libraries that require an OPENAI_API_KEY, Codex can use your ChatGPT subscription, making things “free”.

’ve been using Claude Code to take care of scrappy data cleaning tasks for a while. These days though, I’m using Codex as my coding agent. Similar to what I did with Claude, I’ve been “fine-tuning” Codex CLI to work on a few different vaguely defined tasks like classification, voting, filtering, or ranking. The pattern in this post works surprisingly well when you have the following conditions: Loosely defined open-ended tasks. e.g., tagging tweets with a set of predefined labels, extracting structured information from a GitHub issue, … Powerful agentic capabilities. Doing the task requires something more than a simple llm call or PydanticAI script. e.g., using gh api CLI to get the number of stars of a repository. Structured outputs. You need a response in a certain shape! This is something codex exec can do that claude couldn’t and is really powerful. e.g., return exactly True or False and nothing else. Save money! Unlike llm or other tools/libraries that require an OPENAI_API_KEY, Codex can use your ChatGPT subscription, making things “free”.

I've been using this pattern to "specialize" Codex for vaguely defined tasks like classification, filtering, soft sorting, ...

davidgasquez.com/specializing...

Made more than 10,000 invocations so far (reusing my ChatGPT subsciption) and am really happy with the pattern!

5 months ago 10 1 0 2
Preview
22 years of Brain Science: what CoSyNe tells us about the evolution of Neuroscience Tracking the intellectual DNA of Computational and Systems Neuroscience through its flagship meeting

I tracked every keyword in 22 years of Cosyne abstracts to map how computational neuroscience evolved — from Bayesian brains to neural manifolds to LLMs — and where it's heading next.

1 month ago 159 70 7 18

so hard to *really* comprehend

1 month ago 0 0 0 0
Preview
SORTEE Webinar: On what makes good sharable and reproducible R code, how to do it, and why it’s good for science For this month's SORTEE webinar, Dr. Dax Kellie from the Atlas of Living Australia will present on good, sharable, and reproducible R code in science

In one week I'll be talking about tips for reproducible R code and why science would love you to try these tips on your own code too 🧪😍🌏

It's an online talk, so feel free to watch comfortably from your couch. Hope to see you there!
@sortee.bsky.social #rstats

events.humanitix.com/sortee-webin...

1 month ago 58 29 2 1
Post image

I knew it. This confirms what I knew all my life. I may have Aphantasia (I do ...) but I see colors exceptionally well.

www.keithcirkel.co.uk/whats-my-jnd...

1 month ago 20 1 9 2
Preview
B12 3.0 A decade of helping customers build their home online

I've never built anything for a decade professionally, but here we are! blog.marcua.net/2026/03/12/b...

1 month ago 2 2 0 0

But is it far enough away to start running in the opposite direction? (or at least try to get behind some heavy-duty stuff?)

1 month ago 0 0 1 0
Advertisement

98 million videos for a grapefruit video. (watched it twice).

Takeaway - the secret to life is caring more about something than anybody else.

1 month ago 2 0 0 0
Post image

In this months' blog post, we’ll explore how to add vector layers and legend in a map with QGIS. step by step here: www.miriam-lerma.com/blog/2026-03...

1 month ago 4 2 0 0
Preview
Claude Code isn’t going to replace data engineers (yet)

A new blog post! In which I discover that even Claude Code has its limits, certainly when it comes to replacing data engineers

👉🏻 rmoff.net/2026/03/11/c...

(There's also a companion post if you like poking around Claude session logs to see what it's up to: rmoff.net/2026/03/11/c...)

1 month ago 11 1 0 0
Figure shows the proportion of successful putts by distance (where we have integrated out the missing distances) and geometrical model 2 (as presented in https://users.aalto.fi/~ave/casestudies/disc_putting/disc_putting.html) based putting probability by distances based on data for top 33 PDGA MPO players.

Figure shows the proportion of successful putts by distance (where we have integrated out the missing distances) and geometrical model 2 (as presented in https://users.aalto.fi/~ave/casestudies/disc_putting/disc_putting.html) based putting probability by distances based on data for top 33 PDGA MPO players.

I've made a geometrical model for disc golf putting with uncertainty in 2D angle and distance control.
Based on the model, the putting angle accuracies of top PDGA MPO and FPO players are about 1° and 1.4°, respectively. See more at users.aalto.fi/~ave/casestu...

1 month ago 23 3 3 0