Advertisement · 728 × 90

Posts by Crystal Lewis

Information manipulation is going to get cheaper much faster than interacting with the world will. I expect that this will lead to the bottleneck in research shifting from production to consumption. We should be preparing for this shift now by building infrastructure that allows us to adjudicate claims and filter work in a world where production is cheap. The current journal system is not well positioned to do this, and so we should be experimenting with new institutions that can.

Information manipulation is going to get cheaper much faster than interacting with the world will. I expect that this will lead to the bottleneck in research shifting from production to consumption. We should be preparing for this shift now by building infrastructure that allows us to adjudicate claims and filter work in a world where production is cheap. The current journal system is not well positioned to do this, and so we should be experimenting with new institutions that can.

I'll be at a workshop organized by @rohanalexander.bsky.social next week on how AI will change quantitative social science. This is my short paper, and this is the key argument. ryancbriggs.net/blog/as-ai-l...

4 hours ago 23 7 4 1
Post image

Omg the #KuchiKopi meme for @datarescueproject.org was picked up by @cghlewis.bsky.social for her AMAZING weekly newsletter. Crystal is the Data Meme Icon, so this is an honor.

Check out this week's newsletter and subscribe. rdmweekly.substack.com/p/rdm-weekly...

8 hours ago 11 4 1 0

Such a great meme!

8 hours ago 1 0 0 0
Preview
RDM Weekly - Issue 041 A weekly roundup of Research Data Management resources.

Issue 41 of #rdmweekly is out! 📬

➡️ Scrapers, Pipelines, and AI, Oh My! Tools to Accelerate Data Rescue - Workshop @datarescueproject.org
➡️ How to…Write a Lab Handbook @mehr.nz
➡️ SQL Murder Mystery
➡️ It Depends - A Comical Data Steward Replacement
and more!

rdmweekly.substack.com/p/rdm-weekly...

9 hours ago 13 7 0 0

I love the idea that LLMs may wind up forcing people to structure data in more considered, more usable ways because they're more motivated to do that work so they can *feed it to a machine* rather than caring enough about the humans who were the consumers of the same datasets in the past.

1 day ago 19 6 4 1

Cockroaches
Logistic regression
SQL

When the world is plunged into nuclear winter, these will be the survivors.

1 day ago 27 4 1 0
Preview
🤔 Week Recap re: "Building AI-Ready Open Data” 👉 Last week I had the pleasure of moderating a conversation between Luke Keller (Unanet and former U.S. Census Bureau) and Jose M. Plehn, Ph.D… | St... 🤔 Week Recap re: "Building AI-Ready Open Data” 👉 Last week I had the pleasure of moderating a conversation between Luke Keller (Unanet and former U.S. Census Bureau) and Jose M. Plehn, Ph.D. (BrightQu...

Somehow I knew I messed up the link: www.linkedin.com/posts/stefaa...

1 day ago 0 0 0 0

Key takeaways from @sverhulst.bsky.social from last week's webinar on "Building AI-Ready Open Data" .

www.linkedin.com/posts/stefaa...

1 day ago 11 1 1 0

You had me at data cleaning and data dictionaries. I'll be there. 🌟

1 day ago 11 0 0 0
Event graphic: "Introducing Posit AI for Positron and RStudio" — Demo + Live Q&A on April 29th at 11AM ET. Features logos for Posit AI, Positron, and RStudio on a dark navy background.

Event graphic: "Introducing Posit AI for Positron and RStudio" — Demo + Live Q&A on April 29th at 11AM ET. Features logos for Posit AI, Positron, and RStudio on a dark navy background.

Posit AI is here — live inside Positron and RStudio. Come see it in action in a hands-on workflow demo. Real R & Python. No fluff.
🗓 April 29 · 11am EDT · Register → events.zoom.us/ev/Ajss5j9Ve...
#rstats #python

4 days ago 12 3 1 0
Advertisement

😅 For real, you can find some thoughts here: datamgmtinedresearch.com/style#style-...

5 days ago 0 0 0 0
Image of Alan Moore with long hair and a bushy beard. With the words "Write/Wizard/Mall Santa/Rasputin Impersonator"

Image of Alan Moore with long hair and a bushy beard. With the words "Write/Wizard/Mall Santa/Rasputin Impersonator"

Data Manager

Seeker of Truth/Organizer of Chaos/Finder of Errors/Occasional Hair-Puller/Frequent Screen-Curser/Resolver of Data Disasters/Destroyer of "Data Final_Final For Real.csv" Naming Conventions

5 days ago 30 9 2 0

Data quality is another one that is mentioned a lot. I sure hope we were doing data quality checks before AI. 😳

5 days ago 4 0 0 0

I want to see this rant post 😂

5 days ago 2 0 0 0

Maybe along the lines of what Laurence said, AI is bringing these things to light finally, so that people are starting to see their importance now?

5 days ago 2 0 1 0

I've read a lot of stuff from Stefaan Verhulst which is good food for thought.

But for CSV files being shared in public repositories by individual researchers, which is what I am interested in, I am finding a lot of the usual best practices.

6 days ago 1 0 0 0

The shared images come from the RC_SFA_AI_Ready_Approach_and_Guidance.pdf document provided here:

data.ess-dive.lbl.gov/view/doi:10....

6 days ago 1 0 0 0
Post image Post image

I've been reading a lot of papers on making data AI-ready and what I'm finding is that most of them report practices that were best practices before AI.

Good documentation, standardization, interoperable formats, clear naming conventions, etc.

Is there something new we should be doing though?

6 days ago 44 6 10 2
R Environments, Scoping, and Evaluation With Some Applications Reader environment loaded

Be less confused, but still ignorant (like me!)

onlinelibrary.wiley.com/doi/epdf/10....

1 week ago 6 3 2 0
Advertisement

More tips for working with labelled data: github.com/Cghlewis/dat...

6 days ago 1 0 0 1
Post image

For those who still work with SPSS files, but want to import them into #rstats

If you want to retain user defined missing values in your data (-98 = refused), don't forget to add the "user_na = TRUE" argument if you are using haven::read_sav(). Otherwise, user-defined missing values import as NA.

6 days ago 21 3 1 0

Data Management Plan Mad Lib 😄
cghlewis.github.io/data_mgmt_ma...

6 days ago 1 1 0 0
Preview
RDM Weekly - Issue 040 A weekly roundup of Research Data Management resources.

Issue 40 of #rdmweekly is out! 📬

➡️ Reproducible R Code @daxkellie.bsky.social @sortee.bsky.social
➡️ Generating Universes Within Universes with a Single Seed @andrew.heiss.phd
➡️ AEA Replication Tracker @paulgp.com
➡️ Informed Consent Template
and more!

rdmweekly.substack.com/p/rdm-weekly...

1 week ago 15 3 0 1
Post image

I’m running the NYC Marathon 🍎! This is my first charity run, and I’m proud to support the Marfan Foundation in honor of James (in the sweet Nirvana shirt).

Marfan affects the body’s "glue" (connective tissue). Help me hit $4k for life-saving research: give.marfan.org/ryanj

1 week ago 5 1 2 2
Post image
1 week ago 118 21 2 2
Data Quality Engineer (Data Platform QA / CDK / CI-CD)

Data quality engineer focused on pipeline reliability and automated validation, with strong experience integrating data testing into CI/CD workflows and IaC environments.

Nice-to-Have Certifications

    AWS Certified Data Analytics -- Specialty
    AWS Certified Developer -- Associate

Must-Have Signals

    Testing data pipelines (ETL/ELT), not UI applications
    Strong SQL for data validation (reconciliation, joins, anomalies, freshness)
    Automated tests in Python (pytest or similar)
    Test integration into CI/CD pipelines
    Familiarity with dbt tests, schema validation, or data quality frameworks
    Exposure to CDK / IaC-driven environments where infrastructure and pipelines are deployed via code

Data Quality Engineer (Data Platform QA / CDK / CI-CD) Data quality engineer focused on pipeline reliability and automated validation, with strong experience integrating data testing into CI/CD workflows and IaC environments. Nice-to-Have Certifications AWS Certified Data Analytics -- Specialty AWS Certified Developer -- Associate Must-Have Signals Testing data pipelines (ETL/ELT), not UI applications Strong SQL for data validation (reconciliation, joins, anomalies, freshness) Automated tests in Python (pytest or similar) Test integration into CI/CD pipelines Familiarity with dbt tests, schema validation, or data quality frameworks Exposure to CDK / IaC-driven environments where infrastructure and pipelines are deployed via code

Second: Data Quality Engineer/MDM

1 week ago 1 2 0 0

One of my favorites. So sad though.

1 week ago 0 0 0 0
Clean and Simple Argument Checking Checks function arguments, ideally for use in R packages. Uses a simple interface and produces clean, informative error messages using cli.

arg! I wish R packages had better error messages!

Now they can, thanks to my newest #Rstats package, {arg}! 😉

{arg} produces clean, simple, error messages for checking function arguments, similar to {checkmate}, {dreamerr}, and {chk}, using {cli} formatting.

1 week ago 66 13 5 0
Advertisement

Today! Free and online.

1 week ago 7 2 0 0

Thanks, Randy!

1 week ago 0 0 0 0