New edudata blog post: embracing a Unix-like philosophy toward data
www.edudata.blog/em/
#edusky #databs
Posts by Eric Ekholm
Counselor: These feelings of "everything sucks," how long have you had them? Butt-head: Uhh, since everything started to suck, I guess.
Here it is, your moment of zen.
No matter how good I think my code is at the time of writing, I’m almost always repulsed by it 6 months later
The more I code in languages other than #rstats, the less I like R’s preference for implicit returns. Sure, you can save yourself a line of code, but the explicit return just feels better
New edudata blog post: being “data driven” vs being “data informed”
#edusky #databs
www.edudata.blog/being-data-i...
Most data science work is just asking SMEs to help me understand their data/help me make sure I’m filtering the data correctly.
I just pushed v0.1.0 of my {blueycolors} #rstats package to GitHub.
Check it out if you’re interested in using Bluey-themed colors in ggplot:
ekholme.github.io/blueycolors/
After a few week hiatus, edudata is back with another post. This one reviews Virginia’s fall-to-spring VALLSS growth report and offers some critiques about measuring student growth using ordinal categories:
www.edudata.blog/virginias-va...
#dataBS #edusky
I just learned about “doubleML models.” Is this just data science/ML people trying to reinvent causal inference with the worst name possible?
#rstats #datascience
This week’s Edudata blog post is on the narrative fallacy, school improvement, and root cause analysis
www.edudata.blog/the-narrativ...
#education
I’ve been doing stats/data stuff “seriously” for about 10 years, and I still think bootstrapping is voodoo.
It’s the “ONE WEIRD TRICK” of statistics.
Casio
Tweet: Stats twitter: “All statistical methods fail horribly without telling you and you shouldn’t trust any research ever. fml.” Data science twitter: “Here are NINE SIMPLE WAYS to convert your data into ACTIONABLE INSIGHTS using POWERFUL AI. Number 3 will astound you!! I’m on a boat”
Among my favorites, from @cameronpat.bsky.social
New edudata blog post: alternatives to stacked bar charts, and some thoughts on tradeoffs when visualizing data
#dataBS #rstats #edusky
www.edudata.blog/stacked-bar-...
Yeah true. I put a literal bet on pope pizzaballa, so that’s just my copium
Here’s my copium: if he had been elected pope, he’d adopt a papal name that wasn’t pizzaballa. At least now, we still have cardinal pizzaballa
New edudata blog post: In which I self-consciously offer a framework for data-driven decision making
#datascience #edusky
www.edudata.blog/some-hopeful...
Maybe plotnine is more suspect. As a primarily #rstats user, seaborn feels better than matplotlib but still mainstream enough in the Python community. But I also only use it for eda stuff, and when I need to make something polished, I use ggplot
I don’t think so. Polars seems to be gaining “share” in code, and sklearn supports polars df’s for all (most?) operations. And if the concern is learning from others’ code, LLMs give an alternative - my experience is they’re great at producing polars code.
TFW the Google Gemini VSCode extension just writes your #rstats
function documentation for you
New edudata blog post: what can educators learn from luxury brands like Hermes?
#edusky #dataBS
www.edudata.blog/luxury-goods...
I keep coming across new lists of education research (and adjacent) job opportunities so if you are looking for work, check out these links. 👇
docs.google.com/spreadsheets...
docs.google.com/spreadsheets...
docs.google.com/spreadsheets...
www.purposephilcareer.com
Most importantly, you’re using the correct (ISO 8601) date format 🙃
New edudata post: do we really need “big data” to tell us obvious things?
www.edudata.blog/i-feel-like-...
Yeah, good point. Having to dig in a level to pull out the dataframe could be tedious for end users
Thanks!
So I can ensure that the objects passed to my functions are appropriate. But it also feels like I’m adding an extra layer that’s maybe not necessary? Thoughts on this vs just passing in a dataframe and checking against, say, column names?
I’m writing an #rstats package to work with data extracts (csvs) from a specific source. I mostly want a set of functions that will do data-frame-like operations on these extracts. My inclination is to create a new class that’s a thin wrapper around the data (formatted as a dataframe)…
For my educator friends in the RVA area, VCU is hosting (and I’m helping run) a free event on AI in education on 4/16.