Sarah Krasnik Bedell (@sarahkb) Bsky

my takeaway today: I have a masculine tone 😂

1 year ago 1 0 0 0

after a few failed attempts at tone editing, this was unfortunately effective

1 year ago 4 0 2 0

Yeah that's what I'm thinking - pile definitely bad. Single feels so convenient, but wondering if that should be done or not. To be clear, I do it all the time too for convenience

1 year ago 1 0 0 0

Real talk: when is try/except to suppress an exception lazy (compared to checking the upstream data) vs the most efficient implementation?

I've done this plenty of times to only call out an API once instead of twice, but now looking back it feels like bad practice

#dataBS

1 year ago 4 0 2 0

Yeah I used (or tried to use) Airflow but it kept eating so much RAM I could not do anything else on my laptop
Switched to prefect and never looked back
I wish it was more widespread in the industry

1 year ago 4 2 0 0

we live in the land where bon jovi and pitbull can do a collab for a new rendition of now or never. the american empire not only will last 1000 years, but it deserves to.

1 year ago 7 1 1 0

Alright #dataBS: for anyone using Databricks -

What do you mostly use it for? What made you choose the tool? Where do you find it solving your problems most?

1 year ago 6 3 4 1

If you're trying to grow as an IC: data engineering (requirement for every other data function)

If you're trying to run a data team: analytics (learning to work with stakeholders)

Or, use data as a gateway to learning and evolve your career once again

1 year ago 2 0 0 0

I guess what I mean is, OL is a framework, but relies on other tools to be useful. I'm thinking about where we will go for the one stop shop of answering - "this thing failed, why tho"

1 year ago 0 0 1 0

I'm curious why OL?

I recently watched the airflow summit 2023 video on it - isn't it just an Airflow plugin for dags that relies on manual hooks and lacks deep integration with data or infra assets? I'd also expect some UI around lineage.

If I'm making naive assumptions correct me

1 year ago 1 0 1 0

Yup 100%. But then tie that audit log to actual assets.

1 year ago 0 0 1 0

I think we need to step back and define lineage. Before we defined it just in terms of data assets.

But what if you're running a python ETL process pre-warehouse and your infra dies? The output of that job would be out of date.

That's also lineage, and not in SQL. So we need to solve for that too.

1 year ago 1 0 1 0

Not quite sure if your exact use case, but checkout @prefect.io. retries, logging, and caching right out the box

1 year ago 1 0 0 0

Everyone Should Care About Data Storage From data warehouses, lakes, to realtime applications: they’re all part of the journey to making data useful.

A little bit of a different flavor, but I wrote this back in 2022 and feel like it still mostly applies today

sarahsnewsletter.substack.com/p/everyone-s...

1 year ago 3 0 1 0

Query languages for the SQL-esque ones, and data python packages for the others (to be exact)

1 year ago 0 0 0 0

I feel like 2022 was the year we tried to solve lineage with observability tools, got decently far but not far enough to fully understand failures, so we settled for alerting on failures instead.

Is 2025 going to be the year we solve for true lineage outside of the data warehouse?

#dataBS

1 year ago 4 0 2 0

So: make sure to run only your ML work on expensive GPUs, and run your lightweight ETL on small compute and utilize the warehouse credits you need to use before 2025 instead.

I'm hearing this is a problem when data eng / data platform become different teams.

Who's encountered this?

#dataBS

1 year ago 0 0 1 0

Going from dev to prod in literally anything should be easy.

This is still an unsolved problem.

Devops is a blocke, and data teams are still trying to figure out IaC.

On my mind today: something as sinple as dynamic work pools in Prefect could solve this.

#dataBS

1 year ago 13 1 1 0

Auto spin up / spin down is not flashy these days, it's table stakes. Infra is so expensive - anything that can help save infra cost pays dividends.

1 year ago 2 0 0 0

How to measure PLS (product-led sales) motions We cover the differences between measuring product-led sales and product-led growth motions, and what metrics to define for each

Enjoyed this piece by @sarahkb.bsky.social on measuring PLS vs PLG, where the later mostly doesn't work for enterprise sales. We had to figure out how to measure PLS early because Coginiti targeted customers in highly secure industries like gov, finance, & insurance.

1 year ago 4 2 1 0

Reddit is quickly growing it's user base and content - there's definitely more mess than before, but I've found posting genuine, detailed comments get engagement.

PS ignore the trolls, only way forward

1 year ago 1 0 0 0

Sure that's one use case

But what if an event happens but it's throttled to only run a thing every 5 min? Then it's not realtime

I think realtime is about the SLA of the output the event is triggering

So there's a venn diagram with an overlapping middle

1 year ago 1 0 0 0

Totally fair. I do think oftentimes realtime and event based get confused as one, which they're not

1 year ago 0 0 1 0

Event based != realtime

Event based jobs are often batch, with event triggers used to optimize running only when needed (most common).

True realtime reqs are concentrated more in certain industries - finance, logistics, user-facing analytics (and I'm sure others I missed).

#dataBS change my mind

1 year ago 12 0 3 1

Claude can adjust to tone so much better (even with equal context). I'm a convert

1 year ago 0 0 0 0

Modularity - so you can refactor one piece at a time

1 year ago 4 0 0 0

But the point here is OSS is used in a POC not prod deployment is what you're saying?

1 year ago 0 0 1 0

A convert, we love to see it

1 year ago 1 0 0 0

100000% agree. OSS is the best way to POC

But using it in prod as the end all be all is a different story

1 year ago 1 0 1 0

I'm disappointed no one yet has said the on prem server room

1 year ago 2 0 1 0

Posts by Sarah Krasnik Bedell