Ilya Schurov (@ilyaschurov) Bsky

Hi Jan-Willem, thanks, could you add me?

1 year ago 1 0 0 0

Similarly, I can dump results of the indermediate step to a file, with a helper function like this:

def dump(filename):
def wrapper(df):
df.to_parquet(filename)
return df
return wrapper

And so on! If you didn't use .pipe before, give it a try, it's nice!

1 year ago 0 0 0 0

Now I wrote a small helper function

def assrt(condition):
def wrapper(df):
assert condition(df)
return df
return wrapper

and use this function with a .pipe method:

df.assign(…).query(…).pipe(assrt(lambda _: not _['column'].isna().any())).groupby(…)…

1 year ago 0 0 1 0

Assume I want to make sure that at some intermediate step I do not have NaNs in a column, and give an error otherwise. Previously, I would break the method chain, assign that intermediate result to a variable, add an assert on that variable, and then continue the chain. Not nice.

1 year ago 0 0 1 0

I like pandas method chaining and my code usually looks like this:

df.assign(…).query(…).groupby(…)[['some', 'var']].sum().sort_values(…).iloc[:10].mean()

The problem with this approach is that it is not easy to get access to the results of intermediate steps. Recently I stumbled upon a solution!

1 year ago 1 0 1 0

Hi there! I am a mathematician, ML researcher and educator, currently working at Radboud University, Nijmegen, The Netherlands. Applying ML and some mathematical stuff to condenced matter physics (Neural Quantum States and friends). I also teach Scientific Computing at Constructor University, Bremen

1 year ago 4 0 0 0

Posts by Ilya Schurov