Hi Jan-Willem, thanks, could you add me?
Posts by Ilya Schurov
Similarly, I can dump results of the indermediate step to a file, with a helper function like this:
def dump(filename):
def wrapper(df):
df.to_parquet(filename)
return df
return wrapper
And so on! If you didn't use .pipe before, give it a try, it's nice!
Now I wrote a small helper function
def assrt(condition):
def wrapper(df):
assert condition(df)
return df
return wrapper
and use this function with a .pipe method:
df.assign(…).query(…).pipe(assrt(lambda _: not _['column'].isna().any())).groupby(…)…
Assume I want to make sure that at some intermediate step I do not have NaNs in a column, and give an error otherwise. Previously, I would break the method chain, assign that intermediate result to a variable, add an assert on that variable, and then continue the chain. Not nice.
I like pandas method chaining and my code usually looks like this:
df.assign(…).query(…).groupby(…)[['some', 'var']].sum().sort_values(…).iloc[:10].mean()
The problem with this approach is that it is not easy to get access to the results of intermediate steps. Recently I stumbled upon a solution!
Hi there! I am a mathematician, ML researcher and educator, currently working at Radboud University, Nijmegen, The Netherlands. Applying ML and some mathematical stuff to condenced matter physics (Neural Quantum States and friends). I also teach Scientific Computing at Constructor University, Bremen