Came across this article on medium that covered 20 tips when performing EDA on a pandas dataframe
https://towardsdatascience.com/20-pandas-functions-that-will-boost-your-data-analysis-process-f5dfdb2f9e05
Some of the more useful ones in there I thought were:
df.infer_objects().dtypes
—> by far most useful one I found. function will infer better datatypes from columns whose datatype is object. Ive run into errors before because one series will contain multiple datatypesdf.sample(n)
—> returns a random subset of the dataframe. can either pass an int in for n
to get a certain number of rows back, or pass in frac=0.5
to return a fraction of the dataframedf[some_column].pct_change
—> shows the percentage change of the values in a series. useful for time series datadf.explode(column)
—> if a column contains values in a list, it will expand the dataframe so each value in the list is now its own row.https://github.com/prompt-toolkit/ptpython