Save yourself hours of Data Analysis with this single line of Python code

eda made easy with pandas-profiling
I am sure you have heard of "work smart, not hard." But did you know how to apply it in real life when you are doing Exploratory Data Analysis (EDA) on a dataset?

EDA using vanilla pandas can take hours depending on various factors including size of the dataset, it's complexity, number of features etc. You need to write lines of code to extract meaning out of the dataset. But what if you could do all those with just a single line of code?

What! How?
By using pandas-profiling.

pandas-profiling generates profile reports from a pandas DataFrame, by simply using a single line of code (which I will show you in just a minute). The pandas df.describe() function can be very useful, but it is a little primitive when it comes to some serious EDA. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

Here are the details that are presented in the generated report (depending on how relevant they are for that respective data type):

  • Type inference: detect the types of columns in a dataframe.
  • Essentials: type, unique values, missing values
  • Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent values
  • Histogram
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
  • Missing values matrix, count, heatmap and dendrogram of missing values
  • Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data
  • File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information

    Show me the code already!
    Yes, coming back to it, let's first start with the basic step; installation. Installing pandas-profiling is pretty simple:

    Or, you can simply install from the source. Download the source code by cloning the repository or by pressing 'Download ZIP' on this page. Install by navigating to the proper directory and execute the following code:

    Once installed, implementing pandas-profiling is simple:

    You can checkout their GitHub page here. They also have a detailed documentation page. Isn't this something! Have fun with it, but don't forget to show us some Facebook love by giving us a Like and a Share.
    « PREV
    NEXT »