Microsoft Excel is a very useful tool and is widely used to create tracker’s and data analysis in our daily jobs.
Though Excel has many features but if there is a need to go beyond the built in functionality of excel, we need to use a programming language.
Python is the best choice to be used as a programming language for Excel files specially due to its cross platform interfaces and ease of use.
This article aims to serve as a quick reference guide for most commonly used functions in python pandas.
Pandas Functions | Type | Comments |
---|---|---|
pd.read_csv('filename.csv') | Import | read csv file. |
pd.read_excel('filename.xlsx') | Import | read excel file. |
pd.read_table('filename') | Import | read a delimited text file like tsv. |
pd.read_html(url) | Import | Parses an HTML URL and extracts tables to a list of dataframe. |
df.to_csv('filename.csv') | Export | write to a csv file. |
df.to_excel('filename.xlsx') | Export | write to a excel file. |
df.to_html('filename') | Export | save as a HTML table. |
df.head(n) | View | view first n rows of the dataframe. |
df.tail(n) | View | view last n rows of the dataframe. |
df.shape | View | view total rows and columns of a dataframe. |
df.info() | View | Index, datatype and memory information. |
df.describe() | View | Summary statisitics for Numerical columns. |
df.index | View | describe index. |
df.columns | View | describe dataframe columns. |
df.count() | Statistics | Number of non-NAN values. |
df.sum() | Statistics | Sum of values. |
df.cumsum() | Statistics | Cumulative sum of values. |
df.min() / df.max() | Statistics | minimum / maximum values. |
df.idxmin() / df.idxmax() | Statistics | minimum / maximum index values. |
df.mean() | Statistics | mean of values. |
df.median() | Statistics | median of values. |
df[col] | Selection | Returns Column with label col as series. |
df[[col1,col2]] | Selection | Returns Columns as new dataframe. |
df.loc[] | Selection | Label based Column |
df.iloc[0, :] | Selection | select first row. |
df.iloc[0,0] | Selection | select first element of first column. |
df.coumns = ['a','b','c'] | Data Cleaning | rename columns. |
pd.isnull() | Data Cleaning | checks for null values, returns boolean array. |
pd.notnull() | Data Cleaning | opposite of pd.isnull() |
df.dropna() | Data Cleaning | drops all rows that contain null values. |
df.dropna(axis=1) | Data Cleaning | drops all columns that contain null values. |
df.fillna(x) | Data Cleaning | replaces null values with x. |
df[df[col] > 0.5 | Filter | filter rows where column col is greater than 0.5. |
df[(df[col] > 0.5 & (df[dfcol] < 0.7)] | Filter | filter rows where 0.5 > col < 0.7. |
df.sort_values(col1) | Sort | sorts values by col1 in ascending order. |
df.sort_values(col2,ascending=False) | Sort | sorts values by col2 in descending order. |
df.groupby(col) | Groupby | returns a groupby object for values from one column. |
df.groupby([col1,col2]) | Groupby | returns groupby object values from multiple columns. |
df1.append(df2) | Join | adds the rows in df1 to the end of df2( columns should be identical) |
pd.concat([df1,df2],axis=1) | Join | adds the columns in df1 to the end of df2(rows should be identical) |
pd.merge(left,right,how='inner',on=None,left_on=None,Right_on=None, left_index=False,right_index=False,sort=True) | Merge | Join operations between dataframe objects. |