Pandas Module : Easy way to program excel using Python Pandas

  • Post author:
  • Reading time:3 mins read

Microsoft Excel is a very useful tool and is widely used to create tracker’s and data analysis in our daily jobs.

Though Excel has many features but if there is a need to go beyond the built in functionality of excel, we need to use a programming language.

Python is the best choice to be used as a programming language for Excel files specially due to its cross platform interfaces and ease of use.

This article aims to serve as a quick reference guide for most commonly used functions in python pandas.

Pandas Functions

Type

Comments

pd.read_csv('filename.csv')

Import

read csv file.

pd.read_excel('filename.xlsx')

Import

read excel file.

pd.read_table('filename')

Import

read a delimited text file like tsv.

pd.read_html(url)

Import

Parses an HTML URL and extracts tables to a list of dataframe.

df.to_csv('filename.csv')

Export

write to a csv file.

df.to_excel('filename.xlsx')

Export

write to a excel file.

df.to_html('filename')

Export

save as a HTML table.

df.head(n)

View

view first n rows of the dataframe.

df.tail(n)

View

view last n rows of the dataframe.

df.shape

View

view total rows and columns of a dataframe.

df.info()

View

Index, datatype and memory information.

df.describe()

View

Summary statisitics for Numerical columns.

df.index

View

describe index.

df.columns

View

describe dataframe columns.

df.count()

Statistics

Number of non-NAN values.

df.sum()

Statistics

Sum of values.

df.cumsum()

Statistics

Cumulative sum of values.

df.min() / df.max()

Statistics

minimum / maximum values.

df.idxmin() / df.idxmax()

Statistics

minimum / maximum index values.

df.mean()

Statistics

mean of values.

df.median()

Statistics

median of values.

df[col]

Selection

Returns Column with label col as series.

df[[col1,col2]]

Selection

Returns Columns as new dataframe.

df.loc[]

Selection

Label based Column

df.iloc[0, :]

Selection

select first row.

df.iloc[0,0]

Selection

select first element of first column.

df.coumns = ['a','b','c']

Data Cleaning

rename columns.

pd.isnull()

Data Cleaning

checks for null values, returns boolean array.

pd.notnull()

Data Cleaning

opposite of pd.isnull()

df.dropna()

Data Cleaning

drops all rows that contain null values.

df.dropna(axis=1)

Data Cleaning

drops all columns that contain null values.

df.fillna(x)

Data Cleaning

replaces null values with x.

df[df[col] > 0.5

Filter

filter rows where column col is greater than 0.5.

df[(df[col] > 0.5 & (df[dfcol] < 0.7)]

Filter

filter rows where 0.5 > col < 0.7.

df.sort_values(col1)

Sort

sorts values by col1 in ascending order.

df.sort_values(col2,ascending=False)

Sort

sorts values by col2 in descending order.

df.groupby(col)

Groupby

returns a groupby object for values from one column.

df.groupby([col1,col2])

Groupby

returns groupby object values from multiple columns.

df1.append(df2)

Join

adds the rows in df1 to the end of df2( columns should be identical)

pd.concat([df1,df2],axis=1)

Join

adds the columns in df1 to the end of df2(rows should be identical)

pd.merge(left,right,how='inner',on=None,left_on=None,Right_on=None,

left_index=False,right_index=False,sort=True)

Merge

Join operations between dataframe objects.

Farees Ahmed

The aim of this blog is to serve as a quick reference guide for the Curious. Appreciate your feedback and comments !