Python | Import Excel File using Pandas

code python pandas import excel

Learn how to import an Excel file (having .xlsx extension) using python pandas.

Pandas is the most popular data manipulation package in Python, and DataFrames are the Pandas data type for storing tabular 2D data. Reading data from excel files or CSV files, and writing data to Excel files or CSV files using Python Pandas is a necessary skill for any analyst or data scientist.

Table of Contents

  1. Python Pandas read_excel() Syntax
  2. Import Excel file using Python Pandas (Example)
  3. read_excel Important Parameters Examples
    1. Import Specific Excel Sheet using sheet name
    2. Import only n Rows of Excel Sheet
    3. Import specific columns of Excel Sheet
  4. Common Errors and Troubleshooting

1. Pandas read_excel() Syntax

The syntax of DataFrame to_excel() function and some of the important parameters are:

pandas.read_excel(io='filepath', sheet_name=0, header=0, usecols=None, nrows=None)
Sr.NoParameters Description
1io
the file path from where you want to read the data. This could be a URL path or, could be a local system file path. Valid URL schemes include http, ftp, s3, and file.
2sheet_name: str, int, list, or None, default 0
Available cases:
~Default is 0: 1st sheet as a DataFrame
~Use 1: To read 2nd sheet as a DataFrame
~Use Specific Sheet Name: "Sheet1" to load sheet with name “Sheet1”
~Load Multiple Sheets using dict:[0, 2, "MySheet"] will load first, third and sheet named “MySheet” as a dictionary of DataFrame
~None: Load All sheets
3header
default is 0. Pass Header = 1 to consider the second line of the dataset as a header. Use None if there is no header.
4usecols
~Default is None, then parse all columns.
~If str, then provide a comma-separated list of Excel columns (“A, B, D, E”) or range of Excel columns (e.g. “A:F” or “A, B,E:F”). Ranges are inclusive of both sides.
~If list of int, indicates list of column numbers to be parsed e.g. [1,2,5].
~If list of string, provide list of column names to be parsed e.g. [“A, B, D, E”].
5nrows:
Default is None
Number of rows to parse (provide int).
Python Pandas read_excel() Syntax

For complete list of read_excel parameters refer to official documentation.

2. Import Excel file using Python Pandas

Let’s review a full example:

  • Create a DataFrame from scratch and save it as Excel
  • Import (or load) the DataFrame from above saved Excel file
import pandas as pd

# Create a dataframe
raw_data = {'first_name': ['Sam','Ziva','Kia','Robin'], 
        'degree': ['PhD','MBA','','MS'],
        'age': [25, 29, 19, 21]}
df = pd.DataFrame(raw_data)

df

#Save the dataframe to the current directory
df.to_excel(r'Example1.xlsx')

We have the following data about students:

first_namedegreeage
0SamPhD25
1ZivaMBA29
2Kia19
3RobinMS21
Excel file snapshot in to import in Pandas Dataframe Jupyter

Read Excel file into Pandas DataFrame (Explained)

Now, let’s see the steps to import the Excel file into a DataFrame.

Step 1: Enter the path and filename where the Excel file is stored. The could be a local system file path or URL path.

For example,

 pd.read_excel(r‘D:\Python\Tutorial\Example1.csv‘)

Notice that path is highlighted with 3 different colors:

  • The blue part represents the path where the Excel file is saved.
  • The green part is the name of the file you want to import.
  • The purple part represents the file type or Excel file extension. Use ‘.xlsx’ in case of an Excel file.

Modify the Python above code to reflect the path where the Excel file is stored on your computer.

Note: You can save or read an Excel file without explicitly providing a file path (blue part) by placing the file in the current working directory. To find current directory path use below code:

# Current working directory
import os
print(os.getcwd())

# Display all files present in the current working directory
print(os.listdir(os.getcwd()))
D:\Python\Tutorial\
Example1.xlsx

Find out how to read multiple files in a folder(directory) here.

Step 2: Enter the following code and make the necessary changes to your path to read the Excel file.

import pandas as pd

# Read the excel file
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx')

df

Snapshot of Data Representation in Excel files

On the left side of the image Excel file is opened in Microsoft Excel. On the right side same Excel file is opened in Juptyter Notebook using pandas read_excel.

Import Excel file in Python Pandas Dataframe code

3. Pandas read_excel Important Parameters Examples

3.1 Import Specific Excel Sheet using Python Pandas

Import excel file multiple sheets to pandas
Example1.xlsx Sheet “Personal Info”
Import excel file specific sheet to pandas python
Example1.xlsx Sheet “Salary Info”

There may be Multiple Sheets in an Excel file. Pandas provide various methods to import one or multiple excel sheets in sheet_name parameter.

  • Default is 0: Read the 1st sheet in Excel as a DataFrame
  • Use 1: To read 2nd sheet as a DataFrame
  • Use Specific Sheet Name: "Sheet1" to load sheet with name “Sheet1”
  • Load Multiple Sheets using dict:[0, 2, "MySheet"] will load first, third and sheet named “MySheet” as a dictionary of DataFrame
  • None: Load All sheets

1. Import Excel Sheet using Integer

By default sheet_name = 0 imports the 1st sheet in Excel as a DataFrame. To import Second Excel Sheet i.e. “Salary Info” in our case as a Pandas DataFrame use sheet_name = 1

import pandas as pd

# Read "Salary Info" Sheet from Excel file (2nd Sheet)
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',sheet_name=1)

df
first_name salary
0Sam120000
1Ziva80000
2Kia110000
3Robin150000

2. Import Specific Excel Sheet using Sheet Name

To import Specific Excel Sheet i.e. “Personal Info” as a Pandas DataFrame using sheet_name = "Personal Info"

import pandas as pd

# Read excel file sheet "Personal Info" using sheetname
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',sheet_name="Personal Info")

df
first_namedegreeage
0SamPhD25
1ZivaMBA29
2KiaNaN19
3RobinMS21

3. Import Multiple Excel Sheet into Pandas DataFrame

Multiple Excel Sheets can be read into Pandas DataFrame by passing list in the sheet_name parameter e.g. [0, “Salary Info”] will load the first sheet and sheet named “Salary Info” as a dictionary of DataFrame.

import pandas as pd

# Read multiple excel file sheets as dictionary of DataFrame
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',sheet_name=[0, "Salary Info"])

df
Import excel multiple sheets pandas python dataframe

Now to store different sheets into different DataFrames use Dictionary Key Value.

import pandas as pd

# Read multiple excel file sheets as dictionary of DataFrame
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',sheet_name=[0, "Salary Info"])

# As seen in the output above Keys are 0 and "Salary_Info"
Personal_Info = df[0]
Salary_Info = df["Salary Info"]

print(Personal_Info)
print(Salary_Info)

3.2 Import only n Rows of Excel Sheet using Pandas

Sometimes Excel file is quite big or our system has memory constraints. In this case, we can import only the top n rows of Excel Sheet using Pandas read_excel nrows parameter. For example, to import only top 2 rows use nrows=2

import pandas as pd

# Load top 2 rows of Excel sheets as Pandas DataFrame
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',nrows=2)

df
first_namedegreeage
0SamPhD25
1ZivaMBA29

3.3 Import specific columns of Excel Sheet

There may be hundreds of columns in excel sheet, but while importing we need only few columns. In this case, we can pass usecols parameter. Different ways to use usecols parameter are below:

  • Default is None, parse all columns.
  • If str, then provide a comma-separated list of Excel columns (“A, B, D, E”) or range of Excel columns (e.g. “A:F” or “A, B,E:F”). Ranges are inclusive of both sides.
  • If list of int, indicates list of column numbers to be parsed e.g. [0,2,5].
  • If list of string, provide list of column names to be parsed e.g. [“A, B, D, E”].
import pandas as pd

# Import 1st and 3rd columns of Execl sheet as Pandas DataFrame
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',usecols=[0,2])

df
first_nameage
0Sam25
1Ziva29
2Kia19
3Robin21

4. Common Errors and Troubleshooting

Listing down the common error you can face while loading data from CSV files into Pandas dataframe will be:

  1. FileNotFoundError: File b'filename.csv' does not exist
    • Reason: File Not Found error typically occurs when there is an issue with the file path (or directory) or file name.
    • Fix: Check file path, file name, and file extension.
  2. SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
    • Reason: In \U starts an eight-character Unicode escape, such as \U00014321. In the code, the escape is followed by the character ‘s’, which is invalid.
    • Fix:
      • Use the prefix string with r (to produce a raw string) pd.read_excel(r'D:\Python\Tutorial\filename.xlsx') or,
      • You either need to duplicate all backslashes pd.read_excel(r'D:\\Python\\Tutorial\\filename.xlsx')
  3. ImportError: Install xlrd >= 1.0.0 for Excel support.
    • Reason:  xlrd package is not available in the python environment
    • Fix:  Install xlrd package if you get the above error pip install xlrd

Conclusion

We have covered the steps needed to read an Excel file in python using pandas read_excel function.

Go to read data from csv files, and write data to CSV files using Python.

Leave a Comment

Keytodatascience Logo

Connect

Subscribe

Join our email list to receive the latest updates.

© 2022 KeyToDataScience