Learn how to import an Excel file (having .xlsx extension) using python pandas.
Pandas is the most popular data manipulation package in Python, and DataFrames are the Pandas data type for storing tabular 2D data. Reading data from excel files or CSV files, and writing data to Excel files or CSV files using Python Pandas is a necessary skill for any analyst or data scientist.
Table of Contents
Table of Contents
- Python Pandas read_excel() Syntax
- Import Excel file using Python Pandas (Example)
- read_excel Important Parameters Examples
- Common Errors and Troubleshooting
1. Pandas read_excel() Syntax
The syntax of DataFrame to_excel() function and some of the important parameters are:
pandas.read_excel(io='filepath', sheet_name=0, header=0, usecols=None, nrows=None)
Sr.No | Parameters Description |
---|---|
1 | io the file path from where you want to read the data. This could be a URL path or, could be a local system file path. Valid URL schemes include http, ftp, s3, and file. |
2 | sheet_name: str, int, list, or None, default 0 Available cases: ~Default is 0 : 1st sheet as a DataFrame~Use 1 : To read 2nd sheet as a DataFrame~Use Specific Sheet Name: "Sheet1" to load sheet with name “Sheet1”~Load Multiple Sheets using dict: [0, 2, "MySheet"] will load first, third and sheet named “MySheet” as a dictionary of DataFrame~None: Load All sheets |
3 | header default is 0. Pass Header = 1 to consider the second line of the dataset as a header. Use None if there is no header. |
4 | usecols ~Default is None , then parse all columns.~If str , then provide a comma-separated list of Excel columns (“A, B, D, E”) or range of Excel columns (e.g. “A:F” or “A, B,E:F”). Ranges are inclusive of both sides.~If list of int , indicates list of column numbers to be parsed e.g. [1,2,5].~If list of string , provide list of column names to be parsed e.g. [“A, B, D, E”]. |
5 | nrows: Default is None Number of rows to parse (provide int). |
For complete list of read_excel parameters refer to official documentation.
2. Import Excel file using Python Pandas
Let’s review a full example:
- Create a DataFrame from scratch and save it as Excel
- Import (or load) the DataFrame from above saved Excel file
import pandas as pd
# Create a dataframe
raw_data = {'first_name': ['Sam','Ziva','Kia','Robin'],
'degree': ['PhD','MBA','','MS'],
'age': [25, 29, 19, 21]}
df = pd.DataFrame(raw_data)
df
#Save the dataframe to the current directory
df.to_excel(r'Example1.xlsx')
We have the following data about students:
first_name | degree | age | |
---|---|---|---|
0 | Sam | PhD | 25 |
1 | Ziva | MBA | 29 |
2 | Kia | 19 | |
3 | Robin | MS | 21 |
Read Excel file into Pandas DataFrame (Explained)
Now, let’s see the steps to import the Excel file into a DataFrame.
Step 1: Enter the path and filename where the Excel file is stored. The could be a local system file path or URL path.
For example,
pd.read_excel(r‘D:\Python\Tutorial\Example1.csv‘)
Notice that path is highlighted with 3 different colors:
- The blue part represents the path where the Excel file is saved.
- The green part is the name of the file you want to import.
- The purple part represents the file type or Excel file extension. Use ‘.xlsx’ in case of an Excel file.
Modify the Python above code to reflect the path where the Excel file is stored on your computer.
Note: You can save or read an Excel file without explicitly providing a file path (blue part) by placing the file in the current working directory. To find current directory path use below code:
# Current working directory
import os
print(os.getcwd())
# Display all files present in the current working directory
print(os.listdir(os.getcwd()))
D:\Python\Tutorial\ Example1.xlsx
Find out how to read multiple files in a folder(directory) here.
Step 2: Enter the following code and make the necessary changes to your path to read the Excel file.
import pandas as pd
# Read the excel file
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx')
df
Snapshot of Data Representation in Excel files
On the left side of the image Excel file is opened in Microsoft Excel. On the right side same Excel file is opened in Juptyter Notebook using pandas read_excel.
3. Pandas read_excel Important Parameters Examples
3.1 Import Specific Excel Sheet using Python Pandas
There may be Multiple Sheets in an Excel file. Pandas provide various methods to import one or multiple excel sheets in sheet_name
parameter.
- Default is
0
: Read the 1st sheet in Excel as a DataFrame - Use
1
: To read 2nd sheet as a DataFrame - Use Specific Sheet Name:
"Sheet1"
to load sheet with name “Sheet1” - Load Multiple Sheets using dict:
[0, 2, "MySheet"]
will load first, third and sheet named “MySheet” as a dictionary of DataFrame - None: Load All sheets
1. Import Excel Sheet using Integer
By default sheet_name = 0
imports the 1st sheet in Excel as a DataFrame. To import Second Excel Sheet i.e. “Salary Info” in our case as a Pandas DataFrame use sheet_name = 1
import pandas as pd
# Read "Salary Info" Sheet from Excel file (2nd Sheet)
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',sheet_name=1)
df
first_name | salary | |
0 | Sam | 120000 |
1 | Ziva | 80000 |
2 | Kia | 110000 |
3 | Robin | 150000 |
2. Import Specific Excel Sheet using Sheet Name
To import Specific Excel Sheet i.e. “Personal Info” as a Pandas DataFrame using sheet_name = "Personal Info"
import pandas as pd
# Read excel file sheet "Personal Info" using sheetname
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',sheet_name="Personal Info")
df
first_name | degree | age | |
0 | Sam | PhD | 25 |
1 | Ziva | MBA | 29 |
2 | Kia | NaN | 19 |
3 | Robin | MS | 21 |
3. Import Multiple Excel Sheet into Pandas DataFrame
Multiple Excel Sheets can be read into Pandas DataFrame by passing list in the sheet_name
parameter e.g. [0, “Salary Info”] will load the first sheet and sheet named “Salary Info” as a dictionary of DataFrame.
import pandas as pd
# Read multiple excel file sheets as dictionary of DataFrame
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',sheet_name=[0, "Salary Info"])
df
Now to store different sheets into different DataFrames use Dictionary Key Value.
import pandas as pd
# Read multiple excel file sheets as dictionary of DataFrame
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',sheet_name=[0, "Salary Info"])
# As seen in the output above Keys are 0 and "Salary_Info"
Personal_Info = df[0]
Salary_Info = df["Salary Info"]
print(Personal_Info)
print(Salary_Info)
3.2 Import only n Rows of Excel Sheet using Pandas
Sometimes Excel file is quite big or our system has memory constraints. In this case, we can import only the top n rows of Excel Sheet using Pandas read_excel nrows
parameter. For example, to import only top 2 rows use nrows=2
import pandas as pd
# Load top 2 rows of Excel sheets as Pandas DataFrame
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',nrows=2)
df
first_name | degree | age | |
0 | Sam | PhD | 25 |
1 | Ziva | MBA | 29 |
3.3 Import specific columns of Excel Sheet
There may be hundreds of columns in excel sheet, but while importing we need only few columns. In this case, we can pass usecols
parameter. Different ways to use usecols
parameter are below:
- Default is
None
, parse all columns. - If
str
, then provide a comma-separated list of Excel columns (“A, B, D, E”) or range of Excel columns (e.g. “A:F” or “A, B,E:F”). Ranges are inclusive of both sides. - If
list of int
, indicates list of column numbers to be parsed e.g. [0,2,5]. - If
list of string
, provide list of column names to be parsed e.g. [“A, B, D, E”].
import pandas as pd
# Import 1st and 3rd columns of Execl sheet as Pandas DataFrame
df = pd.read_excel(r'D:\Python\Tutorial\Example1.xlsx',usecols=[0,2])
df
first_name | age | |
0 | Sam | 25 |
1 | Ziva | 29 |
2 | Kia | 19 |
3 | Robin | 21 |
4. Common Errors and Troubleshooting
Listing down the common error you can face while loading data from CSV files into Pandas dataframe will be:
FileNotFoundError: File b'filename.csv' does not exist
- Reason: File Not Found error typically occurs when there is an issue with the file path (or directory) or file name.
- Fix: Check file path, file name, and file extension.
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
- Reason: In
\U
starts an eight-character Unicode escape, such as\U00014321
. In the code, the escape is followed by the character ‘s’, which is invalid. - Fix:
- Use the prefix string with
r
(to produce a raw string)pd.read_excel(r'D:\Python\Tutorial\filename.xlsx')
or, - You either need to duplicate all backslashes
pd.read_excel(r'D:\\Python\\Tutorial\\filename.xlsx')
- Use the prefix string with
- Reason: In
ImportError: Install xlrd >= 1.0.0 for Excel support.
- Reason: xlrd package is not available in the python environment
- Fix: Install xlrd package if you get the above error
pip install xlrd
Conclusion
We have covered the steps needed to read an Excel file in python using pandas read_excel function.
Go to read data from csv files, and write data to CSV files using Python.