Create Pandas DataFrame in Python

Let’s create pandas DataFrame in Python. A Data frame is a two-dimensional data structure containing labeled axes (rows and columns) i.e., data is aligned in a tabular fashion in rows and columns.

pandas.DataFrame

A pandas DataFrame can be created by passing the following parameters:

pandas.DataFrame(data, index, columns, dtype, copy)
Sr.NoParameters Description
1data
input data can be dictionary, series, arrays, constants, list and also another DataFrame.
2index
index (rows label) of the resulting data frame.
3columns
column labels of the resulting data frame, default is RangeIndex (0, 1,…, n) if no column labels are provided.
4type
datatype (dtype) to force on each column. If None then infer.
5copy
copy data from inputs, default is False.

Table of Contents

Let’s explore various ways to create DataFrame using inputs like:

  1. Dictionary
  2. Lists
  3. Series
  4. Numpy arrays
  5. Empty DataFrame

1. Create a DataFrame from Dictionary (Preferred)

import pandas as pd 

#intialize dict containing lists of data
data = {'name':['Sam', 'Zen', 'Robin', 'John'],
         'weight':[70, 85, 55, 90]
 }

#create DataFrame
df = pd.DataFrame(data) 

#print the output
print(df) 
nameweight
0Sam70
1Zen85
2Robin55
3John90

2. Create a DataFrame from Lists

The DataFrame can be created easily using a list.

import pandas as pd
data = [1,2,3,4,5]

#column name can be passed in columns parameter
df = pd.DataFrame(data, columns=["count"])
df
count
01
12
23
34
45

3. Create a DataFrame from Pandas Series

Let’s create DataFrame from dict of series. The resultant indices are the union of all the series of passed indexed.

import pandas as pd

data = {'a' : pd.Series([1, 2, 3]),
    'b' : pd.Series([1, 2, 3, 4])
 }

df = pd.DataFrame(data)
df
ab
01.01
12.02
23.03
3NaN4

Note: In ‘a’ series of data only 3 values are passed, therefore missing index is appended with NaN.

4. Create a DataFrame from ndarrays 

Python Numpy array can also be used to create a Pandas DataFrame.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.array([[1, 2], [3, 4], [5, 6]]),
                    columns=['a', 'b'])
df
ab
012
134
256

5. Creating an empty dataframe

An Empty Pandas Dataframe in python without any rows and columns can be created just by calling a dataframe constructor.

5.1 Empty DataFrame

import pandas as pd

#calling DataFrame constructor
df = pd.DataFrame() 
print(df)
Empty DataFrame
Columns: []
Index: []

5.2 Empty DataFrame with Column Names and Rows indices

Similarly, we can create an empty data frame with only columns, rows or both. In the example below, we will create an empty DataFrame with columns: name, age, weight, and 3 rows.

# create an empty Dataframe with columns or indices

df = pd.DataFrame(columns=['name', 'age', 'weight'], index=list(range(0,4))) 
print(df)
nameageweight
0NaNNaNNaN
1NaNNaNNaN
2NaNNaNNaN

For more reference feel free to explore Pandas DataFrame documentation.

Start with Machine Learning Tutorial here.

Leave a Comment

Keytodatascience Logo

Connect

Subscribe

Join our email list to receive the latest updates.

© 2022 KeyToDataScience