Let’s create pandas DataFrame in Python. A Data frame is a two-dimensional data structure containing labeled axes (rows and columns) i.e., data is aligned in a tabular fashion in rows and columns.
Table of Contents
pandas.DataFrame
A pandas DataFrame can be created by passing the following parameters:
pandas.DataFrame(data, index, columns, dtype, copy)
Sr.No | Parameters Description |
---|---|
1 | data input data can be dictionary, series, arrays, constants, list and also another DataFrame. |
2 | index index (rows label) of the resulting data frame. |
3 | columns column labels of the resulting data frame, default is RangeIndex (0, 1,…, n) if no column labels are provided. |
4 | type datatype (dtype) to force on each column. If None then infer. |
5 | copy copy data from inputs, default is False. |
Table of Contents
Let’s explore various ways to create DataFrame using inputs like:
1. Create a DataFrame from Dictionary (Preferred)
import pandas as pd
#intialize dict containing lists of data
data = {'name':['Sam', 'Zen', 'Robin', 'John'],
'weight':[70, 85, 55, 90]
}
#create DataFrame
df = pd.DataFrame(data)
#print the output
print(df)
name | weight | |
---|---|---|
0 | Sam | 70 |
1 | Zen | 85 |
2 | Robin | 55 |
3 | John | 90 |
2. Create a DataFrame from Lists
The DataFrame can be created easily using a list.
import pandas as pd
data = [1,2,3,4,5]
#column name can be passed in columns parameter
df = pd.DataFrame(data, columns=["count"])
df
count | |
0 | 1 |
1 | 2 |
2 | 3 |
3 | 4 |
4 | 5 |
3. Create a DataFrame from Pandas Series
Let’s create DataFrame from dict of series. The resultant indices are the union of all the series of passed indexed.
import pandas as pd
data = {'a' : pd.Series([1, 2, 3]),
'b' : pd.Series([1, 2, 3, 4])
}
df = pd.DataFrame(data)
df
a | b | |
---|---|---|
0 | 1.0 | 1 |
1 | 2.0 | 2 |
2 | 3.0 | 3 |
3 | NaN | 4 |
Note: In ‘a’ series of data only 3 values are passed, therefore missing index is appended with NaN.
4. Create a DataFrame from ndarrays
Python Numpy array can also be used to create a Pandas DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1, 2], [3, 4], [5, 6]]),
columns=['a', 'b'])
df
a | b | |
---|---|---|
0 | 1 | 2 |
1 | 3 | 4 |
2 | 5 | 6 |
5. Creating an empty dataframe
An Empty Pandas Dataframe in python without any rows and columns can be created just by calling a dataframe constructor.
5.1 Empty DataFrame
import pandas as pd
#calling DataFrame constructor
df = pd.DataFrame()
print(df)
Empty DataFrame Columns: [] Index: []
5.2 Empty DataFrame with Column Names and Rows indices
Similarly, we can create an empty data frame with only columns, rows or both. In the example below, we will create an empty DataFrame with columns: name, age, weight, and 3 rows.
# create an empty Dataframe with columns or indices
df = pd.DataFrame(columns=['name', 'age', 'weight'], index=list(range(0,4)))
print(df)
name | age | weight | |
---|---|---|---|
0 | NaN | NaN | NaN |
1 | NaN | NaN | NaN |
2 | NaN | NaN | NaN |
For more reference feel free to explore Pandas DataFrame documentation.
Start with Machine Learning Tutorial here.