Selecting Rows and Columns Based on Conditions in Python Pandas DataFrame

There are multiple instances where we have to select the rows and columns from a Pandas DataFrame by multiple conditions. Let’s see a few commonly used approaches to filter rows or columns of a dataframe using the indexing and selection in multiple ways. For example, one can use label based indexing with loc function.

 

Table of Contents:

  • Select data by multiple conditions (Boolean Variables)
  • Select data by conditional statement (.loc)
    • Set values for selected subset data in DataFrame
  • Select data by row numbers (.iloc)
  • Select all rows containing a substring

Data Preparation

 first_namedegreenationalityagegrade
0SamMBAUSA25A+
1ZivaMSIndia29A
2KiaGraduateUK19C
3RobinArtsFrance21NaN
4KimMSCanada33B-

 

Select data using Boolean Variables

Select rows or columns based on conditions in Pandas DataFrame using different operators. First, let’s check operators to select rows based on particular column value using '>', '=', '=', '<=', '!=' operators.

 first_namedegreenationalityagegrade
1ZivaMSIndia29A
4KimMSCanada33B-
 first_namedegreenationalityagegrade
1ZivaMSIndia29A
1    MS
Name: degree, dtype: object

Select data using “loc”

The syntax of the “loc” indexer is: data.loc[<row selection>, <column selection>].

Let’s repeat all the previous examples using loc indexer. In the next section we will compare the differences between the two.

 first_namedegreenationalityagegrade
1ZivaMSIndia29A
4KimMSCanada33B-
 first_namedegreenationalityagegrade
1ZivaMSIndia29A
1    MS
Name: degree, dtype: object

Set values for selected subset data in DataFrame

Using “.loc”, DataFrame update can be done in the same statement of selection and filter with a slight change in syntax. You can update values in columns applying different conditions.

For example, we will update the degree of persons whose age is greater than 28 to “PhD”.

first_namedegreenationalityagegrade
1ZivaMSIndia29A
4KimMSCanada33B-
first_namedegreenationalityagegrade
1ZivaPhDIndia29A
4KimPhDCanada33B-

However, boolean operations do not work in case of updating DataFrame values.

Select data using “iloc”

The iloc syntax is data.iloc[<row selection>, <column selection>]. “iloc” in pandas is used to select rows and columns by number, in the order that they appear in the DataFrame. Both row and column numbers start from 0 in python.

i. Single Selection

ii. Multiple Selection

Select all rows containing a sub string

Select rows in DataFrame which contain the substring. We will use str.contains() function

first_namedegreenationalityagegrade
2KiaGraduateUK19NaN

Conclusion

We have covered the basics of indexing and selecting with Pandas. These Pandas functions are an essential part of any data munging task and will not throw an error if any of the values are empty or null or NaN. There are other useful functions that you can check in the official documentation.

Start with Data Science here

Leave a Comment