Filter rows from a Pandas Dataframe

Now in this exercise, we are using the datasource employees.csv. You can download the datasource and use for the transformation.

Let’s extract a series from a dataframe and compare it. In the output we will see it returns a boolean series.

Example: Load the CSV file and use the dataframe for the further transformation.

Python

import pandas as pd
# Read the CSV file
mydata=pd.read_csv("employees.csv")

# We are comparing Series to a value, it returns a boolean series
mydata["Gender"] == "Male"
# Basically, the condition is working over all the rows of the series

The output of the above code is shown below:

To select rows from a dataframe based on a conditional expression, use a condition inside the square brackets []. As boolean series has true or false for each row, if it is true, it will return that row and if false, it will not return that row.

Example: Filter dataframe with the Male employees.

Python

# Extract only Male Employees
mydata[mydata["Gender"] == "Male"]

Or we can use the variable to specify the condition.

Python

# Here we are specifying the condition output to a variable
# Using the variable, we can filter the dataframe
male_employees=mydata["Gender"] == "Male"
mydata[male_employees]

The output of the above code is shown below:

Similarly, we can specify the condition on other columns also. For example, now we want to filter the employees whose salary is above 40000.

Example: Filter the dataframe based on the Salary column.

Python

# Here we are specifying the condition output to a variable
# Using the variable, we can filter the dataframe
high_salary_employees=mydata["Salary"] > 40000
mydata[high_salary_employees]

The output of the above code is shown below:

Multiple Conditions When combining multiple conditional statements, each condition must be surrounded by parentheses (). Moreover, we cannot use or/and but need to use the or operator | and the and operator &.

Example: Filter the dataframe on the basis of Salary and Country column.

Python

# Here we are specifying the condition output to a variable
# Using the variable, we can filter the dataframe
high_salary_employees=mydata["Salary"] > 30000
indian_employees=mydata["Country"] == "India"
mydata[high_salary_employees & indian_employees]

Or the above code can be rewritten as the following:

Python

mydata[(mydata["Salary"] > 30000) & (mydata["Country"] == "India")]

The output of the above code is shown below:

Only rows for which the value is True will be selected.

The isin Method The isin Series method accepts a collection object like a list, tuple, or Series. The method returns True for a row if its value is found in the collection.

Syntax pandas.Series.isin(values)

Example: Filter the dataframe on the basis of multiple column values.

Python

# Here we are specifying the condition output to a variable
# Using the variable, we can filter the dataframe
google_employees=mydata["Company"] == "Google"
tcs_employees=mydata["Company"] == "TCS"
microsoft_employees=mydata["Company"] == "Microsoft"

mydata[google_employees | tcs_employees | microsoft_employees]

Or the above code can be rewritten as the following:

Python

filtered_values = mydata["Company"].isin(["TCS", "Google", "Microsoft"])
mydata[filtered_values]

Or the above code can be rewritten as the following:

Python

mydata[mydata["Company"].isin(["TCS", "Google", "Microsoft"])]

The output of the above code is shown below:

pandas.Series.between The pandas.Series.between function is used to specify the condition range.

Syntax Series.between(left, right)

Example: If we are interested in the employees having salary in between 20000 and 40000. By default, both values are inclusive.

Python

mydata[mydata["Salary"].between(20000, 40000)]

The output of the above code is shown below:

Previous Next