The pandas.DataFrame.groupby method in Pandas

For the demonstration of groupby function, we are using the datasource employees.csv. You can download the datasource and use for the transformation.

Example: Load the employees.csv file.

Python

import pandas as pd
mydata=pd.read_csv("employees.csv")
mydata   

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

Now let’s group the dataframe by the column named “Country”.

Python

grouped_data=mydata.groupby("Country")
grouped_data 

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

Now the dataframe is grouped based on Country column values and make a DataFrameGroupBy object.

Let’s get the length of the dataframe object.

Python

# It returns the number of items in the object
len(grouped_data) 

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

It returns four means we can say the dataframe groupby object has four dataframes, and it is because of the four unique values in the Country column of the original dataframe.

Retrieve a Group with the get_group Method

The get_group method on the DataFrameGroupBy object retrieves a nested DataFrame belonging to a specific group/category.

Syntax pandas.core.groupby.DataFrameGroupBy.get_group(name)

The parameter name specifies the name of the group to get as a DataFrame.

Example: Get all the rows of the group named “India”. Or we can say get all the rows where the country name is India, as the original dataframe is grouped by Country column.

Python

grouped_data.get_group("India") 

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

Methods on the GroupBy Object

Example: Get the SeriesGroupBy object.

Python

# Select the column in square brackets []
grouped_data["Salary"] 

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

Let’s get the sum of the “Salary” column on the based on the groups.

Python

# Select the column in square brackets []
# On which we want to apply the aggregation
grouped_data["Salary"].sum() 

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

Let’s get the minimum value in the Salary column of each group.

Python

# Select the column in square brackets []
# On which we want to apply the aggregation
grouped_data["Salary"].min() 

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

Let’s get the maximum value in the Salary column of each group.

Python

# Select the column in square brackets []
# On which we want to apply the aggregation
grouped_data["Salary"].max() 

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

Let’s get the mean of the Salary column of each group.

Python

grouped_data["Salary"].mean()  

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas

pandas.core.groupby.DataFrameGroupBy.first() This function is used to return the first row from each group,

Python

# It returns the first row of each group
grouped_data.first() 

pandas.core.groupby.DataFrameGroupBy.last() This function is used to return the last row from each group,

Python

# It returns the last row of each group
grouped_data.last()  

The output of the above code is shown below:

The pandas.DataFrame.groupby() method in Pandas