The mean method in Pandas
To calculate the mean of a numeric column (or columns) in a Pandas DataFrame, we can use the mean() method.
Syntax a) pandas.Series.mean()
b) pandas.DataFame.mean()
c) pandas.DataFame.mean(axis=0)
The parameter axis specifies the computation axis. By default, the computation is based on the column wise. If we want to compute the method by iterating over the row use the axis parameter value to 1.
d) pandas.DataFame.mean(skipna=True)
The parameter skipna is used to exclude the NA/null values when computing the result. By default, the value for this parameter is True. So, it means it skips the NA/ null values when computing the mean.
e) pandas.DataFame.mean(numeric_only=False)
The parameter numeric_only specifies that only the numeric columns are returned in the output of the DataFrame.
Example: Load the DataFrame in the variable df.
Python
# import the pandas package import pandas as pd # Read CSV file and create DataFrame df = pd.read_csv("expenditure.csv") # show dataframe df.head(100)
The output of the above code is shown below:
Mean of a Single Column First, we are extracting the column and then use the mean method on it.
Python
# Get the mean of "Expenditure on Clothes" column clothes_exp_mean = df['Expenditure on Clothes'].mean() print("Mean of Expenditure on Clothes :", clothes_exp_mean)
The output of the above code is shown below:
Mean of Expenditure on Clothes : 55.625
Mean of All Numeric Columns Let’s calculate the mean of all the numeric columns:
Python
numeric_means = df.mean(numeric_only=True) print(numeric_means)
The output of the above code is shown below:
Mean of Selected Columns To calculate the mean for specific columns:
Python
# Get the mean of all the selected columns in the DataFrame selected_means = df[['Expenditure on Clothes', 'Expenditure on Food']].mean() print(selected_means)
The output of the above code is shown below:
Get the mean by Iterating over the Rows Get the mean of all the selected columns in the DataFrame iterating over the rows.
Python
# Get the mean of all the selected columns in the DataFrame iterating over the rows selected_means = df[['Expenditure on Clothes', 'Expenditure on Food']].mean(axis=1) print(selected_means)
The output of the above code is shown below: