diff() method in Pandas

The diff() method in Pandas calculates the difference between an element and another element in the DataFrame. By default, it calculates the difference with the previous row.

It is essentially a shorthand for the operation df - df.shift(1).

Real-world use case: Calculating daily temperature changes, month-over-month revenue growth, or differences between consecutive observations in time series data.

Official Documentation:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html

Syntax

df.diff(periods=1, axis=0)    

Example: Calculating differences between rows.

Python

import pandas as pd

df = pd.DataFrame({'Revenue': [1000, 1200, 1100, 1500]})

# Calculate difference from the previous row
df['Change'] = df['Revenue'].diff()

# Calculate difference from two rows ago
df['Change_2P'] = df['Revenue'].diff(periods=2)

print(df)    

The output of the above code is shown below:

Output

   Revenue  Change  Change_2P
0     1000     NaN        NaN
1     1200   200.0        NaN
2     1100  -100.0      100.0
3     1500   400.0      300.0  

Example: Calculating the difference with the next rows using negative periods.

Python

import pandas as pd

df = pd.DataFrame({'Revenue': [1000, 1200, 1100, 1500]})

# Calculate difference from the next row
df['Change'] = df['Revenue'].diff(periods=-1)

# Calculate difference from two rows ahead
df['Change_2P'] = df['Revenue'].diff(periods=-2)

print(df)    

The output of the above code is shown below:

Output

   Revenue  Change  Change_2P
0     1000  -200.0     -100.0
1     1200   100.0     -300.0
2     1100  -400.0        NaN
3     1500     NaN        NaN 

Example: Calculating the absolute difference between rows using the abs() method.

Python

import pandas as pd

df = pd.DataFrame({'Revenue': [1000, 1200, 1100, 1500]})

# Calculate absolute difference with the next row
df['Change'] = df['Revenue'].diff(periods=-1).abs()

# Calculate absolute difference with two rows ahead
df['Change_2P'] = df['Revenue'].diff(periods=-2).abs()

print(df)    

The output of the above code is shown below:

Output

   Revenue  Change  Change_2P
0     1000   200.0      100.0
1     1200   100.0      300.0
2     1100   400.0        NaN
3     1500     NaN        NaN 

Best Practices