diff() method in Pandas
The diff() method in Pandas calculates the difference between an element and another element in the DataFrame. By default, it calculates the difference with the previous row.
It is essentially a shorthand for the operation df - df.shift(1).
Real-world use case: Calculating daily temperature changes, month-over-month revenue growth, or differences between consecutive observations in time series data.
Official Documentation:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.diff.html
Syntax
df.diff(periods=1, axis=0)
- periods: Number of rows to shift before calculating the difference. Default value is 1.
- axis: Defines the direction of calculation.
- axis=0 → Row-wise difference (default).
- axis=1 → Column-wise difference.
Example: Calculating differences between rows.
Python
import pandas as pd
df = pd.DataFrame({'Revenue': [1000, 1200, 1100, 1500]})
# Calculate difference from the previous row
df['Change'] = df['Revenue'].diff()
# Calculate difference from two rows ago
df['Change_2P'] = df['Revenue'].diff(periods=2)
print(df) The output of the above code is shown below:
Output
Revenue Change Change_2P 0 1000 NaN NaN 1 1200 200.0 NaN 2 1100 -100.0 100.0 3 1500 400.0 300.0
Example: Calculating the difference with the next rows using negative periods.
Python
import pandas as pd
df = pd.DataFrame({'Revenue': [1000, 1200, 1100, 1500]})
# Calculate difference from the next row
df['Change'] = df['Revenue'].diff(periods=-1)
# Calculate difference from two rows ahead
df['Change_2P'] = df['Revenue'].diff(periods=-2)
print(df) The output of the above code is shown below:
Output
Revenue Change Change_2P 0 1000 -200.0 -100.0 1 1200 100.0 -300.0 2 1100 -400.0 NaN 3 1500 NaN NaN
Example: Calculating the absolute difference between rows using the abs() method.
Python
import pandas as pd
df = pd.DataFrame({'Revenue': [1000, 1200, 1100, 1500]})
# Calculate absolute difference with the next row
df['Change'] = df['Revenue'].diff(periods=-1).abs()
# Calculate absolute difference with two rows ahead
df['Change_2P'] = df['Revenue'].diff(periods=-2).abs()
print(df) The output of the above code is shown below:
Output
Revenue Change Change_2P 0 1000 200.0 100.0 1 1200 100.0 300.0 2 1100 400.0 NaN 3 1500 NaN NaN
Best Practices
- Use periods=-1 to calculate the difference with the next row instead of the previous row.
- Combine diff() with abs() to calculate the magnitude of change regardless of direction.
- The first row usually returns NaN because there is no previous value to compare.