tail() method in Pyspark

The tail() method in Pyspark is used to retrieve the last n rows of a DataFrame.

Syntax pyspark.sql.DataFrame.tail(n)
• n: The number of rows to retrieve from the end of the DataFrame.

In this exercise, we are using the datasource data.csv. You can download the datasource and use for the transformation.

Example: First create the SparkSession and read the data from the CSV file.

Python

# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate()

# Import the Data
df = spark.read.csv("data.csv", header=True, inferSchema=True)

# Show the data in the DataFrame
df.show() 

The output of the above code is shown below:

tail() method in Pyspark

Let’s get the last three rows from the dataframe.

Python

df.tail(3) 

The output of the above code is shown below:

tail() method in Pyspark