Sort the Dataframe Columns in Pyspark

The dataframe.sort() function is used to sort the columns in the dataframe.

Syntax DataFrame.sort(Column, SortOrder)

Here, the Column specifies the name of the column on which we want to sort, and it can be in the string, or list type. The SortOrder specifies the sort order i.e. ascending or descending, it will be in boolean data type, i.e. True or False. True for ascending order and False for the descending Order. By default, its value is True.

Example: First create the SparkSession and read the data from the CSV file.

Python

# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate() 

# Import the Data
df = spark.read.csv("data.csv", header=True, inferSchema=True)

# Show the data in the DataFrame
df.show()  

The output of the above code is shown below:

Sort the Dataframe Columns in Pyspark

Let’s sort the dataframe by Ascending order.

Python

groupdf=df.groupBy(df["Company"]).agg(sum("Salary").alias("Total Salary"))
groupdf.sort("Total Salary").show()          

The output of the above code is shown below:

Sort the Dataframe Columns in Pyspark

Let’s sort the dataframe in descending order.

Python

groupdf.sort("Total Salary", ascending=False).show()

The output of the above code is shown below:

Sort the Dataframe Columns in Pyspark