The round method in PySpark

In PySpark, the round() method is used to round a numeric column to a specified number of decimal places.

Syntax pyspark.sql.functions.round(“Column1”, scale)

The function has two parameters:

Column1: It specifies the column name.
scale: It specifies the scale of rounding the values, by default it is 0.

In this exercise, we are using the datasource data.csv. You can download the datasource and use for the transformation.

Example: First create the SparkSession and read the data from the CSV file.

Python

# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate()

# Import the Data
df = spark.read.csv("data.csv", header=True, inferSchema=True)

# Show the data in the DataFrame
df.show()

The output of the above code is shown below:

First import the pyspark.sql.functions module.

Python

# Import the functions module
from pyspark.sql.functions import *

Let’s round the Height column, we can round the values in this column to 2 decimal places using the following code:

Python

df = df.withColumn('rounded_height', round(df['Height'], 2))
df.show()

The output of the above code is shown below:

Previous Next