The round method in Pyspark

In PySpark, the round() method is used to round a numeric column to a specified number of decimal places.

Syntax pyspark.sql.functions.round(“Column1”, scale)

The function has two parameters:

In this exercise, we are using the datasource data.csv. You can download the datasource and use for the transformation.

Example: First create the SparkSession and read the data from the CSV file.

Python

# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate()

# Import the Data
df = spark.read.csv("data.csv", header=True, inferSchema=True)

# Show the data in the DataFrame
df.show()  

The output of the above code is shown below:

The round method in Pyspark

First import the pyspark.sql.functions module.

Python

# Import the functions module
from pyspark.sql.functions import * 

Let’s round the Height column, we can round the values in this column to 2 decimal places using the following code:

Python

df = df.withColumn('rounded_height', round(df['Height'], 2))
df.show()  

The output of the above code is shown below:

The round method in Pyspark