cast() method in PySpark

The cast() is used to change the data type of a column.

Syntax pyspark.sql.Column.cast(‘datatype’)

In this exercise, we are using the datasource data.csv. You can download the datasource and use for the transformation.

Example: First create the SparkSession and read the data from the CSV file.

Python

# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate()

# Import the Data
df = spark.read.csv("data.csv", header=True, inferSchema=True)

# Show the data in the DataFrame
df.show() 

The output of the above code is shown below:

cast() method in Pyspark

Converts the 'Salary' column to double data type.

Python

df = df.withColumn('New Salary', df['Salary'].cast('double'))
df.show() 

The output of the above code is shown below:

cast() method in Pyspark

Converts the Height column to int data type.

Python

df = df.withColumn('New Height', df['Height'].cast('int'))
df.show()

The output of the above code is shown below:

cast() method in Pyspark