Replace the value in the Dataframe in Pyspark
The dataframe.replace() method in Pyspark is used to replace the values in the dataframe.
Syntax dataframe.replace(oldvalue, newvalue, ["Columnname1", "columnname2"])
In the above formula, specifying the column names are optional, if we are not specified then the value specified is replaced in every column.
In this exercise, we are using the datasource data.csv. You can download the datasource and use for the transformation.
Example: First create the SparkSession.
Python
# Import the SparkSession module from pyspark.sql import SparkSession # Initialize a Spark session spark = SparkSession.builder.appName("App Name").getOrCreate()
Read the data from the CSV file and show the data after reading.
Python
# Import the Data df = spark.read.csv("data.csv", header=True, inferSchema=True) # Show the data in the DataFrame df.show()
The output of the above code is shown below:

Let’s replace the value “TCS” to value “Tata” in all the columns of the dataframe.
Python
df=df.replace("TCS", "Tata") df.show()
The output of the above code is shown below:
