fillna() method in Pyspark
The fillna() method in PySpark is used to replace null or NaN values in a DataFrame with specified values.
Syntax pyspark.sql.DataFrame.fillna(value, subset=None)
• value: The value to replace null or NaN values. It can be a single value or a dictionary. • subset: A list of column names to consider for replacement. If not specified, all columns are considered.
In this exercise, we are using the datasource employees.csv. You can download the datasource and use for the transformation.
Example: First create the SparkSession and read the data from the CSV file.
Python
# Import the SparkSession module from pyspark.sql import SparkSession # Initialize a Spark session spark = SparkSession.builder.appName("App Name").getOrCreate() # Import the Data df = spark.read.csv("employees.csv", header=True, inferSchema=True) # Show the data in the DataFrame df.show()
The output of the above code is shown below:

Let’s replace all the null values in the dataframe with 0. As we are specifying the numeric values so the function will replace all the null values only in the numeric columns with 0.
Python
newdf=df.fillna(0) newdf.show()
The output of the above code is shown below:

For example, here we have specified, the string values.
Python
newdf=newdf.fillna("No value Present") newdf.show()
The output of the above code is shown below:

We can specify the different replacement values for different columns by using the dictionary.
Python
# Replace null values with different values for different columns df = df.fillna({'Salary': 500, 'Name': 'Not available', 'Company': 'Not found'}) df.show()
The output of the above code is shown below:
