Drop a Column from the dataframe in PySpark

The dataframe.drop() method is used to drop the specific column(s) from the dataframe and returns the transformed new dataframe.

Syntax a) df = dataframe.drop("column_name")
b) df = dataframe.drop([“column_name1”, “column_name2”])

Example: First create the SparkSession.

Python

# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate()

Read the data from the CSV file and show the data after reading.

Python

# Import the Data
df = spark.read.csv("data.csv", header=True, inferSchema=True)

# Show the data in the DataFrame
df.show()

The output of the above code is shown below:

Drop a Column from the dataframe in PySpark

Let’s drop the “Company” column from the DataFrame.

Python

newdf = df.drop("Company")
newdf.show()

The output of the above code is shown below:

Previous Next