Select specific columns from dataframe in Pyspark
In this exercise, we will learn how to select the specific columns from dataframe in Pyspark.
The dataframe.select() function is used to select the specific columns from the dataframe and returns the transformed new dataframe.
Syntax df.select("column1", "column2")
where, column1, column2 is the name of the columns which we want to select from the dataframe.
Example: First create the SparkSession.
Python
# Import the SparkSession module from pyspark.sql import SparkSession # Initialize a Spark session spark = SparkSession.builder.appName("App Name").getOrCreate()
Read the data from the CSV file and show the data after reading.
Python
# Import the Data df = spark.read.csv("data.csv", header=True, inferSchema=True) # Show the data in the DataFrame df.show()
The output of the above code is shown below:

Let’s select the columns “Name” and “Company” from the DataFrame.
Python
df.select("Name", "Company").show()
The output of the above code is shown below:
