Return DataFrame Columns in Pyspark

The dataframe.columns is used to return the column names in the dataframe.

Example: First create the SparkSession.

Python

# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate()   

Read the data from the CSV file and show the data after reading.

Python

# Import the Data
df = spark.read.csv("data.csv", header=True, inferSchema=True)

# Show the data in the DataFrame
df.show()  

The output of the above code is shown below:

Return DataFrame Columns in Pyspark

Let’s get all the column names from the dataframe as a list.

Python

df.columns

The output of the above code is shown below:

Return DataFrame Columns in Pyspark