Read a DataFrame from CSV in Pyspark
In this exercise, we are using the datasource data.csv. You can download the datasource and use for the transformation.
To read a DataFrame from CSV file use the following command:
Python
df = spark.read.csv("path/to/csvfile.csv", header=True, inferSchema=True)
Example: First create the SparkSession.
Python
# Import the SparkSession module from pyspark.sql import SparkSession # Initialize a Spark session spark = SparkSession.builder.appName("App Name").getOrCreate()
Read the file data.csv, we can use the following PySpark code.
Python
# Import the Data df = spark.read.csv("data.csv", header=True, inferSchema=True) # Show the data in the DataFrame df.show()
To output of the above code is shown below:

To get the dataframe schema just type the dataframe name
