Read a DataFrame from CSV in Pyspark

In this exercise, we are using the datasource data.csv. You can download the datasource and use for the transformation.

To read a DataFrame from CSV file use the following command:

Python

df = spark.read.csv("path/to/csvfile.csv", header=True, inferSchema=True)  

Example: First create the SparkSession.

Python

# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate() 

Read the file data.csv, we can use the following PySpark code.

Python

# Import the Data
df = spark.read.csv("data.csv", header=True, inferSchema=True)

# Show the data in the DataFrame
df.show() 

To output of the above code is shown below:

Read a DataFrame from CSV in Pyspark

To get the dataframe schema just type the dataframe name

Read a DataFrame from CSV in Pyspark