Create a Dataframe in Pyspark

In general, DataFrames can be defined as a data structure, which is tabular in nature.

Features of dataframe

To create a dataframe use the following pyspark syntax:


spark.createDataFrame(data, schema)  

Example: First create the SparkSession.


# Import the SparkSession module
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder.appName("App Name").getOrCreate()   

The following shows an example of how to create a dataframe:


data = [
    (15779, "small_business", 1.204, "high"),
    (87675, "large_business", 0.167, "low")

columns= ["Salary", "Business Type", "Score", "Standard"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Show DataFrame   

The output of the above command is shown below:

Create a Dataframe in Pyspark