The pandas.get_dummies method in Pandas

The pandas.get_dummies function creates a new column for each unique value in the categorical data and assigning a 1 or 0 to indicate the presence or absence of that category in the original data. By default, new columns created in the output are each named after a value prefixed with a column name. This process is also known as one-hot encoding.

Note: The converted data is provided to machine learning algorithms to improve their performance.

Syntax a) pandas.get_dummies(data)
b) pandas.get_dummies(data, columns=[“columnName”])

As the columns parameter is not the second parameter in the function, so when we are calling this function, it is required from us to specify the parameter name “columns” with the argument, otherwise, it will not work as expected.
Here, dtype is the data type for new columns. Only a single dtype is allowed. By default, it is a boolean.

Note: Here we are calling the function with the pandas’ library not with the dataframe/Series.

Example: Here we are first reading the data from a csv file. It reads the data and creates a dataframe from the data.

Python

mydata=pd.read_csv("data.csv")
mydata

Let’s convert the Company categorical column.

Python

pd.get_dummies(mydata, columns=["Company"])

The output of the about code is shown in the image below:

Let’s specify the dtype parameter in the function.

Python

pd.get_dummies(mydata, columns=["Company"], dtype=int)

The output of the about code is shown in the image below:

This transformation is useful in machine learning because many algorithms require numerical input and get_dummies helps in converting categorical text data into a numeric format.

Previous Next