LogisticRegression Algorithm in ML

Let’s use a LogisticRegression model to train and test the data.

In this exercise, we will use the data.csv file as the data source for training the machine learning algorithm. You can also download this data and use it for the ML task.

Example: Import the necessary libraries and read the data from the CSV file.

Python

import numpy as np
import pandas as pd

# Load Data into Pandas DataFrame
df = pd.read_csv("data.csv")
df 

The output of the above code is shown below:

LogisticRegression Algorithm in ML

Let’s get the count of Males and Females in the Dataframe.

Python

# Get the unique values and their corresponding count
df["Gender"].value_counts()  

The output of the above code is shown below:

LogisticRegression Algorithm in ML

We found that there are 6 Females and 3 Males in the source data. So, we extract the Male data in one variable named minority and Female data in one variable named majority.

Python

# Filter the Dataframe
minority = df[df['Gender'] == "Male"]
majority = df[df['Gender'] == "Female"] 

We are going to do resampling in the data so number of males and females in the data is equal, so there is no gender bias in the trained model.

Python

# Import the resample module
from sklearn.utils import resample

minority_upsampled = resample(minority, replace=True, n_samples=len(majority), random_state=42)
minority_upsampled 

The output of the above code is shown below:

LogisticRegression Algorithm in ML

Now we are going to concat the majority and minority_upsampled data.

Python

df_balanced = pd.concat([majority, minority_upsampled])
df_balanced

The output of the above code is shown below:

LogisticRegression Algorithm in ML

Let’s again check the number of Males and Females in the transformed data.

Python

df_balanced['Gender'].value_counts()

The output of the above code is shown below:

LogisticRegression Algorithm in ML

So, now we can see 6 Female and 6 Males.

We have a column Company which are going to use in the machine learning, to use it we need to convert the values in some numerical value. To do so, we can use the get_dummies pandas function.

Python

df_encoded = pd.get_dummies(df_balanced, columns=['Company'], dtype=int)
df_encoded

The output of the above code is shown below:

LogisticRegression Algorithm in ML

Python

# Import the module
from sklearn.preprocessing import LabelEncoder

# Initialize the object
label_encoder = LabelEncoder()

df_encoded['Gender'] = label_encoder.fit_transform(df_encoded['Gender'])
df_encoded 

The output of the above code is shown below:

LogisticRegression Algorithm in ML

Now we are selecting the columns from the transformed data for the features and target value in the X and y variables.

Python

# Here, the data in the variables are loaded, which we are going to use to train our model
# Note this data should not contain data other than numerical data type
# Otherwise, the algorithm will raise an error

X = df_encoded.drop(columns=['Name', 'Country', 'Gender'])
y = df_encoded['Gender']
X 

The output of the above code is shown below:

LogisticRegression Algorithm in ML

And in the y variable we have the data as shown below:

LogisticRegression Algorithm in ML

Now it’s time to split the training and testing data. Here we are using the 60% data for training and 40% for testing.

Python

# Import the module
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42) 

The train_test_split function is from the sklearn.model_selection module and is used to split a dataset into training and testing sets.

The following are the parameters in the function:

In the output, the function returns:

Now train the model using the training data with LogisticRegression model algorithm.

LogisticRegression Algorithm We can initialize the algorithm using the below syntax.
a) LogisticRegression()
b) LogisticRegression(max_iter=1000)

The parameter max_iter specifies the maximum number of iterations taken for the solvers to converge. The default value of this parameter is 100.

Python

# Import the module
from sklearn.linear_model import LogisticRegression

# Initialize the model
# model = LogisticRegression()
model = LogisticRegression(max_iter=1000)

# Training the model
model.fit(X_train, y_train) 

The output of the above code is shown below:

LogisticRegression Algorithm in ML

Now it is time to make the predictions on the testing data.

Python

# Predict class labels for samples in X
# It returns y_pred
y_pred = model.predict(X_test)  
y_pred

The output of the above code is shown below:

LogisticRegression Algorithm in ML

Let’s create the classification_report of the LogisticRegression model.

Python

# Import the module
from sklearn.metrics import classification_report

# Generate the classification report
report = classification_report(y_test, y_pred)

# Print the classification report
print("Classification Report:\n", report) 

The output of the above code is shown below:

LogisticRegression Algorithm in ML

Let’s check the accuracy_score of the LogisticRegression model.

Python

# Import the module
from sklearn.metrics import accuracy_score

# Print the Accuracy Score
print("Accuracy:", accuracy_score(y_test, y_pred)) 

The output of the above code is shown below:

LogisticRegression Algorithm in ML