The pandas.DataFrame.duplicated() method in Pandas

This pandas.DataFrame.duplicated() method returns a boolean series denoting the duplicate rows. By default, it will consider all the columns to check whether a row is duplicate or not. Optionally, we can specify the columns to consider while checking whether a row is duplicate.

Syntax a) pandas.DataFrame.duplicated()
b) pandas.DataFrame.duplicated(subset=[“column_name1”, “column_name2”])

Example: Create a dataframe.

Python

mydata = {
    'Name': ['Ashish', 'Katrina', 'Alia', 'Ashish', 'Alia'],
    'Age': [25, 30, 35, 25, 40],
    'City': ['New York', 'Los Angeles', 'Mumbai', 'New York', 'Mumbai']
}
df = pd.DataFrame(mydata)
print(df)

The output of the above code is shown below:

Use the below code to see whether the row is a duplicate or not. It returns a boolean series, where True specifies the duplicated row and False, specifies the non-duplicate row.

Python

newdf=df.duplicated()
print(newdf)

The output of the above code is shown below:

Previous Next