The pandas.DataFrame.duplicated() method in Pandas
This pandas.DataFrame.duplicated() method returns a boolean series denoting the duplicate rows. By default, it will consider all the columns to check whether a row is duplicate or not. Optionally, we can specify the columns to consider while checking whether a row is duplicate.
Syntax a) pandas.DataFrame.duplicated()
b) pandas.DataFrame.duplicated(subset=[“column_name1”, “column_name2”])
Example: Create a dataframe.
Python
mydata = { 'Name': ['Ashish', 'Katrina', 'Alia', 'Ashish', 'Alia'], 'Age': [25, 30, 35, 25, 40], 'City': ['New York', 'Los Angeles', 'Mumbai', 'New York', 'Mumbai'] } df = pd.DataFrame(mydata) print(df)
The output of the above code is shown below:
Use the below code to see whether the row is a duplicate or not. It returns a boolean series, where True specifies the duplicated row and False, specifies the non-duplicate row.
Python
newdf=df.duplicated() print(newdf)
The output of the above code is shown below: