The drop_duplicates method in Pandas
This pandas.DataFrame.drop_duplicates() method returns the dataframe after removing the duplicate rows. By default, it will consider all the columns to check whether a row is duplicate or not. Optionally, we can specify the columns to consider while checking whether a row is duplicate.
Syntax a) pandas.DataFrame.drop_duplicates()
b) pandas.DataFrame.drop_duplicates(subset=[“column_name1”, “column_name2”])
Example: Create a dataframe.
Python
mydata = { 'Name': ['Ashish', 'Katrina', 'Alia', 'Ashish', 'Alia'], 'Age': [25, 30, 35, 25, 40], 'City': ['New York', 'Los Angeles', 'Mumbai', 'New York', 'Mumbai'] } df = pd.DataFrame(mydata) print(df)
Use the below command to drop the duplicates.
Python
newdf=df.drop_duplicates() print(newdf)
The output of the above code is shown below: