The sort_values method in Pandas
The sort_values method is used to sort the values in ascending or descending order and can be used with both Series and Dataframe object.
Points to note:
- If the data type is a string, the sorting will be alphabetical.
- If the data type is numeric, the sorting will be ascending - smallest to largest.
- The ascending parameter can be reset to order in descending order.
In this exercise, we are using the datasource employees.csv. You can download the datasource and use for the transformation.
Sort a Series The pandas.Series.sort_values() function is used to sort the series in ascending or descending order. Syntax
- pandas.Series.sort_values()
- pandas.Series.sort_values(ascending=True)
- pandas.Series.sort_values(ascending=False)
We can also use the ascending parameter to sort the series by ascending or descending order. By default, its value is True , so it sorts on ascending order.
Example: Load the data in the dataframe.
Python
import pandas as pd mydata=pd.read_csv("employees.csv") mydata
The output of the above code is shown below:
Now extract a column from the dataframe which is a Series object and perform the sort_values method on it. Please note here we are not specifying the column_name in brackets.
Python
mydata[“Salary”].sort_values()
Or we can write the code as:
Python
mydata[“Salary”].sort_values(ascending=True)
The output of the above code is shown below:
Let’s sort the Series by descending order, by setting the ascending parameter to False.
Python
mydata[“Salary”].sort_values(ascending=False)
The output of the above code is shown below:
Sort a Dataframe The pandas.DataFrame.sort_values() function is used to sort the column values in the DataFrame.
Syntax a) pandas.DataFrame.sort_values(by=“Column_Name”)
b) pandas.DataFrame.sort_values(“Column_Name”)
c) pandas.DataFrame.sort_values(by=“Column_Name”, ascending=True)
The parameter “by” specifies the column_name in string or list of strings. As by is the first parameter in this function, so we can omit the explicitly specifying the parameter.
Sort on Multiple Columns To sort the Dataframe on multiple columns.
Syntax pandas.DataFrame.sort_values(by=[“Column_Name1”, “Column_Name2” ])
Example: Let’s sort the dataframe “mydata” by the “Name” column.
Python
mydata.sort_values("Name")
Or we can rewrite the code is shown in the following:
Python
mydata.sort_values(by="Name")
As by is the first parameter in the function for specifying the column names, so we can omit the parameter name and directly specify the argument.
The output of the above code is shown below:
Example: We can sort the “Salary” column in the dataframe.
Python
mydata.sort_values("Salary")
Or we can rewrite the code is shown in the following:
Python
mydata.sort_values(by="Salary")
The output of the above code is shown below:
Example: Let’s sort the dataframe by multiple columns.
Python
mydata.sort_values([“Name”, "Salary"])
The output of the above code is shown below: