Pandas Sort by Multiple Columns


Sorting a DataFrame by multiple columns becomes important when there is a need to establish a hierarchical order based on different attributes.

For example, let a dataset contain student information, where you want to sort first by their grades and then by their age to get a more nuanced understanding of performance within age groups.

    Table of Contents

  1. Sort by Multiple Columns
  2. Sort by Multiple Columns in Descending Order
  3. Sort by Multiple Columns with Different Order
  4. Customise Sorting Priority
  5. Conclusion

Sort by Multiple Columns

The sort_values() method in Pandas allows us to sort a DataFrame by one or more columns.

Pass a list of column names to the sort_values() method to sort by multiple columns in the order of priority. For example, df.sort_values(['col1', 'col2']) will sort the DataFrame by col1 first and then by col2.

Here is a working example of sorting a DataFrame by multiple columns.

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [45, 40, 35, 40, 38],
        'Score': [95, 80, 92, 70, 60]}

df = pd.DataFrame(data)

# 👇 Sort DataFrame first by 'Age' and then by 'Score'
df_sorted_multiple = df.sort_values(by=['Age', 'Score'])
print(df_sorted_multiple)

Output:

      Name  Age  Score
2  Charlie   35     92
4      Eve   38     60
3    David   40     70
1      Bob   40     80
0    Alice   45     95

Sort by Multiple Columns in Descending Order

By default the sort_values method sorts the DataFrame in ascending order. To sort in descending order, pass ascending=False as an argument.

Let's sort the DataFrame below in descending order.

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [45, 40, 35, 40, 38],
        'Score': [95, 80, 92, 70, 60]}

df = pd.DataFrame(data)

# 👇 Sort DataFrame in descending order by 'Age' and then by 'Score'
df_sorted_multiple = df.sort_values(by=['Age', 'Score'], ascending=False)
print(df_sorted_multiple)

Output:

      Name  Age  Score
0    Alice   45     95
1      Bob   40     80
3    David   40     70
4      Eve   38     60
2  Charlie   35     92

Sort by Multiple Columns with Different Order

It is also possible to sort by multiple columns with different order. For example, you can sort by Age in descending order and then by Score in ascending order.

For this, pass a list of True (ascending) and False (descending) representing the order of sorting for each column.

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [45, 40, 35, 40, 38],
        'Score': [95, 80, 92, 70, 60]}
df = pd.DataFrame(data)

# 👇 Sort DataFrame first by 'Age' in descending order
# and then by 'Score' in ascending order
df_sorted_multiple = df.sort_values(by=['Age', 'Score'], ascending=[False, True])
print(df_sorted_multiple)

Output:

      Name  Age  Score
0    Alice   45     95
1      Bob   40     80
3    David   40     70
4      Eve   38     60

Customise Sorting Priority

Custom sorting priority is useful when you want to sort by multiple columns but want to give priority to one column over the other.

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
        'Age': [45, 40, 35, 40, 38],
        'Score': [95, 70, 92, 80, 60]}

df = pd.DataFrame(data)

# 👇 custom sorting order
custom_order = {'Alice': 3, 'Bob': 2, 'Charlie': 1, 'David': 2, 'Eve': 4}
df_sorted = df.sort_values(by=['Name', 'Age'], key=lambda x: x.map(custom_order))
print(df_sorted)

Output:

      Name  Age  Score
2  Charlie   35     92
1      Bob   40     70
3    David   40     80
0    Alice   45     95
4      Eve   38     60

You can see that the DataFrame is sorted by given custom order of names and then by age.


Conclusion

Mastering multi-column sorting in Pandas opens up a realm of possibilities for organizing your data.

Whether you need simple ascending or descending sorts or more complex custom orders, Pandas provides the tools you need to efficiently arrange your DataFrame.