Pandas DataFrame Sort by Column


Sorting a Pandas DataFrame by column enabels us to explore patterns, identify trends, and gain insights into your dataset.

In this article we will learn how to sort values and arrange DataFrame based on specific columns in ascending or descending order.

    Table of Contents

  1. Sort using sort_values() Method
  2. Sort by Multiple Columns
  3. Sort in Descending Order
  4. Sort with Missing Values
  5. Conclusion

1. Sort using sort_values() Method

Pandas DataFrame sort_values() method is used to sort the DataFrame by the values of a column.

It takes a column name as its argument and returns a new DataFrame sorted by values in the given column.

Syntax: sort_values() method

df.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')

Here,

Here is an example of sorting a DataFrame by a column.

Example 1:

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [45, 40, 35, 42],
        'Score': [95, 80, 92, 70]}

df = pd.DataFrame(data)
print("Before sorting:")
print(df)

# πŸ‘‡ Sort DataFrame by 'Age' in ascending order
df_sorted_age = df.sort_values(by='Age')
print("\nAfter sorting by 'Age':")
print(df_sorted_age)

Output:

Before sorting:
      Name  Age  Score
0    Alice   45     95
1      Bob   40     80
2  Charlie   35     92
3    David   42     70

After sorting by 'Age':
      Name  Age  Score
2  Charlie   35     92
1      Bob   40     80
3    David   42     70
0    Alice   45     95

2. Sort by Multiple Columns

Sorting dataframe by multiple columns is required when we want to sort the DataFrame by more than one column.

For example, if we want to sort the DataFrame by 'Age' and 2 or more people have the same age, then we can sort them by 'Score', so that the person with the highest score comes first.

To sort by multiple columns, pass a list of column names to the sort_values() method, in the order of priority.

Example 2:

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [45, 40, 35, 40], # πŸ‘ˆ 'David' and 'Bob' have same age
        'Score': [95, 80, 92, 70]}

df = pd.DataFrame(data)
print("Before sorting:")

# πŸ‘‡ Sort DataFrame by 'Age' and 'Score' in ascending order
df_sorted_age_score = df.sort_values(by=['Age', 'Score'])
print("\nAfter sorting by 'Age' and 'Score':")
print(df_sorted_age_score)

Output:

Before sorting:
      Name  Age  Score
0    Alice   45     95
1      Bob   40     80
2  Charlie   35     92
3    David   40     70

After sorting by 'Age' and 'Score':
      Name  Age  Score
2  Charlie   35     92
3    David   40     70
1      Bob   40     80
0    Alice   45     95

As you can see, the DataFrame is first sorted by 'Age' and then by 'Score'.


3. Sort in Descending Order

By default, the sort_values() method sorts the DataFrame in ascending order.

To sort the DataFrame in descending order, set the ascending argument to False in the sort_values() method.

Example 3:

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [45, 40, 35, 42],
        'Score': [95, 80, 92, 70]}

df = pd.DataFrame(data)
print("Before sorting:")
print(df)

# πŸ‘‡ Sort DataFrame by 'Age' in descending order
df_sorted_age = df.sort_values(by='Age', ascending=False)
print("\nAfter sorting by 'Age':")
print(df_sorted_age)

Output:

Before sorting:
      Name  Age  Score
0    Alice   45     95
1      Bob   40     80
2  Charlie   35     92
3    David   42     70

After sorting by 'Age':
      Name  Age  Score
0    Alice   45     95
3    David   42     70
1      Bob   40     80
2  Charlie   35     92

4. Sort with Missing Values

There can be NaN values in the DataFrame. The sort_values() method automatically places the missing values at the end of the DataFrame.

To change the position of missing values, set the na_position argument to first or last based on your requirement.

Example 4:

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [45, 40, 35, None], # πŸ‘ˆ 'David' has missing value
        'Score': [95, 80, 92, 70]}
df = pd.DataFrame(data)
print("Before sorting:")
print(df)

# πŸ‘‡ Sort DataFrame by 'Age' in ascending order
df_sorted_age = df.sort_values(by='Age', na_position='first')
print("\nAfter sorting by 'Age', missing values first:")
print(df_sorted_age)

Output:

Before sorting:
      Name   Age  Score
0    Alice  45.0     95
1      Bob  40.0     80
2  Charlie  35.0     92
3    David   NaN     70

After sorting by 'Age', missing values first:
      Name   Age  Score
3    David   NaN     70
2  Charlie  35.0     92
1      Bob  40.0     80
0    Alice  45.0     95

Conclusion

Knowledge of sorting DataFrame by column will help you to explore your dataset with different perspectives.

Now you can sort DataFrames by column in ascending or descending order, sort by multiple columns, and sort with missing values.

Happy Learning!πŸ˜ƒ