Pandas Drop Rows with Condition


Data cleaning often involves removing rows that don't meet specific conditions. In Pandas, this task is streamlined through various methods.

This guide will explore how to drop rows based on conditions, allowing you to tailor your DataFrame to your analysis requirements.

    Table of Contents

  1. Dropping Rows Based on a Single Condition
  2. Dropping Rows Based on Multiple Conditions
  3. Dropping Rows with String Conditions
  4. Dropping Rows with a Custom Function
  5. Other Methods for Dropping Rows
    1. Using the drop() Method
    2. Using the query() Method
  6. Conclusion

1. Dropping Rows Based on a Single Condition

Best method to drop rows based on condition is to use loc[] method and pass the condition as a boolean expression. This method is also known as boolean indexing.

Suppose you want to drop all the rows where the value of 'Age' column is less than 30, then you can do it like this df.loc[df['Age'] < 30].

Here is an example for this.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 20],
        'Score': [80, 85, 88, 92]}

df = pd.DataFrame(data)

# πŸ‘‡ Dropping rows where Age is less than 30
df_filtered = df[df['Age'] < 30]
print(df_filtered)

Output:

    Name  Age  Score
0  Alice   25     80
3  David   20     92

2. Dropping Rows Based on Multiple Conditions

Multiple conditions filters data more precisely. You can use multiple conditions with loc[] method by separating them with & operator.

The following example drops all the rows where 'Age' is less than 30 and 'Score' is greater than 90.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 20],
        'Score': [80, 85, 88, 92]}

df = pd.DataFrame(data)

# πŸ‘‡ Dropping rows where Age is less than 30 and Score is less than 90
df_filtered = df[(df['Age'] < 30) & (df['Score'] > 90)]

print(df_filtered)

Output:

    Name  Age  Score
3  David   20     92

You can also use other operators like | (or), == (equal to), != (not equal to), > (greater than), < (less than), >= (greater than or equal to), <= (less than or equal to) etc. with loc[] method to filter data based on multiple conditions.


3. Dropping Rows with String Conditions

The string conditions are applied on the string columns of the DataFrame. In the following example we are filtering the rows where 'Name' column starts with 'A'.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 20],
        'Score': [80, 85, 88, 92]}

df = pd.DataFrame(data)

# πŸ‘‡ Dropping rows where Name starts with 'A'
df_filtered = df[df['Name'].str.startswith('A')]

print(df_filtered)

Output:

    Name  Age  Score
0  Alice   25     80

4. Dropping Rows with a Custom Function

You can also use a custom function to filter rows based on your own conditions. The function should return True or False for each row.

In the following example we are filtering the rows where 'Age' is less than 30 and 'Score' is greater than 90 but with a custom function.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 20],
        'Score': [80, 85, 88, 92]}

df = pd.DataFrame(data)

# πŸ‘‡ Dropping rows where Age is less than 30 and Score is less than 90
def custom_filter(row):
    if row['Age'] < 30 and row['Score'] > 90:
        return True
    else:
        return False

df_filtered = df[df.apply(custom_filter, axis=1)]

print(df_filtered)

Output:

    Name  Age  Score
3  David   20     92

5. Other Methods for Dropping Rows

Other than using loc[] method, we have other ways to drop rows based on conditions.

We are discussing two of them here to improve your diversity of knowledge.

5.1 Using the drop() Method

The drop() method is used to drop rows or columns from a DataFrame. It takes the index or column name as a parameter and drops the corresponding rows or columns.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 20],
        'Score': [80, 85, 88, 92]}

df = pd.DataFrame(data)

# πŸ‘‡ getting rows where Age is less than 30
indexes = df[df['Age'] < 30].index
# πŸ‘‡ dropping rows where Age is less than 30
df.drop(indexes, inplace=True)

print(df)

Output:

      Name  Age  Score
1      Bob   30     85
2  Charlie   35     88

Here, we are first getting the indexs of the rows where 'Age' is less than 30 and then passing it to the drop() method to drop those rows.

Similarly, you can use drop() method to drop rows based on multiple conditions.

5.2 Using the query() Method

The query() method is used to filter rows based on conditions. It takes the condition as a string and returns the filtered DataFrame.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 20],
        'Score': [80, 85, 88, 92]}

df = pd.DataFrame(data)

# πŸ‘‡ getting rows where Age is less than 30
df_filtered = df.query('Age < 30')

print(df_filtered)

Output:

    Name  Age  Score
0  Alice   25     80
3  David   20     92

Similarly, you can use query() method to filter rows based on multiple conditions.


Conclusion

In this guide, we learned how to drop rows based on conditions in Pandas using loc[] method, drop() method and query() method.

By mastering these techniques, you enhance the precision and reliability of your data analysis workflows. πŸ§ΉπŸ”