Drop Rows with NaN Pandas


NaN are missing values in a DataFrame, for some analysis it may cause a problem or inconsistency in the result. So, it is better to drop the rows with NaN values.

In this article, you will learn how to drop rows with NaN values and how to drop rows with NaN values in a specific column.

    Table of Contents

  1. Dropping Rows with NaN ๐Ÿ”
  2. Dropping Rows Based on Specific Columns ๐ŸŽฏ
  3. Dropping Rows with a Threshold ๐Ÿ“Š
  4. Conclusion

1. Dropping Rows with NaN

To drop rows with NaN values use dropna() function. When it is applied on a Dataframe, it drops all the rows with NaN values.

For example, df.dropna() will drop any row with NaN values.

import pandas as pd

# Creating a sample DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, None, 35, 30],
        'Score': [90, 85, None, 92]}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# ๐Ÿ‘‡ Dropping rows with NaN
df.dropna(inplace=True)

print("\nDataFrame after dropping rows with NaN:")
print(df)

Output:

Original DataFrame:
      Name   Age  Score
0    Alice  25.0   90.0
1      Bob   NaN   85.0
2  Charlie  35.0    NaN
3    David  30.0   92.0

DataFrame after dropping rows with NaN:
    Name   Age  Score
0  Alice  25.0   90.0
3  David  30.0   92.0

You can clearly see that all rows with NaN values are dropped.


2. Dropping Rows Based on Specific Columns

Now, suppose you only want to drop those rows which have NaN values in a specific column. For example, we have a dataframe with a column "id" which must have a value, so it is absolutely necessary to drop the rows with NaN values in the "id" column.

To drop rows with NaN values for a specific column pass the column name in the subset parameter of dropna() function.

import pandas as pd

# Creating a sample DataFrame with NaN values
data = {'id': [1, 2, None, 4],
        'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, None],
        'Score': [None, 85, 80, 75]}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# ๐Ÿ‘‡ Dropping rows with NaN in "id" column
df.dropna(subset=['id'], inplace=True)

print("\nDataFrame after dropping rows with NaN in 'id' column:")
print(df)

Output:

Original DataFrame:
    id     Name   Age  Score
0  1.0    Alice  25.0    NaN
1  2.0      Bob  30.0   85.0
2  NaN  Charlie  35.0   80.0
3  4.0    David   NaN   75.0

DataFrame after dropping rows with NaN in 'id' column:
    id   Name   Age  Score
0  1.0  Alice  25.0    NaN
1  2.0    Bob  30.0   85.0
3  4.0  David   NaN   75.0

As you can see, only the rows with NaN values in the "id" column are dropped.


3. Dropping Rows with a Threshold

Suppose you want to drop rows with NaN values only if there are more than 2 NaN values (or n NaN) in a row. To do this, pass the thresh parameter with the value 2 (or n) in the dropna() function.

import pandas as pd

# Creating a sample DataFrame with NaN values
data = {'Name': ['Alice', 'Bob', None, 'David'],
        'Age': [25, None, 35, 30],
        'Score': [90, 85, None, 92]}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# ๐Ÿ‘‡ Dropping rows with NaN values only if
# there are more than 2 NaN values in a row
df.dropna(thresh=2, inplace=True)

print("\nDataFrame after dropping rows 2 NaN values:")
print(df)

Output:

Original DataFrame:
    Name   Age  Score
0  Alice  25.0   90.0
1    Bob   NaN   85.0
2   None  35.0    NaN
3  David  30.0   92.0

DataFrame after dropping rows 2 NaN values:
    Name   Age  Score
0  Alice  25.0   90.0
1    Bob   NaN   85.0
3  David  30.0   92.0

In the above output you can see third row is not dropped because it has only one NaN value.


Conclusion

Now whether it's a need to drop rows with NaN values or to drop rows with NaN values in a specific column, you know how to do it.

This skill will help you to clean your data before performing any analysis.