Pandas Replace Values in Column


Data cleaning is an integral part of data preprocessing, and one common task is replacing values in a Pandas DataFrame column.

In this tutorial, we'll explore different techniques to replace values in a Pandas DataFrame column, and cover scenarios from simple replacements to more complex transformations.

    Table of Contents

  1. Replacing Specific Values
    1. Using replace() Method
    2. Using Boolean Indexing
  2. Handling Missing Values
    1. Replacing NaN with fillna()
    2. Replacing NaN with a specific value
  3. Conditional Value Replacements
  4. Replacing Multiple Different Values
  5. Conclusion

1. Replacing Specific Values

Let's start with a simple example of replacing specific values in a Pandas DataFrame column.

There are two ways to replace values in a Pandas DataFrame column:

  1. Using replace() method
  2. Using loc[] and Boolean Indexing

1.1 Using replace() Method

The replace() method is famously used to replace values in a Pandas.

To replace a value with another value in dataframe column, pass the value to be replaced as the first argument and the value to be replaced with as the second argument to the replace() method.

For example, df['column'].replace(1, 100) will replace all the occurrences of 1 in the column with 100.

import pandas as pd

# Sample DataFrame
data = {'A': ['A1', 'A2', 'A3', 'A1', 'A2']}
df = pd.DataFrame(data)

# πŸ‘‡ Replace 'A1' with 'New_A1'
df['A'].replace('A1', 'New_A1', inplace=True)
print(df)

Output:

        A
0  New_A1
1      A2
2      A3
3  New_A1
4      A2

1.2 Using Boolean Indexing

Boolean Indexing is a technique to filter a DataFrame based on a condition. For example, df.loc[df['column'] == 1] will return a DataFrame with only those rows where the value of the column is 1.

To replace value just assign the new value to the column.

import pandas as pd

# Sample DataFrame
data = {'A': ['A1', 'A2', 'A3', 'A1', 'A2']}
df = pd.DataFrame(data)

# πŸ‘‡ Replace 'A1' with 'New_A1' using boolean indexing
df.loc[df['A'] == 'A1', 'A'] = 'New_A1'
print(df)

Output:

        A
0  New_A1
1      A2
2      A3
3  New_A1
4      A2

2. Handling Missing Values

Missing values are a common occurrence in real-world datasets. Let's see how to replace missing values in a Pandas DataFrame column.

2.1 Replacing NaN with fillna()

The fillna() method replaces all the NaN values in a DataFrame column with the value passed as an argument.

import pandas as pd
import numpy as np

# Sample DataFrame with NaN values
data = {'A': [1, 2, np.nan, 4, 5, np.nan]}
df = pd.DataFrame(data)

# πŸ‘‡ Replace NaN with 0
df['A'].fillna(0, inplace=True)
print(df)

Output:

     A
0  1.0
1  2.0
2  0.0
3  4.0
4  5.0
5  0.0

2.2 Replacing NaN with a specific value

If you prefer to replace NaN with a specific value across the entire column, use the replace() method.

import pandas as pd
import numpy as np

# Sample DataFrame with NaN values
data = {'A': [1, 2, np.nan, 4, 5, np.nan]}
df = pd.DataFrame(data)

# πŸ‘‡ Replace NaN with 100
df['A'].replace(np.nan, 100, inplace=True)
print(df)

Output:

       A
0    1.0
1    2.0
2  100.0
3    4.0
4    5.0
5  100.0

3. Conditional Value Replacements

Conditional value replacements are a common occurrence in data cleaning.

The following example replaces all the values in the column that are greater than 30 with 100.

import pandas as pd

# Sample DataFrame
data = {'A': [10, 20, 30, 40, 50]}
df = pd.DataFrame(data)

# πŸ‘‡ Replace values greater than 30 with 100
df.replace(df[df['A'] > 30], 100, inplace=True)
print(df)

# or another way to do it
# df.loc[df['A'] > 30, 'A'] = 100

Output:

     A
0   10
1   20
2   30
3  100
4  100

4. Replacing Multiple Different Values

For a big dataset with multiple different values, it's not feasible to replace each value individually.

To solve this problem, we can use a dictionary to map the values to be replaced with the new values.

For example, in the following example, we'll 'A' with 1, 'B' with 2, and 'C' with 3.

import pandas as pd

# Sample DataFrame
data = {'A': ['A', 'B', 'C', 'A', 'B', 'C']}
df = pd.DataFrame(data)

# πŸ‘‡ Replace multiple values
replace_with = {'A': 1, 'B': 2, 'C': 3}
df['A'].replace(replace_with, inplace=True)
print(df)

# or another way to do it
# df['A'] = df['A'].map(replace_with)

Output:

   A
0  1
1  2
2  3
3  1
4  2
5  3

Conclusion

We have learned various ways to replace values in a Pandas DataFrame column, from simple replacements to complex transformations.

When faced with data cleaning tasks, choose the method that aligns with your specific requirements.

Happy Learning!πŸ˜ƒ