Add Column to DataFrame Pandas


As data enthusiasts, we understand the pivotal role Pandas plays in data manipulation. One fundamental skill is adding columns to DataFrames. It is a task that can significantly impact your data analysis.

In this article, we will explore 5 different ways to add columns to Pandas DataFrames and will look at diverse scenarios and common mistakes.

    Table of Contents

  1. Direct Assignment
  2. Using Existing Columns
  3. Applying a Function
  4. Concatenating DataFrames
  5. Using the assign() Method
  6. Common Mistakes and How to Avoid Them
  7. Conclusion

1. Direct Assignment

The most straightforward method involves directly assigning values to a new column. This method is suitable when you want to assign the same value to each row in the column.

For example, if you want to create a new column with label 'city' you can write df['City'] = 'New York'. This will create a new column with label 'city' and same values for all rows.

import pandas as pd

# Creating a simple DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print("Original Dataframe:")
print(df)

# ๐Ÿ‘‰ Adding a new 'City' column with the same value for all rows
df['City'] = 'New York'

print("\nDataframe after adding new column 'City'")
print(df)

Output:

Original Dataframe:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Dataframe after adding new column 'City'
      Name  Age      City
0    Alice   25  New York
1      Bob   30  New York
2  Charlie   35  New York

If you want to learn how to create a DataFrame click here.


2. Using Existing Columns

You can create a new column based on existing columns, utilizing the flexibility of Python's arithmetic and logical operations.

The following example deriving a new column 'Birth Year' based on 'Age' column.

import pandas as pd

# Creating a simple DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print("Original Dataframe:")
print(df)

# ๐Ÿ‘‰ Adding a new 'Birth Year' column based on 'Age'
df['Birth Year'] = 2024 - df['Age']

print("\nDataframe after adding new column 'Birth Year'")
print(df)

Output:

Original Dataframe:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Dataframe after adding new column 'Birth Year'
      Name  Age  Birth Year
0    Alice   25        1999
1      Bob   30        1994
2  Charlie   35        1989

3. Applying a Function

For more complex transformations, you can use the apply() function to apply a custom function to each row.

The following example creates a new column by applying a Python function on each element of other column.

import pandas as pd

# Creating a simple DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [15, 30, 35]}

df = pd.DataFrame(data)
print("Original Dataframe:")
print(df)

# ๐Ÿ‘‰ Adding a new 'Status' column based on a custom function
def determine_status(age):
    return 'Adult' if age >= 18 else 'Minor'

df['Status'] = df['Age'].apply(determine_status)

print("\nDataframe after adding new column 'Status'")
print(df)

Output:

Original Dataframe:
      Name  Age
0    Alice   15
1      Bob   30
2  Charlie   35

Dataframe after adding new column 'Status'
      Name  Age Status
0    Alice   15  Minor
1      Bob   30  Adult
2  Charlie   35  Adult

4. Concatenating DataFrames

When working with multiple DataFrames, concatenation is a powerful method to combine them and add columns simultaneously.

To concatenate 2 or more dataframes use concat() method and pass list of all dataframes to concat, it returns a new concatenated DataFrame.

import pandas as pd

# Creating a simple DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print("Original Dataframe:")
print(df)

# Creating a second DataFrame
data2 = {'Name': ['David', 'Eva'],
        'Age': [28, 22]}
df2 = pd.DataFrame(data2)

# ๐Ÿ‘‰ Concatenating DataFrames along columns
df_concatenated = pd.concat([df, df2])

print("\nAfter concatenating")
print(df_concatenated)

Output:

Original Dataframe:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

After concatenating
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
0    David   28
1      Eva   22

You can reset index of Dataframe later.


5. Using the assign() Method

The assign() method allows you to add one or more columns in a single line, creating a new DataFrame with the added columns.

import pandas as pd

# Creating a simple DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print("Original Dataframe:")
print(df)

# ๐Ÿ‘‰ Adding new 'Salary' and 'Experience' columns using assign()
df = df.assign(Salary=[60000, 70000, 80000], Experience=[2, 5, 8])

print("\nAfter adding 2 new columns")
print(df)

Output:

Original Dataframe:
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

After adding 2 new columns
      Name  Age  Salary  Experience
0    Alice   25   60000           2
1      Bob   30   70000           5
2  Charlie   35   80000           8

6. Common Mistakes and How to Avoid Them

1. Iterative Appends

Mistake: Iteratively appending rows or columns using methods like iterrows() can be inefficient and lead to performance issues, especially with large datasets.

Best Practice: Prefer direct assignment or vectorized operations to avoid iterative appends. They are more efficient and lead to cleaner code.

2. Inefficient Use of apply()

Mistake: Using apply() without considering vectorized alternatives can lead to slower execution, especially on large datasets.

Best Practice: Leverage Pandas' vectorized operations whenever possible. They are optimized for efficiency and can significantly improve performance.

3. Ignoring Memory Efficiency

Mistake: Adding columns without considering memory usage can lead to increased overhead, affecting performance.

Best Practice: Be mindful of memory usage, especially with substantial datasets. Choose methods that minimize memory overhead, such as vectorized operations.


Conclusion

With above discussed methods you are now equipped to add columns to Pandas DataFrames. You can try out these methods on your own and see which one works best for you.

Also look at the common mistakes and best practices to avoid them. This will help you write more efficient code and improve your data analysis.

Happy coding! ๐Ÿš€โœจ