How to Drop a Column in Pandas
In Pandas, dropping columns from a DataFrame is a common operation during data manipulation and preprocessing.
Understanding different methods to drop columns is crucial for data analysis workflows.
- Column Drop Methods
- Drop Multiple Columns
- Conclusion
Table of Contents
1. Column Drop Methods
There are multiple ways to drop columns from a DataFrame in Pandas. Here are some of the most common methods:
1.1 Using drop() method
The drop() method can be used to drop columns from a DataFrame. To drop a column, we need to specify the column name as an argument and set axis=1 as the second argument.
For example, to drop a column named 'Age' you can use df.drop('Age', axis=1)
where df is the DataFrame.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ drop a column
df = df.drop('Age', axis=1)
# or
# df.drop('Age', axis=1, inplace=True)
print("\nAfter dropping 'Age' column:")
print(df)
Output:
Original DataFrame: Name Age City 0 Alice 25 NY 1 Bob 30 LA 2 Charlie 35 SF After dropping 'Age' column: Name City 0 Alice NY 1 Bob LA 2 Charlie SF
1.2 Using del keyword
The del keyword is well known for deleting variables in Python. But it can also be used to delete columns from a DataFrame.
To delete a column named 'Age' you can write del df['Age']
, and the column will be deleted from the DataFrame.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ delete a column
del df['Age']
print("\nAfter deleting Age column:")
print(df)
Output:
Original DataFrame: Name Age City 0 Alice 25 NY 1 Bob 30 LA 2 Charlie 35 SF After deleting Age column: Name City 0 Alice NY 1 Bob LA 2 Charlie SF
1.3 Using pop() method
The pop() method is used to remove a column from a DataFrame and return it. It takes the column name as an argument.
For example, df.pop('Age')
will remove the 'Age' column from the DataFrame and return it.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ drop a column
df.pop('Age')
print("\nAfter dropping 'Age' column:")
print(df)
Output:
Original DataFrame: Name Age City 0 Alice 25 NY 1 Bob 30 LA 2 Charlie 35 SF After deleting Age column: Name City 0 Alice NY 1 Bob LA 2 Charlie SF
2. Drop Multiple Columns
Pandas deal with data of 100s of columns so knowing how to drop multiple columns is important.
Method 1
To drop multiple columns, you can use drop() method and pass a list of column names to be dropped as an argument.
For example, df.drop(['Age', 'City'], axis=1)
will drop both 'Age' and 'City' columns from the DataFrame.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ droping multiple columns
df.drop(['Age', 'City'], axis=1, inplace=True)
print("\nAfter dropping 'Age' and 'City' columns:")
print(df)
Output:
Original DataFrame: Name Age City 0 Alice 25 NY 1 Bob 30 LA 2 Charlie 35 SF After dropping 'Age' and 'City' columns: Name 0 Alice 1 Bob 2 Charlie
Method 2
The df.column[] method returns a Series object containing the column values. You can pass the index values if columns you want to delete in df.column[] which will return a DataFrame with the specified columns.
You can pass the returned DataFrame to the drop() method and set axis=1 to drop the columns.
import pandas as pd
# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ droping multiple columns
df.drop(df.columns[[0, 2]], axis=1, inplace=True)
print("\nAfter dropping colums at index 0 and 2:")
print(df)
Output:
Original DataFrame: Name Age City 0 Alice 25 NY 1 Bob 30 LA 2 Charlie 35 SF After dropping colums at index 0 and 2: Age 0 25 1 30 2 35
Method 3
Another way to delete multiple columns can be iloc[] method.
Following example is removing columns from index 1 to 4 using iloc[] and drop() method.
import pandas as pd
# Creating a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
'B': [10, 20, 30, 40, 50],
'C': [11, 22, 33, 44, 55],
'D': [12, 24, 36, 48, 60],
'E': [13, 26, 39, 52, 65],
'F': [14, 28, 42, 56, 70]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# ๐ droping multiple columns
# to drop columns from position 1 to 4 pass index 1:5
df.drop(df.iloc[:, 1:5], axis=1, inplace=True)
print("\nAfter dropping colums from index 1 to 4:")
print(df)
Output:
Original DataFrame: A B C D E F 0 1 10 11 12 13 14 1 2 20 22 24 26 28 2 3 30 33 36 39 42 3 4 40 44 48 52 56 4 5 50 55 60 65 70 After dropping colums from index 1 to 4: A F 0 1 14 1 2 28 2 3 42 3 4 56 4 5 70
Conclusion
Dropping columns in Pandas DataFrames is essential for data manipulation tasks. Using methods like drop(), del, and pop() allows for seamless removal of single or multiple columns.