Pandas Select Columns by Name


Pandas provide a wide range of functionalities to clean and manipulate data. One of the most common tasks in data analysis is to select a subset of columns from a DataFrame. This flexibility allows us efficient data exploration and analysis.

In this tutorial, you will learn how to select columns from a DataFrame by their name.

    Table of Contents

  1. Select Single Column
  2. Select Multiple Columns
  3. Select Columns by Booleans
  4. Select Columns in Range
  5. Select Columns by Regex
  6. Select Columns by Condition
  7. Conclusion

1. Select Single Column

To select a single column from a DataFrame, you can use the df['column_name'] syntax. This will return a Series object.

The following example selects the 'Name' column from the DataFrame.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}

df = pd.DataFrame(data)

print(df['Name'])

Output:

0      Alice
1        Bob
2    Charlie
3      David
4       Emma
Name: Name, dtype: object

Another way to select a single column by name is to use the df.loc[] method. This method is used to select rows and columns by labels.

To select a single column of the label 'Name', you can use df.loc[:, 'Name'] syntax.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}

df = pd.DataFrame(data)

# Selecting a single column by name
# using df.loc[]
print(df.loc[:, 'Name'])

Output:

0      Alice
1        Bob
2    Charlie
3      David
4       Emma
Name: Name, dtype: object

2. Select Multiple Columns

To select multiple columns from a DataFrame, you can pass a list of column names to the df[] operator or df.loc[] method.

Let's select the 'Name' and 'City' columns from the DataFrame.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}

df = pd.DataFrame(data)

# Selecting multiple columns by name
# using df[]
print(df[['Name', 'City']])

# using df.loc[]
print(df.loc[:, ['Name', 'City']])

Output:

      Name City
0    Alice   NY
1      Bob   LA
2  Charlie   SF
3    David   NY
4     Emma   LA

      Name City
0    Alice   NY
1      Bob   LA
2  Charlie   SF
3    David   NY
4     Emma   LA

3. Select Columns by Booleans

Instead of passing a list of column names, you can also pass a list of booleans representing the columns you want to select.

For a series of booleans, the True values will be selected and False values will be ignored.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}

df = pd.DataFrame(data)

# Selecting columns by booleans
# using df.loc[]
print(df.loc[:, [True, False, True]])

Output:

      Name City
0    Alice   NY
1      Bob   LA
2  Charlie   SF
3    David   NY
4     Emma   LA

Here, [True, False, True] tells the df.loc[] method to select the first and third columns.


4. Select Columns in Range

Suppose there are 26 columns in a DataFrame named from 'A' to 'Z'. To select all the columns from 'A' to 'F', you can use the df.loc[:, 'A':'F'] syntax.

This will select all the columns from 'A' to 'F' including both columns.

Let's see an example.

import pandas as pd

# Creating a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
        'B': [10, 20, 30, 40, 50],
        'C': [11, 22, 33, 44, 55],
        'D': [12, 24, 36, 48, 60],
        'E': [13, 26, 39, 52, 65],
        'F': [14, 28, 42, 56, 70]}

df = pd.DataFrame(data)

# Selecting columns in range
# select all the columns from 'B' to 'E'
print(df.loc[:, 'B':'E'])

Output:

    B   C   D   E
0  10  11  12  13
1  20  22  24  26
2  30  33  36  39
3  40  44  48  52
4  50  55  60  65

5. Select Columns by Regex

To select columns by regex, you can use the df.filter() method. This method takes a regex as an argument and returns the columns matching the regex.

The regex value is applied to the column names and the columns matching the regex are returned.

Let's see an example.

import pandas as pd

# Creating a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma'],
        'Age': [25, 30, 35, 40, 45],
        'City': ['NY', 'LA', 'SF', 'NY', 'LA']}

df = pd.DataFrame(data)

# Selecting columns by regex
# select all the columns having 'A' or 'a' in their names
print(df.filter(regex='[Aa]'))

Output:

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
3    David   40
4     Emma   45

6. Select Columns by Condition

Suppose you want to select all the columns having a mean greater than 50. To do so, you can use the df.mean() method to calculate the mean of all the columns and then pass the condition to the df.loc[] method.

Let's see an example.

import pandas as pd

# Creating a sample DataFrame
data = {'A': [1, 2, 3, 4, 5],
        'B': [10, 20, 30, 40, 50],
        'C': [11, 22, 33, 44, 55],
        'D': [12, 24, 36, 48, 60],
        'E': [13, 26, 39, 52, 65],
        'F': [14, 28, 42, 56, 70]}

df = pd.DataFrame(data)

# Selecting columns by condition
# select all the columns having a mean greater than 35
print(df.loc[:, df.mean() > 35])

Output:

    D   E   F
0  12  13  14
1  24  26  28
2  36  39  42
3  48  52  56
4  60  65  70

Conclusion

Now you can select columns from a DataFrame by their name in various possible ways.

Learn how to select rows by condition from a DataFrame.