Pandas Unique Values in Column


Finding unique values within column of a Pandas DataFrame is a fundamental step in data analysis. Whether you aim to understand the distinct elements or count their occurrences, Pandas offers several approaches to unravel these insights.

In this tutorial, we'll go through various methods to extract unique values from a column.

    Table of Contents

  1. Getting Unique Values
    1. Using unique()
    2. Using drop_duplicates()
    3. Using list and set
  2. Getting Unique Values with Counts
    1. Using value_counts()
    2. Using nunique()
  3. Handling NaN Values
  4. Counting Unique Values with Conditions
  5. Conclusion

1. Getting Unique Values

We can get the unique values from a column in a Pandas DataFrame in multiple different ways. Let's see them one by one.

1.1 Using unique()

Pandas provide unique() method which can be directly applied on columns of a DataFrame to get the unique values.

To apply this method, first access the column of the DataFrame as a Series object and then call the unique() method on it. For example, to get the unique values of column 'A', we'll write df['A'].unique().

import pandas as pd

# Creating a sample DataFrame
data = {'A': ['A1', 'A2', 'A1', 'A1', 'A2'],
        'B': ['B1', 'B2', 'B3', 'B3', 'B4'],
        'C': ['C1', 'C2', 'C3', 'C4', 'C5']}

df = pd.DataFrame(data)

# πŸ‘‡ get unique values of column 'A'
print(df['A'].unique())

# πŸ‘‡ get unique values of column 'B'
print(df['B'].unique())

Output:

['A1' 'A2']
['B1' 'B2' 'B3' 'B4']

1.2 Using drop_duplicates()

Another way to get unique values from a column is to use the drop_duplicates() method. It removes duplicate values from a column and returns a new DataFrame with unique values.

You can access the column you want to get unique values for and then call the drop_duplicates() method on it. Wrap the whole expression in a list to get the unique values as a list.

import pandas as pd

# Creating a sample DataFrame
data = {'A': ['A1', 'A2', 'A1', 'A1', 'A2'],
        'B': ['B1', 'B2', 'B3', 'B3', 'B4'],
        'C': ['C1', 'C2', 'C3', 'C4', 'C5']}

df = pd.DataFrame(data)

# πŸ‘‡ get unique values of column 'A'
unique_A = list(df['A'].drop_duplicates())
print(unique_A)

# πŸ‘‡ get unique values of column 'B'
unique_B = list(df['B'].drop_duplicates())
print(unique_B)

Output:

['A1', 'A2']
['B1', 'B2', 'B3', 'B4']

1.3 Using list() and set()

One way to get unique values from a column is to get column values as a list and then convert it into a set. Since sets only contain unique values, we'll get the unique values from the column.

import pandas as pd

# Creating a sample DataFrame
data = {'A': ['A1', 'A2', 'A1', 'A1', 'A2'],
        'B': ['B1', 'B2', 'B3', 'B3', 'B4'],
        'C': ['C1', 'C2', 'C3', 'C4', 'C5']}

df = pd.DataFrame(data)

# πŸ‘‡ get unique values of column 'A'
unique_A = list(set(df['A']))
print(unique_A)

# πŸ‘‡ get unique values of column 'B'
unique_B = list(set(df['B']))
print(unique_B)

Output:

['A1', 'A2']
['B1', 'B2', 'B3', 'B4']

2. Getting Unique Values with Counts

Maybe you want to know how many times each unique value occurs in a column. Let's look at 2 different ways to get unique values with their counts.

2.1 Using value_counts()

The value_counts() method is a way to get unique values from a column. It returns a Series object containing the unique values along with their counts.

You can directly apply this method on a column of a DataFrame and it will return an object containing the unique values of that column along with their counts.

import pandas as pd

# Creating a sample DataFrame
data = {'A': ['A1', 'A2', 'A1', 'A1', 'A2'],
        'B': ['B1', 'B2', 'B2', 'B3', 'B3'],
        'C': ['C1', 'C2', 'C3', 'C4', 'C5']}

df = pd.DataFrame(data)

# πŸ‘‡ get unique values of column 'A' along with their counts
print(df['A'].value_counts())

# πŸ‘‡ get unique values of column 'C' along with their counts
print(df['B'].value_counts())

Output:

A
A1    3
A2    2
Name: count, dtype: int64
B
B2    2
B3    2
B1    1
Name: count, dtype: int64

2.2 Using nunique()

As we saw in the previous example, the value_counts() method returns a Series object containing the unique values along with their counts. But what if we only want to count the unique values and not the counts?πŸ€”

Well, we can use the nunique() method to count the unique values in a column. It returns the count of unique values for a given column.

import pandas as pd

# Creating a sample DataFrame
data = {'A': ['A1', 'A2', 'A1', 'A1', 'A2'],
        'B': ['B1', 'B2', 'B3', 'B3', 'B4'],
        'C': ['C1', 'C2', 'C3', 'C4', 'C5']}

df = pd.DataFrame(data)

# πŸ‘‡ get number of unique values of column 'A'
unique_A = df['A'].nunique()
print("Number of unique values in column 'A':", unique_A)

# πŸ‘‡ get number of unique values of column 'B'
unique_B = df['B'].nunique()
print("Number of unique values in column 'B':", unique_B)

Output:

Number of unique values in column 'A': 2
Number of unique values in column 'B': 4

3. Handling NaN Values

While Nan values are automatically ignored while counting unique values, still for consistency it recommended to explicitly handle them.

To remove NaN values use dropna() method on the column before applying any of the above methods.

import pandas as pd
import numpy as np

# Creating a sample DataFrame
data = {'A': ['A1', 'A2', np.nan, 'A1', 'A2', np.nan],
        'B': ['B1', 'B2', 'B3', np.nan, 'B4', np.nan],
        'C': [np.nan, 'C1', 'C2', 'C3', 'C4', np.nan]}

df = pd.DataFrame(data)

# πŸ‘‡ remove NaN and get unique values of column 'A'
unique_A = df['A'].dropna().unique()
print(unique_A)

# πŸ‘‡ remove NaN and get unique values of column 'B'
unique_B = df['B'].dropna().unique()
print(unique_B)

Output:

['A1' 'A2']
['B1' 'B2' 'B3' 'B4']

4. Counting Unique Values with Conditions

What if you want to count the unique values of a column based on some condition? For example, you want to count the unique values of column 'A' where column 'B' is equal to 'B1'.

Well, we can use the loc[] method to filter the DataFrame based on the condition and then apply any of the above methods to get the unique values.

import pandas as pd

# Creating a sample DataFrame
data = {'A': ['A1', 'A2', 'A1', 'A1', 'A2'],
        'B': ['B1', 'B2', 'B1', 'B3', 'B1'],
        'C': ['C1', 'C2', 'C3', 'C4', 'C5']}

df = pd.DataFrame(data)

# πŸ‘‡ get unique values of column 'A' where column 'B' is equal to 'B1'
unique_A = df.loc[df['B'] == 'B1', 'A'].unique()
print(unique_A)

Output:

['A1', 'A2']

Conclusion

One of the most vital things to do during data exploration is to find the unique values in a Pandas DataFrame column. From simple uniqueness as provided by unique() to value_counts(), nunique() has provided us with detailed insights in this article.