How to Reset Index in Pandas


In data analysis, we often need to filter out some rows or columns from the dataset to make it more meaningful. But after filtering out the rows or columns, the index of the dataframe is not in order.

After filtering or manipulating the DataFrame, it still might contains the old index values, which can cause discontinuity in the index values. To avoid this, we can reset the index of the dataframe.

In this tutorial, we will explore various methods to reset the index of a dataframe in Pandas.

    Table of Contents

  1. Understanding Index in Pandas
  2. Reset Index in Pandas
    1. Reset Index to Start at 0
    2. Reset Index to Start at 1
    3. Reset Index to Own Custom Index
    4. Reset Index and Remove Old Index
    5. Reset Column as Index
    6. Reset Column as Index and Remove Old Index
  3. Conclusion

Understanding Index in Pandas

Index in Pandas is a way to uniquely identify each row of the dataframe. By default, the index of a dataframe is set to start at 0 and increment by 1 for each row.

Let's create a dataframe and see how the index works.

import pandas as pd

data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
        'Age': [34, 29, 30, 25, 32, 27],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}

df = pd.DataFrame(data)
print(df)

Output:

     Name  Age          City
0     John   34      New York
1    Smith   36   Los Angeles
2     Dave   30       Chicago
3    James   29       Houston
4   Robert   32  Philadelphia
5    Maria   27       Phoenix

As you can see, the index of the dataframe starts at 0 and increments by 1 for each row.

Now, let's filter out some rows from the dataframe and see how the index changes.

# filter out rows with age less than 30
df = df[df['Age'] >= 30]
print(df)

Output:

     Name  Age          City
0     John   34      New York
2    Smith   36   Los Angeles
4   Robert   32  Philadelphia

As you can see, the index of the dataframe is not in order anymore. It still contains the old index values, which can cause discontinuity in the index values.

To avoid this, we can reset the index of the dataframe.


Reset Index in Pandas

To reset the index of a dataframe, we can use the reset_index() method.

Syntax:

df.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

Let's see how it works.

Reset Index to Start at 0

To reset the index of a dataframe to start at 0, we can use the reset_index() method with drop parameter set to True.

import pandas as pd

data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
        'Age': [34, 29, 30, 25, 32, 27],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}

df = pd.DataFrame(data)

# filter out rows with age less than 30
df = df[df['Age'] >= 30]

# reset index to start at 0
df = df.reset_index(drop=True)

print(df)

Output:

     Name  Age          City
0     John   34      New York
1    Smith   36   Los Angeles
2   Robert   32  Philadelphia

As you can see, the index of the dataframe is reset to start at 0.


Reset Index to Start at 1

To reset the index of a dataframe to start at 1, we can use the reset_index() method with drop parameter set to True and start parameter set to 1.

import pandas as pd

data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
        'Age': [34, 29, 30, 25, 32, 27],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}

df = pd.DataFrame(data)

# filter out rows with age less than 30
df = df[df['Age'] >= 30]

# reset index to start at 1
df = df.reset_index(drop=True, start=1)

print(df)

Output:

     Name  Age          City
1     John   34      New York
2    Smith   36   Los Angeles
3   Robert   32  Philadelphia

Now the index of the dataframe starts at 1.


Reset Index to Own Custom Index

A dataframe can also have a custom index. For our example, we will alphabets as index.

To set alphabets as index, we will need to use set_index() method. Let's see how it works.

# Import pandas package
import pandas as pd
	
data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
        'Age': [34, 29, 30, 25, 32, 27],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}

# custom index
alpha = ['a', 'b', 'c', 'd', 'e', 'f']

# create dataframe with custom index
df = pd.DataFrame(data, index=alpha)

# In this case default index is exist
df.reset_index(inplace = True)

print(df)

Output:

  index    Name  Age          City
0     a    John   34      New York
1     b   Smith   29   Los Angeles
2     c    Dave   30       Chicago
3     d   James   25       Houston
4     e  Robert   32  Philadelphia
5     f   Maria   27       Phoenix

A new column index is created as a custom index.


Reset Index and Remove Old Index

In this example code snippet, we are setting alphabets as index and no reset_index() method is used.

# Import pandas package
import pandas as pd
	
data = {'Name': ['John', 'Smith', 'Dave', 'James', 'Robert', 'Maria'],
        'Age': [34, 29, 30, 25, 32, 27],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Philadelphia', 'Phoenix']}

# custom index
alpha = ['a', 'b', 'c', 'd', 'e', 'f']

# create dataframe with custom index
df = pd.DataFrame(data, index=alpha)

print(df)

Output:

     Name  Age          City
a     John   34      New York
b    Smith   29   Los Angeles
c     Dave   30       Chicago
d    James   25       Houston
e   Robert   32  Philadelphia
f    Maria   27       Phoenix

Conclusion

Resetting the index in pandas is a fundamental operation that allows us to reorganize and transform DataFrames based on specific requirements. In this article, we explored various methods such as reset_index(), set_index(), and the use of the inplace parameter.

By understanding these techniques and using the provided code examples, you can confidently reset the index in pandas and effectively manage your data analysis tasks.

Happy Learning!