10 Ways to Create Pandas DataFrame


One of the most amazing tools for data manipulation in Python is the Pandas library. But what makes Pandas so powerful? πŸ€”

It is the DataFrame, a highly versatile and robust data structure that serves as the backbone of managing and arranging data.

The main objective of this article is to explore deeply about Pandas DataFrame comparing it with other types of data structures, giving practical examples, and covering various ways one can construct DataFrame.

    Table of Contents

  1. Understanding DataFrame
  2. DataFrame vs. Other Data Objects
  3. Example of a Pandas DataFrame
  4. Creating Pandas DataFrame
    1. Creating Empty DataFrame
    2. Creating DataFrame from Dictionary
    3. Creating DataFrame from List of Lists
    4. Creating DataFrame from List of Dictionaries
    5. Creating DataFrame using zip()
    6. Creating Dataframe from Dictionary of Series
    7. Creating from Dictionary of Series
    8. Creating DataFrame with Custom Index
    9. Creating DataFrame from CSV File
    10. Creating DataFrame from Excel File
    11. Creating DataFrame from JSON File
  5. Conclusion

Understanding DataFrame

A Pandas DataFrame is a two-dimensional data structure like a table with rows and columns.

You can compare it to a spreadsheet or SQL table, where data is organized in rows and columns, and each column can have a specific data type. This structure allows for easy manipulation, analysis, and cleaning of data.


DataFrame vs. Other Data Objects


Example of a Pandas DataFrame

pandas.DataFrame() function is used to create a DataFrame in Pandas.

Syntax:

pandas.DataFrame(data=None, index=None, columns=None)

Here,

import pandas as pd

# sample data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}

# Creating a DataFrame from a dictionary
df = pd.DataFrame(data)

print(df)

Output:

        Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles

Creating Pandas DataFrame

You can create from almost whatever data structure you have, be it a dictionary, list, NumPy array, or another DataFrame.

Whatever data you have find below the ways to create a DataFrame from it.

1. Creating Empty DataFrame

To create a dataframe we use pd.DataFrame() function and pass the data to it.

We can pass a list of column names to the columns parameter and it will create an empty DataFrame with the given column names. You can also even choose to pass nothing then it will create an empty DataFrame with no columns.

import pandas as pd

# πŸ‘‡ Creating an empty DataFrame
df = pd.DataFrame(columns=['Name', 'Age', 'City'])

print(df)

Output:

Empty DataFrame
Columns: [Name, Age, City]
Index: []

Also, learn how to check if a dataframe is empty.


2. Creating DataFrame from Dictionary

One of the most common ways to create a DataFrame is from a dictionary.

Each key of the dictionary represents a column name and the corresponding value is a list of column values.

import pandas as pd

# sample data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}

# πŸ‘‡ Creating a DataFrame from a dictionary
df = pd.DataFrame(data)

print(df)

Output:

        Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles

3. Creating DataFrame from List of Lists

Another way to create a DataFrame is from a list of lists. Here each inner list represents a row and the outer list represents the whole DataFrame.

It is important to note that all the inner lists must be of the same length.

For this, you have to explicitly create column names.

import pandas as pd

# sample data
data = [['Alice', 25, 'New York'],
        ['Bob', 30, 'San Francisco'],
        ['Charlie', 35, 'Los Angeles']]

# πŸ‘‡ Creating a DataFrame from a list of lists
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])

print(df)

Output:

        Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles

4. Creating DataFrame from List of Dictionaries

The data you have may be in the form of a list of dictionaries. Here each dictionary represents a row and the keys of the dictionary represent the column names.

import pandas as pd

# sample data
data = [{'Name': 'Alice', 'Age': 25, 'City': 'New York'},
        {'Name': 'Bob', 'Age': 30, 'City': 'San Francisco'},
        {'Name': 'Charlie', 'Age': 35, 'City': 'Los Angeles'}]

# πŸ‘‡ Creating a DataFrame from a list of dictionaries
df = pd.DataFrame(data)

print(df)

Output:

        Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles

5. Creating DataFrame using zip()

In the name of sample data suppose you have 2 lists and you have to create dataframe from it.

Combine these lists using the zip() function, make a list of tuples, and pass it to the pd.DataFrame() function.

import pandas as pd

# sample data
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]

# list of tuples
data = list(zip(names, ages))

# πŸ‘‡ Creating a DataFrame using zip()
df = pd.DataFrame(data, columns=['Name', 'Age'])

print(df)

Output:

     Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

6. Creating Dataframe from Dictionary of Series

A Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.).

Let's see how to create a DataFrame from a dictionary of Series.

import pandas as pd

name_series = pd.Series(['Alice', 'Bob', 'Charlie'])
age_series = pd.Series([25, 30, 35])
city_series = pd.Series(['New York', 'San Francisco', 'Los Angeles'])

# sample data
data = {
    'name': name_series,
    'age': age_series,
    'city': city_series
}

# πŸ‘‡ Creating a DataFrame from a dictionary of Series
df = pd.DataFrame(data)

print(df)

Output:

     name  age           city
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles

7. Creating DataFrame with Custom Index

By default, Pandas DataFrame has a numeric index starting from 0. But you can also create a DataFrame with a custom index, like "a", "b", "c", etc.

Let's create a dataframe with custom index.

import pandas as pd

# sample data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}

# custom index
my_index = ['a', 'b', 'c']

# πŸ‘‡ Creating a DataFrame with custom index
df = pd.DataFrame(data, index=my_index)

print(df)

Output:

        Name  Age           City
a    Alice   25       New York
b      Bob   30  San Francisco
c  Charlie   35    Los Angeles

8. Creating DataFrame from CSV File

The big data you will work with will generally be in the form of CSV files. So, it is important to know how to create a DataFrame from a CSV file.

For this, you can apply the read_csv() function of Pandas and pass the path of the CSV file to it.

It will create a DataFrame from the CSV file.

import pandas as pd

# πŸ‘‡ Creating a DataFrame from a CSV file
df = pd.read_csv('./data.csv')

print(df)

9. Creating DataFrame from Excel File

Like CSV files, you can create a DataFrame from an Excel file.

For this, you can apply the read_excel() function of Pandas and pass the path of the Excel file to it.

For testing you can convert your CSV file to an Excel file using the online tool.

import pandas as pd

# πŸ‘‡ Creating a DataFrame from an Excel file
df = pd.read_excel('./data.xlsx')

print(pd)

10. Creating DataFrame from JSON File

JSON files are also a popular way to store data. To convert JSON data into Pandas DataFrame use the read_json() function.

import pandas as pd

# πŸ‘‡ Creating a DataFrame from a JSON file
df = pd.read_json('./data.json')

print(df)

Conclusion

So, by now you have learned about DataFrame and how to create it from any possible data given to you.

This guide provides a comprehensive overview, empowering you to leverage the full potential of Pandas in your data-centric projects. To learn more about Pandas check the sidebar.

Happy Pythoning! πŸ˜‡