Pandas Create DataFrame from List of Dicts


A list of dictionaries is a structured way to represent tabular data, where each dictionary corresponds to a row in the eventual DataFrame. Each key-value pair in a dictionary represents a column and its value.

Pandas provides a seamless ways to create DataFrame from different data structures. In this tutorial, we will learn how to create a DataFrame from a list of dictionaries.

    Table of Contents

  1. Creating DataFrame from List of Dicts πŸ“œ
  2. Customizing Column Order 🏷️
  3. Handling Missing Values πŸ•΅οΈβ€β™‚οΈ
  4. Conclusion 🌟

1. Creating DataFrame from List of Dicts πŸ“œ

Creating a DataFrame from a list of dictionaries is as simple as passing the list to the DataFrame() constructor.

Let's create a list of dictionaries representing the sales data of a company.

import pandas as pd

sales_data = [
    {'name': 'John', 'product': 'apple', 'units': 10, 'price': 0.5},
    {'name': 'Mary', 'product': 'banana', 'units': 5, 'price': 0.2},
    {'name': 'Peter', 'product': 'apple', 'units': 2, 'price': 0.5},
    {'name': 'John', 'product': 'banana', 'units': 3, 'price': 0.2}
]

# πŸ‘‡ Create DataFrame from list of dicts
df = pd.DataFrame(sales_data)
print(df)

Output:

    name product  units  price
0   John   apple     10    0.5
1   Mary  banana      5    0.2
2  Peter   apple      2    0.5
3   John  banana      3    0.2

As you can see, each dictionary in the list represents a row in the DataFrame. The keys of the dictionary are the column names and the values are the values of the corresponding columns.


2. Customizing Column Order 🏷️

Arrangement of column in DataFrame is very important. We often analyze data by looking at a few columns and ignore the rest. So, it is important to arrange the columns in a way that makes it easy to analyze the data.

Let's set the order of columns in the DataFrame.

sales_data = [
    {'name': 'John', 'product': 'apple', 'units': 10, 'price': 0.5},
    {'name': 'Mary', 'product': 'banana', 'units': 5, 'price': 0.2},
    {'name': 'Peter', 'product': 'apple', 'units': 2, 'price': 0.5},
    {'name': 'John', 'product': 'banana', 'units': 3, 'price': 0.2}
]

# πŸ‘‡ Create DataFrame from list of dicts
# and customize column order
df = pd.DataFrame(sales_data, columns=['product', 'price', 'units', 'name'])
print(df)

Output:

  product  price  units   name
0   apple    0.5     10   John
1  banana    0.2      5   Mary
2   apple    0.5      2  Peter
3  banana    0.2      3   John

As you can see, the columns are arranged in the order we specified.


3. Handling Missing Values πŸ•΅οΈβ€β™‚οΈ

In real-world scenarios, dictionaries might have missing values. Pandas automatically handles this by assigning NaN (Not a Number) for missing values.

import pandas as pd

# Introduce missing values in one dictionary
data_list_of_dicts_missing = [
    {'Name': 'Alice', 'Age': 25, 'City': 'NY'},
    {'Name': 'Bob', 'Age': 30, 'City': 'LA'},
    {'Name': 'Charlie', 'City': 'SF'}
]

# πŸ‘‡ Create DataFrame with missing values
df_missing_values = pd.DataFrame(data_list_of_dicts_missing)

print("DataFrame with Missing Values:")
print(df_missing_values)

Output:

      Name   Age City
0    Alice  25.0   NY
1      Bob  30.0   LA
2  Charlie   NaN   SF

As you can see, the DataFrame has NaN for the missing value of the Age column.


Conclusion 🌟

Creating a Pandas DataFrame from a list of dictionaries is a fundamental operation in data manipulation.

Whether your data is well-structured or contains missing values, Pandas provides the tools to convert your list of dictionaries into a structured DataFrame effortlessly.