Python Pivot Table Cheat Sheet: Simplify Data Analysis Like a Pro

Master pivot tables in Python using Pandas with this simple and intuitive cheat sheet. By using easy-to-understand metaphors, you’ll learn to summarize and analyze data effortlessly, making complex data insights feel as simple as organizing items in your favorite store.
sachin

By sachin on November 03, 2024

Python Pivot Table Cheat Sheet: Simplify Data Analysis Like a Pro

1. Creating a Basic Pivot Table

pivot_table = df.pivot_table(values='Revenue', 
                               index='Department', 
                               aggfunc='sum')

Metaphor: Summarizing the revenue earned by each department and organizing the totals into bins labeled by department names.

2. Using Multiple Indexes

pivot_table = df.pivot_table(values='Revenue', 
                               index=['Department', 'Items Sold'], 
                               aggfunc='sum')

Metaphor: Nesting smaller bins of items sold within larger bins of departments for detailed categorization.

3. Multiple Aggregation Functions

pivot_table = df.pivot_table(values=['Items Sold', 'Revenue'], 
                               index='Department', 
                               aggfunc={'Items Sold': 'sum', 'Revenue': 'mean'})

Metaphor: Calculating both the total items sold and the average revenue per department, like keeping dual statistics for each shelf.

4. Adding Columns

pivot_table = df.pivot_table(values='Revenue', 
                               index='Department', 
                               columns='Items Sold', 
                               aggfunc='sum', 
                               fill_value=0)

Metaphor: Creating labeled sections on a shelf for each number of items sold, making the data easier to interpret.

5. Handling Missing Data

pivot_table = df.pivot_table(values='Revenue', 
                               index='Department', 
                               columns='Items Sold', 
                               aggfunc='sum', 
                               fill_value=0)

Metaphor: Ensuring every bin has a placeholder, even when no data is available, to maintain a tidy organization.

6. Using Margins for Totals

pivot_table = df.pivot_table(values='Revenue', 
                               index='Department', 
                               aggfunc='sum', 
                               margins=True, 
                               margins_name='Total')

Metaphor: Adding a "Total" bin at the end of each shelf to display the overall summary for easy comparison.

7. Sorting Pivot Tables

pivot_table.sort_values(by='Revenue', ascending=False)

Metaphor: Reorganizing bins so the department with the highest revenue is displayed first, like prioritizing the best-sellers.

8. Stacking and Unstacking

stacked = pivot_table.stack()
unstacked = stacked.unstack()

Metaphor: Rearranging data between vertical stacks and horizontal rows for better clarity and organization.

9. Reindexing

reindexed = pivot_table.reindex(['Produce', 'Bakery'])

Metaphor: Rearranging bins on a shelf to follow a specific order, like customizing the layout of a store.

10. Swapping Levels

swapped = pivot_table.swaplevel()

Metaphor: Switching the labels of nested bins to alter the hierarchy for a new perspective.

11. Using pd.Grouper for Date Grouping

grouped_pivot = df.pivot_table(values='Revenue', 
                                  index=pd.Grouper(key='Date', freq='W'), 
                                  aggfunc='sum')

Metaphor: Grouping data by weeks to analyze trends over time, like summarizing sales by time periods.

12. Applying Custom Functions

def custom_agg(series):
    return (series.max() - series.min()) / 2

pivot_table = df.pivot_table(values='Revenue', 
                             index='Department', 
                             aggfunc=custom_agg)

Metaphor: Using your own recipe to create a custom summary for each department, like calculating half the range of sales.

13. Filtering Data in Pivot Tables

filtered_pivot = pivot_table[pivot_table['Revenue'] > 1000]

Metaphor: Removing bins with low revenue to focus on the high-performing departments.

14. Using Multiple Columns in Values

pivot_table = df.pivot_table(values=['Revenue', 'Profit'], 
                               index='Department', 
                               aggfunc='sum')

Metaphor: Adding more layers of information to your shelves, such as both earnings and profit for each department.

15. Renaming Aggregation Results

pivot_table.rename(columns={'sum': 'Total Revenue'}, inplace=True)

Metaphor: Relabeling bins to make the data clearer, like changing "sum" to "Total Revenue".

16. Normalizing Data

pivot_table = df.pivot_table(values='Revenue', 
                               index='Department', 
                               aggfunc='sum')
pivot_table['Normalized Revenue'] = pivot_table['Revenue'] / pivot_table['Revenue'].sum()

Metaphor: Calculating each department's share of the total revenue, like finding percentages of the whole pie.

17. Using Lambda Functions

pivot_table = df.pivot_table(values='Revenue', 
                               index='Department', 
                               aggfunc=lambda x: x.max() - x.min())

Metaphor: Adding a quick calculation without writing a separate function, like measuring the revenue range directly.

18. Pivot Table with Percentage Change

pivot_table['% Change'] = pivot_table['Revenue'].pct_change()

Metaphor: Tracking how revenue increases or decreases over time, like measuring weekly growth or decline.

19. Adding Subtotals

pivot_table = df.pivot_table(values='Revenue', 
                               index=['Region', 'Department'], 
                               aggfunc='sum', 
                               margins=True, 
                               margins_name='Subtotal')

Metaphor: Calculating the subtotal revenue for each region in addition to departmental revenue, like adding subtotals to an invoice.

20. Combining Multiple Pivot Tables

pivot_1 = df.pivot_table(values='Revenue', index='Department', aggfunc='sum')
pivot_2 = df.pivot_table(values='Profit', index='Department', aggfunc='sum')
combined = pivot_1.join(pivot_2, lsuffix='_Revenue', rsuffix='_Profit')

Metaphor: Combining separate shelves into a single mega-shelf to see all data side by side.

21. Formatting Pivot Table Output

pivot_table.style.format({'Revenue': '${:,.2f}', 
                            'Profit': '${:,.2f}'})

Metaphor: Decorating bins with labels and styles, like making a data display visually appealing.

22. Filtering Based on Custom Conditions

filtered = pivot_table.query('Revenue > 5000 and Profit > 1000')

Metaphor: Displaying only high-revenue and high-profit bins, like prioritizing top performers.

23. Creating Multi-Level Columns

pivot_table = df.pivot_table(values=['Revenue', 'Profit'], 
                               index='Department', 
                               columns='Region', 
                               aggfunc='sum')

Metaphor: Organizing shelves by both region and department to create a two-dimensional data hierarchy.

24. Using `melt` to Reshape Data

# Example DataFrame
import pandas as pd

data = {
    'Department': ['Produce', 'Bakery', 'Deli'],
    'Items Sold': [200, 150, 100],
    'Revenue': [500, 400, 300]
}
df = pd.DataFrame(data)

# Melting the DataFrame
melted_df = df.melt(id_vars=['Department'], 
                    value_vars=['Items Sold', 'Revenue'], 
                    var_name='Metric', 
                    value_name='Value')

print(melted_df)

Output:


  Department        Metric  Value
0   Produce   Items Sold    200
1    Bakery   Items Sold    150
2      Deli   Items Sold    100
3   Produce      Revenue    500
4    Bakery      Revenue    400
5      Deli      Revenue    300

Metaphor: Imagine unpacking multiple shelves into a single conveyor belt for streamlined processing and inspection.

Quick Tips

  • Use aggfunc for advanced summarization, including custom and lambda functions.
  • Utilize style.format() to format data for better readability.
  • Experiment with query() for dynamic filtering based on conditions.
  • Use pct_change() to track trends and changes over time.
  • Use fill_value=0 to replace missing data.
  • Leverage pd.Grouper for advanced date-based grouping.
  • Use melt() to reshape wide DataFrames into long formats.
  • Apply reset_index() to flatten multi-level indexes.
  • Experiment with aggfunc for different summarizations (e.g., 'mean', 'median', 'count').
Tags: Aggregation Cheat Sheet Data Analysis Data Engineering Data Science Data Summarization Machine Learning Pandas Pivot Table Python Simple Explanations