Python Pivot Table Cheat Sheet: Simplify Data Analysis Like a Pro
Master pivot tables in Python using Pandas with this simple and intuitive cheat sheet. By using easy-to-understand metaphors, you’ll learn to summarize and analyze data effortlessly, making complex data insights feel as simple as organizing items in your favorite store.
![sachin](/media/avatars/prof_pic.png)
By sachin on November 03, 2024
![Python Pivot Table Cheat Sheet: Simplify Data Analysis Like a Pro](/media/cheatsheets/pivot_table_cheatsheet.png)
1. Creating a Basic Pivot Table
pivot_table = df.pivot_table(values='Revenue',
index='Department',
aggfunc='sum')
Metaphor: Summarizing the revenue earned by each department and organizing the totals into bins labeled by department names.
2. Using Multiple Indexes
pivot_table = df.pivot_table(values='Revenue',
index=['Department', 'Items Sold'],
aggfunc='sum')
Metaphor: Nesting smaller bins of items sold within larger bins of departments for detailed categorization.
3. Multiple Aggregation Functions
pivot_table = df.pivot_table(values=['Items Sold', 'Revenue'],
index='Department',
aggfunc={'Items Sold': 'sum', 'Revenue': 'mean'})
Metaphor: Calculating both the total items sold and the average revenue per department, like keeping dual statistics for each shelf.
4. Adding Columns
pivot_table = df.pivot_table(values='Revenue',
index='Department',
columns='Items Sold',
aggfunc='sum',
fill_value=0)
Metaphor: Creating labeled sections on a shelf for each number of items sold, making the data easier to interpret.
5. Handling Missing Data
pivot_table = df.pivot_table(values='Revenue',
index='Department',
columns='Items Sold',
aggfunc='sum',
fill_value=0)
Metaphor: Ensuring every bin has a placeholder, even when no data is available, to maintain a tidy organization.
6. Using Margins for Totals
pivot_table = df.pivot_table(values='Revenue',
index='Department',
aggfunc='sum',
margins=True,
margins_name='Total')
Metaphor: Adding a "Total" bin at the end of each shelf to display the overall summary for easy comparison.
7. Sorting Pivot Tables
pivot_table.sort_values(by='Revenue', ascending=False)
Metaphor: Reorganizing bins so the department with the highest revenue is displayed first, like prioritizing the best-sellers.
8. Stacking and Unstacking
stacked = pivot_table.stack()
unstacked = stacked.unstack()
Metaphor: Rearranging data between vertical stacks and horizontal rows for better clarity and organization.
9. Reindexing
reindexed = pivot_table.reindex(['Produce', 'Bakery'])
Metaphor: Rearranging bins on a shelf to follow a specific order, like customizing the layout of a store.
10. Swapping Levels
swapped = pivot_table.swaplevel()
Metaphor: Switching the labels of nested bins to alter the hierarchy for a new perspective.
11. Using pd.Grouper for Date Grouping
grouped_pivot = df.pivot_table(values='Revenue',
index=pd.Grouper(key='Date', freq='W'),
aggfunc='sum')
Metaphor: Grouping data by weeks to analyze trends over time, like summarizing sales by time periods.
12. Applying Custom Functions
def custom_agg(series):
return (series.max() - series.min()) / 2
pivot_table = df.pivot_table(values='Revenue',
index='Department',
aggfunc=custom_agg)
Metaphor: Using your own recipe to create a custom summary for each department, like calculating half the range of sales.
13. Filtering Data in Pivot Tables
filtered_pivot = pivot_table[pivot_table['Revenue'] > 1000]
Metaphor: Removing bins with low revenue to focus on the high-performing departments.
14. Using Multiple Columns in Values
pivot_table = df.pivot_table(values=['Revenue', 'Profit'],
index='Department',
aggfunc='sum')
Metaphor: Adding more layers of information to your shelves, such as both earnings and profit for each department.
15. Renaming Aggregation Results
pivot_table.rename(columns={'sum': 'Total Revenue'}, inplace=True)
Metaphor: Relabeling bins to make the data clearer, like changing "sum" to "Total Revenue".
16. Normalizing Data
pivot_table = df.pivot_table(values='Revenue',
index='Department',
aggfunc='sum')
pivot_table['Normalized Revenue'] = pivot_table['Revenue'] / pivot_table['Revenue'].sum()
Metaphor: Calculating each department's share of the total revenue, like finding percentages of the whole pie.
17. Using Lambda Functions
pivot_table = df.pivot_table(values='Revenue',
index='Department',
aggfunc=lambda x: x.max() - x.min())
Metaphor: Adding a quick calculation without writing a separate function, like measuring the revenue range directly.
18. Pivot Table with Percentage Change
pivot_table['% Change'] = pivot_table['Revenue'].pct_change()
Metaphor: Tracking how revenue increases or decreases over time, like measuring weekly growth or decline.
19. Adding Subtotals
pivot_table = df.pivot_table(values='Revenue',
index=['Region', 'Department'],
aggfunc='sum',
margins=True,
margins_name='Subtotal')
Metaphor: Calculating the subtotal revenue for each region in addition to departmental revenue, like adding subtotals to an invoice.
20. Combining Multiple Pivot Tables
pivot_1 = df.pivot_table(values='Revenue', index='Department', aggfunc='sum')
pivot_2 = df.pivot_table(values='Profit', index='Department', aggfunc='sum')
combined = pivot_1.join(pivot_2, lsuffix='_Revenue', rsuffix='_Profit')
Metaphor: Combining separate shelves into a single mega-shelf to see all data side by side.
21. Formatting Pivot Table Output
pivot_table.style.format({'Revenue': '${:,.2f}',
'Profit': '${:,.2f}'})
Metaphor: Decorating bins with labels and styles, like making a data display visually appealing.
22. Filtering Based on Custom Conditions
filtered = pivot_table.query('Revenue > 5000 and Profit > 1000')
Metaphor: Displaying only high-revenue and high-profit bins, like prioritizing top performers.
23. Creating Multi-Level Columns
pivot_table = df.pivot_table(values=['Revenue', 'Profit'],
index='Department',
columns='Region',
aggfunc='sum')
Metaphor: Organizing shelves by both region and department to create a two-dimensional data hierarchy.
24. Using `melt` to Reshape Data
# Example DataFrame
import pandas as pd
data = {
'Department': ['Produce', 'Bakery', 'Deli'],
'Items Sold': [200, 150, 100],
'Revenue': [500, 400, 300]
}
df = pd.DataFrame(data)
# Melting the DataFrame
melted_df = df.melt(id_vars=['Department'],
value_vars=['Items Sold', 'Revenue'],
var_name='Metric',
value_name='Value')
print(melted_df)
Output:
Department Metric Value
0 Produce Items Sold 200
1 Bakery Items Sold 150
2 Deli Items Sold 100
3 Produce Revenue 500
4 Bakery Revenue 400
5 Deli Revenue 300
Metaphor: Imagine unpacking multiple shelves into a single conveyor belt for streamlined processing and inspection.
Quick Tips
- Use
aggfunc
for advanced summarization, including custom and lambda functions. - Utilize
style.format()
to format data for better readability. - Experiment with
query()
for dynamic filtering based on conditions. - Use
pct_change()
to track trends and changes over time. - Use
fill_value=0
to replace missing data. - Leverage
pd.Grouper
for advanced date-based grouping. - Use
melt()
to reshape wide DataFrames into long formats. - Apply
reset_index()
to flatten multi-level indexes. - Experiment with
aggfunc
for different summarizations (e.g.,'mean'
,'median'
,'count'
).