Calculating at the Right Level

BUS220 - Business Intelligence and Analytics | Week 5

Oleh Omelchenko

2026-04-06

Assumptions vs. evidence

Two observations, one story?


Time A Time B What it looks like
22°C at 9am 22°C at 9pm Stable temperature
1,200 rides in Oct 820 rides in Nov Ridership crisis
62% life expectancy in 1960 75% in 2015 Steady improvement


Each row hides something. Let’s look at what’s between the endpoints.

Endpoint illusions

Your apartment. 22°C at 9am. 22°C at 9pm. Perfectly stable, right?


Someone left the oven on for four hours. Two data points hid the entire event.

When the overall trend lies

Two exercise programs tested on 200 people. Overall results:


Program People Lost weight Success rate
Yoga 100 62 62%
CrossFit 100 55 55%


Yoga wins. Recommend it to everyone?

Break it down by starting fitness


Starting fitness Program People Lost weight Success rate
Already active Yoga 80 56 70%
CrossFit 20 16 80%
Sedentary Yoga 20 6 30%
CrossFit 80 39 49%


CrossFit wins in both groups. Yoga looked better overall only because it attracted the easier-to-help population.

This is Simpson’s paradox — the aggregate reverses when you control for a hidden variable.

Simpson’s paradox, visually


The best way to build intuition for this is to see it move. Open this interactive and drag the grouping variable:


setosa.io/simpsons


Watch the overall trend reverse when you separate the groups — nothing changes except the level of analysis.


Detecting this requires looking at the data at two different levels — the aggregate and the segment. You need tools that let you control the grain of your calculation.

The common thread


Mishap The assumption The check
Seasonal panic “Month-to-month drop = crisis” Compare year-over-year
Endpoint illusion “Start and end match = stable” Show the full timeline
Simpson’s paradox “Overall winner = winner everywhere” Break down by subgroup


Each assumption sounds reasonable. Each one is wrong. The only way to tell is to check at a different level.

And sometimes the honest answer is: “We don’t have the data to check.”

Not all calculations are equal

Four types of calculations


Layer What it computes Example Depends on view?
Row-level One value per row, before grouping DATEDIFF(‘day’, [Order Date], [Ship Date]) No
Aggregate Groups rows by dimensions in the view SUM([Sales]), AVG([Profit]) Yes — changes with dimensions
Table calculation Runs on the aggregated results Running total, % of total, rank, MoM % Yes — changes with layout
LOD expression You specify the grain {FIXED [Customer ID] : MIN([Order Date])} No — ignores the view

Layer 1: row-level — each row gets its own value

The Profit and Margin columns are row-level calculations: [Revenue] - [Cost] and ([Revenue] - [Cost]) / [Revenue]. Each row computed independently.

Layer 2: aggregate — what changes with the grouping

Group by category

Electronics leads revenue. Books leads margin.

Group by customer

Alex is 37% of total revenue.


One formula — SUM([Revenue]) — but the grouping determines what you see.

Layer 3: table calculations — derived from the layout

Running total — monthly revenue accumulates across months:

March dipped from February, but the cumulative trajectory shows the business is growing overall. This is how you answer the CFO who asks “are we going to beat last year?” — overlay two running totals and see where they cross.

Layer 3: where is the mix shifting?

The same data, broken down by category per month — first as raw revenue, then as percent of total:

Books collapsed from 29% to single digits. Home grew from 23% to 39%. You wouldn’t see these shifts in the raw revenue totals on the left.

Layer 4: LOD — what the view can’t see

{FIXED [Customer] : MIN([Order Date])} — each customer’s first purchase date, attached to every row.

LOD in action: new vs. returning revenue

January is all new customers. By March, most revenue comes from returning customers — with only Fay as a new arrival.

The execution order


Data Source
LOD FIXED
Dimension Filters
Aggregation
Table Calculations
Table Filter


LOD runs early — before most filters. Aggregation happens next. Table calculations run last — on the results you see on screen.

Which layer do you need?

Quiz time


For each question: which type of calculation do you need?

Row-level
Aggregate
Table calculation
LOD expression

One question to decide


“If I add another dimension to this chart, should this number change?”


Yes → table calculation. The result depends on the view layout.

No → LOD expression. The result is fixed at the grain you specify.

Practice: diagnosing the Olist stagnation

You’re continuing with your Olist workbook from last week.


Last week you found the stagnation. This week you diagnose it — from two angles:


Products: group the 73 categories into macro-categories, then use table calculations to find where the decline is concentrated.

Customers: use LOD expressions to identify each customer’s first purchase date, split new vs. returning, and build a cohort retention view.