BUS220 - Business Intelligence and Analytics

From clean data to decisions

Your VP sends you a Slack message

VP

VP of Sales 10:23 AM

How are we doing?

What do you show them?

Is this good?

$2,300,000

$2,300,000 (+15% YoY, 102% of target)

Now you can act on it. A metric needs a comparison — a trend, a benchmark, or a target.

What does this dashboard tell you?

ARPDAU

$0.42

D7 Retention

18%

LTV/CAC

1.3

MAU

824K

K-factor

0.6

DAU/MAU

0.31

If your audience needs a glossary to read the dashboard, the dashboard has failed. A metric that people can’t discuss in a meeting won’t change anyone’s behavior.

“Raise your hand if you know what all six of these metrics mean.” Probably very few hands.

“ARPDAU is Average Revenue Per Daily Active User. D7 Retention is what percentage of users come back after 7 days. LTV/CAC is Lifetime Value divided by Customer Acquisition Cost. K-factor is a viral coefficient. DAU/MAU is the ratio of daily to monthly active users — a stickiness measure.”

“These are all real, useful metrics in mobile gaming and SaaS. But if you put this dashboard in front of a VP of Operations at a retail company, it’s meaningless. A metric that people can’t remember or discuss in a meeting won’t change anyone’s behavior — no matter how analytically correct it is.”

“This doesn’t mean simple metrics are always better. It means you need to know your audience. ARPDAU is perfectly clear to a mobile product team. Profit margin is perfectly clear to a retail VP. The metric itself isn’t good or bad — it’s good or bad for a specific audience.”

What makes a good metric?

The Lean Analytics framework gives four properties:

Comparative

Can be measured against time periods, segments, or benchmarks. “Conversion increased from 2% to 3%” is more useful than “conversion is 3%.”

Understandable

If people can’t remember it or discuss it, they won’t act on it. The best metrics fit in one sentence.

A ratio or rate

Ratios normalize for scale. Profit margin lets you compare a 10-person store with a 200-person store. Absolute profit doesn’t.

Behavior-changing

If the number moves, someone should do something different. If nobody’s behavior changes, why are you tracking it?

Vanity metrics feel good but don’t help

A vanity metric makes you feel good but doesn’t change what you do.

Classic examples:

Total registered users (includes everyone who signed up and left)
Page views (one person hitting refresh looks like engagement)
Number of followers (how many actually do something when you post?)

The actionable alternative is usually a ratio: not “total users” but “percent of users who are active this week.”

The car salesman test

A car dealer tells his customer: “You’ll get a satisfaction survey call. Can you give us a 5?”

The dealer spends more time asking for a good rating than providing a good experience. The satisfaction score becomes a vanity metric — it measures gaming, not satisfaction.

— Lean Analytics Ch 2, via Seth Godin

KPIs: the metrics that run the business

A Key Performance Indicator is a metric an organization monitors regularly to evaluate performance against a goal.

Not every metric is a KPI. A dashboard with 30 KPIs has zero KPIs — if everything is key, nothing is.

Role	They probably watch
CEO	Revenue, margin, customer count, churn
VP Sales	Pipeline value, conversion rate, deal size
VP Operations	Delivery time, error rate, cost per unit
Product Manager	Active users, feature adoption, NPS

The VP of Sales and the VP of Operations need different dashboards because they make different decisions.

Metrics come in pairs

Leading vs. lagging

A leading metric predicts the future. Number of qualified leads in the sales pipeline predicts next quarter’s revenue.

A lagging metric reports the past. Customer churn tells you people left — but by the time you measure it, they’re already gone.

Both are useful. Leading metrics give you time to act.

Correlated vs. causal

Ice cream sales and drowning deaths are correlated. Both go up in summer. Banning ice cream won’t save anyone.

Finding a correlation is useful — it helps you predict. Finding a cause is better — it lets you intervene. Causation requires controlled experiments, not just dashboards.

BI Detective

Can you figure out what the numbers actually mean?

“We’ve been talking about metrics, comparisons, segmentation. Now let’s see how these ideas work when you’re under pressure. You’re going to play as a data analyst who gets an alarming message from their VP. Your job: explore the dashboard, answer questions, and figure out what’s actually happening — before the meeting.”

“Open the link I’m sharing now. Work through the tabs left to right. Each tab has a question — correct answers unlock observations on the left panel. You have about 15 minutes.”

Share the BI Detective link. Give students 12-15 minutes. Walk around, observe which tabs they explore and how they react to the dead-end breakdowns. Don’t give hints — let them experience the frustration of uniform drops before finding the real signal in the Rider Types tab.

BI Detective: Dashboard Investigation

OK

Olena Koval, VP Operations Divvy Bikeshare, Chicago

Hey! November numbers look bad — rides down 32%, average duration down 12%. Can you figure out what’s going on before our meeting today?

Open the game: Course website → Activities → Week 4 BI Detective. You have 15 minutes.

Work through the tabs left to right. Answer the questions to build your case.

Debrief: what did the investigation reveal?

The ride drop is seasonal, not a crisis

Nov 2023 has +7.6% more rides than Nov 2022. The Oct→Nov drop happens every year when it gets cold. The VP compared adjacent months in a seasonal business — the wrong baseline.

Casual riders drove the entire drop

Casual ridership fell -44%, members only -27%. The overall average duration also dropped — not because anyone rides shorter, but because the long-ride group (casuals ride ~2x longer) shrank in the mix.

Dead ends taught us where NOT to look

Hourly patterns, bike types, stations — all dropped uniformly. When every value in a dimension moves the same way, that dimension doesn’t explain the change.

The Thursday trap

Thursday barely dropped (-1%). Resilient day?

No — October had 4 Thursdays, November had 5. More days = more rides. Per-Thursday, the drop was ~30%, same as every other day.

Always check whether your comparison periods have the same calendar structure.

“Who went to the Breakdowns tab first? What did you find?” Let a few students share. Most will say they spent time on day-of-week or stations and found nothing useful.

“Recognizing dead ends quickly is a skill that separates experienced analysts from beginners. If every bar in the chart dropped by the same amount, that dimension isn’t your culprit. Move on.”

“The Thursday trap is worth remembering. Raw totals across months with different calendar structures are misleading — this comes up all the time in real reporting. Always normalize: rides per Thursday, not total Thursday rides.”

“The duration drop is worth discussing too. The VP said average ride duration dropped 12%. But if you segment by rider type, members barely changed. The overall moved because casual riders — who take 2x longer trips — became a smaller share of the mix. The average changed because of who’s riding, not because of how anyone rides. This is what happens with non-additive metrics when the composition shifts.”

“If you were building a Divvy dashboard for this VP, what would you change?” Collect a few answers: YoY comparisons, segment by rider type, seasonal baselines for alerts.

From what to how

You now know what makes a good metric. The rest of the lecture is about how to compute them.

Column-level calculations

A new value for every row — like adding a formula column in a spreadsheet.

Student	Course	Region	Modules Done	Total Modules	Completed?
Anna	Data 101	EU	8	8	1
Ben	Data 101	US	5	8	0
Clara	SQL Basics	EU	6	6	1
Dan	SQL Basics	US	3	6	0
Eva	SQL Basics	EU	6	6	1

IF [Modules Done] = [Total Modules] THEN 1 ELSE 0 END — each row gets its own result. The new column can then be summed, averaged, or counted in any grouping.

Dynamic calculations

The same formula, different results — depending on how the view is grouped.

Overall

Students	Completed	Rate
5	3	60%

Grouped by Course

Course	Students	Completed	Rate
Data 101	2	1	50%
SQL Basics	3	2	67%

Grouped by Region

Region	Students	Completed	Rate
EU	3	3	100%
US	2	0	0%

SUM([Completed]) / COUNT([Student]) — one formula, three different answers. The grouping determines what gets aggregated. This is how pivot tables, Tableau, and SQL all work.

Additive vs. non-additive metrics

Not every metric can be summed across groups. Compare COUNT vs. COUNT DISTINCT on the same data:

Enrollments

Student	Course
Anna	Data 101
Anna	SQL Basics
Ben	Data 101
Clara	SQL Basics

COUNT (enrollments)

Course	Count
Data 101	2
SQL Basics	2
Total	4 ✓

2 + 2 = 4. Correct.

COUNT DISTINCT (students)

Course	Unique
Data 101	2
SQL Basics	2
Total	3 ≠ 4

2 + 2 ≠ 3. Anna is in both.

COUNT is additive — the parts sum to the whole. COUNT DISTINCT is not — Anna appears in both courses but is one unique student overall. When you slice a dashboard by a dimension, additive metrics stay correct. Non-additive metrics need to be recalculated, not summed.

“This is one of the most common sources of dashboard errors. Someone exports your per-course student counts and sums them to get ‘total students.’ If any student is in multiple courses, that sum is wrong.”

“Same operation — counting — but COUNT counts rows and COUNT DISTINCT counts unique values. The first is additive, the second isn’t. Revenue (a SUM) is additive. Unique customers, unique sessions, unique anything — non-additive.”

“The Divvy game showed a related trap: the overall average duration dropped because casual riders shrank in the mix, not because anyone rode shorter. Averages are also non-additive — you can’t average the averages of different-sized groups.”

“You’ll encounter this in SQL next year. COUNT(*) vs COUNT(DISTINCT column) behave exactly the same way.”

Calculated fields in Tableau

Tableau’s implementation of the same principle: derive new values from existing columns.

Four types of calculated fields

Type	Example
Arithmetic	`SUM([Profit]) / SUM([Sales])`
String	`LEFT([Product Name], 3)`
Date	`DATEDIFF('day', [Order Date], [Ship Date])`
Conditional	`IF [Profit] > 0 THEN "Profit" ELSE "Loss" END`

Product	Order Date	Ship Date	Ship Days
Chair	Mar 1	Mar 4	3
Binder	Mar 1	Mar 2	1
Desk	Mar 3	Mar 10	7

Date calc: DATEDIFF(‘day’, [Order Date], [Ship Date])
Each row gets its own result — this is column-level.

Aggregation order matters

Order	Sales	Profit	Row Margin
A-001	$100	$10	10%
A-002	$10,000	$500	5%
Total	$10,100	$510	?

AVG( [Profit] / [Sales] ) → (10% + 5%) / 2 = 7.5%

Averages the row-level ratios. Both orders count equally.

SUM([Profit]) / SUM([Sales]) → $510 / $10,100 = 5.05%

Aggregates first, then divides. The $10K order dominates — as it should.

For business metrics, you almost always want the second version.

NULLs in calculations

After a left join, unmatched rows get NULL — and NULLs behave differently than you might expect:

Order	Sales	Returned	[Returned] = “Yes”
A-001	$500	Yes	TRUE
A-002	$200	NULL	NULL — not FALSE
A-003	$80	NULL	NULL — not FALSE

A formula using a NULL input produces NULL. [Returned] = "Yes" does not evaluate to FALSE when [Returned] is missing — it evaluates to NULL. Use IFNULL or ISNULL to handle this explicitly.

Aggregations skip NULLs silently. AVG([Returned Value]) ignores NULL rows — it doesn’t treat them as zero. Your average is computed over fewer rows than you think.

Filters exclude NULLs by default. If you filter on a field, NULL values disappear without warning.

Joins

Real business data lives in multiple tables. Joins bring them together.

Why multiple tables?

“What’s our return rate by product category?” — the answer requires data from two tables:

Orders

Order ID	Category	Sales
A-001	Furniture	$500
A-002	Technology	$200
A-003	Furniture	$80

+

Returns

Order ID	Returned
A-001	Yes

Only returned orders have rows here.

Category lives in Orders, return status lives in Returns. A join on Order ID combines them — like XLOOKUP from week 1, but for the entire table at once.

Four join types

Join	What you get
Inner	Only rows that match in both tables
Left	All rows from the left, matches from the right (NULL where no match)
Right	Mirror of left
Full outer	All rows from both tables

In analytics, you’ll use left join most of the time.

Left join: Orders + Returns

Order ID	Sales	Returned
A-001	$500	Yes
A-002	$200	NULL
A-003	$80	NULL

All orders kept. Return info attached where it exists. Non-returned orders get NULL — not dropped.

The most common join mistake: fan-out

If the right table has multiple rows per key, your join duplicates rows on the left:

Orders

Order ID	Sales
A-001	$500
A-002	$200

2 rows, $700 total

Returns

Order ID	Reason
A-001	Damaged
A-001	Wrong item

Two rows for one order!

After left join

Order ID	Sales	Reason
A-001	$500	Damaged
A-001	$500	Wrong item
A-002	$200	NULL

3 rows, SUM(Sales) = $1,200 — wrong!

Rule: always check row count and a key total (like SUM of Sales) before and after a join. If either changed, your join created duplicates.

Joins vs. unions

Operation	What it does	Analogy
Join	Adds columns — attaches new fields from a second table	Stapling two pages side by side
Union	Adds rows — stacks data from tables with the same columns	Stapling two pages top to bottom

You saw unions conceptually in week 3 (combining January + February + March files). Joins are new this week.

Next week: the power tools

Week 5: Advanced Calculations

Table calculations — running totals, rankings, percent of total
LOD expressions — calculating at a different level than the view shows
Cohort analysis — grouping customers by when they first appeared

Reading for next week:

PT Ch 13 — Introduction to Table Calculations
PT Ch 16 — Introduction to LOD Expressions