Assignment 3: Analyzing Web Traffic and E-commerce Performance

BUS220 — Business Intelligence and Analytics

Published

April 12, 2026

Overview

You have 5 months of Google Analytics data (August–December 2016) from the Google Merchandise Store — Google’s own e-commerce shop selling branded merchandise (t-shirts, bags, accessories, electronics). The data covers the lead-up to and through the holiday shopping season.

The data comes in three CSV files at different granularities: sessions (visits), hits (interactions within visits), and products (items involved in interactions). Your job: connect the tables, build business metrics, and deliver a dashboard that answers the store manager’s questions.

Total: 8 points + 2 bonus points

Deadline: See Moodle. Late policy: 1–7 days = 50% credit, >7 days = 0.


Deliverables

Submit in Moodle (file upload + submission text box):

  1. Tableau workbook (.twbx — File > Export As Packaged Workbook) containing your worksheets and dashboard
  2. Moodle submission text with:
    • Profiling numbers and data quality issues (Part 1)
    • Investigation narrative (Part 2)
    • Funnel step findings, category cleanup approach, and product-level insights, if attempting (Part 4)
WarningSubmission rules
  • File uploads only. Do not submit links to Google Docs or any external service. Submissions containing external links instead of actual files will receive a −1 point deduction.
  • Worksheet naming. Your workbook should have descriptive tab names (not “Sheet 1,” “Sheet 2”). Name them after what they show.

File naming: LastName_FirstName_assignment_3.twbx


The Data

Download three CSV files from Moodle: sessions (visits), hits (e-commerce interactions within visits), and products (items involved in interactions). Open them and explore the columns yourself. Most are self-explanatory; a few things that aren’t obvious:

  • uniqueVisitId is a compound ID: browserID:sessionTimestamp. The part before the colon is a persistent browser identifier, so the same person visiting twice gets two different uniqueVisitIds but the same browser ID.

  • source / medium are finer-grained attribution within a channelGrouping. One channel groups many source/medium pairs (e.g., Paid Search includes google/cpc and bing/cpc).

  • ecommerce_action_type uses Google Analytics numeric codes:

    Code Meaning
    1 Product list click
    2 Product detail view
    3 Add to cart
    4 Remove from cart
    5 Checkout
    6 Purchase
  • Revenue and price values are stored in micro-currency — divide by 1,000,000 to get USD.

How the tables connect

Sessions ──1:Many──▶ Hits ──1:Many──▶ Products

Here is one session from the data, a returning visitor who browses office products, adds an item to the cart, and completes a purchase:

Session: 4694...8258, 2016-08-28, visit #3, Referral, desktop, United States

Hits for this session:

hitId (suffix) pagePath action transactionid has product rows?
:5 /google+redesign/office 1 (product list) yes
:6 /google+redesign/office/quickview 2 (detail view) no
:7 /google+redesign/office 3 (add to cart) yes
:9 /yourinfo.html 5 (checkout) no
:10 /payment.html 5 (checkout) no
:11 /revieworder.html 5 (checkout) no
:12 /ordercompleted.html 6 (purchase) ORD20160828782 yes

Products for hit :12 (the purchase):

productName productRevenue productQuantity
Leatherette Journal 266,700,000 30
Rubber Grip Ballpoint Pen 4 Pack 34,920,000 8

Hits :5 and :7 also have product rows (the pen appears in both), but :6, :9, :10, and :11 do not. Not every action type records which products were involved. You’ll explore this in Part 4.

Notice: one session has many hits, one hit can have multiple products, and many hits have no products at all. Not every session appears in the Hits table either — most visitors browse without interacting with products. This means you must be careful about how you count and aggregate.


Part 1: Data Profiling & Connection (2 points)

1.1 Profile the data

Before connecting anything, profile each CSV file using whatever tool you’re comfortable with (Syto, Python, R, or another tool, same workflow as Assignment 2). Understand what each table contains, what the columns look like, and where the problems are.

Report these numbers in your Moodle submission text:

  • How many rows in each table?
  • How many unique sessions? How many unique visitors? (Hint: look at how uniqueVisitId is structured.)
  • How many transactions (unique transactionid values, excluding nulls)?
  • What is the total revenue in USD?

1.2 Document data quality issues

As you profile, you will find data quality issues: values that look wrong, inconsistent, or that would cause problems in Tableau. Document what you find in your Moodle submission text. A few sentences per issue: what’s wrong, which column(s) it affects, and whether it matters for analysis.

Tip

By “type” we mean the category of problem, not individual occurrences. For example, if several columns use placeholder strings instead of proper nulls, that’s one type even though it appears in multiple places. You should be able to find at least 3 distinct types.

1.3 Connect the tables

Add all three CSV files to Tableau. Set up relationships:

  • Sessions → Hits on uniqueVisitId
  • Hits → Products on hitId
Warning

Sanity check: After connecting all three tables, verify that your total revenue and session count still match the numbers you found in step 1.1. If they changed, investigate before proceeding.

1.4 Create calculated fields

1. [Revenue USD]: convert productRevenue from micro-currency to dollars. Handle NULLs so that products without revenue show 0, not NULL (this matters for totals).

2. Funnel step metrics: create calculated fields that count how many sessions reached each stage of the e-commerce funnel. Use the ecommerce_action_type codes from the Hits table:

  • [Detail View Sessions]: sessions where a product detail page was viewed (action type 2)
  • [Add to Cart Sessions]: sessions where a product was added to the cart (action type 3)
  • [Purchase Sessions]: sessions where a purchase was completed (action type 6)

Why count distinct sessions rather than counting raw hits? A single session can generate dozens of product detail views but only one purchase. Counting hits would make early funnel steps look enormous relative to later ones, drowning out the actual conversion signal. Counting sessions tells you how many visits reached each stage, which is what matters for understanding visitor behavior.

Tip

Hint: The pattern COUNTD(IF <condition> THEN <value to count> END) counts how many distinct sessions reached step X. Think about which field identifies a session. You will use this pattern constantly.

3. Funnel conversion rates — using your funnel step metrics, create calculated fields for:

  • [View-to-Cart Rate]: what share of sessions that viewed a product went on to add one to the cart?
  • [Cart-to-Purchase Rate]: what share of sessions that added to cart completed a purchase?

We skip Checkout (action type 5) because a single checkout can generate several hits as the visitor moves through pages, making it noisy as a funnel step. The three-step funnel is cleaner for this analysis.

These step-to-step rates tell you where in the funnel visitors drop off, which is far more actionable than an overall conversion rate.


Part 2: The Investigation (3 points)

The store manager wants to know: “How are we doing heading into the holiday season?”

Your job is to find out, and if something looks off, explain why.

Step 1: The Big Picture

Build an overview showing sessions, revenue, and your funnel metrics over time. Choose a time granularity that makes trends readable. Not every metric works well at the same granularity: sessions might show clear patterns at a daily or weekly level, while conversion rates based on rare events (like purchases) can be noisy unless you aggregate more broadly.

Look at what the data shows. Are sessions and revenue moving together? Are your funnel conversion rates stable, improving, or deteriorating?

Step 2: Find the Source

If something looked off (or surprisingly good) in Step 1, the overall number is hiding something. Break it down to find what’s driving it.

Pick a dimension from the Sessions table to decompose the trend. Build a view that shows whether the pattern holds equally across all segments, or whether one segment is responsible. Once you identify what’s driving the pattern, verify your hypothesis: exclude it (using a filter) and check whether the remaining traffic looks different.

Tip

When an overall metric moves in a surprising direction, the cause is usually one of two things: a single segment is dragging the average, or the mix of segments is shifting. Your breakdown should help you tell which.

Step 3: Go Deeper

You’ve identified which segment behaves differently. Now ask why.

Look at other dimensions in the Sessions table. What else is different about this traffic compared to the rest? Build a view that tests your hypothesis.

Written analysis

In your Moodle submission, write a short narrative (5–8 sentences) covering:

  1. What you noticed in the big picture: what looked good, what looked off
  2. Which segment you identified as the driver, and how you verified it
  3. What makes that segment different — what other session-level data helped you explain why it behaves differently
  4. What you would recommend to the store manager

Lead with the insight, not the chart description. “I made a line chart of sessions by month and a bar chart of channels” tells the reader nothing. Say what you found, not what you built.


Part 3: Dashboard (3 points)

Assemble your investigation into a single interactive dashboard.

Required elements

  • KPI row at the top: total sessions, total revenue, and overall funnel conversion rates (view-to-cart, cart-to-purchase). Use a clear, scannable layout with large numbers, labels, no chart chrome.
  • Your worksheets from Part 2, arranged so the big picture leads into the breakdown and deeper analysis.
  • Filters that affect all views and KPIs on the dashboard. At minimum, include a date filter.
  • Cross-filtering: clicking a mark in one chart should filter the others, including the KPIs. For example, clicking a segment in the breakdown should update everything else to show that segment’s numbers.

Layout

Here is a suggested layout. The charts inside are up to you; arrange your investigation worksheets however makes the story clearest. The KPI row at the top and filters on the right are strongly recommended as placement conventions.

Dashboard Title
KPI
KPI
KPI
KPI
Filters
Big picture:
trends over time
Breakdown
by segment
Deeper
analysis

Design

  • Give the dashboard a clear title.
  • Remove worksheet titles that duplicate the dashboard layout (e.g., if the dashboard section heading already says “Channel Performance,” the worksheet title inside it is redundant).
  • Clean up tooltips: remove default “SUM of…” labels and make tooltips informative.

Part 4: Product-Level Analysis (2 bonus points)

Your Part 2 investigation works at the session level. The store manager now wants to go deeper: “Which product categories drive revenue, and where do we lose shoppers?”

Two problems stand in the way.

4.1 Which funnel steps have product data?

Not every ecommerce_action_type has product-level information. Some funnel steps record which products were involved; others are page-level events with no product rows.

Explore the data to figure out which action types appear in the Products table and which don’t. In your Moodle submission, report which funnel steps can be analyzed at the product level and which cannot (1–2 sentences).

4.2 Clean up product categories

Look at the productCategory column. You’ll find a mix of hierarchical paths (Home/Apparel/Men's/Men's-T-Shirts/), flat labels (Apparel), broken template strings, and junk values. This is real Google Analytics data. It’s messy.

Use an LLM (ChatGPT, Claude, or similar) to create a clean category mapping, the same technique you used in the Week 5 practice. Extract the unique combinations of productName and productCategory, and prompt the LLM to map each one to a clean top-level category (e.g., Apparel, Bags, Drinkware, Electronics, Office, Lifestyle, Brands). Using both fields gives the LLM more context. A product named “Yoga Mat” with category ${escCatTitle} is obviously Lifestyle/Accessories, even though the category string is broken. Save the result as a CSV with columns productName, productCategory, and cleanCategory, then add it to your Tableau data model as a relationship to the Products table on both productName and productCategory.

Build a worksheet that uses your clean categories to show product-level funnel performance (using the steps you identified in 4.1). Add it to your dashboard.

Tip

If the number of unique categories is still too large to visualize clearly, focus on the ones that matter most. Sort by revenue and show the top categories.

In your Moodle submission, write 3–4 sentences: what the product-level view reveals that the session-level view doesn’t, and what you would recommend to the store manager.


Grading

Part Points
1. Data Profiling & Connection 2
2. The Investigation 3
3. Dashboard (KPIs + filters working, cross-filtering, design polish) 3
4. Product-Level Analysis (bonus) +2

What loses points:

  • Numbers you can’t explain. If your profiling numbers change after adding a relationship and you don’t notice, or your conversion rate is 40%, something is wrong. Submitting it without investigating is worse than getting it wrong.
  • Stopping at “what” without asking “why.” The big picture shows a pattern. If your submission describes the pattern but doesn’t try to explain it, the investigation is incomplete.
  • Findings that describe the chart instead of interpreting it. “Organic Search has the most sessions” is reading a bar chart back to me. What should the store manager do about it?
  • Dashboard clutter. Default tooltips, redundant titles, unsorted bars, unlabeled KPIs. If it takes effort to figure out what a number means, the dashboard isn’t doing its job.

AI Usage

You may use AI tools (beyond what’s mentioned in this task explicitly) to help with Tableau syntax (calculated field formulas, formatting), understanding Google Analytics terminology, or debugging unexpected results. If using AI, mention it in your submission. The metric choices, chart design, and written interpretation must be your own.