Assignment 5: Airbnb Market Profiling

BUS220 — Business Intelligence and Analytics

Overview

Your group project asks you to build a multi-page analytical dashboard on a city’s Airbnb market. This assignment is a dry run of the project workflow on a shared practice city: union two quarterly scrapes, clean price, profile the data, compare snapshots with a parameter, and build a geographic view.

You will work with Edinburgh, which has a large rental market and noticeable seasonal patterns. For the group project, your team will get a different city. The workflow is the same, but the data will have its own quirks — different fields may be empty, different patterns will emerge, and some approaches that work for Edinburgh may not apply. Use your own judgement when you get there.

Total: 8 points + 2 bonus points

Deadline: See Moodle. Late policy: 1–7 days = 50% credit, >7 days = 0.

Deliverables

Submit in Moodle (file upload + submission text box):

Tableau workbook (.twbx) with your Edinburgh worksheets (Parts 1–3)
Moodle submission text with:
- Profiling numbers (Part 1)
- Cross-quarter observations (Part 2)
- Geographic observations and design notes (Part 3)

Submission rules

File uploads only. No external links. -1 point deduction for links instead of files.
Worksheet naming. Descriptive tab names, not “Sheet 1.”

File naming: LastName_FirstName_assignment_5.twbx

Data

Download from Moodle:

listings-edinburgh-2025-06-15.csv.zip (June 2025 scrape)
listings-edinburgh-2025-09-21.csv.zip (September 2025 scrape)

Each zip contains a CSV with ~80 columns — one row per listing. Columns cover host profile, property details, pricing, availability, review scores, and regulatory info.

Part 1: Data Prep & Profiling (3 points)

1.1 Union the two scrapes

Combine both Edinburgh listing files into a single dataset, with a column that identifies which scrape each row came from. Use whichever tool you’re comfortable with (Python/pandas, R, or Syto) and export as a single CSV.

Verify: Your combined file should have 10,872 rows (5,936 from June + 4,936 from September).

1.2 Clean the price field

The price column is stored as formatted text, not as a number. Clean it so it’s usable for calculations.

Verify: After cleaning, the median price across both scrapes should be approximately $163. If you get a median near $1 or near $160,000, recheck your parsing.

1.3 Report your profiling numbers

Connect the cleaned file to Tableau. In your Moodle submission text, report:

Metric	June 2025	September 2025
Listing count
Unique hosts
Distinct neighbourhoods
Price NULLs (count and %)
Median price (after cleaning)
`review_scores_rating` NULLs (%)
Superhost rate (%)

Also report: total rows after union and how many listings appear in both scrapes vs. only one (match on id).

Use the reference values below to check your work. Small rounding differences are fine; large discrepancies mean something went wrong in data prep.

Reference values (check after you have your own numbers)

Check	Expected
Listings per scrape	June: 5,936 / September: 4,936
Total rows (union)	10,872
Distinct neighbourhoods	111 in both scrapes
Price NULLs	June: ~866 (14.6%) / September: ~11 (0.2%)
Median price	June: ~$167 / September: ~$160
`review_scores_rating` NULLs	June: ~8% / September: ~6%
Superhost rate	~44% both scrapes
Listings in both scrapes	~4,200
June only	~1,700
September only	~700

If your June price NULLs are much higher than September’s, that is correct — the reason will become clear in Part 4 (bonus).

1.4 Room type breakdown

Build one worksheet showing the distribution of listings across room types, broken down by scrape. This is your first visual sanity check — it should immediately tell you the scale of the market and whether the two snapshots are comparable.

Part 2: Cross-Quarter Comparison (3 points)

The two scrapes give you a before-and-after snapshot of Edinburgh’s rental market. What changed between June and September?

2.1 Parameter for metric switching

Create a parameter that lets the viewer choose which metric to display. Include at minimum:

Median price
Listing count
Average review score

Create a calculated field that returns the selected metric. Build a bar chart or dot plot showing the selected metric by neighbourhood, with both scrapes visible (side-by-side bars, or color/shape encoding for scrape).

Tip

With 111 neighbourhoods, showing all of them will be cluttered. Consider filtering to the top 15–20 by listing count, or letting the user pick a neighbourhood subset. The goal is a readable comparison, not a wall of bars.

Note

Why a parameter? The same neighbourhood view can answer different questions depending on what the stakeholder cares about. Three separate charts would work, but a parameter lets one view serve multiple questions — a pattern you will use in your project dashboard.

2.2 Interpret what changed

In your Moodle submission text, write 3–4 sentences:

Which metric shows the most change between the June and September scrapes?
Are the changes uniform across neighbourhoods, or concentrated in specific areas?
What might explain the pattern? (Edinburgh hosts the world’s largest arts festival in August — does that show up in the data?)

Part 3: Map, Sets & Design Polish (2 points)

3.1 Listing map

Build a symbol map placing one mark per listing using latitude and longitude. Color-encode the marks by price, review score, room type, or superhost status.

3.2 Compare two areas with sets

Use the lasso tool on your map to select a cluster of listings in one area of Edinburgh (e.g., the Old Town centre). Right-click the selection and choose Create Set. Then lasso a second area (e.g., Leith along the waterfront) and create a second set.

Build a worksheet that compares the two areas side by side:

Median price
Listing count
Average review score

Tip

You can place both sets on the same view (one on Columns, one as Color, or combine them into a calculated field like IF [Old Town Set] THEN "Old Town" ELSEIF [Leith Set] THEN "Leith" ELSE "Other" END) to create a grouped comparison.

3.3 Design review

Before submitting, review all your worksheets (Parts 1–3) against this checklist:

Tooltips: Do they say something useful, or are they Tableau defaults (“SUM of price: $1,234.00”)?
Color: Can a colorblind viewer distinguish your encodings? Does color serve the analysis or just decorate?
Clutter: Are there gridlines, borders, or axis labels that add nothing?
Titles: Does each view have a clear title stating what the viewer is looking at?
Formatting: Are numbers formatted appropriately? (Currency with $, percentages with %, no excessive decimals.)

Fix at least 3 issues you find. In your Moodle submission text, briefly note what you changed and why (one sentence each is enough).

3.4 Interpret the geography

In your Moodle submission text, write 2–3 sentences on each:

What does the map reveal about Edinburgh’s rental market? Where are listings concentrated, and how does your chosen metric vary across the city?
What did the set comparison show? How do the two areas you selected differ, and what might explain the difference?

Part 4: Stale Listing Investigation (2 bonus points)

Some scrapes include listings carried over from prior data collections. These “stale” listings often have NULL prices, missing review data, and other gaps that quietly distort your numbers.

4.1 Find the stale listings

In Part 1, you may have noticed that one scrape has dramatically more price NULLs than the other. Investigate why. One of the metadata columns in the dataset explains the pattern — find it, and report in your Moodle submission text what you found: which listings are stale, how many there are per scrape, and how their data quality compares to the rest.

4.2 Measure the impact

Calculate the median price and listing count with and without the stale listings you identified.

Write 2–3 sentences: how do stale listings affect the profiling numbers you reported in Part 1? Should you filter them out, flag them, or keep them? Justify your choice.

Note

This is a judgment call, not a trick question. There are valid arguments for filtering, flagging, or keeping stale entries depending on the analysis. The point is to make a deliberate decision and document it — not to silently carry bad data forward. You will face the same question on your project city.

Grading

Part	Points
1. Data Prep & Profiling (union, clean, profiling numbers, room type worksheet)	3
2. Cross-Quarter Comparison (parameter, neighbourhood view, interpretation)	3
3. Map, Sets & Design Polish (map, lasso sets comparison, checklist fixes, interpretation)	2
4. Stale Listing Investigation (bonus)	+2

What loses points:

Profiling numbers that don’t match your data. If you report numbers without checking them against the reference values, errors cascade into later parts. A median price that’s off by $50 means something went wrong upstream.
Neighbourhood view that’s unreadable. Showing all 111 neighbourhoods in a single bar chart produces a wall of text. Filter or sort to make the comparison useful.
Set comparison with no context. Showing two numbers side by side without saying where the areas are or why they differ is not a comparison — it’s two numbers. Name the areas and interpret the gap.
Interpretation is absent or generic. “Prices vary by neighbourhood” is not a finding. Name the neighbourhoods, state the difference, and suggest why.
Design checklist claimed but not visible. If your submission text says “fixed tooltips” but the workbook still shows “SUM(price_clean): 167.0”, the fix didn’t happen.

Connection to the group project

This assignment is practice for the project’s data foundation. When you start your project city:

Repeat the same workflow — union, clean, profile — but expect differences. Your city may have different NULL patterns, different room type distributions, or different seasonal dynamics than Edinburgh.
Your profiling numbers become the “Data Overview” section of the project’s written analysis.
The parameter view and map are starting points, but the project expects much more depth — multiple analytical angles, cross-quarter comparisons, interactivity, written recommendations, and a presentation defense.
The stale-listing investigation (Part 4) teaches you to check the source column. Your project city may or may not have the same issue — profile it and decide.

The project is not “do A5 again on a different city.” You will go much deeper, pursuing specific business questions with your data. A5 gives you the data prep practice so you can spend project time on analysis, not plumbing.

AI Usage

You may use AI tools to help with data cleaning code, Tableau syntax (parameter setup, calculated fields), or interpreting unfamiliar data fields. If using AI, mention it in your submission. The profiling numbers, analytical observations, and design choices must be your own.