Assignment 5: Airbnb Market Profiling
BUS220 — Business Intelligence and Analytics
Overview
Your group project asks you to build a multi-page analytical dashboard on a city’s Airbnb market. This assignment is a dry run of the project workflow on a shared practice city: union two quarterly scrapes, clean price, profile the data, compare snapshots with a parameter, and build a geographic view.
You will work with Edinburgh, which has a large rental market and noticeable seasonal patterns. For the group project, your team will get a different city. The workflow is the same, but the data will have its own quirks — different fields may be empty, different patterns will emerge, and some approaches that work for Edinburgh may not apply. Use your own judgement when you get there.
Total: 8 points + 2 bonus points
Deadline: See Moodle. Late policy: 1–7 days = 50% credit, >7 days = 0.
Deliverables
Submit in Moodle (file upload + submission text box):
- Tableau workbook (
.twbx) with your Edinburgh worksheets (Parts 1–3) - Moodle submission text with:
- Profiling numbers (Part 1)
- Cross-quarter observations (Part 2)
- Geographic observations and design notes (Part 3)
- File uploads only. No external links. -1 point deduction for links instead of files.
- Worksheet naming. Descriptive tab names, not “Sheet 1.”
File naming: LastName_FirstName_assignment_5.twbx
Data
Download from Moodle:
listings-edinburgh-2025-06-15.csv.zip(June 2025 scrape)listings-edinburgh-2025-09-21.csv.zip(September 2025 scrape)
Each zip contains a CSV with ~80 columns — one row per listing. Columns cover host profile, property details, pricing, availability, review scores, and regulatory info.
Part 1: Data Prep & Profiling (3 points)
1.1 Union the two scrapes
Combine both Edinburgh listing files into a single dataset, with a column that identifies which scrape each row came from. Use whichever tool you’re comfortable with (Python/pandas, R, or Syto) and export as a single CSV.
Verify: Your combined file should have 10,872 rows (5,936 from June + 4,936 from September).
1.2 Clean the price field
The price column is stored as formatted text, not as a number. Clean it so it’s usable for calculations.
Verify: After cleaning, the median price across both scrapes should be approximately $163. If you get a median near $1 or near $160,000, recheck your parsing.
1.3 Report your profiling numbers
Connect the cleaned file to Tableau. In your Moodle submission text, report:
| Metric | June 2025 | September 2025 |
|---|---|---|
| Listing count | ||
| Unique hosts | ||
| Distinct neighbourhoods | ||
| Price NULLs (count and %) | ||
| Median price (after cleaning) | ||
review_scores_rating NULLs (%) |
||
| Superhost rate (%) |
Also report: total rows after union and how many listings appear in both scrapes vs. only one (match on id).
Use the reference values below to check your work. Small rounding differences are fine; large discrepancies mean something went wrong in data prep.
| Check | Expected |
|---|---|
| Listings per scrape | June: 5,936 / September: 4,936 |
| Total rows (union) | 10,872 |
| Distinct neighbourhoods | 111 in both scrapes |
| Price NULLs | June: ~866 (14.6%) / September: ~11 (0.2%) |
| Median price | June: ~$167 / September: ~$160 |
review_scores_rating NULLs |
June: ~8% / September: ~6% |
| Superhost rate | ~44% both scrapes |
| Listings in both scrapes | ~4,200 |
| June only | ~1,700 |
| September only | ~700 |
If your June price NULLs are much higher than September’s, that is correct — the reason will become clear in Part 4 (bonus).
1.4 Room type breakdown
Build one worksheet showing the distribution of listings across room types, broken down by scrape. This is your first visual sanity check — it should immediately tell you the scale of the market and whether the two snapshots are comparable.
Part 2: Cross-Quarter Comparison (3 points)
The two scrapes give you a before-and-after snapshot of Edinburgh’s rental market. What changed between June and September?
2.1 Parameter for metric switching
Create a parameter that lets the viewer choose which metric to display. Include at minimum:
- Median price
- Listing count
- Average review score
Create a calculated field that returns the selected metric. Build a bar chart or dot plot showing the selected metric by neighbourhood, with both scrapes visible (side-by-side bars, or color/shape encoding for scrape).
With 111 neighbourhoods, showing all of them will be cluttered. Consider filtering to the top 15–20 by listing count, or letting the user pick a neighbourhood subset. The goal is a readable comparison, not a wall of bars.
Why a parameter? The same neighbourhood view can answer different questions depending on what the stakeholder cares about. Three separate charts would work, but a parameter lets one view serve multiple questions — a pattern you will use in your project dashboard.
2.2 Interpret what changed
In your Moodle submission text, write 3–4 sentences:
- Which metric shows the most change between the June and September scrapes?
- Are the changes uniform across neighbourhoods, or concentrated in specific areas?
- What might explain the pattern? (Edinburgh hosts the world’s largest arts festival in August — does that show up in the data?)
Part 3: Map, Sets & Design Polish (2 points)
3.1 Listing map
Build a symbol map placing one mark per listing using latitude and longitude. Color-encode the marks by price, review score, room type, or superhost status.
3.2 Compare two areas with sets
Use the lasso tool on your map to select a cluster of listings in one area of Edinburgh (e.g., the Old Town centre). Right-click the selection and choose Create Set. Then lasso a second area (e.g., Leith along the waterfront) and create a second set.
Build a worksheet that compares the two areas side by side:
- Median price
- Listing count
- Average review score
You can place both sets on the same view (one on Columns, one as Color, or combine them into a calculated field like IF [Old Town Set] THEN "Old Town" ELSEIF [Leith Set] THEN "Leith" ELSE "Other" END) to create a grouped comparison.
3.3 Design review
Before submitting, review all your worksheets (Parts 1–3) against this checklist:
Fix at least 3 issues you find. In your Moodle submission text, briefly note what you changed and why (one sentence each is enough).
3.4 Interpret the geography
In your Moodle submission text, write 2–3 sentences on each:
- What does the map reveal about Edinburgh’s rental market? Where are listings concentrated, and how does your chosen metric vary across the city?
- What did the set comparison show? How do the two areas you selected differ, and what might explain the difference?
Part 4: Stale Listing Investigation (2 bonus points)
Some scrapes include listings carried over from prior data collections. These “stale” listings often have NULL prices, missing review data, and other gaps that quietly distort your numbers.
4.1 Find the stale listings
In Part 1, you may have noticed that one scrape has dramatically more price NULLs than the other. Investigate why. One of the metadata columns in the dataset explains the pattern — find it, and report in your Moodle submission text what you found: which listings are stale, how many there are per scrape, and how their data quality compares to the rest.
4.2 Measure the impact
Calculate the median price and listing count with and without the stale listings you identified.
Write 2–3 sentences: how do stale listings affect the profiling numbers you reported in Part 1? Should you filter them out, flag them, or keep them? Justify your choice.
This is a judgment call, not a trick question. There are valid arguments for filtering, flagging, or keeping stale entries depending on the analysis. The point is to make a deliberate decision and document it — not to silently carry bad data forward. You will face the same question on your project city.
Grading
| Part | Points |
|---|---|
| 1. Data Prep & Profiling (union, clean, profiling numbers, room type worksheet) | 3 |
| 2. Cross-Quarter Comparison (parameter, neighbourhood view, interpretation) | 3 |
| 3. Map, Sets & Design Polish (map, lasso sets comparison, checklist fixes, interpretation) | 2 |
| 4. Stale Listing Investigation (bonus) | +2 |
What loses points:
- Profiling numbers that don’t match your data. If you report numbers without checking them against the reference values, errors cascade into later parts. A median price that’s off by $50 means something went wrong upstream.
- Neighbourhood view that’s unreadable. Showing all 111 neighbourhoods in a single bar chart produces a wall of text. Filter or sort to make the comparison useful.
- Set comparison with no context. Showing two numbers side by side without saying where the areas are or why they differ is not a comparison — it’s two numbers. Name the areas and interpret the gap.
- Interpretation is absent or generic. “Prices vary by neighbourhood” is not a finding. Name the neighbourhoods, state the difference, and suggest why.
- Design checklist claimed but not visible. If your submission text says “fixed tooltips” but the workbook still shows “SUM(price_clean): 167.0”, the fix didn’t happen.
Connection to the group project
This assignment is practice for the project’s data foundation. When you start your project city:
- Repeat the same workflow — union, clean, profile — but expect differences. Your city may have different NULL patterns, different room type distributions, or different seasonal dynamics than Edinburgh.
- Your profiling numbers become the “Data Overview” section of the project’s written analysis.
- The parameter view and map are starting points, but the project expects much more depth — multiple analytical angles, cross-quarter comparisons, interactivity, written recommendations, and a presentation defense.
- The stale-listing investigation (Part 4) teaches you to check the
sourcecolumn. Your project city may or may not have the same issue — profile it and decide.
The project is not “do A5 again on a different city.” You will go much deeper, pursuing specific business questions with your data. A5 gives you the data prep practice so you can spend project time on analysis, not plumbing.
AI Usage
You may use AI tools to help with data cleaning code, Tableau syntax (parameter setup, calculated fields), or interpreting unfamiliar data fields. If using AI, mention it in your submission. The profiling numbers, analytical observations, and design choices must be your own.