Group Project: Airbnb Market Analysis

BUS220: Business Intelligence and Analytics

Published

April 24, 2026

Points	35
Groups	4 students (smaller groups require instructor approval)
Final submission + presentations	See Moodle for date

Form your group and register via the Google Spreadsheet (pinned on Moodle). All deadlines are posted on Moodle.

How it works

Form a group of 4. Register on the Google Spreadsheet pinned on Moodle. Groups smaller than 4 need instructor approval.
Get your city. Each group gets a different city’s Airbnb data — two quarterly scrapes of listings and reviews from Inside Airbnb.
Prepare the data. Union the two scrapes, clean price and other fields, profile data quality. This is shared group work — Assignment 5 walks you through the same workflow on a practice city.
Each member picks an analytical area (Pricing & Value, Host Profiles, Guest Experience, or Geography & Regulation) and builds one dashboard page investigating it.
Deliver three things:
- Tableau workbook (.twbx) with a KPI overview page + one page per member
- Written analysis (PDF) with profiling, per-area findings, and recommendations
- In-class presentation with live demo and individual defense
Grading is split: data prep (4 pts) and dashboard design (6 pts) are group grades; analytical depth (14 pts) and written analysis (6 pts) are individual; presentation (5 pts) is group.
Defense is mandatory. Every member presents and defends their own area. If you can’t attend the scheduled session, notify the instructor in advance to arrange an online defense.
Deadlines: Workbook and PDF are due the day before presentations (see Moodle for exact date). Late submissions lose 5 points but are still accepted.

Objective

Your team will build an analytical Tableau dashboard for a city’s short-term rental market using real data scraped from Airbnb.

The goal is not to build as many charts as possible. The goal is to answer business questions with data and present your findings in a way that helps a specific audience make decisions.

Each group member picks an analytical area, investigates it in depth, and builds one dashboard page for it. Together, the group delivers a cohesive multi-page dashboard with shared filters, consistent design, and a written analysis document.

The Data

Each group works with a different city from Inside Airbnb — an independent project that publishes public Airbnb data under a CC BY 4.0 license. You can browse cities and pick one that interests your group, or request a dataset from the instructor. Each city will be assigned to only one group.

You receive two quarterly scrapes of your city. Each scrape is a snapshot of every listing at that point in time.

Each scrape contains two files you will work with:

Detailed listings (listings.csv.gz) — one row per listing, ~80 columns covering host profile, property details, pricing, availability, review scores, and regulatory info
Reviews (reviews.csv.gz) — one row per guest review: listing_id, review date, reviewer id/name, and a free-text comments column. No numeric columns — review scores (overall and the six sub-categories) are pre-aggregated per listing in the Listings file as review_scores_rating, review_scores_cleanliness, etc. Use Reviews for volume and timing (when reviews happened), use Listings for score analysis.

There are also summary files with fewer columns. The detailed files have the analytical depth you need.

An earlier version of this document also emphasized the Calendar file. Calendar has been removed — it is mostly empty for many of the assigned cities, so it is not part of the project scope.

An earlier version also implied that review scores lived on the Reviews file. They do not — scores are per-listing aggregates on Listings, and Reviews holds only dates and free-text comments. The Reviews bullet above and the Guest Experience area have been clarified.

Browse the data dictionary and available cities at insideairbnb.com/get-the-data.

Combining the two scrapes

Your first data preparation task is to union the two listing snapshots into a single dataset. Use Python/pandas, R, or Syto — whichever tool you’re comfortable with. Add a column (e.g., scrape_quarter) that identifies which snapshot each row came from.

Fields that need attention

Most columns in the detailed listings file are self-explanatory. A few are not:

Field	What to know
`price`	Dollar-formatted string (`"$169.00"`, `"$1,500.00"`). Strip `$` and `,` before use.
`host_response_rate`, `host_acceptance_rate`	Strings with `%` signs and `"N/A"` values. Need cleaning.
`bathrooms_text`	Text like `"1.5 shared baths"`. The numeric `bathrooms` column is mostly NULL — use this one instead.
`amenities`	JSON array as string (`["Wifi", "Kitchen", ...]`). Needs parsing if you want to analyze amenities.
`neighbourhood_cleansed`	Standardized neighbourhood name. The free-text `neighbourhood` column is often NULL.
`neighbourhood_group_cleansed`, `calendar_updated`	Often 100% NULL. Check yours.
`estimated_occupancy_l365d`, `estimated_revenue_l365d`	Inside Airbnb’s estimates of annual occupancy and revenue. Pre-computed, already numeric. Check whether your city has them.
`license`	Regulatory license info. Coverage and format vary by city.
`source`	How the listing was collected. Some values indicate listings carried over from prior scrapes — these may have NULL prices and other missing fields. Profile this column and decide whether to include or exclude stale entries.
`calculated_host_listings_count`	Number of listings this host has within the city. Compare to `host_listings_count`.

Data relationships

Listings ──1:Many──▶ Reviews (on id = listing_id)

Work with Listings as your base table. Join Reviews only when your analysis specifically needs it — and check your row counts after joining.

Verify after joining

Always check a key aggregate (like listing count or total estimated revenue) before and after adding a table. If your numbers change unexpectedly, something went wrong.

Analytical Areas

Each group member chooses one area and covers all 4. If your group has fewer than 4 members (with instructor approval), divide the remaining area among yourselves. The descriptions below are starting points — you are free to explore directions not listed, and you do not need to cover everything suggested.

Pricing & Value

Business question: What drives pricing in this city’s rental market, and where are the opportunities?

Analyze how nightly prices vary across the market. Which factors explain price differences — location, property type, capacity, review scores? Are there neighbourhoods where listings are priced high relative to their review scores, or vice versa?

Compare estimated revenue against listed price to understand yield: a cheap listing booked 300 nights earns more than an expensive one booked 30 nights. Which pricing strategies seem to work?

Techniques that fit naturally: calculated fields for cleaned pricing metrics, scatter plots for price vs. quality comparisons, parameters for metric switching.

Host Profiles & Market Structure

Business question: Who runs this city’s Airbnb market — casual hosts or commercial operators?

Analyze the host landscape. What share of listings belongs to hosts with just one property vs those managing many? How do superhosts differ from regular hosts in pricing, reviews, and occupancy? Which hosts dominate the market, and in which neighbourhoods?

Look at host responsiveness (host_response_time, host_response_rate) and whether it correlates with review scores or occupancy. Use host_since to see when different types of hosts joined the platform.

Techniques that fit naturally: LOD expressions for host-level aggregates, calculated fields for host tier classification (e.g., single-listing vs. multi-property operator).

Guest Experience & Reviews

Business question: What does the review data reveal about guest satisfaction and seasonal demand?

Many cities have strong seasonal patterns — summer tourism, winter holidays, local festivals. Analyze review volume over time to find these patterns. Is overall review activity growing year over year? This analysis uses the Reviews file (one row per review, each with a date); join to Listings only if you want to segment by neighbourhood or room type.

Explore review scores across the six sub-categories (accuracy, cleanliness, checkin, communication, location, value). Which dimension scores lowest? Does the pattern differ by room type or neighbourhood? Are there listings with high overall ratings but a notably weak sub-score? These review_scores_* fields live on the Listings file as per-listing averages — no join needed.

Techniques that fit naturally: joining Reviews to Listings for segmented volume-over-time analysis (watch row counts after joining), table calculations for year-over-year trends and seasonality, aggregating the Listings review-score columns by segment.

Geography & Regulation

Business question: How is the rental market distributed across the city, and what does the regulatory picture look like?

Build a geographic view of the market. Where are listings concentrated? How do prices, occupancy, and review scores vary by neighbourhood? Are there underserved areas with few listings but high occupancy rates?

If your city has licensing or regulatory data in the license field, analyze compliance: which neighbourhoods have higher rates? Do licensed listings differ from unlicensed ones in pricing, reviews, or occupancy? (The license field varies by city — some cities have detailed license numbers, others have nothing. Work with what your data has.)

Techniques that fit naturally: geographic maps (latitude/longitude are in the data), calculated fields for license categorization, LOD expressions for neighbourhood-level aggregates.

These are starting points, not checklists

The best projects follow what the data reveals rather than mechanically answering every suggested angle. If you find something interesting that isn’t listed above, pursue it. If a suggested direction turns out to be a dead end, say so — “no meaningful difference” is a real finding.

Cross-area connections

If you find something in your area that connects to a teammate’s (e.g., “commercial hosts in this neighbourhood price 20% higher and get worse reviews”), mention it. Cross-area insights in the written analysis or dashboard are welcome but don’t substitute for depth within your own area.

Relevant course readings

As you plan your analysis, revisit:

LA Ch 1–2 — What makes a good metric, vanity vs. actionable
SWD Ch 3–4 — Decluttering and directing attention in complex views
SWD Ch 7 — Building a narrative arc for your presentation
BBoD Part II — Real-world dashboard scenarios for layout ideas
PT Ch 12–16 — Calculated fields, table calculations, LOD expressions, parameters, sets

Deliverables

What	Where	Due
Tableau workbook (.twbx)	Moodle	Day before presentations
Written analysis (PDF)	Moodle	Day before presentations
Data prep scripts (if any)	Moodle	Day before presentations
Presentation + defense	In class	See Moodle for date

Your submission should be self-contained. If you cleaned, transformed, or combined data outside of Tableau (Python script, Jupyter notebook, R script), include that file in your submission. Someone reviewing your work should be able to trace the path from the raw CSVs to the final workbook without guessing what happened in between.

Late submissions

Workbook and PDF are due by the end of the day before your presentation. Late submissions lose 5 points (out of 35) but are still accepted — submit late rather than not at all.

Tableau Workbook

Your .twbx file should include:

Data source with both listing snapshots unioned and a scrape_quarter identifier. Add Reviews only if your area uses it — document your data model in a text sheet or tooltip.
One dashboard page per analytical area, each answering its business question. Where the data supports it, include cross-quarter comparisons — these are expected, not optional.
One overview page with headline KPIs (total listings, median price, average review score, superhost share, and at least one cross-quarter comparison) that responds to a shared filter. This is a group responsibility.
Interactivity: at minimum, a filter affecting all pages (neighbourhood or room type) and one filter action (clicking a mark in one view filters related views). More interactivity where it serves the analysis — parameters for metric switching, highlight actions, navigation between pages.
Consistent visual design across all pages: same color palette, font sizes, title conventions, tooltip format.

Profile before building

Before you start building worksheets, profile the data. Connect your unioned Listings and check:

How many listings per scrape? How many unique hosts?
What is the price distribution after cleaning? (Min, median, max, and check for extreme outliers.)
What are the NULL rates for key fields — review scores, estimated revenue, license?
How many listings appear in both scrapes vs only one?
How are listings distributed across neighbourhoods and room types?

Document these numbers in your written analysis.

Written Analysis (PDF)

Up to 10 pages of text and dashboard screenshots combined — include annotated screenshots of key views to support your narrative. This is a guideline, not a hard limit; use as much space as you need to present your findings clearly.

Include:

Data overview — your city, listing counts per scrape, price distribution after cleaning, key NULL rates, data quality issues and how you handled them.
Analysis sections — one per area. State your business question, present key findings with specific numbers, and explain what they mean. Do not just describe what the chart shows — interpret it.
Recommendations — 3–5 actionable recommendations, each with: what you recommend, why (the evidence), and who should care. The audience could be a prospective host, a city regulator, or Airbnb itself.

Writing that works

A finding is not “The city center has the most listings.” A finding is “The central district holds 12% of all listings but its median nightly price is only 14% above the city median. Meanwhile, a quieter residential neighbourhood with just 2% of listings commands a 24% price premium, suggesting price is driven by scarcity and neighbourhood character, not density.”

Lead with the insight, back it with numbers, connect it to a decision.

Presentation + Defense

Presentation timing will be communicated separately based on the number of groups. Each member presents their analytical area. As a group, cover:

Your most significant finding (with supporting dashboard view)
One surprising result from the data
Your top recommendations
Live demo of your dashboard’s interactivity

During Q&A, each member may be asked to explain any part of their area — how a metric was calculated, why a visualization was chosen, what a specific number means.

Defense is mandatory

Every group member must personally defend their work. If you cannot attend the scheduled presentation:

Notify the instructor in advance to arrange an alternative time
An individual online defense may be arranged for documented scheduling conflicts
If a group member is absent without prior arrangement, the group presents without them and the absent member defends individually at a later date

Inability to explain your own work is not a presentation issue — it is an academic integrity concern handled per KSE policy.

Assignment 5 and the project

The final individual assignment walks you through the exact workflow needed for project data preparation — union, clean, profile, build a cross-quarter comparison, create a map — on a shared test city (Edinburgh). Replicate that workflow on your own project city’s data to build the foundation for your project workbook.

Roles and Ownership

Each member owns one analytical area and builds one dashboard page. The group shares responsibility for:

Data preparation (union, price cleaning, shared calculated fields)
Overview KPI page
Visual consistency across pages
Written analysis cohesion
Presentation preparation

List your area assignments in the written analysis.

On using AI for this project

AI tools are allowed and encouraged — but how you use them matters.

Good uses:

Ask AI to explain a Tableau technique you haven’t used before
Paste an error or unexpected result and ask for debugging help
Ask AI to review your calculated field logic
Brainstorm which angles to explore in your area
Help parse or categorize messy fields (like license or amenities)

Bad uses:

Ask AI to “analyze Airbnb data” and paste whatever it generates
Have AI write your analysis text without rewriting it in your own voice
Use AI-generated interpretations without verifying them against your actual data

Disclosure requirement

Per course policy, document all AI usage in your written analysis: which tools, what you asked, and how you used the output. Be specific — “used Claude to debug a calculated field that was returning NULL for superhosts” is useful; “used AI for some parts” is not. A brief list or short paragraph at the end of the document is enough. During presentations, you may be asked to explain any part of your work in detail.

Grading Rubric (35 Points)

Deductions within a category may stack if multiple independent problems apply.

Data Preparation & Profiling (4 pts, group)

Full marks: Both scrapes unioned and usable. Data quality issues discovered, documented, and handled. Profiling in the written analysis includes specific numbers.

Common mistakes:

–1 to –2	Profiling numbers in the writeup don’t match the workbook. If you cleaned or filtered the data after profiling and didn’t update your numbers, the reader sees contradictions.
–1 to –2	Stale or problematic entries are kept without acknowledgment. You don’t have to filter them, but you do have to show you noticed and made a deliberate choice.

Analytical Depth (14 pts, individual per area)

Full marks: The area is explored from at least two distinct angles (e.g., two different segmentations, a comparison and a trend, or an aggregate and a drill-down). Findings are specific, supported by numbers, and interpreted in business context. Cross-quarter comparisons reveal how the market changed between snapshots. The analysis answers “so what?” for every major finding.

Engineering effort (complex data cleaning, custom calculations) is appreciated but does not substitute for analytical depth. A simple chart with a strong interpretation earns more than a complex visualization with no follow-up.

Common mistakes:

–2 to –3	A metric is wrong and the error goes unnoticed — averaging a ratio, double-counting after a join, comparing absolute numbers between groups of very different sizes. These are easy to miss and hard to catch without sanity-checking your numbers.
–2 to –3	Cross-quarter comparison is present but superficial. Showing June and September side by side without commenting on what changed or why is not a comparison — it’s two charts.
–3 to –5	The area has breadth but no depth. Five charts that each say one thing are weaker than two charts where the second one follows up on what the first one found.

Dashboard Design & Interactivity (6 pts, group)

Full marks: Consistent design across pages. Views are connected through filters and actions. The KPI overview gives quick context. A viewer can orient themselves without help.

Common mistakes:

–1 to –2	Interactivity that misleads — a filter that only affects one view while appearing to be global, or a cross-filter that produces wrong numbers because of a join issue. Worse than no interactivity, because it creates false confidence. Test your filters by clicking through and checking that the numbers still make sense.
–1 to –2	Each page looks like a different person made it. Different color palettes, title styles, tooltip formats. The pages work individually but feel like four separate assignments stapled together.
–1	The dashboard works at one screen size but breaks at another. Scrolling worksheets, overlapping labels, or containers that collapse. Check your layout at the resolution you’ll present on.

Written Analysis & Communication (6 pts, individual per area)

Full marks: Findings are interpreted, not described. Numbers are specific and contextualized (“compared to what?”). Recommendations follow from evidence.

Common mistakes:

–1 to –2	Numbers without context. “The median price is $165” means nothing on its own. Compared to what — another neighbourhood, the other scrape, the city median? Every number needs a reference point.
–1 to –2	Recommendations that could apply to any city. “Hosts should price competitively” is not a finding from your data. Name the neighbourhood, the price gap, the specific opportunity.
–1 to –2	Screenshots don’t match the text. If the writeup says “as shown in Figure 3” but the screenshot shows a different view or different filters, the reader loses trust.

Presentation + Defense (5 pts, group)

Full marks: Each member presents their area clearly. The live demo works. Defense answers show understanding of the work.

Common mistakes:

–1 to –2	Reading from the slides or the writeup instead of presenting. The audience has the PDF — tell them what isn’t in it.
–1 to –2	The demo is rehearsed as a click-through tour (“and here I click this filter”) rather than answering a question with the dashboard (“if we want to know which neighbourhoods lost listings, we can…”).
–2 to –3	A member can’t explain how their own metric was calculated or why a number looks the way it does during Q&A.

Team Collaboration

Tableau workbooks are single-user files. Two strategies for parallel work:

Option A: Divide by area (recommended)

One person handles data preparation: unions the two scrapes, cleans the price field, creates shared calculated fields (host tier, price per guest, etc.).
Share the .twbx with the team.
Each person builds their worksheets and dashboard page in their own copy.
One person integrates all pages into the final file (copy worksheets between workbooks via clipboard).

Option B: Sequential handoffs

Person 1 completes data setup, saves Project_v1_data.twbx.
Person 2 adds their dashboard page, saves Project_v2_pricing.twbx.
Continue with clear version naming.

Whichever approach you choose: agree on naming conventions early (worksheet names, calculated field names, color palette) so integration is smooth.