NYC AirBnB Data Analytics Case Study

Python Data Visualizations Marketing Analysis

Identify a marketplace that connects people renting houses with people looking for a place to stay.

Agus Santoso
2023-09-29

Introductions

This personal project is a case study given by RevoU in a mini course held for 2 weeks to see the participants’ understanding of the Data Analytics material that has been presented, therefore I tried to make several analyzes of this Airbnb which is a marketplace that connects people who renting out houses to people looking for a place to stay.

What Tools To Use

Summary

Fig. 1. Te Kahu, Wanaka, New Zealand

Airbnb is an online marketplace that connects people who want to rent out their homes with people looking for accommodations in specific locales. The company has come a long way since 2007, when its co-founders first came up with the idea to invite paying guests to sleep on an air mattress in their living room. According to Airbnb’s latest data, it now has more than 7 million listings, covering some 100,000 cities and towns in 220-plus countries and regions worldwide.

Exploratory Data Analytics

Preparation

Connect with the data

Connect the data

The first step is that we have to make sure our data is connected.

# Connect with dataset in drive
from google.colab import drive
drive.mount("/content/drive")

Import and Reading Data

Import libraries

Import libraries we need to use, like:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import random as random
import random
from datetime import datetime, timedelta
import plotly.express as px
import warnings
Read the data

Read the data

# Read the data
df = pd.read_csv("/content/drive/MyDrive/dataset nyc airbnb/AB_NYC_2019.csv")

Data Understandings

Basic data understandings

This is basic data understandings like:

Data Processing

Informations of Dataset

In the first step of data processing we can use several database understandings to see an overview of the dataset to data type information and check for the duplicates values.

df.head() # View the data
df.describe() # View the descriptive statistics or overview the dataset
df.info() # View the info from data like data type and others
print(df.shape) # Provide the number of rows and columns in the dataset
list(df) # View list of column names
df.drop_duplicates() # Check for the duplicates values
df.shape
print(df.isnull().sum()) # Print the number of null values in each column
Fixing Error

Second step is fixing error form the data

# Change the data type from object to date
df['last_review'] = pd.to_datetime(df['last_review'])
df['review_year'] = df['last_review'].apply(lambda last_review:last_review.year)
df['review_year'] = df['review_year'].fillna(0)
df['review_year'] = df.review_year.astype(int)
df = pd.concat([df[(df['availability_365']==0) & (df['review_year']==2019)],df[df['availability_365']>0]])
#Replace the missing value with 0
df['reviews_per_month'] = df['reviews_per_month'].fillna(0)
df.info()
# Drop the 'last_review' column
df.drop('last_review', axis=1, inplace=True)

# Replace the missing 'name' values by random name
df = df.fillna({'name': 'Upper East Side Oasis!'}) # here i fill with 'Upper East Side Oasis!'
df = df.fillna({'host_name': 'john'}) # here i fill with 'john'
df.head()
df.info()
print(df.isnull().sum())
Design Visualizations

Correlation heatmaps are here to show us how closely related variables are.

plt.figure(figsize=(12,12))
sns.heatmap(df.corr(), cmap='Blues', annot=True,)
plt.show()

To see room type by neighbourhood group

# Create the count plot
plt.figure(figsize=(10, 10))
sns.set(style="whitegrid")
# Create a count plot with 'room_type' on the x-axis and hue='neighbourhood_group'
sns.countplot(data=df, x='room_type', hue='neighbourhood_group', palette='viridis')
plt.title('Count of Room Types by Neighbourhood Group')
# Show the plot
plt.show()

To shows Proportion of Airbnb listings across boroughs and room type.

# Create a bar plot borough
(df['neighbourhood_group'].value_counts() / df.shape[0]).plot.bar(cmap='tab10', title='Proportion of Airbnb listings across boroughs')
plt.xlabel('Borough')
plt.ylabel('Proportion')
plt.xticks(rotation=45)
plt.show()

# Create a bar plot room type
(df['room_type'].value_counts() / df.shape[0]).plot.bar(cmap='tab10', title='Proportion of Airbnb listings across room_type')
plt.xlabel('Room Type')
plt.ylabel('Proportion')
plt.xticks(rotation=45)
plt.show()

To see where is the most expensive area?

# Create the bar plot
fig = sns.catplot(x='neighbourhood_group', y='price', data=df, kind='bar', hue='room_type', palette='viridis')
# Add a title and adjust its position
fig.fig.suptitle('Where is the most expensive area?', fontsize=15, y=1.05)
# Save the plot as an image with tight layout
fig.savefig('most_expensive_area.png', bbox_inches='tight')

To see where is the most popular area based on availability?

# Create the bar plot
plt.figure(figsize=(10, 6))
sns.set(style="whitegrid")
# Sort the DataFrame by 'availability_365' in descending order
df_sorted = df.sort_values(by='availability_365', ascending=False)
# Create the bar plot
sns.boxplot(data=df_sorted, x='neighbourhood_group', y='availability_365', palette='viridis')
plt.xlabel('Neighbourhood Group')
plt.ylabel('Availability (in days)')
plt.title('Most Popular Neighbourhood Group Based on Availability')
# Show the plot
plt.show()

Next, we can visualize it using a box map to see the distribution.

# Visualize Map Box with scatterplot
plt.figure(figsize=(10,10))
sns.scatterplot(x='longitude', y='latitude', hue='neighbourhood_group',s=20, data=df, palette="viridis")

To see who the top 5 by calculated_host_listings_count

# Calculate total reviews per host
total_host_listings_count = df.groupby('host_name')['calculated_host_listings_count'].sum()
# Rank hosts by total listings count
ranked_hosts = total_host_listings_count.sort_values(ascending=False)
# Top 5 host_names based on total listings count
top_5_host_names = ranked_hosts.head(5)
# Display the top 5 host_names
print(top_5_host_names)

# Create a bar plot to visualize the top 5 host_names by total listings
plt.figure(figsize=(10, 6))
plt.bar(top_5_host_names.index, top_5_host_names.values, color='skyblue')
plt.xlabel('Host Name')
plt.ylabel('Total Listings')
plt.title('Top 5 Hosts by Total Listings')
plt.xticks(rotation=45)
plt.tight_layout()

# Show the plot
plt.show()

Visualizations and Analysis

Heatmaps Correlations

A correlation heatmap is a graphical representation of a correlation matrix representing the correlation between different variables. The value of correlation can take any value from -1 to 1. Correlation between two random variables or bivariate data does not necessarily imply a causal relationship. We can see it as in Fig. 2 below 👇

Fig. 2. Heatmaps Correlations.

Count of Room Types by Neighbourhood Group

To create an analysis of this, we first collected data from Airbnb listings and created Airbnb areas based on neighborhood groups such as: Manhattan, Bronx, Brooklyn, Queens, and Staten Island. We then calculated three main room types: Whole house/apartment, Shared room, and Private room in each neighborhood group. We can see the details in Fig. 3.

Fig. 3. Count of Room Types by Neighbourhood Group

The results indicate that Manhattan has the highest number of Entire home/apt listings, reflecting its popularity as a tourist destination. Brooklyn follows closely behind, offering a mix of room types. In contrast, the Bronx, Queens, and Staten Island have a higher proportion of Entire home/apt and Private rooms, likely due to their residential nature.

Proportion of Airbnb listings across boroughs and room_type

This is Breakdown from Count of Room Types by Neighbourhood Group

Proportion of Airbnb listings across boroughs

In this analysis, we explore the distribution of Airbnb listings across the five major boroughs of New York City: Manhattan, Bronx, Brooklyn, Queens, and Staten Island. We can see details in Fig. 4.

Fig. 4. Proportion of Airbnb Listings Across Boroughs.

Regarding boroughs in this graph Fig. 4., Manhattan and Brooklyn have the highest proportions of Airbnb listings, reflecting their status as top tourist destinations. Queens also boasts a significant number of listings, while the Bronx offers a more affordable alternative. Staten Island, with its suburban appeal, has the smallest share of Airbnb listings among the five boroughs.

Proportion of Airbnb listings across room_type

In addition to region, We explored the Airbnb listing distribution of three room types in New York City: Entire home/apt, Private room, and Shared room. We can see details in Fig. 5.

Fig. 5. Proportions of Airbnb Listings Across Room Type.

In terms of room types, Airbnb listings in New York City predominantly comprise “Entire home/apt” options, catering to those seeking privacy and convenience. “Private room” listings come next, offering a balance between affordability and comfort. “Shared room” listings are the least common, serving budget-conscious solo travelers or those comfortable sharing living spaces.

Where The Most Expensive Area?

To analyze the data, we first gathered information on rental prices for these room types in each borough. Afterward, we calculated the average rental price for each combination of borough and room type. Here are the findings:

Manhattan: Unsurprisingly, Manhattan emerges as the most expensive borough overall, with entire homes/apartments being the costliest, followed by private rooms and shared rooms.

Bronx: In the Bronx, shared rooms are the most affordable option, followed by private rooms and entire homes/apartments.

Brooklyn: Brooklyn exhibits a similar pattern to Manhattan, with entire homes/apartments being the most expensive, followed by private rooms and shared rooms.

Queens: Queens generally offers more affordable accommodations compared to Manhattan and Brooklyn. Entire homes/apartments are the priciest, followed by private rooms and shared rooms.

Staten Island: Staten Island, being the least expensive of the five boroughs, sees private rooms as the most economical choice, followed by shared rooms and entire homes/apartments.

To visualize these findings, we’ve created bar graphs below that represent the average rental prices for each borough and room type. These graphs will help you better understand the price distribution across the different areas and accommodation types. We can see details in Fig. 6.

Fig. 6. The Most Expensive Area.

Please note that the specific rental prices can vary greatly within each borough, and this analysis provides a general overview. If you have access to the relevant data, you can create more detailed analyses and visuals to dive deeper into the specific neighborhoods and factors influencing rental prices in each area.

The box plot analysis depicts the availability of Airbnb listings in the five major boroughs of New York City (Manhattan, Bronx, Brooklyn, Queens, and Staten Island). The ‘availability_365’ metric is used to measure the availability of listings throughout the year. We can see the details in Fig. 7.

Fig. 7. The Most Popular Area Based On Availability.

Looking at the above categorical box plot we can infer that the listings in State Island seems to be more available throughout the year to more than 300 days. On an average, these listings are available to around more 250 days every year followed by Bronx where every listings are available for around more 150 on an average every year.

Let’s also see Map Box Distributions

Fig. 8. Map Box Distributions.

Mapbox distribution in New York City shows varying levels of usage across the five boroughs: Manhattan, Bronx, Brooklyn, Queens, and Staten Island. In Manhattan, where the bustling heart of the city resides, Mapbox is likely to see high utilization, with its mapping and location services catering to the demands of businesses, tourists, and residents alike.

Top 5 Host Listing

To find Top 5 hosts by total listings count.

Fig. 9. Top 5 Host Listings.

As we can find that Top 5 host name are Sonder (NYC), Blueground, Kara, Kazuya, and Sonder.

Conclusion

Observations

Manhattan has the highest number of Entire home/apt listings, reflecting its popularity as a tourist destination. Brooklyn follows closely behind, offering a mix of room types. Then Manhattan have more expensive places to stay in NYC. Room availability is very low in manhattan and brooklyn and you can find a room anytime in the State Island and Bronx.

Recomendations

If you are looking for the most expensive locations you can come to Manhattan and Brooklyn or Queens. However, if you want to find the places with the highest availability in New York City, you can choose a location like the Bronx with an average of more than 150 days every year or State Island with an average of more than 250 days available every year.

THANK YOU 🙌

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Santoso (2023, Sept. 29). Agus Santoso: NYC AirBnB Data Analytics Case Study. Retrieved from https://agussantoso05.github.io/index.html/posts/2023-09-29-nycairbnb-data-analytics-case-study/

BibTeX citation

@misc{santoso2023nyc,
  author = {Santoso, Agus},
  title = {Agus Santoso: NYC AirBnB Data Analytics Case Study},
  url = {https://agussantoso05.github.io/index.html/posts/2023-09-29-nycairbnb-data-analytics-case-study/},
  year = {2023}
}