Clustering Neighborhoods Pt. 2: The Re-Clustering


9 minute read

This article is an update to our initial article describing how we identified the best neighborhoods in a city, read it here.

For those just looking for the results, feel free to jump down to the results.

Clustering Neighborhoods Pt. 2: The Re-Clustering

One of the biggest complaints with big booking sites is that you never know what neighborhood you are staying in. Is it right in the middle of the city center, with tons of stuff within walking distance? Or is it further out in a quiet suburb? Is it great for partying and staying out late, or is it more family friendly? No booking sites offer that level of neighborhood context, but it’s an important thing to know when searching for a place to stay. We think we've helped tackle that problem with the second version of our accommodation booking engine! We strived to identify the best neighborhoods in a city to help you book the perfect place to stay.

Problems with our First Release

We received a lot of feedback from our users about issues they found with the initial release. These include:

Not Enough Neighborhoods

Our goal is to identify the best neighborhoods to stay in any city in the world. Initially, our clustering algorithm only looked at highly rated hostels, hotels, and homestays. The result was that we only identified areas that are seen as 'touristy' by most standards. While this might be fine for some people (nothing wrong with staying in Times Square!), it's disingenuous to say that these are the only good neighborhoods in a city. There's lots of great neighborhoods that might not have tons of available accommodations, which our algorithm would pass right over. Similarly, any smaller cities that don't have many accommodations would be completely ignored. The result was that we didn't offer enough information beyond what current booking sites like Kayak already provide.

Our solution was to re-design the clustering from the ground up. Instead of using highly rated accommodations as our clustering dataset, we used "attractions" as a proxy for great neighborhoods. "Attractions" is vague on purpose - we gathered bars, restaurants, museums, parks, and millions of other points of interest from around the world to form our clustering dataset. The thought process was that great neighborhoods in a city will have lots of things to offer, and those things will be different between neighborhoods, cities, and countries.

Things to do in New York

Look at all the things to do!

We re-ran our clusters with this new dataset, generating three times as many clusters and capturing 12,000 neighborhoods around the world. We not only identified the less touristy neighborhoods in big cities, we actually generated some clusters for the smaller cities! Win, win.

Clusters Based off Things to Do in New York

Densely packed clusters of things to do

After running the clustering algorithm on attractions in a city, we then re-ran the clustering algorithm using only the highly rated accommodations, like we did in the first version. We still feel that dense areas of accommodations with great location ratings makes a great neighborhood, and we didn't want to exclude these. In order to not double count anything & flood the map with too much information, we only kept these highly-rated accommodations clusters if they didn't intersect with any of the existing attraction clusters. We felt that the attraction cluster results were higher quality & more indicative of a city's neighborhoods.

Clusters Based off Accommodations in New York

Densely packed clusters of highly-rated accommodations

As a final touch, we went full-Inception and clustered the clusters. We grouped together any clusters that were <200m or so apart. This was enough to group together multiple clusters in the same neighborhood to form one cohesive "neighborhood" cluster while ensuring that disparate neighborhoods stayed distinct.

All Clusters in New York

The final product, shown with accommodations

Analysis Paralysis

With all these neighborhoods identified, which one should you stay in? Our previous version calculated the average location rating of all accommodations within it, but that didn't tell you what that neighborhood is about. You might be the kind of traveler who wants to stay near nightlife, or maybe you want to stay near the sights & landmarks.

Remember those clusters from before, how we ran our first clustering algorithm with attractions & things to do in a city? We used them again to calculate the average density of attractions within a neighborhood. Now, each neighborhood has a rating for nightlife, food, culture, and sights. These represent how many options are available nearby, regardless of user reviews. Reviews are a fickle thing, not everything is great for everyone. Our thought process was that a neighborhood with lots of bars within a 10 minute walk is great for nightlife because with that many options, everyone can find something to enjoy.

Remember those other clusters from before, from the previous version that were based on highly-rated accommodations? We used them too, in tandem with the attraction scores above. We calculated a combined TripHappy neighborhood score by combining the different attraction scores with accommodation reviews. In our eyes, a great neighborhood has great places to stay with tons of things to do.

East Village TripHappy Neighborhood Scores

Scores for East Village, NYC

We've also gone ahead and added in heatmaps of these attractions. This lets you see exactly where points of interest are throughout the city. We found that a visual overlay helped to see just how dense or sparse a neighborhood really is.

New York Heatmaps

A heatmap showing all the places to eat in New York

Mistaken Identity

Under the hood, we use Google Place IDs for every destination in order to link everything together. Unfortunately, this can cause issues as Place IDs are extremely fickle. They're not guaranteed to stay constant for a given location, and there are often multiple IDs for the same city (one for the administrative level, one for the municipality, etc.) This was causing major issues, especially with cities that have the same name, such as the 35 cities named Springfield in the US. Going to any one of these cities would only load in clusters & accommodations for Springfield, MA, regardless of which one you searched for.

Our solution was to completely switch over to using lat, lng coordinates throughout our site. Everything from accommodations, to clusters, to city attractions are now all pulled in by the coordinates of the city you search for. This way, there's no more cases of mistaken identity - any city you search for will load up everything we have to offer regardless of where it is. As an added bonus, we now have content for destinations that haven't yet been added to our database and assigned a Place ID. Win, win.

Filters, filters, filters

Previously, we showed you neighborhoods in a city but no good way to actually filter the accommodation choices that we showed you. At the very least, since we're showing you great neighborhoods in a city, you should be able to then filter down into these neighborhoods, right?

Filters on TripHappy

Now, we've added lots of filters to help you find the perfect place to stay. You can now filter by room type, like shared or private room (important for hostels & homestays), price (important for everyone), and last but not least neighborhoods. Once you know where you want to stay, you can now filter into each neighborhood and only see accommodations nearby.

East Village Neighborhood Filtering

Search for accommodation near East Village, NYC

Try it out below and let us know what you think!

Future Plans

We're constantly getting feedback and making changes, some of our planned new features for version 3 include:

  • Identification of tourist vs up-and-coming neighborhoods
  • Integration of public transit information
  • More accommodation providers

TL;DR

  • More neighborhoods - We've identified more than just the super touristy areas - now there's the artsy and up-and-coming neighborhoods too. We've also combined similar neighborhoods and split out separate ones to better identify individual pockets of activity in a city.
  • Neighborhood scores! We've calculated scores for each neighborhood and ranked them based on user-generated reviews, popularity, and amount of things to do nearby.
  • Better UI - Now you'll be able to easily search/filter for accommodations on the left, with a full-size map on the right. Easily explore different neighborhoods to find which one suits you best.
  • Heatmaps - Now you can see where the action is with heatmaps for Food, Nightlife, Shopping, and Sights.

Tell Us What You Think