Skip to main content
Skip to main content

Foursquare places

Dataset

This dataset by Foursquare is available to download and to use for free under the Apache 2.0 license.

It contains over 100 million records of commercial points-of-interest (POI), such as shops, restaurants, parks, playgrounds, and monuments. It also includes additional metadata about those places, such as categories and social media information.

Data exploration

For exploring the data we'll use clickhouse-local, a small command-line tool that provides the full ClickHouse engine, although you could also use ClickHouse Cloud, clickhouse-client or even chDB.

Run the following query to select the data from the s3 bucket where the data is stored:

SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*') LIMIT 1
Row 1:
──────
fsq_place_id:        4e1ef76cae60cd553dec233f
name:                @VirginAmerica In-flight Via @Gogo
latitude:            37.62120111687914
longitude:           -122.39003793803701
address:             ᴺᵁᴸᴸ
locality:            ᴺᵁᴸᴸ
region:              ᴺᵁᴸᴸ
postcode:            ᴺᵁᴸᴸ
admin_region:        ᴺᵁᴸᴸ
post_town:           ᴺᵁᴸᴸ
po_box:              ᴺᵁᴸᴸ
country:             US
date_created:        2011-07-14
date_refreshed:      2018-07-05
date_closed:         2018-07-05
tel:                 ᴺᵁᴸᴸ
website:             ᴺᵁᴸᴸ
email:               ᴺᵁᴸᴸ
facebook_id:         ᴺᵁᴸᴸ
instagram:           ᴺᵁᴸᴸ
twitter:             ᴺᵁᴸᴸ
fsq_category_ids:    ['4bf58dd8d48988d1f7931735']
fsq_category_labels: ['Travel and Transportation > Transport Hub > Airport > Plane']
placemaker_url:      https://foursquare.com/placemakers/review-place/4e1ef76cae60cd553dec233f
geom:                �^��a�^@B�
bbox:                (-122.39003793803701,37.62120111687914,-122.39003793803701,37.62120111687914)

We see that quite a few fields have ᴺᵁᴸᴸ, so we can add some additional conditions to our query to get back more usable data:

SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*')
   WHERE address IS NOT NULL AND postcode IS NOT NULL AND instagram IS NOT NULL LIMIT 1
Row 1:
──────
fsq_place_id:        59b2c754b54618784f259654
name:                Villa 722
latitude:            ᴺᵁᴸᴸ
longitude:           ᴺᵁᴸᴸ
address:             Gijzenveldstraat 75
locality:            Zutendaal
region:              Limburg
postcode:            3690
admin_region:        ᴺᵁᴸᴸ
post_town:           ᴺᵁᴸᴸ
po_box:              ᴺᵁᴸᴸ
country:             ᴺᵁᴸᴸ
date_created:        2017-09-08
date_refreshed:      2020-01-25
date_closed:         ᴺᵁᴸᴸ
tel:                 ᴺᵁᴸᴸ
website:             https://www.landal.be
email:               ᴺᵁᴸᴸ
facebook_id:         522698844570949 -- 522.70 trillion
instagram:           landalmooizutendaal
twitter:             landalzdl
fsq_category_ids:    ['56aa371be4b08b9a8d5734e1']
fsq_category_labels: ['Travel and Transportation > Lodging > Vacation Rental']
placemaker_url:      https://foursquare.com/placemakers/review-place/59b2c754b54618784f259654
geom:                ᴺᵁᴸᴸ
bbox:                (NULL,NULL,NULL,NULL)

Run the following query to view the automatically inferred schema of the data using the DESCRIBE:

DESCRIBE s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*')
    ┌─name────────────────┬─type────────────────────────┬
 1. │ fsq_place_id        │ Nullable(String)            │
 2. │ name                │ Nullable(String)            │
 3. │ latitude            │ Nullable(Float64)           │
 4. │ longitude           │ Nullable(Float64)           │
 5. │ address             │ Nullable(String)            │
 6. │ locality            │ Nullable(String)            │
 7. │ region              │ Nullable(String)            │
 8. │ postcode            │ Nullable(String)            │
 9. │ admin_region        │ Nullable(String)            │
10. │ post_town           │ Nullable(String)            │
11. │ po_box              │ Nullable(String)            │
12. │ country             │ Nullable(String)            │
13. │ date_created        │ Nullable(String)            │
14. │ date_refreshed      │ Nullable(String)            │
15. │ date_closed         │ Nullable(String)            │
16. │ tel                 │ Nullable(String)            │
17. │ website             │ Nullable(String)            │
18. │ email               │ Nullable(String)            │
19. │ facebook_id         │ Nullable(Int64)             │
20. │ instagram           │ Nullable(String)            │
21. │ twitter             │ Nullable(String)            │
22. │ fsq_category_ids    │ Array(Nullable(String))     │
23. │ fsq_category_labels │ Array(Nullable(String))     │
24. │ placemaker_url      │ Nullable(String)            │
25. │ geom                │ Nullable(String)            │
26. │ bbox                │ Tuple(                     ↴│
    │                     │↳    xmin Nullable(Float64),↴│
    │                     │↳    ymin Nullable(Float64),↴│
    │                     │↳    xmax Nullable(Float64),↴│
    │                     │↳    ymax Nullable(Float64)) │
    └─────────────────────┴─────────────────────────────┘

Loading the data into ClickHouse

If you'd like to persist the data on disk, you can use clickhouse-server or ClickHouse Cloud.

To create the table, run the following command:

CREATE TABLE foursquare_mercator
(
    fsq_place_id String,
    name String,
    latitude Float64,
    longitude Float64,
    address String,
    locality String,
    region LowCardinality(String),
    postcode LowCardinality(String),
    admin_region LowCardinality(String),
    post_town LowCardinality(String),
    po_box LowCardinality(String),
    country LowCardinality(String),
    date_created Nullable(Date),
    date_refreshed Nullable(Date),
    date_closed Nullable(Date),
    tel String,
    website String,
    email String,
    facebook_id String,
    instagram String,
    twitter String,
    fsq_category_ids Array(String),
    fsq_category_labels Array(String),
    placemaker_url String,
    geom String,
    bbox Tuple(
        xmin Nullable(Float64),
        ymin Nullable(Float64),
        xmax Nullable(Float64),
        ymax Nullable(Float64)
    ),
    category LowCardinality(String) ALIAS fsq_category_labels[1],
    mercator_x UInt32 MATERIALIZED 0xFFFFFFFF * ((longitude + 180) / 360),
    mercator_y UInt32 MATERIALIZED 0xFFFFFFFF * ((1 / 2) - ((log(tan(((latitude + 90) / 360) * pi())) / 2) / pi())),
    INDEX idx_x mercator_x TYPE minmax,
    INDEX idx_y mercator_y TYPE minmax
)
ORDER BY mortonEncode(mercator_x, mercator_y)

Take note of the use of the LowCardinality data type for several columns which changes the internal representation of the data types to be dictionary-encoded. Operating with dictionary encoded data significantly increases the performance of SELECT queries for many applications.

Additionally, two UInt32 MATERIALIZED columns, mercator_x and mercator_y are created that map the lat/lon coordinates to the Web Mercator projection for easier segmentation of the map into tiles:

mercator_x UInt32 MATERIALIZED 0xFFFFFFFF * ((longitude + 180) / 360),
mercator_y UInt32 MATERIALIZED 0xFFFFFFFF * ((1 / 2) - ((log(tan(((latitude + 90) / 360) * pi())) / 2) / pi())),

Let's break down what is happening above for each column.

mercator_x

This column converts a longitude value into an X coordinate in the Mercator projection:

  • longitude + 180 shifts the longitude range from [-180, 180] to [0, 360]
  • Dividing by 360 normalizes this to a value between 0 and 1
  • Multiplying by 0xFFFFFFFF (hex for maximum 32-bit unsigned integer) scales this normalized value to the full range of a 32-bit integer

mercator_y

This column converts a latitude value into a Y coordinate in the Mercator projection:

  • latitude + 90 shifts latitude from [-90, 90] to [0, 180]
  • Dividing by 360 and multiplying by pi() converts to radians for the trigonometric functions
  • The log(tan(...)) part is the core of the Mercator projection formula
  • multiplying by 0xFFFFFFFF scales to the full 32-bit integer range

Specifying MATERIALIZED makes sure that ClickHouse calculates the values for these columns when we INSERT the data, without having to specify these columns (which are not part of the original data schema) in the `INSERT statement.

The table is ordered by mortonEncode(mercator_x, mercator_y) which produces a Z-order space-filling curve of mercator_x, mercator_y in order to significantly improve geospatial query performance. This Z-order curve ordering ensures data is physically organized by spatial proximity:

ORDER BY mortonEncode(mercator_x, mercator_y)

Two minmax indices are also created for faster search:

INDEX idx_x mercator_x TYPE minmax,
INDEX idx_y mercator_y TYPE minmax

As you can see, ClickHouse has absolutely everything you need for real-time mapping applications!

Run the following query to load the data:

INSERT INTO foursquare_mercator 
SELECT * FROM s3('s3://fsq-os-places-us-east-1/release/dt=2025-04-08/places/parquet/*')

Visualizing the data

To see what's possible with this dataset, check out adsb.exposed. adsb.exposed was originally built by co-founder and CTO Alexey Milovidov to visualize ADS-B (Automatic Dependent Surveillance-Broadcast) flight data, which is 1000x times larger. During a company hackathon Alexey added the Foursquare data to the tool.

Some of our favourite visualizations are produced here below for you to enjoy.