Application programming interface (API) is a tool which defines an interface for a program to interact with a software component, for example, what sort of requests or calls can be made to it, and how these can be made. Here, we are using the term ‘API’ to denote tools created by an open data provider to give access to different subsets of their content. Such APIs facilitate scripted and programmatic extraction of content, as permitted by the API provider 1.
APIs can take many different forms and be of varying quality and usefulness 2. For the purposes of accessing open data from the web, we are specifically talking about RESTful APIs. The ‘REST’ stands for Representational State Transfer. These APIs work directly over the web. The computer asking for the data is called the client, and the computer sending the data back is known as a server. This dance is called the request-response cycle 3. These web-based apis are nice because we can play with the API with relative ease in order to understand how it works 2.
Here are some examples of data sets made available to request from using APIs, and some cool projects people have done with them.
Make direct call with request usually in format of a URL
In the documentation there is an ‘API area’ (which offers some guidance on the types of data available). Scroll down and find the example calls for stops Stops. In the description you will see that this call returns information on stops for busses or London Underground lines, and there are two examples (bus route 24 and the Bakerloo line).
Let’s take their demonstration url you see for the bakerloo line:
https://api.tfl.gov.uk/line/bakerloo/stoppoints and paste it into a web browser.
Let’s say instead we want data for the Northern Line. What do you think that URL will look like ?
Hint/reminder: the URL for the bakerloo line was:
In R, we can use the
fromJSON() function from the
jsonlite package and the
readLines() function from base R to parse all this information into a data frame (with rows and columns).
So we need now the URL from the exercise above:
"https://api.tfl.gov.uk/line/northern/stoppoints" and to use readLines to get the request, and fromJSON to parse it
To keep things focused, let’s request data about stop points on the Northern line, one of the largest and busiest on the network. Note how this is simply an amended version of the example provided by TfL in the API documentation.
library(jsonlite) api_call <- fromJSON(readLines("https://api.tfl.gov.uk/line/northern/stoppoints"))
## Warning in readLines("https://api.tfl.gov.uk/line/northern/stoppoints"): ## incomplete final line found on 'https://api.tfl.gov.uk/line/northern/stoppoints'
This gives us an object (
api_call) which contains all the information returned by the TfL API.
JSON is slightly different to traditional data frames with rows and columns, which we are probably more familiar with, because the data are nested. For instance,
api_call is classed as a data frame, but now try viewing the object using
Some of the columns are actually lists, rather than character or factor vectors, which is what we might usually expect. Another way of exploring the
api_call object is by looping the
class() function through all the columns in the data frame using
This demonstrates an important challenge faced by researchers when using open data, because dealing with data in this format can be messy and complicated. It is not always a neatly formatted data frame like a CSV file. However, with some data wrangling it is possible to extract the elements you need and move on to do some cool things. For example:
# load packages library(dplyr) library(sf) library(ggplot2) # transform api_call object by selecting name and coordinates and projecting to British National Grid tfl_north_sf <- api_call %>% select(commonName, lat, lon) %>% st_as_sf(coords = c(x = "lon", y = "lat"), crs = 4326) %>% st_transform(27700) # plot the object with ggplot2 + sf ggplot(tfl_north_sf) + geom_sf()
There we have it: with just a few lines of code in R, we have queried the TfL API, a public sector source of open data, and plotted stations on the Northern underground line.
In some cases you don’t have to request directly by altering a string like the URL above, but can specify the parameters of your request using a wrapper. This is called a wrapper because it ‘wraps’ around the API to make it a neater, more usable way to acquire data. In this way, wrappers may remove (or at least lower) many obstacles to accessing open data.
In this case we will look first at a wrapper that uses a web interface that provides a graphical user interface (GUI) for accessing the API in question. Specifically we will explore the API for Open Street Map
To demonstrate wrappers, we will access data from Open Street Map, a database of geospatial information built by a community of mappers, enthusiasts and members of the public, who contribute and maintain data about all sorts of environmental features, such as roads, green spaces, restaurants and railway stations, amongst many other things, all over the world. As such, it is a prime example of ‘crowdsourced’ open data. You can view the information contributed to Open Street Map using their online mapping platform (https://www.openstreetmap.org/). The result of people’s contributions is a database of spatial information rich in local knowledge which provides invaluable information about places and their features, without being subject to strict terms on usage.
Open Street Map (OSM) is currently on API vversion 0.6, originally deployed 17-21 April 2009. The API is currently accessible using the following URL: https://api.openstreetmap.org/. Much like for the TfL API, which we could query without having to create any sort of login, we can query OSM data without authentication. However, all of the calls to the API which update, create or delete data have to be made by an authenticated and authorized user.
To read more about the details of the OSM API see the documentation.
Open Street Map has two types of wrappers available for its API, a web-based GUI called Overpass Turbo (https://overpass-turbo.eu/), and an R package called
osmdata. We start with Overpass Turbo.
When you open the link it will give you an example: