The next recipe in our series on the socioeconomic impacts of COVID-19 is for a tried and true visualization: the line chart. There’s a reason it’s been filling newspapers and screens throughout the pandemic. It’s familiar, intuitive, and immediately meaningful for almost any audience. That’s especially useful when presenting data in a format that people may not be familiar with. For this visualization, we will be using this Google’s Community Mobility data to create a line chart which shows how much time people are spending on different types of locations.
The Data
Google captures movement trends across six different place categories: retail & recreation areas, groceries & pharmacies, parks, transit stations, workplaces, and residential areas. The data has been collected anonymously from Google users who have chosen to turn on Location Sharing and is only presented as an aggregate. Unlike other data, these mobility data are indexed, meaning that the raw data has been normalized around a common starting point -- in this case, a random week in mid-January. That period serves as the zero points to see how mobility has changed over the past year relative to that baseline. This allows us to compare different types of mobility data meaningfully, according to the percent change instead of the absolute numbers.
We need to understand before proceeding to the data extraction and cleaning part. Generally, data sources provide codebooks and metadata which disclose information about the data, contents, and layout of the data file. It is crucial to trace these codebooks to clearly know what these data represent, the units and contexts behind the data, and how to make sense of these datasets to create meaningful visualization. Fortunately, this Google Mobility Data provides guidance about these mobility datasets. Go to the links below to get yourself acquainted with this dataset:
In the line chart we are going to create, the X-axis is specific dates and the Y-axis is the percent change from the baseline. On March 1st, for example, the number of visitors to retail and recreation locations (more on this later) was actually 5% higher than the baseline period, so the blue line is above the black baseline horizontal line. The lowest point is April 15th, when the number of visitors was 74% less than the baseline period. An upward spike means that more people are spending time in retail and recreation locations, and a downward spike or valley means fewer people are spending time in retail and recreation locations. These data are a perfect example of how important it is, as data users, to take the time to understand what the data actually mean instead of making assumptions based on what types of data we’re used to working with.
It’s important to note what the data does and doesn’t measure. Because of the way it’s captured, it can’t be interpreted solely as changes in number of visitors or as changes in duration of visits -- both influence the data. It also doesn’t account for other contexts that might influence how people spend their time, such as changing seasons, holidays, et cetera. Again, the sample is smartphone users who have and use Google location sharing, so it is difficult to know how accurate a representation it is of the entire population. Like true chefs, we need to take our ingredients with a few grains of salt.
A meal is only as good as its ingredients, and for this recipe we have to start by finding the exact ones we need. Fortunately, the Google Community Mobility data is easily and publicly available as a raw CSV file.
For this visualization we’re going to use Myanmar as our case study, but the recipe can very easily be used for any other country in the world.
When you open it, your Excel window should look like this. If you scroll down, you’ll start to see rows with “Mandalay Metropolitan Area” or “Yangon Metropolitan Area” in the metro_area column. These are the only regions in Myanmar with sufficient data that they can be disaggregated without compromising anonymity. Other countries may have a much higher number of regions available.
For this recipe, we want to look at Myanmar as a whole, not single out or compare specific regions. The cells that are blank under the metro_area column have the data we want, so we need to delete the rows that say Mandalay or Yangon to avoid counting them twice. I’ll be talking through how to do this manually, but if you know how to use filters in Excel you can use that to show every row that isn’t [blank] under metro_area and then delete those rows.
Now, all you have to do is click delete on your keyboard and those rows will be cleared, leaving only the rows we wanted. Save your Excel sheet and you’re all done!
Now that we have our CSV file, we can import it into Datawrapper and start putting together our visualization.
Datawrapper will automatically send you to the next page, with the heading “Check & Describe.” This page will give us the opportunity to edit how Datawrapper interprets and uses the data.
Now you’ll need to repeat this step for every column we don’t want.
country_region_code,
country_region,
sub_region_1,
sub_region_2,
metro_area,
iso_3166_2_code, and
census_fips_code.
The remaining rows are date, which we definitely want, and then all 6 of our location types. Each one of these location types will appear as its own line on our line chart.
For most audiences, however, this is too cluttered and confusing. For simplicity, we could try drilling down to those areas that we feel would best reflect changes in behavior caused by policy. Out of these five, I would say that retail, workplaces, and grocery would be best suited.
This chart is definitely easier to understand, but still has a lot going on. More importantly, these data don’t seem to have any kind of meaningful relationship to each other. There’s no clear reason to present them together, other than to have as much information crammed in as possible. Furthermore, we can see that each line has dips and spikes independent of the other lines, suggesting the possibility of confounding variables.
For example, the amount of time spent in workplaces cycles through clear peaks and valleys that aren’t reflected in the other location types. On closer examination, you can see that this lines up with workdays/weekends and working holidays. The spikes upwards represent weekdays when people are spending more time at work, and the deepest spikes down are public holidays when very few people are at work. This is interesting but doesn’t really tell a story relevant to COVID, and could distract readers from what’s really important.
The Google Community Mobility data is very detailed. Even just one line is rich with information and insight. For this recipe, we’ve decided to focus on only one line in order to tell as clear and persuasive a story as possible. Of all the options, Retail & Recreation is best able to capture movement that is purely discretionary and less likely to be influenced by outside factors irrelevant to COVID. For that reason, we’ll be including Retail & Recreation as the sole location type for this visualization. However, changing the number of location types you include is as easy as clicking and unclicking “Hide column” in the “Check & Describe” tab on Datawrapper. Feel free to play around and figure out what works best for your purposes and audience.]
The column name will appear in our visualization as the variable name, so we also need to go through and rename our columns of interest.
Once you’ve done this, your dataset is completely cleaned and formatted and you’re ready to start visualizing!
Fortunately, Datawrapper does all the hard work of creating our visualization and all we’ll have to do is make some slight formatting edits.
The first thing we want to edit is the format of our dates, which currently only show up as a year on our horizontal axis.
Now, the lines and axes are formatted exactly how we want them. We could stop here -- but for even more information and context, we’re now going to add annotations to our chart to help explain the trends they reveal. Click “proceed” at the bottom of the page to go to the annotation page.
The line itself is interesting, but adding further context makes it even more insightful. The best way to add that context is with annotations.
We can start by adding the basics: a title, data source citation, and byline.
These are helpful, of course, but still not what I meant when I talked about the importance of annotations for this recipe. To tell the full story, we need to embed contextual information into the visualization itself to guide the reader’s interpretation of the data.
Datwrapper has two kinds of annotations: text annotations and range highlights. Let’s figure out when and how to use each one.
Again, this works for some of the context we want to provide, but not for everything. Singular events can’t really be represented. Now that we’ve seen and understand the limitations of each annotation type, we’re ready to create a visualization that uses both in a way that’s balanced and intentional.
We’re going to start by adding our text annotations. For reference, all of the information we use to annotate is cited at the bottom of this recipe. Our methodology was to choose significant moments without limiting ourselves to a particular genre, like landmark case numbers or policy interventions. Instead, we noticed changes in the lines themselves and then found and included the events that probably caused those spikes or dips. If we wanted to tell a specific story, then we’d choose events or context that corresponded to that story.
Once it’s been added to the chart, you’ll be able to customize it.
Once you’ve added these and adjusted them to your liking, it should look something like this:
Next we’re going to add a few range highlights. Scroll down and click “Add range highlight.”
Once you’ve added your first highlight, it should look like this.
Afterwards, your chart should look like this:
Datawrapper doesn’t have a built-in way to add data labels to range highlights, so we’re going to go back to the Text Annotation section to add labels.
Once you’ve added those, you’re all done with the chart -- well done!
The hard work is done, and now it’s time to publish your chart in a format that makes sense for your purposes.
This will generate a link to a webpage of your visualization, as well as an embed code you can use to integrate it into the webpage of an article or other medium.
If you want to print your visualization or just save it as a file on your desktop, you won’t use these options.
“Duplicate” is handy if you like this version of the chart, but would like to play around with it without losing the current version.
This recipe is a little lengthy, but it’s one that can be adapted to all kinds of different cuisines. Perhaps most importantly, you should now also be confident with a new kind of ingredient (indexed data) and able to evaluate the best way to present that ingredient for different palates. If you would like to explore how the same ingredients are used to create different meals, look at these articles listed below. These new stories approached using Mobility Data, which may be from Google or other reliable local firms. You will find it insightful on how these mobility data are presented in different ways in crafting the stories.
Now that you’ve cooked along with us, you can riff on this recipe in new ways -- maybe using multiple lines to tell more complex stories, or highlighting values as opposed to timeframes with the range highlights.