Visualizing the Impact of COVID-19 and Policymaking on Mobility

Stats

Ingredients

Google’s Community Mobility data

Tools

Datawrapper, Excel

Read in other Languages

မြန်မာ

ខ្មែរ

Introduction

The Data

Google captures movement trends across six different place categories: retail & recreation areas, groceries & pharmacies, parks, transit stations, workplaces, and residential areas. The data has been collected anonymously from Google users who have chosen to turn on Location Sharing and is only presented as an aggregate. Unlike other data, these mobility data are indexed, meaning that the raw data has been normalized around a common starting point -- in this case, a random week in mid-January. That period serves as the zero points to see how mobility has changed over the past year relative to that baseline. This allows us to compare different types of mobility data meaningfully, according to the percent change instead of the absolute numbers.

We need to understand before proceeding to the data extraction and cleaning part. Generally, data sources provide codebooks and metadata which disclose information about the data, contents, and layout of the data file. It is crucial to trace these codebooks to clearly know what these data represent, the units and contexts behind the data, and how to make sense of these datasets to create meaningful visualization. Fortunately, this Google Mobility Data provides guidance about these mobility datasets. Go to the links below to get yourself acquainted with this dataset:

Overview

Understand the data

‍

‍

In the line chart we are going to create, the X-axis is specific dates and the Y-axis is the percent change from the baseline. On March 1st, for example, the number of visitors to retail and recreation locations (more on this later) was actually 5% higher than the baseline period, so the blue line is above the black baseline horizontal line. The lowest point is April 15th, when the number of visitors was 74% less than the baseline period. An upward spike means that more people are spending time in retail and recreation locations, and a downward spike or valley means fewer people are spending time in retail and recreation locations. These data are a perfect example of how important it is, as data users, to take the time to understand what the data actually mean instead of making assumptions based on what types of data we’re used to working with.

It’s important to note what the data does and doesn’t measure. Because of the way it’s captured, it can’t be interpreted solely as changes in number of visitors or as changes in duration of visits -- both influence the data. It also doesn’t account for other contexts that might influence how people spend their time, such as changing seasons, holidays, et cetera. Again, the sample is smartphone users who have and use Google location sharing, so it is difficult to know how accurate a representation it is of the entire population. Like true chefs, we need to take our ingredients with a few grains of salt.

step-1

Importing the Data

A meal is only as good as its ingredients, and for this recipe we have to start by finding the exact ones we need. Fortunately, the Google Community Mobility data is easily and publicly available as a raw CSV file.

‍

1.1) Sourcing Data

Go to the Google Community Mobility Report website: https://www.google.com/covid19/mobility/.

Click “Region CSVs” to start downloading all of the country data as a zipped file. Once it has been downloaded, you can find your country of interest using its two-letter country code. For example, Myanmar is under “2020_MM_Region_Mobility_Report.csv”, Cambodia is under “2020_KH_Region_Mobility_Report.csv”, and so forth.

For this visualization we’re going to use Myanmar as our case study, but the recipe can very easily be used for any other country in the world.

Double-click the file name and it will open in Excel.

When you open it, your Excel window should look like this. If you scroll down, you’ll start to see rows with “Mandalay Metropolitan Area” or “Yangon Metropolitan Area” in the metro_area column. These are the only regions in Myanmar with sufficient data that they can be disaggregated without compromising anonymity. Other countries may have a much higher number of regions available.

For this recipe, we want to look at Myanmar as a whole, not single out or compare specific regions. The cells that are blank under the metro_area column have the data we want, so we need to delete the rows that say Mandalay or Yangon to avoid counting them twice. I’ll be talking through how to do this manually, but if you know how to use filters in Excel you can use that to show every row that isn’t [blank] under metro_area and then delete those rows.

Scroll down to where the rows with “Mandalay Metropolitan Area” start to appear, and click the row number on the far left of the first row for Mandalay.

Now, scroll all the way down until the very last populated row in the sheet (which should say “Yangon Metropolitan Area” under metro_area). Hold down the shift key on your keyboard and click the row number on the far left of that row. Now, every single row that isn’t blank under metro_area should be highlighted.

Now, all you have to do is click delete on your keyboard and those rows will be cleared, leaving only the rows we wanted. Save your Excel sheet and you’re all done!

1.2) Importing Data

Now that we have our CSV file, we can import it into Datawrapper and start putting together our visualization.

Go to this page to get started: https://www.datawrapper.de/.
Click “Start Creating” and you’ll be able to start a new chart without creating an account, or you can log in if you’ve worked with Datawrapper before.

This should bring you to the first page of chart creation, where you can upload your dataset. Click “XLS/CSV upload.”

This will bring up your files so you can find and select the one with our data, called 2020_MM_Region_Mobility_Report.csv (unless you renamed it after making edits). My finder automatically pulled up my most recent folder, which was the Google Mobility data folder, but you may have to search for the file if that doesn’t happen automatically for you.
Once you’ve selected your file, click “Open.”

Datawrapper will automatically send you to the next page, with the heading “Check & Describe.” This page will give us the opportunity to edit how Datawrapper interprets and uses the data.

Make sure that “First row as label” has a check mark next to it, which should happen automatically.

Any data that we give Datawrapper, Datawrapper is going to try and use in the visualization. We don’t want that for the blank or redundant columns, like country_region or sub_region_1. Thankfully, Datawrapper makes it easy to tell them which columns to ignore. Start by clicking “A” above country_region_code to select that column.

This will bring up options to edit the column. Click “Hide column from visualization,” and that column will appear greyed out.

Now you’ll need to repeat this step for every column we don’t want.

Hide the following columns:

country_region_code,

country_region,

sub_region_1,

sub_region_2,

metro_area,

iso_3166_2_code, and

census_fips_code.

The remaining rows are date, which we definitely want, and then all 6 of our location types. Each one of these location types will appear as its own line on our line chart.

For most audiences, however, this is too cluttered and confusing. For simplicity, we could try drilling down to those areas that we feel would best reflect changes in behavior caused by policy. Out of these five, I would say that retail, workplaces, and grocery would be best suited.

This chart is definitely easier to understand, but still has a lot going on. More importantly, these data don’t seem to have any kind of meaningful relationship to each other. There’s no clear reason to present them together, other than to have as much information crammed in as possible. Furthermore, we can see that each line has dips and spikes independent of the other lines, suggesting the possibility of confounding variables.

For example, the amount of time spent in workplaces cycles through clear peaks and valleys that aren’t reflected in the other location types. On closer examination, you can see that this lines up with workdays/weekends and working holidays. The spikes upwards represent weekdays when people are spending more time at work, and the deepest spikes down are public holidays when very few people are at work. This is interesting but doesn’t really tell a story relevant to COVID, and could distract readers from what’s really important.

The Google Community Mobility data is very detailed. Even just one line is rich with information and insight. For this recipe, we’ve decided to focus on only one line in order to tell as clear and persuasive a story as possible. Of all the options, Retail & Recreation is best able to capture movement that is purely discretionary and less likely to be influenced by outside factors irrelevant to COVID. For that reason, we’ll be including Retail & Recreation as the sole location type for this visualization. However, changing the number of location types you include is as easy as clicking and unclicking “Hide column” in the “Check & Describe” tab on Datawrapper. Feel free to play around and figure out what works best for your purposes and audience.]

To replicate our chart, go through and hide location columns other than Retail & Recreation (parks, transit stations, workplaces, and grocery.)

The column name will appear in our visualization as the variable name, so we also need to go through and rename our columns of interest.

Double-click the first row/header to bring up the text edit box and type the name you want.

Once you’ve done this, your dataset is completely cleaned and formatted and you’re ready to start visualizing!

Click “Proceed” to go to the next page.

step-2-creating-line-chart

Creating Line Chart

Fortunately, Datawrapper does all the hard work of creating our visualization and all we’ll have to do is make some slight formatting edits.

It should automatically select Lines as the chart type, but if not, all you need to do is click “Lines.” Click the “Refine” tab to go to the next page.The first thing we want to edit is the format of our dates, which currently only show up as a year on our horizontal axis.

The first thing we want to edit is the format of our dates, which currently only show up as a year on our horizontal axis.

Click the dropdown menu next to “tick format” and select “(custom)”.

A text box will open up for you to enter your format. For the dates to show as “Jan 01,” “Feb 01,” etc., you can enter “MMM DD” into that text box. This will change the date formatting both on our horizontal axis, and on our tooltips (the informational popup that appears when you hover over a certain point on a line).

Now, scroll down until you see the header “Labelling” on the left side of your screen. Because we only have one line, there isn’t really a need to label it. Click “none” to hide it.

Now, the lines and axes are formatted exactly how we want them. We could stop here -- but for even more information and context, we’re now going to add annotations to our chart to help explain the trends they reveal. Click “proceed” at the bottom of the page to go to the annotation page.

step-3

Annotating Line Chart

The line itself is interesting, but adding further context makes it even more insightful. The best way to add that context is with annotations.

3.1) Basic Annotations

We can start by adding the basics: a title, data source citation, and byline.

Type the title you want in the textbox under “Title.” According to Google, they want to be cited as “Google LLC” with a link to the data (https://www.google.com/covid19/mobility/), so we can fill those in under “Data source” and “Link to data source” respectively.
Finally, put your own name or the name of your organization under “Byline.”

These are helpful, of course, but still not what I meant when I talked about the importance of annotations for this recipe. To tell the full story, we need to embed contextual information into the visualization itself to guide the reader’s interpretation of the data.

Datwrapper has two kinds of annotations: text annotations and range highlights. Let’s figure out when and how to use each one.

Again, this works for some of the context we want to provide, but not for everything. Singular events can’t really be represented. Now that we’ve seen and understand the limitations of each annotation type, we’re ready to create a visualization that uses both in a way that’s balanced and intentional.

3.2) Add Text Annotations

We’re going to start by adding our text annotations. For reference, all of the information we use to annotate is cited at the bottom of this recipe. Our methodology was to choose significant moments without limiting ourselves to a particular genre, like landmark case numbers or policy interventions. Instead, we noticed changes in the lines themselves and then found and included the events that probably caused those spikes or dips. If we wanted to tell a specific story, then we’d choose events or context that corresponded to that story.

To add a text annotation, click “Add text annotation” and then click where on the chart you’d like it to appear. Feel free to estimate here -- once you’ve added it you’ll have the option to enter the exact date or other value where you’d like it to appear.

Our first annotation will be March 24th, when the first case was confirmed in Myanmar. Click “Add text annotation” to get started.

Then, click wherever on the map you want the annotation to appear. Don’t worry too much about this -- you’ll be able to manually adjust it later with the exact date value.

‍

Once it’s been added to the chart, you’ll be able to customize it.

Start by typing the text you want to appear in the “Text” box.
To maximize the amount of text that would fit in the chart, we reduced the text size to 10 px.
Where it says X next to “Position,” you can input the exact date to use as the X-axis position marker. In this case, input “2020/03/24” Now that it’s positioned horizontally exactly how we want it, you can move it up and down according to what you think looks best. You can also copy the exact values from the below screenshot to get the same look as our visualization.
To make sure the text fits perfectly, type in 8.22% for Width.

Finally, click the straight line option for “Line end” to reduce visual clutter.

This process can be repeated for all the text annotations we have included, and any that you would like to add. The below chart includes the text and position information for each data point we chose. I recommend clicking “Duplicate annotation” on your first annotation to preserve the formatting edits you’ve already made, and then inputting the information from the chart into the Datawrapper interface.

Once you’ve added these and adjusted them to your liking, it should look something like this:

3.3) Add Range Highlights

Next we’re going to add a few range highlights. Scroll down and click “Add range highlight.”

Again, you’ll have to add the range highlight to the chart by clicking directly where you want it, but once you’ve done that you’ll be able to edit it manually. Click somewhere around early April.

Initially, it will appear as a horizontal line. Click the box with a vertical line next to “Orientation,” and then click “Range.”
Once you’ve created your vertical range highlight, edit the textbox next to “Position Start:” to read 2020/04/10, and “End:” to 2020/04/18.
We also increased the Opacity to 25% using the slider, which can also be done manually by typing in the value.

Once you’ve added your first highlight, it should look like this.

Next, repeat the process using the values in the below chart for the remaining two highlights. Remember to fix the orientation, type, and opacity.

‍

Afterwards, your chart should look like this:

Datawrapper doesn’t have a built-in way to add data labels to range highlights, so we’re going to go back to the Text Annotation section to add labels.

Click “Add text annotation” and use the following values for position, or click and drag them over their corresponding range highlights.

Once you’ve added those, you’re all done with the chart -- well done!

step-4

Publishing

The hard work is done, and now it’s time to publish your chart in a format that makes sense for your purposes.

Click “Proceed” at the bottom of the page and you’ll end up in the “Design” tab. These options likely aren’t relevant to you, unless you want to integrate social media “Share” buttons into the visualization. You can do this easily by toggling on “Enable social media share buttons.” Then, click “Proceed” again.

Now, you’re in the “Publish & Embed” page. If you want to link to your visualization or embed it directly into a website (recommended for online distribution), click the big blue button:

This will generate a link to a webpage of your visualization, as well as an embed code you can use to integrate it into the webpage of an article or other medium.

If you want to print your visualization or just save it as a file on your desktop, you won’t use these options.

Instead, click PNG under “Export or duplicate chart.”

“Duplicate” is handy if you like this version of the chart, but would like to play around with it without losing the current version.

‍

Conclusion

This recipe is a little lengthy, but it’s one that can be adapted to all kinds of different cuisines. Perhaps most importantly, you should now also be confident with a new kind of ingredient (indexed data) and able to evaluate the best way to present that ingredient for different palates. If you would like to explore how the same ingredients are used to create different meals, look at these articles listed below. These new stories approached using Mobility Data, which may be from Google or other reliable local firms. You will find it insightful on how these mobility data are presented in different ways in crafting the stories.

Why Europe’s second, less severe lockdowns are working (The Economist)

Turkey's mobility sharply decreases as COVID-19 measures yield results (Daily Sabah)

New mobile data shows influx of shoppers ahead of Ontario lockdown (Global News CA)

Now that you’ve cooked along with us, you can riff on this recipe in new ways -- maybe using multiple lines to tell more complex stories, or highlighting values as opposed to timeframes with the range highlights.