In previous recipes, we have worked with COVID-19 case and death data and created some interactive visualizations. This time, we are moving our attention to the socioeconomic impacts of the pandemic. The COVID-19 pandemic has imposed travel restrictions and social distancing measures, which have led to a disruption in employment and impacted livelihoods worldwide. While all economic sectors have been affected by the pandemic, tourism is one of the hardest-hit. The United Nations estimated that over 100 million direct tourism jobs are at risk. This post will analyze the socioeconomic impacts of COVID-19 on Cambodia's tourism sector and prepare a relevant visualization.
This recipe was produced with the generous support of the Institute for War and Peace Reporting.
According to the World Health Organization, Cambodia has reported 363 cases with no deaths as of December 22, 2020. Despite its apparent success in fighting against the spread of COVID-19, Cambodia has experienced significant negative effects in its key economic sectors: garments, construction, hotels and restaurants, transportation and communication, and agriculture. The impact on these sectors has substantial socioeconomic implications in this developing country.
In this recipe, we will examine the socioeconomic impacts triggered by the collapse of the tourism sector. Among the various economic sectors in Cambodia, tourism is noteworthy to explore because:
Travel restrictions vastly jeopardize the tourism sector, and Cambodia became one of the top twenty countries with a significant decline in the number of tourists. The following chart shows the number of international tourist arrivals in Cambodia, which has declined in 2020 due to the global pandemic. Between January and June 2020, Cambodia was able to retain just 1.24 million foreign visitors, which is 74 percent lower than the number of tourist arrivals during the same months in 2019.
In this recipe, we will work with the data on international tourist arrivals to Cambodia from 2015 to 2020 and create an animated visualization as shown below:
First, we need to obtain data on international tourist arrivals to Cambodia from a reliable source. It is a best practice to get data from a primary source. A primary source provides direct or first-hand testimony or evidence. In our context, this would be Cambodia’s Ministry of Tourism.
Unfortunately, the Ministry’s website does not have the most recent report. A Google search however will lead you to the Cambodian Tourism Statistics Report as of September 2020 on the website of NagaCorp, a company that operates an integrated resort in Cambodia’s capital, Phnom Penh, which comprises one of the country’s largest luxury hotels and a popular casino. Since NagaCorp is not the official tourism authority, it is advised to check whether their report is reliable. In comparing the September 2020 report to the other reports posted on the Ministry’s website, you will see that the formats are somewhat identical, including the official logo of the Ministry on the cover page. Thus, we may assume that the reports posted on NagaCorp have been authored and verified by Cambodia’s Ministry of Tourism.
The table below is from the aforementioned report. It shows the monthly international tourist arrivals to Cambodia between 2015 and 2020. We will use this data table to create an animated chart.
One trick that we can use here to look for additional data is to slightly modify the URL to access different files on the website. If you examine the URL for the PDF report, you’ll see that it ends with “/tourism_statistics_202009.pdf”. The “202009” at the end hints that this particular file is for the September (month 09) of the year 2020. If we would like to find reports that correspond to other months, we might be able to access them by changing “202009” to some other “YYYYMM” value. For instance the report for January of 2020 can be accessed at this URL: https://www.nagacorp.com/eng/ir/tourism/tourism_statistics_202001.pdf
This is one common technique that can be used to search for datasets within a particular website, if you can guess from the pattern of the URLs that there might be additional files with similarly formatted names that are available on the site. For this recipe, we will just be using data from the September 2020 report.
You can find this table on Page-2 along with the graph showing the tourism arrival trends for each year. This graph is informational, but we will be making an animated version that can more effectively deliver the impact of COVID-19 on Cambodia’s tourism sector in 2020.
A challenging aspect in this step is that the data table is in a PDF format. To analyze the data and create the visualization, we need to follow extra steps to export this table into a processable format that is machine-readable and structured.
We will use a free and open source tool called Tabula to extract this dataset from PDF to Excel format.
You will find data tables in PDF files that cannot be easily copied or imported into Excel or Word. This section will show you how to extract these data tables from PDF files using a free and time-saving tool called Tabula. This tool works well in most PDF files with black and white data tables and does not require an internet connection.
Before installing Tabula, you need to ensure if Java is installed on your computer. If you already have Java, you may skip ahead to the installation steps for Tabula.
Once Java is installed, you are ready to start the installation process for Tabula.
Now that Tabula has been installed, we can start working on the conversion process from PDF to Excel.
After uploading the file, you can see the file name under the Imported PDFs category.
The PDF file will pop up. You will see that you can now select the tables in this pdf document.
Here, we will select the area of the table we want to convert. You should carefully select to include only those necessary columns and rows of the table. Do not expand too much to include text that is not part of the table, such as the table titles or notes.
A window appears that displays the preview of the extracted data in a structured, machine-readable format. Make sure that the data looks correct as shown below.
In case the table is not in the right format, you can click the Revise selection(s) to repeat the selection. If it still appears to be a bit off, you may have to edit manually as appropriate.
There are two extraction methods to choose from: Stream and Lattice. Stream extraction method is used for tables where rows and columns are separated by blank space. Lattice works better for tables where rows and columns are separated by lines. You can see that Tablua automatically selects the Stream extraction method for this table.
Now, we are ready to export the dataset.
A CSV file named “tabula-tourism_statistics_202009” will be exported.
The data table is now in processable format. We can start the data cleaning process for the visualization part.
In the table, quarterly tourist arrival data (Q1, Q2, Q3, Q4) and total numbers are not required. We will delete these rows.
The 2 screenshots below are using Excel, but you can also follow the same steps on Google Sheets to get the same results.
Your final cleaned data table should look like this:
We have completed the data cleaning process. Let’s proceed to the most exciting part: creating the visualization! We will use Flourish to make an animated chart.
After you have signed up, a Project page will appear.
You will be directed to the Template page. Here, you can make a selection from an array of charts, visualization techniques, and functions.
We want to create a line chart race in which monthly tourist arrival data for different years are compared. Let’s select the chart type based on what we want.
A visualization page with the running line chart race will appear:
This is an initial preview of your chart. To customize our chart, we need to upload the “International Tourist Arrivals to Cambodia” data.
It will notify you with the number of uploaded rows .
Now, the datatable has been uploaded.
In this graph, each running line represents the number of international tourist arrivals to Cambodia for each year (2015, 2016, 2017, 2018, 2019, 2020) from January to December. For a racing line chart on Flourish, each row of data has to represent each year. We then need to transpose the rows and columns.
Afterwards, we need to make selections for the Name column and Score column.
Name column is where the racing lines (years) will be. Score columns show the values of these competing lines (tourist arrivals in these months).
Now. let’s get back to Preview. There, you can see the racing lines for each year.
The graph may look odd because it is based on Ranks. We will work on each of the elements on the right to make the animated graph look professional.
On the right of the page, you can see a column listing different chart elements. We will work on each of these elements to make the chart more intuitive and attractive. Although we recommend certain selections, you can play around with these options to explore different possibilities.
First we want to make the graph look like below which requires adjustments in View and Scoring type sections.
View
We can choose how to animate this chart. Since we want to show the whole picture,
Scoring type
In the chart, we want to show the actual number of tourist arrivals each year instead of ranking them.
Next, we want to make the chart organized as shown below. Do you detect any changes? Yes, we made some changes in size, control, and colors.
Chart sizing
We need to make this chart in a portable size so that it can be conveniently embedded anywhere. Let’s change its size.
We can add some margins to the graphs by typing in these values in the respective boxes.
Controls
Animated graphs usually include a Replay button which allows users to repeat the animation if they desire.
We can make this Replay button look even better by setting as below:
Leave the options for Button Group Styles as default, as shown below.
Colors
Color is an essential element of this visualization. Flourish provides the default palettes. But we will customize the color palette for all the lines because we want to make the line colors for 2020 distinctive.
2020: #ff9966
2019: #ffbf59
2018: #7a8099
2017: #c2b0af
2016: #99cccc
2015: #339999
Now we have made some significant changes in the map elements. However, we need to make changes in the lines and labels cleaner and easier to read.
Line styles
It is better to have thinner lines, especially when they overlap.
Line width: 0.1
Opacity: 1
Curve: Straight
Shading behind line: Off
Circle styles
The circles, which mark the beginning and end of the trend lines, are important visual elements of the graph.
Start radius: 0.4
End radius: 0.4
End stroke: 0.4
Space between: 4
Stroke color: Background
Image inside circle: Off
Label styles
We will resize the label by selecting
In the upcoming step, we will change labels in the X-axis and Y-axis to be more organized with improved visibility.
Y axis
We will make some subtle changes to the default values of the Y-axis.
For the Y-axis values, we would like to start it from zero.
In the Number Styling,
X axis
In the X-axis,
We can adjust the speed of these racing lines. Let’s reset the animation and mode duration:
Animation duration: 1200
Mode duration: 300
Number formatting
The number formatting can be set to default as below:
Decimal separator in data sheet: . (decimal point)
Number format to display: 12,235.67
Layout
In this section, you can change the fonts used, background color, size and structure of this graph.
First we will work on the font:
We are close to the finishing line. We will add a few information in the header and footer as final elements.
Header
For the Header,
Make sure that Change title style and Change Subtitle styles are turned off.
We will also add some text which briefly explains what this chart conveys.
And we will make changes for the Border:
Footer
For the Footer,
Now we can export and publish our interactive chart.
In the prompted window, click Publish.
You can now embed this animated chart in your website using the link provided.
4.1) Socioeconomic Implications
The decline in tourist arrivals has resulted in devastating socioeconomic impacts for a country like Cambodia which relies substantially on tourism for its national revenue. This decline could result in the loss of approximately 3 billion US dollars in tourism revenue and leave 110,000 workers in the sector at risk. Simultaneously, this could worsen social inequality in the country because the tourism sector largely employs women. Providing opportunities for women at different occupation levels, tourism in Cambodia has been considered an important sector in promoting gender equality. Because of the pandemic, the country’s continued success in promoting gender equality may fall short.
You can see how different recent news articles craft stories on tourism in the links below:
4.2) Navigating Newsworthy Issues
At a global scale, the COVID-19 pandemic creates hardship in livelihoods in every sector and leaves negative consequences across the economic, environmental, human, social, political, and security dimensions. Accessing and reporting on the pandemic's impacts on societies, economies, and vulnerable communities become more and more important. The United Nations Development Programmes (UNDP) has prepared such assessment reports. These resources may help come up with interesting data stories for the socioeconomic impacts of COVID-19.
As we move into the new normal world with heightened uncertainty, understanding socioeconomic impacts has become critical in formulating effective policies and targeting limited resources. Extensive and in-depth studies on different social and economic issues will emerge over time. With the growing necessity of COVID-19 data stories in public awareness and policy dialogue, Thibi Recipes will continue to guide data journalists with efficient approaches in crafting compelling data stories with comprehensive guidelines in data preparation, analysis, and visualization.