The sources are the Disease Outbreak News (DONs) and the Coronavirus Dashboard produced by the World Health Organization (WHO). This information is issued under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Intergovernmental Organization (CC BY-NC-SA 3.0 IGO) license, which allows users to freely copy, reproduce, reprint, distribute, translate, and adapt WHO materials for non-commercial purposes.
The information from the DONs includes all reports on confirmed acute public health events or potential events of concern that have occurred since 1996. Specifically, DONs include events of:
- Unknown cause but with significant or potential health concern that may affect international travel or trade.
- Known cause with demonstrated ability to produce a serious public health impact and spread internationally.
- High public concern that could potentially disrupt required public health interventions or international travel and trade.
The Coronavirus Dashboard presents information reported by official public health authorities from countries and territories worldwide.
The data collection and integration processes to produce this dataset consist of the following steps:
First, DONs are collected from the WHO website. This process was automated using an R script to extract the information from the DONs. The earliest DON records a cholera outbreak reported on 22 January 1996 in Cabo Verde, Côte d'Ivoire, the Islamic Republic of Iran, Iraq, and Senegal.
To ensure standardized concepts and definitions, official short country names in English, according to ISO-3166-23 and International Statistical Classification of Diseases and Related Health Problems 10th Revision, are used.
Three recording issues need to be tackled at this stage:
-
Some DONs report multiple diseases.
-
Some DONs report disease outbreaks occurring in more than one country.
-
Some DONs register the same outbreak multiple times due to situation updates.
To resolve these issues:
-
For DONs reporting more than one disease (for instance, DON0065 on influenza and malaria in Ghana, or DON1094 on chikungunya and dengue in the southwest Indian Ocean) and/or reporting more than one country (e.g., DON1540 about an outbreak of polio in Angola and the Democratic Republic of the Congo, or DON0617 on a meningococcal disease outbreak in the Great Lakes area) the DON is replicated for each diseases (or country). For instance, DON0617 informs of an outbreak that occurred in Burundi, Rwanda, and Tanzania (Great Lakes area). Therefore, this DON was registered three times, one for each country.
-
To avoid multiplicity issues, we deleted all DONs that reported the same disease in the same country more than once in a calendar year. Variants or mutations of viruses, such as avian influenza A(H1N1), A(H1N2), A(H5N1), A(H3N2), etc., were considered the same disease, i.e., influenza due to identified zoonotic or pandemic influenza virus. This ensured only one observation per disease, country, and year.
Then, given that outbreaks related to COVID-19 are not included in the DONs, this information is extracted from the Coronavirus Dashboard. Specifically, we dichotomized the data on cases per country per year, assigning a value of one if a country had at least one reported case of Coronavirus, and zero otherwise. For standardization, we followed the same approach as before, using the official short country names in English according to ISO-3166-23 and ICD-10.
Lastly, the geographic information of administrative boundaries is merged with the resulting data from the previous steps.
More details about the methodology can be found in Torres Munguía, Badarau, Díaz Pavez, Martínez-Zarzoso & Wacker. "A global dataset of pandemic- and epidemic-prone disease outbreaks" Sci Data 9, 683 (2022). https://doi.org/10.1038/s41597-022-01797-2. Read the paper by clicking here!.