Metabiota is the pioneer in comprehensive risk analytics that help organizations and countries build resilience to epidemics and protect global public health. Built on a strong foundation of scientific expertise, including a worldwide network of on-the-ground experts, Metabiota delivers actionable, data-driven analytics to help countries and corporations mitigate complex health issues. With a strategic global presence and sustained partnerships, Metabiota’s agile approach helps identify, analyze, and transfer the risk associated with biological threats.
| Dataset date: December 01, 2019-August 13, 2022
This repository contains spatiotemporal data from many official sources for 2019-Novel Coronavirus beginning 2019 in Hubei, China ("nCoV_2019")
You may not use this data for commercial purposes. If there is a need for commercial use of the data, please contact Metabiota at email@example.com to obtain a commercial use license.
The incidence data are in a CSV file format. One row in an incidence file contains a piece of epidemiological data extracted from the specified source.
The file contains data from multiple sources at multiple spatial resolutions in cumulative and non-cumulative formats by confirmation status. To select a single time series of case or death data, filter the incidence dataset by source, spatial resolution, location, confirmation status, and cumulative flag.
Data are collected, structured, and validated by Metabiota’s digital surveillance experts. The data structuring process is designed to produce the most reliable estimates of reported cases and deaths over space and time. The data are cleaned and provided in a uniform format such that information can be compared across multiple sources. Data are collected at the time of publication in the highest geographic and temporal resolutions available in the original report.
This repository is intended to provide a single access point for data from a wide range of data sources. Data will be updated periodically with the latest epidemiological data. Metabiota maintains a database of epidemiological information for over two thousand high-priority infectious disease events. Please contact us (firstname.lastname@example.org) if you are interested in licensing the complete dataset.
Cumulative vs. Non-Cumulative Incidence
Reporting sources provide either cumulative incidence, non-cumulative incidence, or both. If the source only provides a non-cumulative incidence value, the cumulative values are inferred using prior reports from the same source. Use the CUMULATIVE FLAG variable to subset the data to cumulative (TRUE) or non-cumulative (FALSE) values.
Case Confirmation Status
The incidence datasets include the confirmation status of cases and deaths when this information is provided by the reporting source. Subset the data by the CONFIRMATION_STATUS variable to either TOTAL, CONFIRMED, SUSPECTED, or PROBABLE to obtain the data of your choice.
Total incidence values include confirmed, suspected, and probable incidence values. If a source only provides suspected, probable, or confirmed incidence, the total incidence is inferred to be the sum of the provided values. If the report does not specify confirmation status, the value is included in the "total" confirmation status value.
The data provided under the "Metabiota Composite Source" often does not include suspected incidence due to inconsistencies in reporting cases and deaths with this confirmation status.
Outcome - Cases vs. Deaths
The incidence datasets include cases and deaths. Subset the data to either CASE or DEATH using the OUTCOME variable. It should be noted that deaths are included in case counts.
Data are provided at multiple spatial resolutions. Data should be subset to a single spatial resolution of interest using the SPATIAL_RESOLUTION variable.
Information is included at the finest spatial resolution provided to the original epidemic report. We also aggregate incidence to coarser geographic resolutions. For example, if a source only provides data at the province-level, then province-level data are included in the dataset as well as country-level totals. Users should avoid summing all cases or deaths in a given country for a given date without specifying the SPATIAL_RESOLUTION value. For example, subset the data to SPATIAL_RESOLUTION equal to “AL0” in order to view only the aggregated country level data.
There are differences in administrative division naming practices by country. Administrative levels in this dataset are defined using the Google Geolocation API (https://developers.google.com/maps/documentation/geolocation/). For example, the data for the 2019-nCoV from one source provides information for the city of Beijing, which Google Geolocations indicates is a “locality.” Beijing is also the name of the municipality where the city Beijing is located. Thus, the 2019-nCoV dataset includes rows of data for both the city Beijing, as well as the municipality of the same name. If additional cities in the Beijing municipality reported data, those data would be aggregated with the city Beijing data to form the municipality Beijing data.
Data sources in this repository were selected to provide comprehensive spatiotemporal data for each outbreak. Data from a specific source can be selected using the SOURCE variable.
In addition to the original reporting sources, Metabiota compiles multiple sources to generate the most comprehensive view of an outbreak. This compilation is stored in the database under the source name “Metabiota Composite Source.” The purpose of generating this new view of the outbreak is to provide the most accurate and precise spatiotemporal data for the outbreak. At this time, Metabiota does not incorporate unofficial - including media - sources into the “Metabiota Composite Source” dataset.
Data are collected by a team of digital surveillance experts and undergo many quality assurance tests. After data are collected, they are independently verified by at least one additional analyst. The data also pass an automated validation program to ensure data consistency and integrity.
NonCommercial Use License
Creative Commons License Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
This is a human-readable summary of the Legal Code.
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:
Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Noncommercial — You may not use this work for commercial purposes.
Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
With the understanding that:
Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.
Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
Other Rights — In no way are any of the following rights affected by the license:
Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations;
The author's moral rights;
Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights.
Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.
For details and the full license text, see http://creativecommons.org/licenses/by-nc-sa/3.0/
Metabiota shall in no event be liable for any decision taken by the user based on the data made available. Under no circumstances, shall Metabiota be liable for any damages (whatsoever) arising out of the use or inability to use the database. The entire risk arising out of the use of the database remains with the user.