The Humanitarian Data Exchange (HDX) is an open platform for sharing data across crises and organisations. Launched in July 2014. The goal of HDX is to make humanitarian data easy to find and use for analysis. Our growing collection of datasets has been accessed by users in over 200 countries and territories. Watch this video to learn more.
HDX is based on CKAN, an open-source data management system for powering data hubs and data portals. The source code for HDX can be found here.
This section contains information for developers who want to write code that interacts with the Humanitarian Data Exchange (HDX) and the datasets it contains. Anything that you can do by way of the HDX user interface, you can do programatically by making calls to the API and you can do a lot more. Typical uses of the API might be to script the creation and update of datasets in HDX or to read data for analysis and visualisation.
HDX has a RESTful API largely unchanged from the underlying CKAN API which can be used from any programming language that supports HTTP GET and POST requests. However, the terminology that CKAN uses is a little different to the HDX user interface. Hence, we have developed wrappers for specific languages that harmonise the nomenclature and simplify the interaction with HDX.
These APIs allow various operations such as searching, reading and writing dataset metadata, but not the direct querying of data within resources which can point to files or urls and of which there can be more than one per dataset.
The recommended way of developing against HDX is to use the HDX Python API. This is a mature library that supports Python 2.7 and 3 with tests that have a high level of code coverage. The major goal of the library is to make pushing and pulling data from HDX as simple as possible for the end user. There are several ways this is achieved. It provides a simple interface that communicates with HDX using the CKAN Python API, a thin wrapper around the CKAN REST API. The HDX objects, such as datasets and resources, are represented by Python classes. This should make the learning curve gentle and enable users to quickly get started with using HDX programmatically. For example, to read a dataset and get its resources, you would simply do:
from hdx.hdx_configuration import Configuration from hdx.data.dataset import Dataset Configuration.create(hdx_site='prod', user_agent='A_Quick_Example', hdx_read_only=True)' dataset = Dataset.read_from_hdx('novel-coronavirus-2019-ncov-cases') resources = dataset.get_resources()
There is library API-level documentation available online.
If you intend to push data to HDX, then it may be helpful to start with this scraper template which shows what needs to be done to create datasets on HDX. It should be straightforward to adapt the template for your needs.
If you wish to read data from HDX for analysis in R, then you can use the rhdx package. The goal of this package is to provide a simple interface to interact with HDX. Like the Python API, it is a wrapper around the CKAN REST API. rhdx is not yet fully mature and some breaking changes are expected.
If you need to use another language or simply want to examine dataset metadata in detail in your web browser, then you can use CKAN’s RESTful API, a powerful, RPC-style interface that exposes all of CKAN’s core features to clients.
This section contains information for developers who want to write code to process datasets that use the Humanitarian Exchange Language (HXL). HXL is a different kind of data standard, adding hashtags to existing datasets to improve information sharing during a humanitarian crisis without adding extra reporting burdens. HXL has its own website and of particular interest will be the documentation section.
import hxl source = hxl.data('http://example.org/dataset.xlsx')
As in JQuery, you process the dataset by adding additional steps to the chain. The following example selects every row with the organisation “UNICEF” and removes the column with email addresses:
The library also includes a set of command-line tools for processing HXL data in shell scripts. For example, the following will perform the same operation shown above, without the need to write Python code:
$ cat dataset.xlsx | hxlselect -q "#org=UNICEF" | hxlcut -x '#contact+email'
There is library API-level documentation available online.
HDX provides a suite of tools that leverage HXLated datasets:
- QuickCharts automatically generates embeddable, live data charts, graphs and key figures from your data. It uses the HXL hashtags to guess the best charts to display, but you can then go in and override with your own preferences.
- HXL Tag Assist allows you to find hashtag examples and definitions, and see how data managers are using the hashtags in their data.
- Data Check provides help with data cleaning for humanitarian data, automatically detecting and highlighting common errors. It includes validation against CODs and other vocabularies.
The HXL Proxy is a tool for validating, cleaning, transforming, and visualising HXL-tagged data. You supply an input url pointing to a tabular or JSON dataset and then create a recipe that contains a series of steps for transforming the data. The result is a download link that you can share and use in HDX, and the output will update automatically whenever the source dataset changes. Full user documentation is available in the HXL Proxy wiki.
The HXL Proxy is primarily a web wrapper around the libhxl-python library (see above), and makes the same functionality available via RESTful web calls.
Humanitarian projects frequently require handling countries, locations and regions in particular dealing with inconsistent country naming between different data sources and different coding standards like ISO3 and M49. The HDX Python Country library was created to fulfill these requirements and is a dependency of the HDX Python API. It is also very useful as a standalone library and has library API-level documentationavailable online.
All kinds of utility functions have been coded over time for use internally, so since we think these have value externally, it was decided that they should be packaged into the HDX Python Utilities library which has library API-level documentation available online.