Resources for Developers
Don't see an answer you are looking for?
About the Humanitarian Data Exchange

The Humanitarian Data Exchange (HDX) is an open platform for sharing data across crises and organisations. Launched in July 2014. The goal of HDX is to make humanitarian data easy to find and use for analysis. Our growing collection of datasets has been accessed by users in over 200 countries and territories. Watch this video to learn more.

HDX is based on CKAN, an open-source data management system for powering data hubs and data portals. The source code for HDX can be found here.

Accessing HDX by API

This section contains information for developers who want to write code that interacts with the Humanitarian Data Exchange (HDX) and the datasets it contains. Anything that you can do by way of the HDX user interface, you can do programatically by making calls to the API and you can do a lot more. Typical uses of the API might be to script the creation and update of datasets in HDX or to read data for analysis and visualisation.

HDX has a RESTful API largely unchanged from the underlying CKAN API which can be used from any programming language that supports HTTP GET and POST requests. However, the terminology that CKAN uses is a little different to the HDX user interface. Hence, we have developed wrappers for specific languages that harmonise the nomenclature and simplify the interaction with HDX.
These APIs allow various operations such as searching, reading and writing dataset metadata, but not the direct querying of data within resources which can point to files or urls and of which there can be more than one per dataset.

The recommended way of developing against HDX is to use the HDX Python API. This is a mature library that supports Python 3 with tests that have a high level of code coverage. The major goal of the library is to make pushing and pulling data from HDX as simple as possible for the end user. There are several ways this is achieved. It provides a simple interface that communicates with HDX using the CKAN Python API, a thin wrapper around the CKAN REST API. The HDX objects, such as datasets and resources, are represented by Python classes. This should make the learning curve gentle and enable users to quickly get started with using HDX programmatically. For example, to read a dataset and get its resources, you would simply do:

from hdx.api.configuration import Configuration
from hdx.data.dataset import Dataset
Configuration.create(hdx_site="prod", user_agent="A_Quick_Example", hdx_read_only=True)
dataset = Dataset.read_from_hdx("novel-coronavirus-2019-ncov-cases")
resources = dataset.get_resources()

There is library API-level documentation available online.
If you intend to push data to HDX, then it may be helpful to start with this scraper template which shows what needs to be done to create datasets on HDX. It should be straightforward to adapt the template for your needs.

If you wish to read data from HDX for analysis in R, then you can use the rhdx package. The goal of this package is to provide a simple interface to interact with HDX. Like the Python API, it is a wrapper around the CKAN REST API. rhdx is not yet fully mature and some breaking changes are expected.

If you need to use another language or simply want to examine dataset metadata in detail in your web browser, then you can use CKAN’s RESTful API, a powerful, RPC-style interface that exposes all of CKAN’s core features to clients.

Coding with the Humanitarian Exchange Language

This section contains information for developers who want to write code to process datasets that use the Humanitarian Exchange Language (HXL). HXL is a different kind of data standard, adding hashtags to existing datasets to improve information sharing during a humanitarian crisis without adding extra reporting burdens. HXL has its own website and of particular interest will be the documentation section.

The most well developed HXL library, libhxl-python, is written in Python. The most recent versions support Python 3 only, but there are earlier versions with Python 2.7 support. Features of the library include filtering, validation and the ingestion and generation of various formats. libhxl-python uses an idiom that is familiar from JQuery and other Javascript libraries; for example, to load a dataset, you would use simply

import hxl 
source = hxl.data('http://example.org/dataset.xlsx')

As in JQuery, you process the dataset by adding additional steps to the chain. The following example selects every row with the organisation “UNICEF” and removes the column with email addresses:

source.with_rows('#org=UNICEF').without_columns('#contact+email')

The library also includes a set of command-line tools for processing HXL data in shell scripts. For example, the following will perform the same operation shown above, without the need to write Python code:

$ cat dataset.xlsx | hxlselect -q "#org=UNICEF" | hxlcut -x '#contact+email'

There is library API-level documentation available online.

libhxl-js is a library for HXL written in Javascript. It supports high-level filtering and aggregation operations on HXL datasets. Its programming idiom is similar to libhxl-python, but it is smaller and contains fewer filters and no data-validation support.

Third party support for R is available via the package rhxl. It has basic support for reading HXLated files to make them available for advanced data-processing and analytics inside R.

Tools

HDX provides a suite of tools that leverage HXLated datasets:

  1. Quick Charts automatically generates embeddable, live data charts, graphs and key figures from your data. It uses the HXL hashtags to guess the best charts to display, but you can then go in and override with your own preferences. Here is a list of Quick Charts enabled datasets
  2. HXL Tag Assist allows you to find hashtag examples and definitions, and see how data managers are using the hashtags in their data.
  3. Data Check provides help with data cleaning for humanitarian data, automatically detecting and highlighting common errors. It includes validation against CODs and other vocabularies.

The HXL Proxy is a tool for validating, cleaning, transforming, and visualising HXL-tagged data. You supply an input url pointing to a tabular or JSON dataset and then create a recipe that contains a series of steps for transforming the data. The result is a download link that you can share and use in HDX, and the output will update automatically whenever the source dataset changes. Full user documentation is available in the HXL Proxy wiki.
The HXL Proxy is primarily a web wrapper around the libhxl-python library (see above), and makes the same functionality available via RESTful web calls.

Other HDX Libraries

Humanitarian projects frequently require handling countries, locations and regions in particular dealing with inconsistent country naming between different data sources and different coding standards like ISO3 and M49. The HDX Python Country library was created to fulfill these requirements. It provides utilities to map between country and region codes and names and to match administrative level names from different sources. It also provides utilities for foreign exchange enabling obtaining current and historic FX rates for different currencies. It has library API-level documentation available online.

All kinds of utility functions have been coded over time for use internally, so since we think these have value externally, it was decided that they should be packaged into the HDX Python Utilities library. It provides a range of helpful utilities for Python developers including streaming tabular data, date parsing, JSON and YAML handling, dictionary and list utilities. It has library API-level documentation available online.

Contact Us

If you have any questions about these resources, we will do our best to answer them. We would also love to hear about how you are using them for your work.

Please contact us at: hdx@un.org. Sign up to receive our newsletter here.