Data Zetu - Accessibility to Health Care Services in Temeke and Mbeya District of Tanzania

  • 100+ Downloads
  • This dataset updates: Every year


Related Showcases

There are no showcases for this dataset.


  • DataZetu_Temeke_Consolidated_Healthcare_Form_PI...XLSX (6.5M)
    Updated: October 8, 2018

    Why Obfuscate Locations?

    Individual residence locations with certain attached data can constitute Personally Identifiable Information, and as such cannot be shared publicly without informed consent.

    As a mapping agency, HOT shares locations of buildings when they are already public information (building footprints in OpenStreetMap, for example, which are already visible from public-facing aerial imagery).

    HOT also sometimes shares aggregated information about indicators that would be individually unethical to share. For example, while it is unacceptable to share the HIV status of an individual, it is standard public health practice to publish aggregated statistics such as the HIV prevalence in a city or district population.

    With mildly sensitive data such as perceived access to health care, HOT sometimes takes a middle approach; we do not aggregate the information into polygon statistics, but we obfuscate the individual home locations by randomly moving the individual points within a specific radius, chosen to create an appropriate level of ambiguity while retaining some of the spatial patterns inherent in the data.

    Therefore, in some casts, HOT uses a randomization algorithm to obfuscate individual location points, as described below.

    Randomization of Location in Tabular Data

    HOT survey data typically comes in the form of Comma Separated Values (CSV) with columns for Latitude and Longitude. While there are sophisticated GIS algorithms available to randomize locations, it is often simpler to use a spreadsheet program to add a random component to the location columns.

    The spreadsheet formula used by HOT is as follows:

    [lat] = latitude or y-position in decimal degrees [long] = longitude or x-position in decimal degrees [jit] = “jitter” or radius of the circle within which the point will be displaced (example 15 would create a random displacement somewhere within a 15 meter radius circle from the original point)

    In the new Lat column: =[lat] + ((RAND() * [jit] * 2) - [jit]) / 111111)

    In the new Long column: =[long] + ((RAND() * [jit] * 2) - [jit]) / (111111COS([lat]/180PI()))

    This formula takes the original point and moves it somewhere within a circle of radius [jit]. In Temeke, for the health care data, we used a value of +/15 meters, based on our observation that this, combined with the inherent error of the mobile device GPS (+/- > 4m), gave us a sufficient ambiguity to ensure that no house could be uniquely identified; on average each point would resolve to 10 structures, sufficient to safeguard the privacy of individuals for the types of questions being asked

    The 15 meter radius was carefully chosen: if it were, for example, HIV status or personal income information, we would have used a higher jitter radius. However, the higher the jitter radius, the less the data reflects useful patterns such as differences in perception of access in different neighbourhoods, variation in access by population density, and so forth.

  • DataZetu_Mbeya_HealthCare_Form_PII_Obfuscated_F...XLSX (6.1M)
    Updated: October 16, 2018

    See data set above (Temeke Health Care Data) for a full description on why locations are obfuscated.

Source Humanitarian OpenStreetMap Team (HOT)
Date of Dataset November 01, 2017 - May 01, 2018
Updated October 16, 2018
Expected Update Frequency Every year
Methodology Sample Survey
Caveats / Comments
File Format