Global Edge-Matched Admin Data

The Global Admin project is a data processing pipeline designed to take the best available administration boundary information from multiple sources (OCHA, government, OSM, GADM, etc), aggregating them into a single edge-matched dataset with a common schema using an automated methodology. The most detailed admin level available is used, ranging from 0-4 depending on the source. This dataset uses ISO-3 codes as the primary unit of edge-matching, with 249 currently used (not including disputed areas). Of these, 227 ISO-3 codes have subdivisions available: 114 sourced from OCHA, 106 from GADM, and 7 from public government data repositories. All these sources are matched to an admin 0 dataset produced by the UN Geospatial Information Section, and coastlines produced by OpenStreetMap. Data can be downloaded from the following links (updated weekly), last updated Monday, 15 February, 2021:

Sources

The following links point to data catalogues used for this project, all data manually downloaded, and some transformations performed. These are some of the modifications performed on each source as a pre-processing step:

  • OCHA: Boundary modifications are mostly limited to areas involving disputed boundaries, such as both Sudan and South Sudan claiming the Abyei Administrative Region. In this case, the area is removed from both countries when performing edge-matching, and represented as a disputed area in the admin 0 layer. In rare cases, topology cleaning is also required. For attributes that have not been formatted to the ITOS geodatabase standard, as defined by documents on GitHub, tables are manually changed to match this so they can be automatically read.
  • GADM: Lakes and water bodies are classified as administrative regions with no names in this dataset, and are therefore removed before performing edge-matching.

LayerSource
CoastlinesOSM
Admin 0UN GIS
Admin 1-4 PrimaryOCHA
Admin 1-4 SecondaryGADM

Boundary Normalization

A common problem of merging spatial data from different sources is the existence of gaps and overlaps between sources. There are many ways to address this problem, with the approach taken here being to generate borderless digital boundary files for each input. In this context, a digital boundary file is one that does not follow shorelines and international boundaries, but rather stretches out with simplified edges, intended for users to clip with their own shorelines and international boundaries. For example, Statistics Canada uses a digital boundary file when creating census blocks, later clipped with lakes and shorelines to derive a layer suitable for reference maps.

cartographic vs digital boundary file

Attribute Normalization

Just as with boundaries, merging attribute columns between sources with different schemas need to be conditioned so that columns align with each other. The following schema is used.

Repeating in layer

The following columns repeat for each higher level in an admin layer. An admin 2 layer will include attributes for adm2, adm1, and adm0. Replace the "X" with the indicated level.

NameDescription
admX_idAutomatically generated ID used for internal pipeline management.
admX_ochaP-Code taken from OCHA sources. If not an OCHA source, a P-Code like ID is generated using the ISO-2 code.
admX_name1Primary administrative region name. Uses the language defined by the "lang_name1" column.
admX_name2Secondary administrative region name. Uses the language defined by the "lang_name2" column.
admX_name3Tertiary administrative region name. Uses the language defined by the "lang_name3" column.
admX_nameaAll other names listed for a region are combined together using the pipe ( | ) symbol.

Once per layer

These columns only appear a single time per layer, providing layer metadata.

NameDescription
lang_name1Primary language used for "admX_name1".
lang_name2Secondary language used for "admX_name2".
lang_name3Tertiary language used for "admX_name3".
src_nameOne of: OCHA, GOVT, GADM.
src_urlLink where the original data source can be downloaded.
src_dateDate original dataset was produced.
src_validLast date original dataset was reviewed.
adm_maxMost detailed administrative level available for a particular ISO-3 region.

Only in admin 0

These columns only appear in the admin 0 layer.

NameDescription
adm0_fidFeature ID code used to differentiate states and self-governing territories sharing the same ISO-3 code.
adm0_nameRomanized name associated to the region defined by adm0_fid.
adm0_labelMap label to be used for a region defined by adm0_fid.
adm0_contCode of the continent a given ISO-3 belongs to.
adm0_colorWhen creating thematic maps, features sharing the same value for this column should be coloured together.
adm0_stscIndicates the sovereignty status code of the region given as an integer.
adm0_stsnIndicates the sovereignty status of the region (State, Territory, Special Region, etc).