Global Edge-Matched Admin Data
The Global Admin project is a data processing pipeline designed to take the best available administration boundary information from multiple sources (OCHA, WFP, government, and others), aggregating them into a single edge-matched dataset with a common schema using an automated methodology. The most detailed admin level available is used, ranging from 0-4 depending on the source. This dataset uses ISO-3 codes as the primary unit of edge-matching, with 249 currently used (not including disputed areas). Of these, 227 ISO-3 codes have subdivisions available: 114 sourced from OCHA, 106 from GADM, and 7 from public government data repositories. All these sources are matched to an admin 0 dataset produced by WFP for the public domain. Data can be downloaded from the following links:
Layer (ADM 0-4) | Size | URL |
---|---|---|
Boundary Polygons | 2.45 GB | data.fieldmaps.io/global-admin/wld_polygons.gpkg.zip |
Cartographic Lines | 869 MB | data.fieldmaps.io/global-admin/wld_lines.gpkg.zip |
Label Points | 19.9 MB | data.fieldmaps.io/global-admin/wld_points.gpkg.zip |
Sources
The following links point to data catalogues used for this project, all data manually downloaded, and some transformations performed. These are some of the modifications performed on each source as a pre-processing step:
- OCHA: Boundary modifications are mostly limited to areas involving disputed boundaries, such as both Sudan and South Sudan claiming the Abyei Administrative Region. In this case, the area is removed from both countries when performing edge-matching, and represented as a disputed area in the admin 0 layer. In rare cases, topology cleaning is also required. For attributes that have not been formatted to the ITOS geodatabase standard, as defined by documents on GitHub, tables are manually changed to match this so they can be automatically read.
- GADM: Lakes and water bodies are classified as administrative regions with no names in this dataset, and are therefore removed before performing edge-matching.
- WFP: Only minor changes are made to the boundary itself, such as clipping out the Caspian Sea, as is commonly done in other global reference maps. Attribute modifications are minor as well, removing deleted (WAK → UMI) and transitional reservation (ANT → BES) ISO-3 codes.
Boundary Normalization
A common problem of merging spatial data from different sources is the existence of gaps and overlaps between sources. There are many ways to address this problem, with the approach taken here being to generate borderless digital boundary files for each input. In this context, a digital boundary file is one that does not follow shorelines and international boundaries, but rather stretches out with simplified edges, intended for users to clip with their own shorelines and international boundaries. For example, Statistics Canada uses a digital boundary file when creating census blocks, later clipped with lakes and shorelines to derive a layer suitable for reference maps.
Attribute Normalization
Just as with boundaries, merging attribute columns between sources with different schemas need to be conditioned so that columns align with each other. The following schema is used.
Repeating in layer
The following columns repeat for each higher level in an admin layer. An admin 2 layer will include attributes for adm2, adm1, and adm0. Replace the "X" with the indicated level.
Name | Description |
---|---|
admX_id | Automatically generated ID used for internal pipeline management. |
admX_ocha | P-Code taken from OCHA sources. If not an OCHA source, a P-Code like ID is generated using the ISO-2 code. |
admX_name1 | Primary administrative region name. Uses the language defined by the "lang_name1" column. |
admX_name2 | Secondary administrative region name. Uses the language defined by the "lang_name2" column. |
admX_name3 | Tertiary administrative region name. Uses the language defined by the "lang_name3" column. |
admX_namea | All other names listed for a region are combined together using the pipe ( | ) symbol. |
Once per layer
These columns only appear a single time per layer, providing layer metadata.
Name | Description |
---|---|
lang_name1 | Primary language used for "admX_name1". |
lang_name2 | Secondary language used for "admX_name2". |
lang_name3 | Tertiary language used for "admX_name3". |
src_name | One of: OCHA, GOVT, GADM. |
src_url | Link where the original data source can be downloaded. |
src_date | Date original dataset was produced. |
src_valid | Last date original dataset was reviewed. |
adm_max | Most detailed administrative level available for a particular ISO-3 region. |
Only in admin 0
These columns only appear in the admin 0 layer.
Name | Description |
---|---|
wfp_id | ID code used by WFP. There may be multiple ID's per ISO-3 to differentiate islands and other distinct features. |
wfp_name | Preferred name associated to the region defined by a WFP ID. |
wfp_namea | Alternative name for a region defined by a WFP ID. |
wfp_disput | One of: YES or NO. Indicates whether the area is a disputed region. |
wfp_source | Source used for creating the admin 0 boundary, usually listed as SALB. |
wfp_status | Indicates the sovereignty status of the region (State, Territory, Special Region, etc). |
wfp_label | Label used for creating cartographic maps, indicating non-self-governing territory, etc. |
wfp_mapclr | Used for data management to indicate the ISO-3 source of non-self-governing territory, etc. |
wfp_update | Last date boundary data for a region was modified. |