You are here: Home / Services / Data Analysis and Visualization / Visualization / Vis Blog / Masking with Shapefiles

Masking with Shapefiles

Apr 17, 2020

Gridded model data usually lack information such as to which country grid boxes belong. Such social or political information is available as vector data, i.e. line segments that enclose the respective areas. One common format for this is the "ESRI shapefile". How to use such shape information in data analysis and visualization with NCL and outside of Geographic Information Systems (GIS) is described in this article, more specifically how to convert ESRI shapefiles into NetCDF and how to properly work with them.

 

Panel Shapefile 1


Welcome to the first Vis Blog article @DKRZ!

Many of you may have encountered the problem of visualizing only a regional part of a larger data set to illustrate the situation for a specific country or region. This is straightforward for a regional visualization of a rectangular subset of the model grid, but if only data values within an irregular shape (such as political borders) should be displayed, it gets more challenging. Here, shapefiles can help you analyze and/or visualize data inside of irregularly shaped boundaries.

In this blog article we give an example of creating a gridded mask based on a closed polyline defined in a shapefile in order to analyze and/or visualize gridded data only inside of it.

Shapefiles can contain geometrical data such as country or state borderlines, cities, traffic routes, rivers, water areas and related data such as population, vegetation, etc. The geospatial information is stored in the form of geometries like 'points', 'polylines' or 'polygons'.

A shapefile is actually not only one single file; it is rather a set of files containing different parts of the information. The important ones are:

  • a .shp file, which contains the geometry
  • a .shx file, which contains the positional index of the feature geometry
  • a .dbf file, which contains the attributes of each shape


Notice that most applications need the complete set of files to work properly although you are only loading the '.shp' file.

You can search and download shapefiles from different web sites like
    https://www.eea.europa.eu/data-and-maps/data/eea-reference-grids-2/gis-files/
    https://www.diva-gis.org/gdata
    https://gadm.org/index.html
 
The shapefile set used for this article can be downloaded from
    https://www.diva-gis.org/gdata
     (-> Germany -> administrative areas -> DEU_adm.zip)

Here we are using DEU_adm0.shp and DEU_adm1.shp.

1. What does the shapefile contain?

NCL provides the command line tool ncl_filedump which gives a summary of of the shapefile's content.

ncl_filedump DEU_adm/DEU_adm0.shp

The shapefile contains some variables, but at the moment we don't need to know more about its content besides the fact that the variables x and y are the lon/lat coordinates of Germany's boundary.

2. Create a NetCDF mask file from the shapefile

To create a mask file based on the shapefile, we need a second file with the grid description for the output file.
    
Let us assume that we want to create a gridded mask for Germany that contains a "0"  for grid cells outside of Germany  and a "1" for those inside. The output mask file should have at a grid resolution of 0.1x0.1 degrees and enclose the region of Germany.

If you have a 0.1x0.1 deg grid file, everything is fine. If not, you can easily generate a (global) one with CDO (Climate Data Operators). Here we use the built-in topography grid and remap it to the desired output resolution:

cdo -f nc -remapbil,r3600x1800 -setmissval,1e20 -topo topo_0.1x0.1deg_global.nc

Since NCL is able to read shapefiles and provides some shapefile functions, the following script is written in the NCL command language. You will have to download the library from NCAR first in order to use these shapefile functions:
    
        https://www.ncl.ucar.edu/Applications/Scripts/shapefile_utils.ncl
        
For later use, we save the file in $HOME/NCL/shapefiles/.
    
This is an NCL script that creates a mask for Germany on a 0.1x0.1deg grid and saves it to a netCDF file:

load "$HOME/NCL/shapefiles/shapefile_utils.ncl"  ;-- load additional lib
        
shp_name  = "$HOME/data/Shapefiles/DEU_adm/DEU_adm0.shp" ;-- shapefile name
mask_name = "DEU_adm0_mask_array_0.1x0.1deg_global.nc"   ;-- output mask file name
        
;-- use CDO to generate a 0.1x0.1 degree grid (variable topo)
system("cdo -f nc -remapbil,r3600x1800 -setmissval,1e20 -topo topo_0.1x0.1deg_global.nc")
        
;-- Open grid file and read the variable
f = addfile("topo_0.1x0.1deg_global.nc","r")
v     = f->topo
v@lat = f->lat
v@lon = f->lon
        
;-- resources for the shapefile_mask_data function
opt = True
opt@return_mask = True    ;-- this forces the return of a 0s and 1s mask array
opt@delta_kilometers = 300
          
;-- create the mask based on the given shapefile; add coordinates
mask_array         =  shapefile_mask_data(v, shp_name, opt)
mask_array!0       = "lat"
mask_array!1       = "lon"
mask_array&lat     =  f->lat
mask_array&lon     =  f->lon
mask_array@comment = "Mask shapefile: DEU_adm/DEU_adm0.shp"
        
;-- set value 0 to missing value
mask_array = where(mask_array .eq. 1, mask_array, mask_array@_FillValue)
        
;-- delete, create, and write mask array to new file
system("rm -f " + mask_name)
fout = addfile(mask_name,"c")
fout->mask_array = mask_array

To check the result, you can visualize the content of the mask file with ncview or Panoply.

Screenshot shapefile 1


3. Mask your data using the mask file

We only want to use the data inside of Germany; the values outside of Germany have be set to missing value. As we don't have actual data to show we take the topography data that CDO already offers.
    
a) Masking on the same grid
    
We take the file topo_0.1x0.1deg_global.nc from above and simply use CDO with the operator 'mul' to multiply the data file with the mask file (both have the same grid).

cdo -mul topo_0.1x0.1deg_global.nc DEU_adm0_mask_array_0.1x0.1deg_global.nc \
                                           topo_0.1x0.1deg_global_mask.nc

                               
b) Masking to another grid

If, e.g., the grid of your input data is on a 1x1 degree grid, you will have to remap the mask file also to a 1x1 degree grid first. Again, CDO can be used to remap the mask file to the grid of your data file.
        
        Data file:  topo_1x1deg_global.nc
        Mask file: DEU_adm0_mask_array_0.1x0.1deg_global.nc
        
Remap the mask to the grid of the input data file:

cdo -remapbil,topo_1x1deg_global.nc DEU_adm0_mask_array_0.1x0.1deg_global.nc \
                                            DEU_adm0_mask_array_1x1deg_global.nc    

Mask the data like above:

cdo -mul topo_1x1deg_global.nc DEU_adm0_mask_array_1x1deg_global.nc topo_1x1deg_global_mask.nc 

Select smaller subregion for Germany:

cdo -sellonlatbox,5.,16.,47.,55.5 topo_1x1deg_global_mask.nc topo_1x1deg_Germany_mask.nc


4. Visualize the masked data with NCL

The processed data can now be plotted with NCL e.g.:

shp_file = "$HOME/data/Shapefiles/DEU_adm/DEU_adm0.shp"
        
infile1  = "topo_0.1x0.1deg_Germany_mask.nc"
f1 = addfile(infile1,"r")
topo1 = f1->topo
        
infile2  = "topo_1x1deg_Germany_mask.nc"
f2 = addfile(infile2,"r")
topo2 = f2->topo
        
;-- open workstation
wks = gsn_open_wks("png","plot_topo_Germany")
        
;-- set map resources
map_res                       = True
map_res@gsnDraw               = False
map_res@gsnFrame              = False
        
map_res@mpProjection          ="Mercator"
map_res@mpLimitMode           ="Corners"
map_res@mpLeftCornerLatF      = 47
map_res@mpRightCornerLatF     = 55.5
map_res@mpLeftCornerLonF      = 5.
map_res@mpRightCornerLonF     = 16.
map_res@mpOutlineOn           = True
map_res@mpGridAndLimbOn       = True
map_res@mpGridLineColor       = "gray30"
map_res@mpGridLonSpacingF     = 1.
map_res@mpGridLatSpacingF     = 1.
map_res@mpGeophysicalLineColor = "black"
map_res@mpGeophysicalLineThicknessF = 1.0
map_res@mpOutlineBoundarySets = "national"
map_res@mpDataSetName         = "Earth..4"
map_res@pmTickMarkDisplayMode = "Always"
map_res@mpOutlineDrawOrder    = "Draw"
map_res@tiMainString          = "DEU_adm0.shp polyline"
        
;-- create the base map plot
map_shp = gsn_csm_map(wks, map_res)
        
;-- add the shapefile polylines
shp_res                    =  True
shp_res@gsLineThicknessF   =  2.0
shp_res@gsLineColor        = "red"
        
plid = unique_string("poly")
map_shp@$plid$ = gsn_add_shapefile_polylines(wks, map_shp, shp_file, shp_res)
        
;-- set data resources
res = map_res
res@gsnAddCyclic          = False
        
res@cnFillOn              = True
res@cnLinesOn             = False
res@cnFillPalette         = "OceanLakeLandSnow"
res@cnFillMode            = "RasterFill"
res@cnLevelSelectionMode  = "ManualLevels"
res@cnMinLevelValF        =  -30
res@cnMaxLevelValF        = 1400
res@cnLevelSpacingF       =   20
        
res@lbBoxLinesOn          = False
        
res@tiMainString = "0.1x0.1 degree grid"
plot1 = gsn_csm_contour_map(wks,topo1,res)
     
res@tiMainString = "1.0x1.0 degree grid"
plot2 = gsn_csm_contour_map(wks,topo2,res)
        
;-- draw the panel
pres             = True
pres@gsnMaximize = True
gsn_panel(wks,(/map_shp,plot1,plot2/),(/1,3/),pres)

 

Panel Shapefile 2

5. Select part of shapefile with NCL

You can also use just NCL to read a shapefile and select data for a region. The next example shows how to select and plot the data for the federal state Schleswig-Holstein of Germany using the shapefile DEU_adm1.shp.

Now, we have to take a closer look at the shapefile content with ncl_filedump

prompt>  ncl_filedump DEU_adm1.shp

Variable: f
Type: file
filename:    DEU_adm1
path:    DEU_adm1.shp
   file global attributes:
      layer_name : DEU_adm1
      geometry_type : polygon
      geom_segIndex : 0
      geom_numSegs : 1
      segs_xyzIndex : 0
      segs_numPnts : 1
   dimensions:
      geometry = 2
      segments = 2
      num_features = 16  // unlimited
      num_segments = 125
      num_points = 207860
   variables:
      integer geometry ( num_features, geometry )

      integer segments ( num_segments, segments )

      double x ( num_points )

      double y ( num_points )

      int64 ID_0 ( num_features )

      string ISO ( num_features )

      string NAME_0 ( num_features )

      int64 ID_1 ( num_features )

      string NAME_1 ( num_features )

      string HASC_1 ( num_features )

      int64 CCN_1 ( num_features )

      string CCA_1 ( num_features )

      string TYPE_1 ( num_features )

      string ENGTYPE_1 ( num_features )

      string NL_NAME_1 ( num_features )

      string VARNAME_1 ( num_features )

What you can see immediately in the dimensions part is that there are 16 features, 125 segments and a total of 207860 points. The 16 features are the federal states whose names are stored in the variable NAME_1.

prompt>  ncl_filedump DEU_adm1.shp -v NAME_1 

...
Variables:
 
Variable No. 0
     NAME_1:
 
Baden-Württemberg
Bayern
Berlin
Brandenburg
Bremen
Hamburg
Hessen
Mecklenburg-Vorpommern
Niedersachsen
Nordrhein-Westfalen
Rheinland-Pfalz
Saarland
Sachsen-Anhalt
Sachsen
Schleswig-Holstein
Thüringen

Knowing the corresponding index of a federal state you can read its polygons using standard NCL functions. With the function gc_inout NCL also offers the possibility to check if points of the data are within a polygon or not. If only these points are saved in a new array, the data can be drawn very easily afterwards.
The following example script creates a base map plot onto which the contour plot and the shapefile polylines for the federal state Schleswig-Holstein (index 14) are overlayed.

;-- open data file and read variable topo
f = addfile("topo_0.1x0.1deg_Germany_mask.nc", "r")
topo = f->topo

;-- open workstation
wks = gsn_open_wks("png", "plot_topo_compute_averages_Germany")

;-- set map resources
res = True
res@gsnDraw               = False
res@gsnFrame              = False
res@gsnLeftString         = ""
res@gsnRightString        = ""

res@mpProjection          = "Mercator"
res@mpLimitMode           = "Corners"
res@mpLeftCornerLatF      = 53.0
res@mpRightCornerLatF     = 55.5
res@mpLeftCornerLonF      =  8.0
res@mpRightCornerLonF     = 12.0
res@mpOutlineOn           = True
res@mpGridAndLimbOn       = True
res@mpGridLineColor       = "gray30"
res@mpGridLonSpacingF     = 1.
res@mpGridLatSpacingF     = 1.
res@mpGeophysicalLineColor = "black" 
res@mpGeophysicalLineThicknessF = 1.0
res@mpOutlineBoundarySets = "national"
res@mpDataSetName         = "Earth..4"
res@pmTickMarkDisplayMode = "Always"
res@mpOutlineDrawOrder    = "Draw"

res@tiMainString          = "Schleswig-Holstein (Germany)"
res@tiMainOffsetYF        = -0.005

;-- create the base plot
map = gsn_csm_map(wks, res)

;-- read shapefile content for selected state
shp_filename = "$HOME/data/Shapefiles/DEU_adm/DEU_adm1.shp"
shp_varname  = "NAME_1"

;-- read data from shapefile
shpf    = addfile(shp_filename,"r")
shp_var = shpf->$shp_varname$

lon          = shpf->x
lat          = shpf->y
segments     = shpf->segments
geometry     = shpf->geometry
dim_segments = dimsizes(segments)
num_segments = dim_segments(0)

;-- get global attributes  
geom_segIndex = shpf@geom_segIndex
geom_numSegs  = shpf@geom_numSegs
segs_xyzIndex = shpf@segs_xyzIndex
segs_numPnts  = shpf@segs_numPnts

;-- select state to be displayed
shp_state = "Schleswig-Holstein"
ind_state = ind(shp_state .eq. shp_var)     ;-- retrieve the index of shp_state

;-- topo lat/lon
lat1d = f->lat
lon1d = f->lon
nlat  = dimsizes(lat1d)
nlon  = dimsizes(lon1d)

;-- create 1D array for values inside shapefile polygon
topo_inside = topo
topo_inside = topo@_FillValue           ;-- set complete array to missing value

;-- set resources for the shapefile polygon
plres =  True
plres@gsLineColor      = "black"
plres@gsLineThicknessF = 2.0

;-- assign array for the polylines
plid = new(dim_segments(0),graphic)

;-- set start and end segment values for selected ind_state
startSegment = geometry(ind_state, geom_segIndex)
numSegments  = geometry(ind_state, geom_numSegs)

segNum = 0
k = 0
do seg=startSegment, startSegment+numSegments-1
   startPT = segments(seg, segs_xyzIndex)
   endPT   = startPT + segments(seg, segs_numPnts) - 1

   plid(segNum) = gsn_add_polyline(wks, map, lon(startPT:endPT),\
                                                    lat(startPT:endPT), plres)
   ;-- find the values inside the shapefile polygon
   do n=0,nlat-1
      do m=0,nlon-1
         if(lat1d(n) .lt. min(lat(startPT:endPT)) .or. lat1d(n) .gt. max(lat(startPT:endPT)).or.\
            lon1d(m) .lt. min(lon(startPT:endPT)) .or. lon1d(m) .gt. max(lon(startPT:endPT)))
            continue
         end if
         if(gc_inout(lat1d(n), lon1d(m), lat(startPT:endPT), lon(startPT:endPT))) then
            topo_inside(n,m) = topo(n,m)    ;-- values inside the shapefile polygon
         end if
      end do
   end do
   k = k + 1
   segNum = segNum + 1
end do

;-- get number of points and average value inside shapefile polygon
count   = dimsizes(ind(.not. ismissing(ndtooned(topo_inside))))
average = dim_avg_n_Wrap(dim_avg_n_Wrap(topo_inside,0),0)

print("")
print("--> values inside shapefile polygon = " + count)
print("")
print("--> average value = " + sprintf("%3.2f",average) + topo@units)
print("")

;-- create the contour plot
cnres = True
cnres@gsnDraw               = False
cnres@gsnFrame              = False
cnres@gsnLeftString         = ""
cnres@gsnRightString        = ""
cnres@gsnAddCyclic          = False

cnres@cnFillOn              = True
cnres@cnLinesOn             = False
cnres@cnFillPalette         = "OceanLakeLandSnow"
cnres@cnFillMode            = "RasterFill"
cnres@cnLevelSelectionMode  = "ManualLevels"
cnres@cnMinLevelValF        =  -30          
cnres@cnMaxLevelValF        = 1400          
cnres@cnLevelSpacingF       =   20          

cnres@lbTitleString         = "Topography [m]"
cnres@lbTitleFontHeightF    = 0.012
cnres@lbTitlePosition       = "Bottom"
cnres@lbTitleOffsetF        = -0.46
cnres@lbBoxSeparatorLinesOn = False
cnres@lbBoxMinorExtentF     = 0.15
cnres@pmLabelBarOrthogonalPosF = -0.1

cnplot = gsn_csm_contour(wks, topo_inside, cnres)

;-- overlay contour plot over base map plot
overlay(map,cnplot)

;-- draw the map plot and advance the frame
draw(map)
frame(wks)

 

 

 

Document Actions