Geospatial data analysis in R

Raster data I

Josh Merfeld

KDI School

November 20, 2024

Rasters

  • We’ve discussed shapefiles
    • Now let’s talk about rasters!

  • Rasters are a different type of geospatial data
    • They are made up of a grid of cells
    • Each cell has a value

A raster with terra

Let’s zoom in!

Let’s zoom in!

Example raster grid - how much info do we need?

  • Here’s a grid.
    • How many points do we need?

Example raster grid - how much info do we need?

  • Need to know location of one grid cell…
    • And the size of each grid!

How much info do we need?

  • In other words, we do not need a point for every raster cell

  • We just need to know:

    • The location of one cell
    • The size of each cell
      • This is called the resolution of the raster

  • Example:

    • I know the first grid cell in bottom left is at (0, 0)
    • I know each grid cell is 1 meter by 1 meter (the resolution)
    • Then I know the exact location of every single grid cell

Population in Cotonou, Benin

  • What are the white values?

Population in Cotonou, Benin

  • Here’s the information for this raster
    • What’s the resolution? What are the units?
class       : SpatRaster 
dimensions  : 34, 46, 1  (nrow, ncol, nlyr)
resolution  : 0.0008333333, 0.0008333333  (x, y)
extent      : 2.39375, 2.432083, 6.342917, 6.37125  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : beninpop.tif 
name        : beninpop 

Rasters

  • Rasters are defined by the grid layout and the resolution
    • Grid cells are sometimes called pixels (just like images, which are often rasters!)

  • There are many different file types for rasters
    • .tif or .tiff (one of the most common)
    • .nc (NetCDF, common for very large raster data)
    • Image files, e.g. png, jpg, etc.

Reading rasters in R

  • Reading rasters is also quite easy!
    • Going to use the terra package for it
      • Note: can use terra for shapefiles, too
    • week6data/beninpop.tif is a raster of population counts in Benin
Code
library(terra)

# this is the raster for Cotonou, Benin
cotonou <- rast("week6data/beninpop.tif")
cotonou
class       : SpatRaster 
dimensions  : 34, 46, 1  (nrow, ncol, nlyr)
resolution  : 0.0008333333, 0.0008333333  (x, y)
extent      : 2.39375, 2.432083, 6.342917, 6.37125  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : beninpop.tif 
name        : beninpop 

Plotting rasters

  • Plotting rasters only with terra is a bit of a pain
    • Can’t use ggplot
    • So, I load another package that lets me use ggplot with rasters
      • tidyterra
Code
library(tidyterra)

ggplot() +
  geom_spatraster(data = cotonou) + 
  theme_bw()

Making it nicer

Code
library(tidyterra)

ggplot() +
  geom_spatraster(data = cotonou) + 
  # distiller is for continuous values
  # but we can use palettes!
  # I like spectral a lot
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Cotonou, Benin")

Extracting raster data to shapefiles

  • Let’s go back to our use case:
    • We want to estimate a sub-area model at the EA level in Malawi
    • This means we need to extract raster data to the EA level
    • We can do this with terra, sf, and exactextractr
      • terra has its own method, but i find exactextractr to be MUCH faster

  • Let’s start by looking at the raster I’ve uploaded to the week6data: mwpop.tif

Give it a try

  • Try to load it into R using terra, then plot it with tidyterra and ggplot
Code
tif <- rast("week6data/mwpop.tif")

ggplot() +
  geom_spatraster(data = tif) + 
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Northern Malawi")

Give it a try

  • I actually don’t like that map! It’s too hard to see because of all the low values.
  • So let’s take logs, instead!
    • Note that all the zeros become missing (can’t log zero)
Code
tif <- rast("week6data/mwpop.tif")

ggplot() +
  geom_spatraster(data = log(tif)) + 
  scale_fill_distiller("Population\ncount (log)", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in Northern Malawi")

We want to extract the .tif values to the .shp

Let’s do it with exactextractr

Code
library(exactextractr)

tif <- rast("week6data/mwpop.tif")
adm4 <- read_sf("week6data/mw4.shp")
# make sure they are in the same CRS! (they already are, but just in case)
# st_transform is for the sf object
adm4 <- st_transform(adm4, crs = crs(tif))

# extract the raster values to the shapefile
# we are going to SUM, and add the EA_CODE from the shapefile to the result
extracted <- exact_extract(tif, adm4, fun = "sum", append_cols = "EA_CODE")
Code
head(extracted)
   EA_CODE       sum
1 10507801 1068.7787
2 10507072  695.8013
3 10507010  938.1139
4 10507001  749.6960
5 10507009  597.5432
6 10507033  474.3934

Now we can join the extracted data to the shapefile

Code
# join
adm4 <- adm4 |>
  left_join(extracted, by = "EA_CODE")

# plot it!
ggplot() +
  geom_sf(data = adm4, aes(fill = sum), 
    color = "black", lwd = 0.01) +
  scale_fill_distiller("Population\ncount", 
    palette = "Spectral", na.value = "white") +
  theme_bw() +
  labs(subtitle = "Population in EAs") + 
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

We can also do it with terra

Code
tif <- rast("week6data/mwpop.tif")
adm4 <- vect("week6data/mw4.shp")
# make sure they are in the same CRS! (they already are, but just in case)
# st_transform is for the sf object
adm4 <- terra::project(adm4, crs(tif))

# we are going to SUM
# just include EA_CODE
extracted <- terra::extract(tif, adm4, fun = "sum")
Code
head(extracted)
  ID     mwpop
1  1 1112.1717
2  2  694.0924
3  3  938.3791
4  4  760.5574
5  5  611.8594
6  6  481.1209

It’s in the same order!

Code
adm4$pop <- extracted$mwpop
Code
adm4
 class       : SpatVector 
 geometry    : polygons 
 dimensions  : 3212, 4  (geometries, attributes)
 extent      : 493678.3, 691462.6, 8591461, 8964535  (xmin, xmax, ymin, ymax)
 coord. ref. : WGS 84 / UTM zone 36S (EPSG:32736) 
 names       : DIST_CODE  EA_CODE TA_CODE   pop
 type        :     <chr>    <chr>   <chr> <num>
 values      :       105 10507801   10507  1112
                     105 10507072   10507 694.1
                     105 10507010   10507 938.4

Now it’s your turn

  • Here’s your task:

Now it’s your turn

  • Here’s your task:
    • Search for “worldpop population counts”
    • Scroll down the page, click on “unconstrained individual countries 2000-2020 UN adjusted (1km resolution)
    • Then, search for a country (maybe yours?)

Now it’s your turn

  • Here’s your task:
    • Search for “worldpop population counts”
    • Scroll down the page, click on “unconstrained individual countries 2000-2020 UN adjusted (1km resolution)
    • Then, search for a country (maybe yours?)
    • Click on “Data & Resources” for 2020
    • Scroll down to the bottom of the page and download the .tif

Now it’s your turn

  • Load the .tif into R using terra
  • Plot the raster using tidyterra and ggplot
    • Make it look nice!

Let’s keep going!

  • Now you need to find a shapefile for the same country
  • This will be a bit less straightforward
    • Search for “shapefile COUNTRY humdata”
    • You should find a link to the Humanitarian Data Exchange
    • Click on it and see if it has shapefiles for your country of choice
    • If so, download a shapefile (it can be at a higher admin level)
      • If not, raise your hand and I’ll come help you find a shapefile
    • Load it into R and plot it!

One last thing

  • You have the population tif and the shapefile
  • Extract the population data (using sum, don’t forget!) to the shapefile
    • Use append_cols and make sure you choose the correct identifier!
  • Join the data to the shapefile
  • Plot the shapefile with the population data
    • Make it look nice!

What can you do with that data?

  • Now you have a shapefile with population data
  • You can save it as a .csv and use it in your analysis!
    • write_csv(NAME, PATH)

Creating a grid

  • Yesterday, we used a grid in Korea
    • kgrid.shp
  • By now, you can probably see that a grid is very similar to a raster!

Load the shapefile

  • Let’s load kshape.shp
Code
kshape <- vect("week6data/kshape.shp")
kgrid <- rast(kshape, res = 5000)
kgrid <- as.polygons(kgrid)
kgrid$id <- 1:nrow(kgrid)

The grid

Not quite done

  • We aren’t quite done. What do we want to do now?
Code
intersection <- intersect(kshape, kgrid)
kgrid <- kgrid |>
  filter(id %in% intersection$id)

Not quite done

Your turn!

  • Find the population raster for Korea on Worldpop
  • Extract the TOTAL population in each grid cell
  • Plot it!

Let’s have a little fun with maps!

How do we do this?

Code
# load raster
mwrast <- rast("week6data/mwndvimonths.tif")
# vector
adm4 <- vect("week6data/mw4.shp")
# project
mwrast <- terra::project(mwrast, crs(adm4))

# extract
extract <- terra::extract(mwrast, adm4, fun = "mean")

head(extract)
  ID      Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
1  1 4756.460 5388.849 6359.126 6365.519 5548.002 4297.354 3669.479 3176.294
2  2 5786.706 6156.773 7432.599 7154.398 6260.265 5410.169 4888.113 4132.957
3  3 3281.118 5142.824 8524.425 8591.957 7722.501 7324.118 6962.072 6143.839
4  4 1947.943 6303.996 6614.029 6974.710 7034.415 5950.442 5389.052 4654.426
5  5 6109.966 6999.138 8066.552 7959.506 7331.672 6694.690 6084.893 5264.631
6  6 6412.467 7286.674 8147.131 7993.387 7208.807 6053.188 5305.224 4285.144
       Sep      Oct      Nov      Dec
1 2671.925 2363.872 2286.108 4435.463
2 3508.677 2977.613 3154.451 4953.721
3 5522.871 4650.212 5266.973 6615.277
4 4113.541 3870.270 4052.974 4412.086
5 4627.720 4210.714 4709.623 6253.621
6 3458.877 3382.628 3792.173 4327.365

How do we do this?

Code
# it's in the same order, so we can just do this
extract$EA_CODE <- adm4$EA_CODE
# remove the "ID" column
extract <- extract[,-1]
  • To do this, we need to “reshape” the data
    • Each row needs to be an EA/month observation
Code
# pivot longer
extract <- extract |> 
  pivot_longer(cols = -EA_CODE, names_to = "month", values_to = "ndvi")
head(extract)
# A tibble: 6 × 3
  EA_CODE  month  ndvi
  <chr>    <chr> <dbl>
1 10507801 Jan   4756.
2 10507801 Feb   5389.
3 10507801 Mar   6359.
4 10507801 Apr   6366.
5 10507801 May   5548.
6 10507801 Jun   4297.

How do we do this?

  • To keep things manageable (in terms of memory), let’s just keep months one through six
Code
# filter
extract <- extract |>
  filter(month %in% month.abb[1:6])
# match! We need an integer value below
extract$monthint <- match(extract$month, month.abb)
head(extract)
# A tibble: 6 × 4
  EA_CODE  month  ndvi monthint
  <chr>    <chr> <dbl>    <int>
1 10507801 Jan   4756.        1
2 10507801 Feb   5389.        2
3 10507801 Mar   6359.        3
4 10507801 Apr   6366.        4
5 10507801 May   5548.        5
6 10507801 Jun   4297.        6
Code
# then join to adm4
adm4 <- adm4 |>
  left_join(extract, by = "EA_CODE")
nrow(adm4)
[1] 19272

And the code

  • Note this can take some time!
Code
library(gganimate)
ggplot(adm4) +
  geom_spatvector(aes(fill = ndvi), color = NA) +
  scale_fill_distiller("NDVI", palette = "RdYlGn", 
    na.value = "white", direction = 1) +
  theme_bw() +
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) +
  # Here comes the gganimate code
  transition_manual(
                    frames = monthint,
                    cumulative = FALSE
                    ) +
  labs(title = "Month: {(adm4 |> filter(monthint==current_frame))$month[1]}")
  • This will create a gif that shows the NDVI values for each EA in Malawi over the first six months of the year!

Group activity

  • Here’s your task:
    • Create a grid for Seoul (only!)
    • Pull five years of raster data from WorldPop
    • Instead of creating a gif, I want you to use facet_wrap (or facet_grid) to show the pop values for each year in a separate panel

Imagery

Satellite imagery

  • Satellite imagery is also saved as rasters
    • But they are usually much larger
    • And they have multiple bands
      • Each band is a different “color” (or wavelength)
      • For example, red, green, blue, near-infrared, etc.
    • This is how we get “true color” images
      • Red, green, blue bands are combined to make an image

Can load using terra

Code
image <- rast("week6data/imageryexample.tiff")

image
class       : SpatRaster 
dimensions  : 652, 775, 26  (nrow, ncol, nlyr)
resolution  : 8.983153e-05, 8.983153e-05  (x, y)
extent      : 34.25294, 34.32256, -11.69463, -11.63606  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : imageryexample.tiff 
names       :      B1,  B2,  B3,  B4,      B5,  B6, ... 
min values  :   4.505,   0,   0,   0,   0.085,   0, ... 
max values  : 255.000, 255, 255, 255, 255.000, 255, ... 
  • A bunch of bands! Let’s figure out what they are
  • Search for Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A
    • Google Earth Engine

Easiest way to plot: plotRGB

Code
plotRGB(image)

Let’s set the names!

  • Oh no!

  • What’s going on?

  • Let’s look at the help documentation for plotRGB:
Code
plotRGB(x, r=1, g=2, b=3, a=NULL, scale=NULL, mar=0, 
        stretch=NULL, smooth=TRUE, colNA="white", alpha=NULL, bgalpha=NULL, 
        zlim=NULL, zcol=FALSE, axes=FALSE ,...)

Easiest way to plot: plotRGB

Code
plotRGB(image, r = 4, g = 3, b = 2)

With tidyterra

Code
ggplot() +
  geom_spatraster_rgb(data = image, r = 4, g = 3, b = 2) +
  theme_void()

Satellite imagery

  • Those options plot the red, green, and blue
    • RGB is the default order in the plotting functions
    • Sometimes a raster might be saved different, however
  • Satellite imagery is also used to create vegetation indices!
    • We will use the NIR for this

Vegetation indices

  • NDVI (Normalized Difference Vegetation Index):

\[ NDVI = \frac{NIR - RED}{NIR + RED} \]

  • EVI (Enhanced Vegetation Index):

\[ NDVI = G\times\frac{NIR - RED}{NIR + C_1\times RED - C_2\times BLUE+L} \] In MODIS: \(L=1\), \(C_1=6\), \(C_2=7.5\), \(G=2.5\)

Calculating NDVI

Calculating NDVI

Code
names(image)
 [1] "B1"                  "B2"                  "B3"                 
 [4] "B4"                  "B5"                  "B6"                 
 [7] "B7"                  "B8"                  "B8A"                
[10] "B9"                  "B11"                 "B12"                
[13] "AOT"                 "WVP"                 "SCL"                
[16] "TCI_R"               "TCI_G"               "TCI_B"              
[19] "MSK_CLDPRB"          "MSK_SNWPRB"          "QA10"               
[22] "QA20"                "QA60"                "MSK_CLASSI_OPAQUE"  
[25] "MSK_CLASSI_CIRRUS"   "MSK_CLASSI_SNOW_ICE" "ndvi"               
Code
image$ndvi <- (image$B8 - image$B4) / (image$B8 + image$B4)

g1 <- ggplot() +
  geom_spatraster_rgb(data = image, r = 4, g = 3, b = 2) +
  theme_void() + labs(subtitle = "A. RGB Image")
g2 <- ggplot() +
  geom_spatraster(data = image, aes(fill = ndvi), show.legend = FALSE) +
  scale_fill_distiller("NDVI", palette = "RdYlGn", na.value = "white", direction = 1) +
  theme_void() + labs(subtitle = "B. NDVI")

Calculating NDVI

Code
plot_grid(g1, g2) +
  theme(plot.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb")) + 
  theme(legend.background = element_rect(fill = "#f0f1eb", color = "#f0f1eb"))

To Google Earth Engine!

Code

Code
var dataset = ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
                  .filterDate('2023-04-01', '2023-04-30')
                  // Pre-filter to get less cloudy granules.
                  .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE',10));

// Get an image from the dataset
var image = dataset.mean().clip(XXXXX); // SPECIFY!

// Export the image as a GeoTIFF
Export.image.toDrive({
  image: image,
  description: 'NAME',
  scale: 10, // Define the scale in meters
  region: XXXXX,
  fileFormat: 'GeoTIFF',
  maxPixels: 1e13 // Adjust if needed depending on the size of data
});