Feature selection for SAE

Nairobi Workshop: Day 3 (afternoon)

Ann-Kristin Kreutzmann
Josh Merfeld

August 26, 2024

Feature selection

  • Let’s start with some example data I have
    • This comes from Malawi
      • Northern Malawi only (due to the size of the data)
    • And we’ll use it all day tomorrow!

Code
library(tidyverse)
surveycollapsed <- read_csv("day3data/ihs5ea.csv")
predictors <- read_csv("day3data/mosaikvars.csv")

A short explanation

  • The survey data is collapsed to the admin3 level (TAs)
    • This is the area, in SAE terminology
    • I have poverty rates for areas (TAs) and subareas (EAs)
    • I have some variables that predict poverty at the subarea level

  • So it’s a perfect setup for SAE!
    • We want to estimate poverty at the TA
    • We don’t have any observations in some TAs and we have too few in others
    • We could estimate a subarea model

Observations?

Predictive features

  • I also have a bunch of predictive features!
    • The data come from something called MOSAIKS, that we’ll discuss briefly tomorrow
    • In short, they are variables derived from satellite imagery
    • Take a look at this
Code
predictors
# A tibble: 2,911 × 501
    EA_CODE  mosaik1  mosaik2 mosaik3 mosaik4 mosaik5 mosaik6 mosaik7 mosaik8 mosaik9 mosaik10 mosaik11 mosaik12 mosaik13 mosaik14 mosaik15 mosaik16 mosaik17 mosaik18 mosaik19 mosaik20 mosaik21 mosaik22 mosaik23 mosaik24 mosaik25 mosaik26 mosaik27 mosaik28 mosaik29 mosaik30 mosaik31 mosaik32 mosaik33 mosaik34 mosaik35 mosaik36 mosaik37 mosaik38 mosaik39 mosaik40 mosaik41 mosaik42 mosaik43 mosaik44 mosaik45 mosaik46 mosaik47 mosaik48 mosaik49 mosaik50 mosaik51 mosaik52 mosaik53 mosaik54 mosaik55 mosaik56 mosaik57 mosaik58 mosaik59 mosaik60 mosaik61 mosaik62 mosaik63 mosaik64 mosaik65 mosaik66 mosaik67 mosaik68  mosaik69 mosaik70 mosaik71 mosaik72 mosaik73 mosaik74 mosaik75 mosaik76 mosaik77   mosaik78  mosaik79 mosaik80 mosaik81 mosaik82 mosaik83 mosaik84 mosaik85 mosaik86 mosaik87 mosaik88 mosaik89 mosaik90  mosaik91 mosaik92 mosaik93 mosaik94 mosaik95 mosaik96 mosaik97 mosaik98 mosaik99 mosaik100 mosaik101 mosaik102 mosaik103 mosaik104 mosaik105 mosaik106 mosaik107 mosaik108 mosaik109 mosaik110 mosaik111 mosaik112 mosaik113 mosaik114 mosaik115 mosaik116 mosaik117 mosaik118 mosaik119 mosaik120 mosaik121 mosaik122 mosaik123 mosaik124 mosaik125 mosaik126 mosaik127 mosaik128 mosaik129 mosaik130 mosaik131 mosaik132 mosaik133 mosaik134 mosaik135 mosaik136 mosaik137 mosaik138 mosaik139 mosaik140 mosaik141 mosaik142 mosaik143 mosaik144 mosaik145 mosaik146 mosaik147 mosaik148 mosaik149 mosaik150 mosaik151 mosaik152 mosaik153 mosaik154 mosaik155 mosaik156 mosaik157 mosaik158 mosaik159
      <dbl>    <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>      <dbl>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
 1 10101001 0.00143  0.00242    0.632  0.0334  0.0684   0.223 0.00641  0.0753   0.172    0.200    0.385  0.00697     1.51 0.00345    0.0488    0.528    0.422    0.387    0.463  0.00535  0.00298   0.0144  0.00735    0.303   0.0665 0.00144     0.454    0.574    0.473    0.303    0.626 0.000357    0.860   0.0220    0.466    0.986    0.107   0.0699  0.0123   0.00504    0.345  0.00537    0.864   0.0139    0.116    0.174    0.687   0.0660  0.00980  0.00810     1.04     1.12   0.0163    0.711  0.00678    0.307  0.00555   0.104    0.0250    0.336    0.195    0.176   0.0397    0.263    0.628    0.134    0.253    0.207 0.00197     0.0232 0.00396     0.282    0.394    0.267     1.17     1.16    0.440 0.00151    0.00101     0.0243   0.0950    0.234    0.509    0.480    0.430   0.0171    0.453   0.0880   0.0495  0.00819 0.000781     0.514 0.00300     0.104    0.123    0.153    0.247   0.0293   0.0297    0.0284   2.37e-3     0.138     0.229      1.26   5.25e-4  0.00173      0.152     0.555   0.00836     0.537     0.590  0.000996     0.687    0.0197    0.0785     0.159     0.287    0.0881    0.0126   0.00432    0.0578     0.332     0.225    0.0464     0.457     0.134     0.103    0.0278      1.02     0.755     0.126   0.00411     0.120      1.19   0.00353   0.00527   0.00409     0.370     0.476   0.00553    0.107      0.122    0.0222     0.211    0.0671     0.301     0.217  0.00144    0.00740 0.0000114     0.302    0.0437     0.517   0.00582      1.44   0.00179     0.391    0.0913    0.0400
 2 10101002 0.000659 0.000250   0.911  0.0468  0.0978   0.357 0.00359  0.127    0.249    0.291    0.623  0.00565     1.83 0.000853   0.0899    0.694    0.557    0.460    0.701  0.00399  0.00220   0.0246  0.0149     0.427   0.0798 0.00130     0.624    0.762    0.812    0.445    0.829 0.000695    1.11    0.0229    0.539    1.32     0.181   0.134   0.00454  0.00256    0.427  0.00688    1.31    0.0172    0.175    0.260    0.822   0.0900  0.00998  0.00416     1.44     1.54   0.0318    0.897  0.00881    0.485  0.00396   0.142    0.0275    0.521    0.264    0.191   0.0732    0.402    1.00     0.181    0.377    0.319 0.000135    0.0181 0.000658    0.401    0.544    0.336     1.46     1.54    0.624 0.00000453 0.0000542   0.0320   0.0984    0.336    0.683    0.741    0.477   0.0171    0.534   0.114    0.0355  0.00422 0.0000841    0.647 0.000803    0.161    0.211    0.193    0.335   0.0284   0.0251    0.0296   1.87e-5     0.206     0.304      1.66   0        0.000200     0.293     0.944   0.00968     0.721     0.767  0.000885     0.984    0.0205    0.137      0.194     0.423    0.110     0.0225   0.00195    0.0702     0.531     0.337    0.0701     0.700     0.184     0.154    0.0338      1.47     0.962     0.170   0.00457     0.192      1.68   0.00229   0.00582   0.00192     0.583     0.771   0.00253    0.0746     0.206    0.0303     0.328    0.0834     0.505     0.286  0.000418   0.00297 0.0000256     0.322    0.0525     0.736   0.00592      2.23   0.00219     0.489    0.105     0.0626
 3 10101003 0.000657 0.000403   0.811  0.0373  0.0794   0.326 0.00285  0.100    0.196    0.249    0.532  0.00426     1.77 0.000646   0.0672    0.619    0.487    0.404    0.647  0.00370  0.00153   0.0180  0.0105     0.383   0.0653 0.00119     0.565    0.640    0.791    0.381    0.710 0.000422    0.983   0.0172    0.471    1.19     0.165   0.132   0.00575  0.00251    0.375  0.00513    1.22    0.0138    0.138    0.215    0.709   0.0698  0.00829  0.00392     1.33     1.41   0.0240    0.804  0.00612    0.446  0.00380   0.113    0.0217    0.427    0.226    0.168   0.0564    0.337    0.952    0.154    0.317    0.286 0.000149    0.0171 0.000714    0.331    0.468    0.286     1.27     1.44    0.570 0.00000290 0.0000282   0.0277   0.0844    0.276    0.584    0.683    0.422   0.0139    0.470   0.0949   0.0349  0.00319 0.0000650    0.564 0.000685    0.133    0.185    0.146    0.288   0.0240   0.0230    0.0253   1.16e-5     0.176     0.252      1.53   8.55e-7  0.000195     0.281     0.907   0.00834     0.633     0.688  0.000778     0.869    0.0182    0.103      0.167     0.372    0.0879    0.0237   0.00149    0.0570     0.489     0.274    0.0510     0.605     0.154     0.140    0.0279      1.36     0.845     0.140   0.00252     0.152      1.57   0.00182   0.00442   0.00142     0.492     0.718   0.00298    0.0781     0.185    0.0229     0.266    0.0658     0.448     0.240  0.000293   0.00225 0.0000808     0.289    0.0425     0.676   0.00375      2.11   0.00141     0.416    0.0868    0.0457
 4 10101004 0.00102  0.000769   0.975  0.0578  0.111    0.369 0.00584  0.140    0.264    0.316    0.666  0.00852     1.92 0.00116    0.104     0.717    0.632    0.511    0.715  0.00637  0.00298   0.0292  0.0180     0.433   0.0999 0.00250     0.636    0.796    0.842    0.467    0.921 0.000642    1.16    0.0264    0.572    1.37     0.207   0.156   0.00505  0.00286    0.452  0.00910    1.34    0.0227    0.174    0.268    0.864   0.0989  0.0148   0.00463     1.47     1.55   0.0408    0.952  0.0111     0.494  0.00487   0.151    0.0325    0.540    0.290    0.201   0.0788    0.421    1.03     0.199    0.422    0.351 0.000135    0.0227 0.000991    0.448    0.605    0.351     1.55     1.58    0.627 0.00000306 0.0000670   0.0388   0.127     0.370    0.720    0.753    0.535   0.0191    0.550   0.137    0.0466  0.00517 0.0000883    0.678 0.00105     0.168    0.238    0.203    0.399   0.0357   0.0293    0.0388   1.91e-5     0.229     0.326      1.70   0        0.000447     0.324     0.969   0.0120      0.749     0.786  0.000880     1.03     0.0244    0.137      0.229     0.448    0.119     0.0302   0.00189    0.0827     0.540     0.359    0.0812     0.755     0.204     0.172    0.0356      1.47     1.02      0.201   0.00369     0.206      1.69   0.00288   0.00794   0.00246     0.643     0.782   0.00491    0.0963     0.214    0.0342     0.350    0.105      0.530     0.312  0.000583   0.00402 0.0000494     0.350    0.0642     0.744   0.00592      2.26   0.00252     0.521    0.118     0.0636
 5 10101005 0.000472 0.000351   0.815  0.0344  0.0668   0.381 0.00287  0.0947   0.172    0.257    0.526  0.00484     1.75 0.000780   0.0639    0.646    0.452    0.356    0.748  0.00305  0.00151   0.0152  0.0102     0.439   0.0592 0.000853    0.623    0.611    0.955    0.402    0.676 0.000364    0.963   0.0151    0.432    1.24     0.174   0.151   0.00327  0.00212    0.342  0.00464    1.37    0.0142    0.124    0.223    0.643   0.0621  0.00651  0.00231     1.46     1.55   0.0244    0.838  0.00629    0.518  0.00285   0.0992   0.0184    0.436    0.221    0.137   0.0528    0.336    1.12     0.147    0.285    0.302 0.000139    0.0134 0.000582    0.310    0.413    0.290     1.20     1.59    0.654 0.00000658 0.0000207   0.0214   0.0769    0.268    0.554    0.787    0.401   0.0119    0.460   0.0840   0.0276  0.00328 0.0000430    0.529 0.000734    0.142    0.193    0.145    0.247   0.0214   0.0187    0.0221   2.70e-6     0.176     0.233      1.59   0        0.000217     0.319     1.10    0.00704     0.657     0.728  0.000512     0.901    0.0147    0.0935     0.150     0.404    0.0828    0.0254   0.00135    0.0493     0.582     0.276    0.0458     0.607     0.148     0.141    0.0227      1.58     0.817     0.130   0.00249     0.135      1.81   0.00191   0.00442   0.00149     0.464     0.864   0.00200    0.0591     0.219    0.0231     0.236    0.0602     0.493     0.224  0.000257   0.00219 0.0000663     0.264    0.0396     0.789   0.00377      2.45   0.00145     0.407    0.0753    0.0399
 6 10101006 0.00107  0.000835   0.861  0.0496  0.122    0.315 0.00536  0.137    0.255    0.281    0.644  0.00834     1.71 0.00190    0.0977    0.653    0.562    0.488    0.632  0.00734  0.00322   0.0286  0.0170     0.378   0.0945 0.00252     0.568    0.773    0.730    0.412    0.835 0.000971    1.11    0.0294    0.517    1.28     0.164   0.121   0.00779  0.00421    0.432  0.00764    1.23    0.0195    0.177    0.244    0.818   0.101   0.0150   0.00591     1.35     1.43   0.0336    0.835  0.00704    0.432  0.00510   0.139    0.0316    0.486    0.259    0.193   0.0804    0.386    0.929    0.189    0.399    0.290 0.000297    0.0247 0.00120     0.442    0.601    0.311     1.47     1.43    0.547 0.0000111  0.000193    0.0443   0.105     0.338    0.665    0.679    0.454   0.0181    0.489   0.134    0.0394  0.00707 0.000121     0.673 0.00161     0.149    0.194    0.188    0.365   0.0339   0.0396    0.0380   5.12e-5     0.188     0.302      1.56   0        0.000476     0.277     0.851   0.0119      0.693     0.719  0.000813     0.965    0.0236    0.138      0.211     0.395    0.109     0.0189   0.00308    0.0779     0.473     0.316    0.0731     0.694     0.175     0.145    0.0475      1.31     0.952     0.175   0.00526     0.206      1.52   0.00258   0.00839   0.00303     0.597     0.682   0.00553    0.0951     0.178    0.0340     0.341    0.0883     0.473     0.281  0.000498   0.00429 0.000127      0.299    0.0536     0.657   0.00800      2.03   0.00184     0.474    0.107     0.0697
 7 10101007 0.00132  0.000842   1.13   0.0549  0.0999   0.594 0.00649  0.154    0.235    0.389    0.789  0.00820     2.25 0.00165    0.120     0.841    0.631    0.461    1.09   0.00632  0.00334   0.0263  0.0229     0.627   0.0974 0.00178     0.846    0.789    1.50     0.546    0.908 0.00159     1.20    0.0244    0.535    1.63     0.331   0.334   0.00198  0.00216    0.451  0.0105     1.97    0.0265    0.181    0.319    0.781   0.0835  0.00995  0.00292     1.95     1.99   0.0533    1.03   0.0127     0.778  0.00415   0.139    0.0282    0.573    0.312    0.167   0.103     0.515    1.64     0.205    0.457    0.482 0.000345    0.0161 0.000928    0.416    0.583    0.385     1.56     2.09    0.913 0.0000619  0.000161    0.0308   0.0926    0.370    0.780    1.12     0.442   0.0146    0.570   0.117    0.0263  0.00645 0.000196     0.660 0.00232     0.223    0.338    0.147    0.371   0.0351   0.0259    0.0367   7.97e-5     0.254     0.318      2.09   1.66e-5  0.000782     0.627     1.68    0.0133      0.886     0.953  0.000818     1.22     0.0172    0.167      0.205     0.583    0.113     0.0707   0.00294    0.0587     0.875     0.367    0.0867     0.932     0.225     0.241    0.0338      2.15     1.06      0.189   0.00679     0.218      2.44   0.00551   0.00907   0.00440     0.740     1.27    0.00385    0.0526     0.361    0.0393     0.376    0.0826     0.780     0.300  0.000915   0.00529 0.000292      0.301    0.0697     1.09    0.00917      3.46   0.00270     0.527    0.0958    0.0674
 8 10101008 0.00202  0.00182    1.05   0.0796  0.166    0.415 0.00953  0.179    0.309    0.347    0.794  0.0116      1.94 0.00308    0.128     0.764    0.670    0.581    0.791  0.0124   0.00480   0.0408  0.0239     0.469   0.129  0.00414     0.683    0.860    0.953    0.505    0.978 0.00147     1.27    0.0390    0.591    1.47     0.231   0.191   0.00889  0.00398    0.510  0.0112     1.46    0.0301    0.211    0.295    0.934   0.117   0.0188   0.00782     1.58     1.67   0.0529    0.985  0.0114     0.550  0.00734   0.171    0.0407    0.585    0.319    0.221   0.114     0.465    1.15     0.231    0.510    0.382 0.000520    0.0320 0.00168     0.512    0.721    0.369     1.70     1.70    0.673 0.0000219  0.000229    0.0603   0.130     0.399    0.776    0.836    0.541   0.0222    0.559   0.175    0.0471  0.0104  0.000181     0.777 0.00268     0.187    0.274    0.234    0.458   0.0464   0.0510    0.0515   8.96e-5     0.240     0.358      1.81   1.03e-6  0.000819     0.394     1.09    0.0190      0.810     0.850  0.000938     1.14     0.0298    0.169      0.256     0.488    0.135     0.0331   0.00370    0.102      0.610     0.386    0.105      0.840     0.217     0.190    0.0633      1.56     1.10      0.219   0.00741     0.249      1.81   0.00407   0.0147    0.00479     0.753     0.859   0.00968    0.113      0.242    0.0474     0.405    0.111      0.600     0.333  0.00101    0.00673 0.000203      0.350    0.0759     0.796   0.0110       2.46   0.00272     0.558    0.127     0.0903
 9 10101009 0.000445 0.000417   0.834  0.0332  0.0663   0.375 0.00278  0.0950   0.168    0.263    0.522  0.00452     1.82 0.000686   0.0617    0.644    0.455    0.352    0.744  0.00365  0.00149   0.0161  0.00933    0.438   0.0572 0.000850    0.620    0.628    0.946    0.398    0.659 0.000380    0.967   0.0140    0.445    1.26     0.180   0.149   0.00243  0.00192    0.343  0.00432    1.38    0.0128    0.130    0.224    0.655   0.0584  0.00596  0.00224     1.47     1.54   0.0240    0.816  0.00682    0.515  0.00272   0.102    0.0191    0.423    0.228    0.135   0.0553    0.351    1.10     0.147    0.304    0.303 0.0000882   0.0121 0.000360    0.300    0.422    0.295     1.24     1.57    0.652 0.00000180 0.0000226   0.0206   0.0744    0.264    0.582    0.779    0.394   0.0120    0.470   0.0766   0.0264  0.00278 0.0000423    0.527 0.000545    0.143    0.193    0.125    0.252   0.0207   0.0180    0.0224   4.67e-6     0.170     0.241      1.61   0        0.000208     0.319     1.09    0.00798     0.674     0.729  0.000622     0.914    0.0139    0.0990     0.153     0.405    0.0845    0.0245   0.00123    0.0481     0.573     0.268    0.0489     0.624     0.149     0.141    0.0210      1.58     0.836     0.133   0.00220     0.135      1.80   0.00150   0.00396   0.00126     0.485     0.857   0.00282    0.0549     0.213    0.0206     0.243    0.0599     0.489     0.219  0.000313   0.00219 0.0000233     0.266    0.0414     0.786   0.00336      2.45   0.00152     0.411    0.0752    0.0413
10 10101010 0.000720 0.000438   0.794  0.0367  0.0849   0.328 0.00377  0.109    0.195    0.255    0.566  0.00556     1.63 0.00113    0.0703    0.631    0.476    0.408    0.655  0.00416  0.00176   0.0210  0.0116     0.394   0.0679 0.00144     0.575    0.667    0.787    0.382    0.695 0.000532    1.01    0.0194    0.466    1.22     0.153   0.120   0.00472  0.00274    0.383  0.00525    1.24    0.0141    0.152    0.222    0.703   0.0678  0.00868  0.00430     1.37     1.44   0.0259    0.786  0.00565    0.455  0.00398   0.113    0.0230    0.417    0.225    0.161   0.0658    0.343    0.960    0.155    0.340    0.267 0.000215    0.0168 0.000740    0.348    0.499    0.289     1.33     1.45    0.582 0.0000181  0.0000968   0.0305   0.0783    0.275    0.599    0.696    0.392   0.0141    0.456   0.100    0.0291  0.00394 0.000120     0.588 0.000878    0.140    0.175    0.134    0.305   0.0257   0.0247    0.0271   2.72e-5     0.160     0.259      1.55   0        0.000282     0.279     0.913   0.00985     0.658     0.687  0.000585     0.895    0.0177    0.113      0.174     0.379    0.0929    0.0164   0.00187    0.0581     0.495     0.269    0.0553     0.628     0.152     0.130    0.0318      1.40     0.873     0.143   0.00322     0.167      1.61   0.00156   0.00522   0.00185     0.511     0.731   0.00314    0.0716     0.186    0.0226     0.282    0.0656     0.451     0.240  0.000238   0.00304 0.0000841     0.258    0.0425     0.694   0.00438      2.17   0.00131     0.411    0.0832    0.0519
# ℹ 2,901 more rows
# ℹ 341 more variables: mosaik160 <dbl>, mosaik161 <dbl>, mosaik162 <dbl>, mosaik163 <dbl>, mosaik164 <dbl>, mosaik165 <dbl>, mosaik166 <dbl>, mosaik167 <dbl>, mosaik168 <dbl>, mosaik169 <dbl>, mosaik170 <dbl>, mosaik171 <dbl>, mosaik172 <dbl>, mosaik173 <dbl>, mosaik174 <dbl>, mosaik175 <dbl>, mosaik176 <dbl>, mosaik177 <dbl>, mosaik178 <dbl>, mosaik179 <dbl>, mosaik180 <dbl>, mosaik181 <dbl>, mosaik182 <dbl>, mosaik183 <dbl>, mosaik184 <dbl>, mosaik185 <dbl>, mosaik186 <dbl>, mosaik187 <dbl>, mosaik188 <dbl>, mosaik189 <dbl>, mosaik190 <dbl>, mosaik191 <dbl>, mosaik192 <dbl>, mosaik193 <dbl>, mosaik194 <dbl>, mosaik195 <dbl>, mosaik196 <dbl>, mosaik197 <dbl>, mosaik198 <dbl>, mosaik199 <dbl>, mosaik200 <dbl>, mosaik201 <dbl>, mosaik202 <dbl>, mosaik203 <dbl>, mosaik204 <dbl>, mosaik205 <dbl>, mosaik206 <dbl>, mosaik207 <dbl>, mosaik208 <dbl>, mosaik209 <dbl>, mosaik210 <dbl>, mosaik211 <dbl>, mosaik212 <dbl>, mosaik213 <dbl>, mosaik214 <dbl>, mosaik215 <dbl>, mosaik216 <dbl>, mosaik217 <dbl>, mosaik218 <dbl>, mosaik219 <dbl>, mosaik220 <dbl>, mosaik221 <dbl>, mosaik222 <dbl>, mosaik223 <dbl>, mosaik224 <dbl>, mosaik225 <dbl>, mosaik226 <dbl>, mosaik227 <dbl>, mosaik228 <dbl>, mosaik229 <dbl>, mosaik230 <dbl>, mosaik231 <dbl>, mosaik232 <dbl>, mosaik233 <dbl>, mosaik234 <dbl>, mosaik235 <dbl>, mosaik236 <dbl>, mosaik237 <dbl>, mosaik238 <dbl>, mosaik239 <dbl>, mosaik240 <dbl>, mosaik241 <dbl>, mosaik242 <dbl>, mosaik243 <dbl>, mosaik244 <dbl>, mosaik245 <dbl>,
#   mosaik246 <dbl>, mosaik247 <dbl>, mosaik248 <dbl>, mosaik249 <dbl>, mosaik250 <dbl>, mosaik251 <dbl>, mosaik252 <dbl>, mosaik253 <dbl>, mosaik254 <dbl>, mosaik255 <dbl>, mosaik256 <dbl>, mosaik257 <dbl>, mosaik258 <dbl>, mosaik259 <dbl>, …

We have a problem

Code
# this is how many subarea observations we have
nrow(surveycollapsed)
[1] 107
Code
# this is how many predictors we have
ncol(predictors)
[1] 501
  • What’s the problem?
  • It’s actually impossible to estimate a model with more predictors than observations!

Another problem: overfitting

  • There’s another problem, too

  • If we have too many predictors, we can “overfit” the model

    • This means the model is too complex
    • It fits the data we have too well
    • This means it doesn’t generalize well to new data

  • So we need to select the best predictors

    • What does “best” mean here?

Generalizing out-of-sample

  • We want to know what best predicts OUT of sample

  • So we are going to set up our data to allow this:

    • We will split the data into X parts
    • A common number for X is 10, but let’s do 5

Cross validation

Cross validation

Cross validation - random folds

Code
surveycollapsed$fold <- sample(1:5, nrow(surveycollapsed), replace = TRUE)
head(surveycollapsed)
# A tibble: 6 × 5
   EA_CODE   poor total_weights total_obs  fold
     <dbl>  <dbl>         <dbl>     <dbl> <int>
1 10101006 0.230          5690.        16     2
2 10101011 0.444          7614.        16     1
3 10101027 0.0947         9441.        16     5
4 10101033 0.376          7486.        16     4
5 10101039 0.600          9147.        16     4
6 10101054 0.497          5351.        16     3

Cross validation

Cross validation

But what “models” are we going to fit?

  • What are the models we are going to fit?
    • We want a way to select the best predictors
    • This will reduce the number of predictors and prevent overfitting (we hope)

  • We are going to use a method called LASSO (or lasso)
    • It’s an acronym: Least Absolute Shrinkage and Selection Operator
    • No details, but it’s a way to select the best predictors
      • It “penalizes” the coefficients of the predictors
    • R package glmnet does this for us

The setup - with a transformed outcome

Code
library(glmnet)
set.seed(398465) # this is a random process, so we want to set the seed!

# we need to set up the data (combining the predictors and the outcome)
data <- surveycollapsed |>
  left_join(predictors, by = "EA_CODE")

# cv.glmnet will set up everything for us
lasso <- cv.glmnet(
  y = asin(sqrt(data$poor)), # the outcome
  x = data |> dplyr::select(starts_with("mosaik")) |> as.matrix(), # the predictors (as.matrix() is required)
  weights = data$total_weights, # the weights (sample weights)
  nfolds = 5) # number of folds (10 is the default)
lasso

Call:  cv.glmnet(x = as.matrix(dplyr::select(data, starts_with("mosaik"))),      y = asin(sqrt(data$poor)), weights = data$total_weights,      nfolds = 5) 

Measure: Mean-Squared Error 

     Lambda Index Measure       SE Nonzero
min 0.02030    26 0.04227 0.006409       6
1se 0.06493     1 0.04418 0.005811       0

What have we done?

Code
lasso

Call:  cv.glmnet(x = as.matrix(dplyr::select(data, starts_with("mosaik"))),      y = asin(sqrt(data$poor)), weights = data$total_weights,      nfolds = 5) 

Measure: Mean-Squared Error 

     Lambda Index Measure       SE Nonzero
min 0.02030    26 0.04227 0.006409       6
1se 0.06493     1 0.04418 0.005811       0


  • What are the different “models”?
    • Different values of lambda
    • In this case, the “best” lambda is 0.02030
    • Note that some people prefer to use the 1se value (it is more conservative). No details today.

Different values of lambda: different predictors!

Code
lasso

Call:  cv.glmnet(x = as.matrix(dplyr::select(data, starts_with("mosaik"))),      y = asin(sqrt(data$poor)), weights = data$total_weights,      nfolds = 5) 

Measure: Mean-Squared Error 

     Lambda Index Measure       SE Nonzero
min 0.02030    26 0.04227 0.006409       6
1se 0.06493     1 0.04418 0.005811       0


  • At the “optimal” lambda, we have 6 predictors (non-zero coefficients)

Choosing based on mean-squared error (MSE)

Non-zero coefficients

Non-zero coefficients

Code
coef(lasso, s = "lambda.min")
501 x 1 sparse Matrix of class "dgCMatrix"
                       s1
(Intercept)   0.568658061
mosaik1       .          
mosaik2       .          
mosaik3       .          
mosaik4       .          
mosaik5       .          
mosaik6       .          
mosaik7       .          
mosaik8       .          
mosaik9       .          
mosaik10      .          
mosaik11      .          
mosaik12      .          
mosaik13      .          
mosaik14      .          
mosaik15      .          
mosaik16      .          
mosaik17      .          
mosaik18      .          
mosaik19      .          
mosaik20      .          
mosaik21      .          
mosaik22      .          
mosaik23      .          
mosaik24      .          
mosaik25      .          
mosaik26      .          
mosaik27      .          
mosaik28      .          
mosaik29      .          
mosaik30      .          
mosaik31      .          
mosaik32      .          
mosaik33      .          
mosaik34      .          
mosaik35      .          
mosaik36      .          
mosaik37      .          
mosaik38      .          
mosaik39     -0.003056014
mosaik40      .          
mosaik41      .          
mosaik42      .          
mosaik43      .          
mosaik44      .          
mosaik45      .          
mosaik46      .          
mosaik47      .          
mosaik48      .          
mosaik49      .          
mosaik50      .          
mosaik51      .          
mosaik52      .          
mosaik53      .          
mosaik54      .          
mosaik55      .          
mosaik56      .          
mosaik57      .          
mosaik58      .          
mosaik59      .          
mosaik60      .          
mosaik61      .          
mosaik62      .          
mosaik63      .          
mosaik64      .          
mosaik65      .          
mosaik66      .          
mosaik67      .          
mosaik68      .          
mosaik69      .          
mosaik70      .          
mosaik71      .          
mosaik72      .          
mosaik73      .          
mosaik74      .          
mosaik75      .          
mosaik76      .          
mosaik77      .          
mosaik78      .          
mosaik79      .          
mosaik80      .          
mosaik81      .          
mosaik82      .          
mosaik83      .          
mosaik84      .          
mosaik85      .          
mosaik86      .          
mosaik87      .          
mosaik88      .          
mosaik89      .          
mosaik90      .          
mosaik91      .          
mosaik92      .          
mosaik93      .          
mosaik94      .          
mosaik95      .          
mosaik96      .          
mosaik97      .          
mosaik98      .          
mosaik99      .          
mosaik100     .          
mosaik101     .          
mosaik102     .          
mosaik103     .          
mosaik104     .          
mosaik105     .          
mosaik106     .          
mosaik107     .          
mosaik108     .          
mosaik109     .          
mosaik110     .          
mosaik111     .          
mosaik112     .          
mosaik113     .          
mosaik114     .          
mosaik115     .          
mosaik116     .          
mosaik117     .          
mosaik118     .          
mosaik119     .          
mosaik120     .          
mosaik121     .          
mosaik122     .          
mosaik123     .          
mosaik124     .          
mosaik125     .          
mosaik126     .          
mosaik127     .          
mosaik128     .          
mosaik129     .          
mosaik130     .          
mosaik131     .          
mosaik132     .          
mosaik133     .          
mosaik134     .          
mosaik135     .          
mosaik136     .          
mosaik137     .          
mosaik138     .          
mosaik139     .          
mosaik140     .          
mosaik141     .          
mosaik142     .          
mosaik143     .          
mosaik144     .          
mosaik145     .          
mosaik146     .          
mosaik147     .          
mosaik148     .          
mosaik149     .          
mosaik150     .          
mosaik151     .          
mosaik152     .          
mosaik153     .          
mosaik154     .          
mosaik155     .          
mosaik156     .          
mosaik157     .          
mosaik158     .          
mosaik159     .          
mosaik160     .          
mosaik161     .          
mosaik162     .          
mosaik163     .          
mosaik164     .          
mosaik165     .          
mosaik166     .          
mosaik167     .          
mosaik168     .          
mosaik169     .          
mosaik170     .          
mosaik171     .          
mosaik172     .          
mosaik173     .          
mosaik174     .          
mosaik175     .          
mosaik176     .          
mosaik177     .          
mosaik178     .          
mosaik179     .          
mosaik180     .          
mosaik181     .          
mosaik182     .          
mosaik183     .          
mosaik184     .          
mosaik185     .          
mosaik186     .          
mosaik187     .          
mosaik188     .          
mosaik189     .          
mosaik190     .          
mosaik191     .          
mosaik192     .          
mosaik193     .          
mosaik194     .          
mosaik195     .          
mosaik196     .          
mosaik197     .          
mosaik198     .          
mosaik199     .          
mosaik200     .          
mosaik201     .          
mosaik202     .          
mosaik203     .          
mosaik204     .          
mosaik205     .          
mosaik206     .          
mosaik207     .          
mosaik208     .          
mosaik209     .          
mosaik210     .          
mosaik211     .          
mosaik212     .          
mosaik213     .          
mosaik214     .          
mosaik215     .          
mosaik216     .          
mosaik217     .          
mosaik218     .          
mosaik219     .          
mosaik220     .          
mosaik221     .          
mosaik222     .          
mosaik223     .          
mosaik224     .          
mosaik225     .          
mosaik226     .          
mosaik227     .          
mosaik228     .          
mosaik229     .          
mosaik230     .          
mosaik231     .          
mosaik232     .          
mosaik233     .          
mosaik234     0.015558794
mosaik235     .          
mosaik236     .          
mosaik237     .          
mosaik238     .          
mosaik239     .          
mosaik240     .          
mosaik241     .          
mosaik242     .          
mosaik243     .          
mosaik244     .          
mosaik245     .          
mosaik246     .          
mosaik247     .          
mosaik248     .          
mosaik249     .          
mosaik250     .          
mosaik251     .          
mosaik252     .          
mosaik253     .          
mosaik254     .          
mosaik255     .          
mosaik256     .          
mosaik257     .          
mosaik258     .          
mosaik259     .          
mosaik260     .          
mosaik261     .          
mosaik262     .          
mosaik263     .          
mosaik264     .          
mosaik265     .          
mosaik266     .          
mosaik267     .          
mosaik268     .          
mosaik269     .          
mosaik270     .          
mosaik271     .          
mosaik272     .          
mosaik273     .          
mosaik274     .          
mosaik275     .          
mosaik276     .          
mosaik277     0.044130256
mosaik278     .          
mosaik279     .          
mosaik280    -8.148026851
mosaik281     .          
mosaik282     .          
mosaik283     .          
mosaik284     .          
mosaik285     .          
mosaik286     .          
mosaik287     .          
mosaik288     .          
mosaik289     .          
mosaik290     .          
mosaik291     .          
mosaik292     .          
mosaik293     .          
mosaik294     .          
mosaik295     .          
mosaik296     .          
mosaik297     .          
mosaik298     .          
mosaik299     .          
mosaik300     .          
mosaik301     .          
mosaik302     .          
mosaik303     .          
mosaik304     .          
mosaik305     .          
mosaik306     .          
mosaik307     .          
mosaik308     .          
mosaik309     .          
mosaik310     .          
mosaik311     .          
mosaik312     .          
mosaik313     .          
mosaik314     .          
mosaik315     .          
mosaik316     .          
mosaik317     .          
mosaik318     .          
mosaik319     .          
mosaik320     .          
mosaik321     .          
mosaik322     .          
mosaik323     .          
mosaik324     .          
mosaik325     .          
mosaik326     .          
mosaik327     .          
mosaik328     .          
mosaik329     .          
mosaik330     .          
mosaik331     .          
mosaik332     .          
mosaik333     .          
mosaik334     .          
mosaik335     .          
mosaik336     .          
mosaik337     .          
mosaik338     .          
mosaik339     .          
mosaik340     .          
mosaik341     .          
mosaik342     .          
mosaik343     .          
mosaik344     .          
mosaik345     .          
mosaik346     .          
mosaik347     .          
mosaik348     .          
mosaik349     .          
mosaik350     .          
mosaik351     .          
mosaik352     .          
mosaik353     .          
mosaik354     .          
mosaik355     .          
mosaik356     .          
mosaik357     .          
mosaik358     .          
mosaik359     .          
mosaik360     .          
mosaik361     .          
mosaik362     .          
mosaik363     .          
mosaik364     .          
mosaik365     .          
mosaik366     .          
mosaik367     .          
mosaik368     .          
mosaik369     .          
mosaik370     .          
mosaik371     .          
mosaik372     .          
mosaik373     .          
mosaik374     .          
mosaik375     .          
mosaik376     .          
mosaik377     .          
mosaik378     .          
mosaik379     .          
mosaik380     .          
mosaik381     .          
mosaik382     .          
mosaik383     .          
mosaik384     .          
mosaik385     .          
mosaik386     .          
mosaik387     .          
mosaik388     .          
mosaik389     .          
mosaik390     .          
mosaik391     .          
mosaik392     .          
mosaik393     .          
mosaik394     .          
mosaik395     .          
mosaik396   -35.261808063
mosaik397     .          
mosaik398     .          
mosaik399     .          
mosaik400     .          
mosaik401     .          
mosaik402     .          
mosaik403     .          
mosaik404     .          
mosaik405     .          
mosaik406     .          
mosaik407     .          
mosaik408     .          
mosaik409     .          
mosaik410     .          
mosaik411     .          
mosaik412     .          
mosaik413     .          
mosaik414     .          
mosaik415     .          
mosaik416     .          
mosaik417     .          
mosaik418     .          
mosaik419     .          
mosaik420     .          
mosaik421     .          
mosaik422     .          
mosaik423     .          
mosaik424     .          
mosaik425     .          
mosaik426     .          
mosaik427     .          
mosaik428     .          
mosaik429     .          
mosaik430     .          
mosaik431     .          
mosaik432     .          
mosaik433     .          
mosaik434     .          
mosaik435     .          
mosaik436     .          
mosaik437     .          
mosaik438     .          
mosaik439     .          
mosaik440     .          
mosaik441     .          
mosaik442     .          
mosaik443     .          
mosaik444     .          
mosaik445     .          
mosaik446     .          
mosaik447     .          
mosaik448     .          
mosaik449     .          
mosaik450     .          
mosaik451     .          
mosaik452     .          
mosaik453     .          
mosaik454     .          
mosaik455     .          
mosaik456     .          
mosaik457     .          
mosaik458     .          
mosaik459    55.635488670
mosaik460     .          
mosaik461     .          
mosaik462     .          
mosaik463     .          
mosaik464     .          
mosaik465     .          
mosaik466     .          
mosaik467     .          
mosaik468     .          
mosaik469     .          
mosaik470     .          
mosaik471     .          
mosaik472     .          
mosaik473     .          
mosaik474     .          
mosaik475     .          
mosaik476     .          
mosaik477     .          
mosaik478     .          
mosaik479     .          
mosaik480     .          
mosaik481     .          
mosaik482     .          
mosaik483     .          
mosaik484     .          
mosaik485     .          
mosaik486     .          
mosaik487     .          
mosaik488     .          
mosaik489     .          
mosaik490     .          
mosaik491     .          
mosaik492     .          
mosaik493     .          
mosaik494     .          
mosaik495     .          
mosaik496     .          
mosaik497     .          
mosaik498     .          
mosaik499     .          
mosaik500     .          

What we want: the non-zero variable names!

  • Getting the names of the variables is more complicated than it should be
Code
# first, turn the coefs into a data.frame
coefs <- coef(lasso, s = "lambda.min") |>
  as.matrix() |>
  as.data.frame()
coefs
                       s1
(Intercept)   0.568658061
mosaik1       0.000000000
mosaik2       0.000000000
mosaik3       0.000000000
mosaik4       0.000000000
mosaik5       0.000000000
mosaik6       0.000000000
mosaik7       0.000000000
mosaik8       0.000000000
mosaik9       0.000000000
mosaik10      0.000000000
mosaik11      0.000000000
mosaik12      0.000000000
mosaik13      0.000000000
mosaik14      0.000000000
mosaik15      0.000000000
mosaik16      0.000000000
mosaik17      0.000000000
mosaik18      0.000000000
mosaik19      0.000000000
mosaik20      0.000000000
mosaik21      0.000000000
mosaik22      0.000000000
mosaik23      0.000000000
mosaik24      0.000000000
mosaik25      0.000000000
mosaik26      0.000000000
mosaik27      0.000000000
mosaik28      0.000000000
mosaik29      0.000000000
mosaik30      0.000000000
mosaik31      0.000000000
mosaik32      0.000000000
mosaik33      0.000000000
mosaik34      0.000000000
mosaik35      0.000000000
mosaik36      0.000000000
mosaik37      0.000000000
mosaik38      0.000000000
mosaik39     -0.003056014
mosaik40      0.000000000
mosaik41      0.000000000
mosaik42      0.000000000
mosaik43      0.000000000
mosaik44      0.000000000
mosaik45      0.000000000
mosaik46      0.000000000
mosaik47      0.000000000
mosaik48      0.000000000
mosaik49      0.000000000
mosaik50      0.000000000
mosaik51      0.000000000
mosaik52      0.000000000
mosaik53      0.000000000
mosaik54      0.000000000
mosaik55      0.000000000
mosaik56      0.000000000
mosaik57      0.000000000
mosaik58      0.000000000
mosaik59      0.000000000
mosaik60      0.000000000
mosaik61      0.000000000
mosaik62      0.000000000
mosaik63      0.000000000
mosaik64      0.000000000
mosaik65      0.000000000
mosaik66      0.000000000
mosaik67      0.000000000
mosaik68      0.000000000
mosaik69      0.000000000
mosaik70      0.000000000
mosaik71      0.000000000
mosaik72      0.000000000
mosaik73      0.000000000
mosaik74      0.000000000
mosaik75      0.000000000
mosaik76      0.000000000
mosaik77      0.000000000
mosaik78      0.000000000
mosaik79      0.000000000
mosaik80      0.000000000
mosaik81      0.000000000
mosaik82      0.000000000
mosaik83      0.000000000
mosaik84      0.000000000
mosaik85      0.000000000
mosaik86      0.000000000
mosaik87      0.000000000
mosaik88      0.000000000
mosaik89      0.000000000
mosaik90      0.000000000
mosaik91      0.000000000
mosaik92      0.000000000
mosaik93      0.000000000
mosaik94      0.000000000
mosaik95      0.000000000
mosaik96      0.000000000
mosaik97      0.000000000
mosaik98      0.000000000
mosaik99      0.000000000
mosaik100     0.000000000
mosaik101     0.000000000
mosaik102     0.000000000
mosaik103     0.000000000
mosaik104     0.000000000
mosaik105     0.000000000
mosaik106     0.000000000
mosaik107     0.000000000
mosaik108     0.000000000
mosaik109     0.000000000
mosaik110     0.000000000
mosaik111     0.000000000
mosaik112     0.000000000
mosaik113     0.000000000
mosaik114     0.000000000
mosaik115     0.000000000
mosaik116     0.000000000
mosaik117     0.000000000
mosaik118     0.000000000
mosaik119     0.000000000
mosaik120     0.000000000
mosaik121     0.000000000
mosaik122     0.000000000
mosaik123     0.000000000
mosaik124     0.000000000
mosaik125     0.000000000
mosaik126     0.000000000
mosaik127     0.000000000
mosaik128     0.000000000
mosaik129     0.000000000
mosaik130     0.000000000
mosaik131     0.000000000
mosaik132     0.000000000
mosaik133     0.000000000
mosaik134     0.000000000
mosaik135     0.000000000
mosaik136     0.000000000
mosaik137     0.000000000
mosaik138     0.000000000
mosaik139     0.000000000
mosaik140     0.000000000
mosaik141     0.000000000
mosaik142     0.000000000
mosaik143     0.000000000
mosaik144     0.000000000
mosaik145     0.000000000
mosaik146     0.000000000
mosaik147     0.000000000
mosaik148     0.000000000
mosaik149     0.000000000
mosaik150     0.000000000
mosaik151     0.000000000
mosaik152     0.000000000
mosaik153     0.000000000
mosaik154     0.000000000
mosaik155     0.000000000
mosaik156     0.000000000
mosaik157     0.000000000
mosaik158     0.000000000
mosaik159     0.000000000
mosaik160     0.000000000
mosaik161     0.000000000
mosaik162     0.000000000
mosaik163     0.000000000
mosaik164     0.000000000
mosaik165     0.000000000
mosaik166     0.000000000
mosaik167     0.000000000
mosaik168     0.000000000
mosaik169     0.000000000
mosaik170     0.000000000
mosaik171     0.000000000
mosaik172     0.000000000
mosaik173     0.000000000
mosaik174     0.000000000
mosaik175     0.000000000
mosaik176     0.000000000
mosaik177     0.000000000
mosaik178     0.000000000
mosaik179     0.000000000
mosaik180     0.000000000
mosaik181     0.000000000
mosaik182     0.000000000
mosaik183     0.000000000
mosaik184     0.000000000
mosaik185     0.000000000
mosaik186     0.000000000
mosaik187     0.000000000
mosaik188     0.000000000
mosaik189     0.000000000
mosaik190     0.000000000
mosaik191     0.000000000
mosaik192     0.000000000
mosaik193     0.000000000
mosaik194     0.000000000
mosaik195     0.000000000
mosaik196     0.000000000
mosaik197     0.000000000
mosaik198     0.000000000
mosaik199     0.000000000
mosaik200     0.000000000
mosaik201     0.000000000
mosaik202     0.000000000
mosaik203     0.000000000
mosaik204     0.000000000
mosaik205     0.000000000
mosaik206     0.000000000
mosaik207     0.000000000
mosaik208     0.000000000
mosaik209     0.000000000
mosaik210     0.000000000
mosaik211     0.000000000
mosaik212     0.000000000
mosaik213     0.000000000
mosaik214     0.000000000
mosaik215     0.000000000
mosaik216     0.000000000
mosaik217     0.000000000
mosaik218     0.000000000
mosaik219     0.000000000
mosaik220     0.000000000
mosaik221     0.000000000
mosaik222     0.000000000
mosaik223     0.000000000
mosaik224     0.000000000
mosaik225     0.000000000
mosaik226     0.000000000
mosaik227     0.000000000
mosaik228     0.000000000
mosaik229     0.000000000
mosaik230     0.000000000
mosaik231     0.000000000
mosaik232     0.000000000
mosaik233     0.000000000
mosaik234     0.015558794
mosaik235     0.000000000
mosaik236     0.000000000
mosaik237     0.000000000
mosaik238     0.000000000
mosaik239     0.000000000
mosaik240     0.000000000
mosaik241     0.000000000
mosaik242     0.000000000
mosaik243     0.000000000
mosaik244     0.000000000
mosaik245     0.000000000
mosaik246     0.000000000
mosaik247     0.000000000
mosaik248     0.000000000
mosaik249     0.000000000
mosaik250     0.000000000
mosaik251     0.000000000
mosaik252     0.000000000
mosaik253     0.000000000
mosaik254     0.000000000
mosaik255     0.000000000
mosaik256     0.000000000
mosaik257     0.000000000
mosaik258     0.000000000
mosaik259     0.000000000
mosaik260     0.000000000
mosaik261     0.000000000
mosaik262     0.000000000
mosaik263     0.000000000
mosaik264     0.000000000
mosaik265     0.000000000
mosaik266     0.000000000
mosaik267     0.000000000
mosaik268     0.000000000
mosaik269     0.000000000
mosaik270     0.000000000
mosaik271     0.000000000
mosaik272     0.000000000
mosaik273     0.000000000
mosaik274     0.000000000
mosaik275     0.000000000
mosaik276     0.000000000
mosaik277     0.044130256
mosaik278     0.000000000
mosaik279     0.000000000
mosaik280    -8.148026851
mosaik281     0.000000000
mosaik282     0.000000000
mosaik283     0.000000000
mosaik284     0.000000000
mosaik285     0.000000000
mosaik286     0.000000000
mosaik287     0.000000000
mosaik288     0.000000000
mosaik289     0.000000000
mosaik290     0.000000000
mosaik291     0.000000000
mosaik292     0.000000000
mosaik293     0.000000000
mosaik294     0.000000000
mosaik295     0.000000000
mosaik296     0.000000000
mosaik297     0.000000000
mosaik298     0.000000000
mosaik299     0.000000000
mosaik300     0.000000000
mosaik301     0.000000000
mosaik302     0.000000000
mosaik303     0.000000000
mosaik304     0.000000000
mosaik305     0.000000000
mosaik306     0.000000000
mosaik307     0.000000000
mosaik308     0.000000000
mosaik309     0.000000000
mosaik310     0.000000000
mosaik311     0.000000000
mosaik312     0.000000000
mosaik313     0.000000000
mosaik314     0.000000000
mosaik315     0.000000000
mosaik316     0.000000000
mosaik317     0.000000000
mosaik318     0.000000000
mosaik319     0.000000000
mosaik320     0.000000000
mosaik321     0.000000000
mosaik322     0.000000000
mosaik323     0.000000000
mosaik324     0.000000000
mosaik325     0.000000000
mosaik326     0.000000000
mosaik327     0.000000000
mosaik328     0.000000000
mosaik329     0.000000000
mosaik330     0.000000000
mosaik331     0.000000000
mosaik332     0.000000000
mosaik333     0.000000000
mosaik334     0.000000000
mosaik335     0.000000000
mosaik336     0.000000000
mosaik337     0.000000000
mosaik338     0.000000000
mosaik339     0.000000000
mosaik340     0.000000000
mosaik341     0.000000000
mosaik342     0.000000000
mosaik343     0.000000000
mosaik344     0.000000000
mosaik345     0.000000000
mosaik346     0.000000000
mosaik347     0.000000000
mosaik348     0.000000000
mosaik349     0.000000000
mosaik350     0.000000000
mosaik351     0.000000000
mosaik352     0.000000000
mosaik353     0.000000000
mosaik354     0.000000000
mosaik355     0.000000000
mosaik356     0.000000000
mosaik357     0.000000000
mosaik358     0.000000000
mosaik359     0.000000000
mosaik360     0.000000000
mosaik361     0.000000000
mosaik362     0.000000000
mosaik363     0.000000000
mosaik364     0.000000000
mosaik365     0.000000000
mosaik366     0.000000000
mosaik367     0.000000000
mosaik368     0.000000000
mosaik369     0.000000000
mosaik370     0.000000000
mosaik371     0.000000000
mosaik372     0.000000000
mosaik373     0.000000000
mosaik374     0.000000000
mosaik375     0.000000000
mosaik376     0.000000000
mosaik377     0.000000000
mosaik378     0.000000000
mosaik379     0.000000000
mosaik380     0.000000000
mosaik381     0.000000000
mosaik382     0.000000000
mosaik383     0.000000000
mosaik384     0.000000000
mosaik385     0.000000000
mosaik386     0.000000000
mosaik387     0.000000000
mosaik388     0.000000000
mosaik389     0.000000000
mosaik390     0.000000000
mosaik391     0.000000000
mosaik392     0.000000000
mosaik393     0.000000000
mosaik394     0.000000000
mosaik395     0.000000000
mosaik396   -35.261808063
mosaik397     0.000000000
mosaik398     0.000000000
mosaik399     0.000000000
mosaik400     0.000000000
mosaik401     0.000000000
mosaik402     0.000000000
mosaik403     0.000000000
mosaik404     0.000000000
mosaik405     0.000000000
mosaik406     0.000000000
mosaik407     0.000000000
mosaik408     0.000000000
mosaik409     0.000000000
mosaik410     0.000000000
mosaik411     0.000000000
mosaik412     0.000000000
mosaik413     0.000000000
mosaik414     0.000000000
mosaik415     0.000000000
mosaik416     0.000000000
mosaik417     0.000000000
mosaik418     0.000000000
mosaik419     0.000000000
mosaik420     0.000000000
mosaik421     0.000000000
mosaik422     0.000000000
mosaik423     0.000000000
mosaik424     0.000000000
mosaik425     0.000000000
mosaik426     0.000000000
mosaik427     0.000000000
mosaik428     0.000000000
mosaik429     0.000000000
mosaik430     0.000000000
mosaik431     0.000000000
mosaik432     0.000000000
mosaik433     0.000000000
mosaik434     0.000000000
mosaik435     0.000000000
mosaik436     0.000000000
mosaik437     0.000000000
mosaik438     0.000000000
mosaik439     0.000000000
mosaik440     0.000000000
mosaik441     0.000000000
mosaik442     0.000000000
mosaik443     0.000000000
mosaik444     0.000000000
mosaik445     0.000000000
mosaik446     0.000000000
mosaik447     0.000000000
mosaik448     0.000000000
mosaik449     0.000000000
mosaik450     0.000000000
mosaik451     0.000000000
mosaik452     0.000000000
mosaik453     0.000000000
mosaik454     0.000000000
mosaik455     0.000000000
mosaik456     0.000000000
mosaik457     0.000000000
mosaik458     0.000000000
mosaik459    55.635488670
mosaik460     0.000000000
mosaik461     0.000000000
mosaik462     0.000000000
mosaik463     0.000000000
mosaik464     0.000000000
mosaik465     0.000000000
mosaik466     0.000000000
mosaik467     0.000000000
mosaik468     0.000000000
mosaik469     0.000000000
mosaik470     0.000000000
mosaik471     0.000000000
mosaik472     0.000000000
mosaik473     0.000000000
mosaik474     0.000000000
mosaik475     0.000000000
mosaik476     0.000000000
mosaik477     0.000000000
mosaik478     0.000000000
mosaik479     0.000000000
mosaik480     0.000000000
mosaik481     0.000000000
mosaik482     0.000000000
mosaik483     0.000000000
mosaik484     0.000000000
mosaik485     0.000000000
mosaik486     0.000000000
mosaik487     0.000000000
mosaik488     0.000000000
mosaik489     0.000000000
mosaik490     0.000000000
mosaik491     0.000000000
mosaik492     0.000000000
mosaik493     0.000000000
mosaik494     0.000000000
mosaik495     0.000000000
mosaik496     0.000000000
mosaik497     0.000000000
mosaik498     0.000000000
mosaik499     0.000000000
mosaik500     0.000000000

What we want: the non-zero variable names!

  • Getting the names of the variables is more complicated than it should be
Code
# Now, create variable that is the name of the rows
coefs$variable <- rownames(coefs)
head(coefs)
                   s1    variable
(Intercept) 0.5686581 (Intercept)
mosaik1     0.0000000     mosaik1
mosaik2     0.0000000     mosaik2
mosaik3     0.0000000     mosaik3
mosaik4     0.0000000     mosaik4
mosaik5     0.0000000     mosaik5
Code
# non-zero rows
coefs <- coefs[coefs$s1!=0,]
# finally, the names of the variables
coefs$variable
[1] "(Intercept)" "mosaik39"    "mosaik234"   "mosaik277"   "mosaik280"   "mosaik396"   "mosaik459"  

One more step: remove the Intercept!

  • We don’t want the name of the intercept
    • All of the packages we use will add that automatically
Code
allvariables <- coefs$variable[-1]
allvariables
[1] "mosaik39"  "mosaik234" "mosaik277" "mosaik280" "mosaik396" "mosaik459"

How do we use this with ebp?

  • In EBP, we need a formula
  • How do we turn this into a formula?
    • We need to add the outcome variable (poor) and combine the predictors with +
Code
ebpformula <- as.formula(paste("poor ~", paste(allvariables, collapse = " + ")))
ebpformula
poor ~ mosaik39 + mosaik234 + mosaik277 + mosaik280 + mosaik396 + 
    mosaik459

Finally: estimating the model

Code
library(povmap) # I like to use povmap instead of emdi (personal preference)
# get "area" variable
predictors$TA_CODE <- substr(predictors$EA_CODE, 1, 5)
data$TA_CODE <- substr(data$EA_CODE, 1, 5)
ebp <- ebp(fixed = ebpformula, # the formula
  pop_data = predictors, # the population data
  pop_domains = "TA_CODE", # the domain (area) name in the population data
  smp_data = data, # the sample data
  smp_domains = "TA_CODE", # the domain (area) name in the sample data
  transformation = "arcsin", # I'm going to use the arcsin transformation
  weights = "total_weights", # sample weights
  weights_type = "nlme", # weights type
  MSE = TRUE, # variance? yes please
  L = 0) # this is a new thing in povmap: "analytical" variance estimates. much faster!
Time difference of 0.54 secs
Code
head(ebp$ind)
  Domain      Mean Head_Count
1  10101 0.3669687  0.1446667
2  10102 0.3526612  0.1677491
3  10103 0.2829257  0.3078810
4  10104 0.3255265  0.2415296
5  10105 0.3564514  0.1773102
6  10106 0.3747787  0.1587885

Some results

Code
plot(ebp)

Press [enter] to continue

Press [enter] to continue

Press [enter] to continue

Some results

Code
summary(ebp)
Empirical Best Prediction

Call:
 ebp(fixed = poor ~ mosaik39 + mosaik234 + mosaik277 + mosaik280 + 
    mosaik396 + mosaik459, pop_data = predictors, pop_domains = "TA_CODE", 
    smp_data = data, smp_domains = "TA_CODE", L = 0, transformation = "arcsin", 
    MSE = TRUE, weights = "total_weights", weights_type = "nlme")

Out-of-sample domains:  27 
In-sample domains:  49 
Out-of-sample subdomains:  0 
In-sample subdomains:  0 

Sample sizes:
Units in sample:  107 
Units in population:  2911 
                   Min. 1st Qu. Median      Mean 3rd Qu. Max.
Sample_domains        1     1.0      2  2.183673       3    7
Population_domains    1     5.5     18 38.302632      44  300

Explanatory measures for the mixed model:
 Marginal_R2 Conditional_R2 Marginal_Area_R2 Conditional_Area_R2
   0.2552368      0.4115571        0.4196524           0.6605383

Residual diagnostics for the mixed model:
                Skewness Kurtosis Shapiro_W    Shapiro_p
Error         -0.5081831 3.993441 0.9761079 5.063839e-02
Random_effect  0.1855229 6.703526 0.8687609 1.234483e-06

Estimated variance of random effects:
                 Variance
Error         0.028541022
Random_effect 0.007581942

ICC:  0.2098926 

Transformation:
 Transformation Shift_parameter
         arcsin               0

Some results

Code
estimators(ebp, "Mean", MSE = TRUE, CV = TRUE)
Indicator/s: Mean
   Domain      Mean    Mean_MSE   Mean_CV
1   10101 0.3669687 0.002629563 0.1397375
2   10102 0.3526612 0.004691803 0.1942282
3   10103 0.2829257 0.005993879 0.2736412
4   10104 0.3255265 0.013896106 0.3621264
5   10105 0.3564514 0.005792373 0.2135147
6   10106 0.3747787 0.005655064 0.2006522
7   10107 0.3518191 0.006464604 0.2285344
8   10108 0.3545512 0.010294990 0.2861765
9   10109 0.3971006 0.005946677 0.1941944
10  10110 0.2786403 0.029650480 0.6179764
11  10120 0.3480978 0.014268712 0.3431556
12  10201 0.3514189 0.003464771 0.1674990
13  10202 0.3344865 0.004468171 0.1998418
14  10203 0.4354745 0.002630998 0.1177870
15  10204 0.3489391 0.003343137 0.1657019
16  10205 0.3882951 0.005510909 0.1911832
17  10206 0.2853691 0.025078233 0.5549342
18  10220 0.1969776 0.003156887 0.2852415
19  10301 0.2713425 0.005196212 0.2656597
20  10302 0.2897256 0.003922275 0.2161634
21  10303 0.3130857 0.007064307 0.2684550
22  10304 0.2778572 0.005705761 0.2718537
23  10305 0.2522589 0.006000382 0.3070739
24  10306 0.2242030 0.002644896 0.2293839
25  10307 0.2762336 0.008079368 0.3253959
26  10308 0.2357340 0.004659298 0.2895596
27  10309 0.3058134 0.004927044 0.2295285
28  10310 0.2791359 0.008044734 0.3213218
29  10311 0.3226810 0.023161630 0.4716405
30  10312 0.1930645 0.003625567 0.3118785
31  10314 0.3159857 0.016804961 0.4102526
32  10320 0.2135503 0.005137835 0.3356525
33  10401 0.2920620 0.002988713 0.1871833
34  10402 0.2809861 0.009975523 0.3554537
35  10403 0.2862341 0.004755373 0.2409189
36  10404 0.3170657 0.005211576 0.2276854
37  10405 0.4033321 0.004833684 0.1723758
38  10406 0.2999634 0.018108395 0.4486128
39  10407 0.3129457 0.005513247 0.2372656
40  10408 0.3671935 0.005647509 0.2046602
41  10409 0.3303997 0.011236376 0.3208289
42  10410 0.2798613 0.006148608 0.2801854
43  10411 0.3233065 0.011905717 0.3374919
44  10412 0.3241912 0.014079233 0.3660061
45  10413 0.2911344 0.038900244 0.6774583
46  10420 0.1852033 0.003764123 0.3312709
47  10501 0.4279057 0.004548114 0.1576041
48  10502 0.3145523 0.005071991 0.2264104
49  10503 0.3572890 0.006396011 0.2238386
50  10504 0.2799923 0.008608295 0.3313695
51  10505 0.4006467 0.004651181 0.1702237
52  10506 0.3835653 0.006266404 0.2063811
53  10507 0.3457535 0.011422360 0.3091087
54  10508 0.4131363 0.004964878 0.1705536
55  10509 0.4611727 0.012175319 0.2392636
56  10510 0.4246205 0.008066957 0.2115212
57  10511 0.3551883 0.005861361 0.2155463
58  10512 0.3095397 0.038539002 0.6342111
59  10520 0.2261618 0.017305163 0.5816593
60  10601 0.2232001 0.008633771 0.4162995
61  10620 0.2148553 0.040063699 0.9316001
62  10701 0.1332353 0.015674280 0.9396679
63  10702 0.1960373 0.007782660 0.4500134
64  10703 0.1714235 0.018327112 0.7897264
65  10704 0.1271540 0.017895297 1.0520570
66  10705 0.2342148 0.009814370 0.4229771
67  10706 0.1794717 0.007639432 0.4870062
68  10707 0.1332409 0.004362838 0.4957319
69  10708 0.1786822 0.005321580 0.4082621
70  10709 0.1986310 0.015732667 0.6314722
71  10710 0.2180149 0.005152287 0.3292410
72  10711 0.1896444 0.010116085 0.5303545
73  10712 0.1853390 0.029183701 0.9217294
74  10713 0.2527483 0.008197760 0.3582279
75  10714 0.1688222 0.010264168 0.6001121
76  10715 0.1517842 0.023150823 1.0024362