Error estimation

For the most part, this document will present the functionalities of the function surveysd::calc.stError() which generates point estimates and standard errors for user-supplied estimation functions.

Prerequisites

In order to use a dataset with calc.stError(), several weight columns have to be present. Each weight column corresponds to a bootstrap sample. In the following examples, we will use the data from demo.eusilc() and attach the bootstrap weights using draw.bootstrap() and recalib(). Please refer to the documentation of those functions for more detail.

library(surveysd)

set.seed(1234)
eusilc <- demo.eusilc(prettyNames = TRUE)
dat_boot <- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight",
                           strata = "region", period = "year")
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
                          epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[, onePerson := nrow(.SD) == 1, by = .(year, hid)]

## print part of the dataset
dat_boot_calib[1:5, .(year, povertyRisk, eqIncome, onePerson, pWeight, w1, w2, w3, w4, w5)]

year	povertyRisk	eqIncome	onePerson	pWeight	w1	w2	w3	w4	w5
2010	FALSE	16090.69	FALSE	504.5696	1015.0604336	0.4526272	1009.3031	0.4480523	1002.5438149
2010	FALSE	16090.69	FALSE	504.5696	1015.0604336	0.4526272	1009.3031	0.4480523	1002.5438149
2010	FALSE	16090.69	FALSE	504.5696	1015.0604336	0.4526272	1009.3031	0.4480523	1002.5438149
2010	FALSE	27076.24	FALSE	493.3824	0.4410974	0.4432779	986.6543	0.4370670	0.4361168
2010	FALSE	27076.24	FALSE	493.3824	0.4410974	0.4432779	986.6543	0.4370670	0.4361168

Estimator functions

The parameters fun and var in calc.stError() define the estimator to be used in the error analysis. There are two built-in estimator functions weightedSum() and weightedRatio() which can be used as follows.

povertyRate <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio)
totalIncome <- calc.stError(dat_boot_calib, var = "eqIncome", fun = weightedSum)

Those functions calculate the ratio of persons at risk of poverty (in percent) and the total income. By default, the results are calculated separately for each reference period.

povertyRate$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk
2010	14827	8182222	direct	14.44422	0.4477234
2011	14827	8182222	direct	14.77393	0.6018256
2012	14827	8182222	direct	15.04515	0.7126190
2013	14827	8182222	direct	14.89013	0.6015532
2014	14827	8182222	direct	15.14556	0.6637220
2015	14827	8182222	direct	15.53640	0.6062735
2016	14827	8182222	direct	15.08315	0.3041252
2017	14827	8182222	direct	15.42019	0.3551701

totalIncome$Estimates

year	n	N	estimate_type	val_eqIncome	stE_eqIncome
2010	14827	8182222	direct	162750998071	1167749615
2011	14827	8182222	direct	161926931417	958914287
2012	14827	8182222	direct	162576509628	1426809520
2013	14827	8182222	direct	163199507862	1281970889
2014	14827	8182222	direct	163986275009	1103382292
2015	14827	8182222	direct	163416275447	1316614887
2016	14827	8182222	direct	162706205137	973560276
2017	14827	8182222	direct	164314959107	1045829176

Columns that use the val_ prefix denote the point estimate belonging to the “main weight” of the dataset, which is pWeight in case of the dataset used here.

Columns with the stE_ prefix denote standard errors calculated with bootstrap replicates. The replicates result in using w1, w2, …, w10 instead of pWeight when applying the estimator.

n denotes the number of observations for the year and N denotes the total weight of those persons.

Custom estimators

In order to define a custom estimator function to be used in fun, the function needs to have at least two arguments like the example below.

## define custom estimator
myWeightedSum <- function(x, w) {
  sum(x*w)
}

## check if results are equal to the one using `surveysd::weightedSum()`
totalIncome2 <- calc.stError(dat_boot_calib, var = "eqIncome", fun = myWeightedSum)
all.equal(totalIncome$Estimates, totalIncome2$Estimates)

## [1] TRUE

The parameters x and w can be assumed to be vectors with equal length with w being numeric weight vector and x being the column defined in the var argument. It will be called once for each period (in this case year) and for each weight column (in this case pWeight, w1, w2, …, w10).

Custom estimators using additional parameters can also be supplied and parameter add.arg can be used to set the additional arguments for the custom estimator.

## use add.arg-argument
fun <- function(x, w, b) {
  sum(x*w*b)
}
add.arg = list(b="onePerson")

err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = fun,
                        period.mean = 0, add.arg=add.arg)
err.est$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk
2010	14827	8182222	direct	273683.9	8793.773
2011	14827	8182222	direct	261883.6	11886.266
2012	14827	8182222	direct	243083.9	13888.956
2013	14827	8182222	direct	238004.4	13427.979
2014	14827	8182222	direct	218572.1	10288.531
2015	14827	8182222	direct	219984.1	12950.126
2016	14827	8182222	direct	201753.9	10499.734
2017	14827	8182222	direct	196881.2	11654.796

# compare with direct computation
compare.value <- dat_boot_calib[,fun(povertyRisk,pWeight,b=onePerson),
                                 by=c("year")]
all((compare.value$V1-err.est$Estimates$val_povertyRisk)==0)

## [1] TRUE

The above chunk computes the weighted poverty ratio for single person households.

Adjust variable depending on bootstrap weights

In our example the variable povertyRisk is a boolean and is TRUE if the income is less than 60% of the weighted median income. Thus it directly depends on the original weight vector pWeight. To further reduce the estimated error one should calculate for each bootstrap replicate weight $w$ the weighted median income $medIncome_{w}$ and then define $povertyRisk_w$ as

\[ povertyRisk_w = \cases{1 \quad\text{if Income}<0.6\cdot medIncome_{w}\\ 0 \quad\text{else}} \]

The estimator can then be applied to the new variable $povertyRisk_w$. This can be realized using a custom estimator function.

# custom estimator to first derive poverty threshold 
# and then estimate a weighted ratio
povmd <- function(x, w) {
 md <- laeken::weightedMedian(x, w)*0.6
 pmd60 <- x < md
 # weighted ratio is directly estimated inside the function
 return(sum(w[pmd60])/sum(w)*100)
}

err.est <- calc.stError(
  dat_boot_calib, var = "povertyRisk", fun = weightedRatio,
  fun.adjust.var = povmd, adjust.var = "eqIncome")
err.est$Estimates

year	n	N	estimate_type	val_povertyRisk
2010	14827	8182222	direct	14.44422
2011	14827	8182222	direct	14.77393
2012	14827	8182222	direct	15.04515
2013	14827	8182222	direct	14.89013
2014	14827	8182222	direct	15.14556
2015	14827	8182222	direct	15.53640
2016	14827	8182222	direct	15.08315
2017	14827	8182222	direct	15.42019

The approach shown above is only valid if no grouping variables are supplied (parameter group = NULL). If grouping variables are supplied one should use parameters fun.adjust.var and adjust.var such that the $povertyRisk_w$ is first calculated for each period and then used for each grouping in group.

# using fun.adjust.var and adjust.var to estimate povmd60 indicator
# for each period and bootstrap weight before applying the weightedRatio
povmd2 <- function(x, w) {
 md <- laeken::weightedMedian(x, w)*0.6
 pmd60 <- x < md
 return(as.integer(pmd60))
}

# set adjust.var="eqIncome" so the income vector is used to estimate
# the povmd60 indicator for each bootstrap weight
# and the resulting indicators are passed to function weightedRatio
group <- "gender"
err.est <- calc.stError(
  dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender",
  fun.adjust.var = povmd2, adjust.var = "eqIncome")
err.est$Estimates

year	n	N	gender	estimate_type	val_povertyRisk	stE_povertyRisk
2010	7267	3979572	male	direct	12.02660	0.5016625
2010	7560	4202650	female	direct	16.73351	0.4531936
2010	14827	8182222	NA	direct	14.44422	0.4186797
2011	7267	3979572	male	direct	12.81921	0.6064670
2011	7560	4202650	female	direct	16.62488	0.5998059
2011	14827	8182222	NA	direct	14.77393	0.5638617
2012	7267	3979572	male	direct	13.76065	0.6341716
2012	7560	4202650	female	direct	16.26147	0.6145102
2012	14827	8182222	NA	direct	15.04515	0.5870018
2013	7267	3979572	male	direct	13.88962	0.3725628
2013	7560	4202650	female	direct	15.83754	0.5319610
2013	14827	8182222	NA	direct	14.89013	0.4148279
2014	7267	3979572	male	direct	14.50351	0.5813244
2014	7560	4202650	female	direct	15.75353	0.6411541
2014	14827	8182222	NA	direct	15.14556	0.5667295
2015	7267	3979572	male	direct	15.12289	0.5631947
2015	7560	4202650	female	direct	15.92796	0.5356809
2015	14827	8182222	NA	direct	15.53640	0.5110738
2016	7267	3979572	male	direct	14.57968	0.3264055
2016	7560	4202650	female	direct	15.55989	0.4000595
2016	14827	8182222	NA	direct	15.08315	0.2936738
2017	7267	3979572	male	direct	14.94816	0.3979681
2017	7560	4202650	female	direct	15.86717	0.5509920
2017	14827	8182222	NA	direct	15.42019	0.4315945

Multiple estimators

In case an estimator should be applied to several columns of the dataset, var can be set to a vector containing all necessary columns.

multipleRates <- calc.stError(dat_boot_calib, var = c("povertyRisk", "onePerson"), fun = weightedRatio)
multipleRates$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk	val_onePerson	stE_onePerson
2010	14827	8182222	direct	14.44422	0.4477234	14.85737	0.3801450
2011	14827	8182222	direct	14.77393	0.6018256	14.85737	0.3699768
2012	14827	8182222	direct	15.04515	0.7126190	14.85737	0.4138585
2013	14827	8182222	direct	14.89013	0.6015532	14.85737	0.4599134
2014	14827	8182222	direct	15.14556	0.6637220	14.85737	0.2669035
2015	14827	8182222	direct	15.53640	0.6062735	14.85737	0.3215474
2016	14827	8182222	direct	15.08315	0.3041252	14.85737	0.2946678
2017	14827	8182222	direct	15.42019	0.3551701	14.85737	0.4340717

Here we see the relative number of persons at risk of poverty and the relative number of one-person households.

Grouping

The groups argument can be used to calculate estimators for different subsets of the data. This argument can take the grouping variable as a string that refers to a column name (usually a factor) in dat. If set, all estimators are not only split by the reference period but also by the grouping variable. For simplicity, only one reference period of the above data is used.

dat2 <- subset(dat_boot_calib, year == 2010)
for (att  in c("period", "weights", "b.rep"))
  attr(dat2, att) <- attr(dat_boot_calib, att)

To calculate the ratio of persons at risk of poverty for each federal state of Austria, group = "region" can be used.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, group = "region")
povertyRates$Estimates

year	n	N	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	549	260564	Burgenland	direct	19.53984	2.1973197
2010	733	377355	Vorarlberg	direct	16.53731	2.6155590
2010	924	535451	Salzburg	direct	13.78734	0.8944779
2010	1078	563648	Carinthia	direct	13.08627	1.4766568
2010	1317	701899	Tyrol	direct	15.30819	1.8190266
2010	2295	1167045	Styria	direct	14.37464	0.6647327
2010	2322	1598931	Vienna	direct	17.23468	1.7285320
2010	2804	1555709	Lower Austria	direct	13.84362	1.1352401
2010	2805	1421620	Upper Austria	direct	10.88977	0.6174542
2010	14827	8182222	NA	direct	14.44422	0.4477234

The last row with region = NA denotes the aggregate over all regions. Note that the columns N and n now show the weighted and unweighted number of persons in each region.

Several grouping variables

In case more than one grouping variable is used, there are several options of calling calc.stError() depending on whether combinations of grouping levels should be regarded or not. We will consider the variables gender and region as our grouping variables and show three options on how calc.stError() can be called.

Option 1: All regions and all genders

Calculate the point estimate and standard error for each region and each gender. The number of rows in the output is therefore

\[n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12.\]

The last row is again the estimate for the whole period.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = c("gender", "region"))
povertyRates$Estimates

year	n	N	gender	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	549	260564	NA	Burgenland	direct	19.53984	2.1973197
2010	733	377355	NA	Vorarlberg	direct	16.53731	2.6155590
2010	924	535451	NA	Salzburg	direct	13.78734	0.8944779
2010	1078	563648	NA	Carinthia	direct	13.08627	1.4766568
2010	1317	701899	NA	Tyrol	direct	15.30819	1.8190266
2010	2295	1167045	NA	Styria	direct	14.37464	0.6647327
2010	2322	1598931	NA	Vienna	direct	17.23468	1.7285320
2010	2804	1555709	NA	Lower Austria	direct	13.84362	1.1352401
2010	2805	1421620	NA	Upper Austria	direct	10.88977	0.6174542
2010	7267	3979572	male	NA	direct	12.02660	0.5083609
2010	7560	4202650	female	NA	direct	16.73351	0.4998858
2010	14827	8182222	NA	NA	direct	14.44422	0.4477234

Option 2: All combinations of `region` and `gender`

Split the data by all combinations of the two grouping variables. This will result in a larger output-table of the size

\[n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + 1) = 1\cdot(9\cdot2 + 1)= 19.\]

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list(c("gender", "region")))
povertyRates$Estimates

year	n	N	gender	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	261	122741.8	male	Burgenland	direct	17.414524	2.7650644
2010	288	137822.2	female	Burgenland	direct	21.432598	2.1955850
2010	359	182732.9	male	Vorarlberg	direct	12.973259	2.6527745
2010	374	194622.1	female	Vorarlberg	direct	19.883637	2.8905261
2010	440	253143.7	male	Salzburg	direct	9.156964	1.4356454
2010	484	282307.3	female	Salzburg	direct	17.939382	0.8890716
2010	517	268581.4	male	Carinthia	direct	10.552148	1.4619086
2010	561	295066.6	female	Carinthia	direct	15.392924	1.9737504
2010	650	339566.5	male	Tyrol	direct	12.857542	1.9520177
2010	667	362332.5	female	Tyrol	direct	17.604861	2.1657259
2010	1128	571011.7	male	Styria	direct	11.671247	0.6329168
2010	1132	774405.4	male	Vienna	direct	15.590616	2.0163890
2010	1167	596033.3	female	Styria	direct	16.964539	0.9470867
2010	1190	824525.6	female	Vienna	direct	18.778813	1.6516759
2010	1363	684272.5	male	Upper Austria	direct	9.074690	0.6585135
2010	1387	772593.2	female	Lower Austria	direct	16.372949	1.3593056
2010	1417	783115.8	male	Lower Austria	direct	11.348283	1.0510725
2010	1442	737347.5	female	Upper Austria	direct	12.574205	0.7867555
2010	14827	8182222.0	NA	NA	direct	14.444218	0.4477234

Option 3: Cobination of Option 1 and Option 2

In this case, the estimates and standard errors are calculated for

every gender,
every region and
every combination of region and gender.

The number of rows in the output is therefore

\[n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9\cdot2 + 9 + 2 + 1) = 30.\]

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list("gender", "region", c("gender", "region")))
povertyRates$Estimates

year	n	N	gender	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	261	122741.8	male	Burgenland	direct	17.414524	2.7650644
2010	288	137822.2	female	Burgenland	direct	21.432598	2.1955850
2010	359	182732.9	male	Vorarlberg	direct	12.973259	2.6527745
2010	374	194622.1	female	Vorarlberg	direct	19.883637	2.8905261
2010	440	253143.7	male	Salzburg	direct	9.156964	1.4356454
2010	484	282307.3	female	Salzburg	direct	17.939382	0.8890716
2010	517	268581.4	male	Carinthia	direct	10.552148	1.4619086
2010	549	260564.0	NA	Burgenland	direct	19.539836	2.1973197
2010	561	295066.6	female	Carinthia	direct	15.392924	1.9737504
2010	650	339566.5	male	Tyrol	direct	12.857542	1.9520177
2010	667	362332.5	female	Tyrol	direct	17.604861	2.1657259
2010	733	377355.0	NA	Vorarlberg	direct	16.537310	2.6155590
2010	924	535451.0	NA	Salzburg	direct	13.787343	0.8944779
2010	1078	563648.0	NA	Carinthia	direct	13.086268	1.4766568
2010	1128	571011.7	male	Styria	direct	11.671247	0.6329168
2010	1132	774405.4	male	Vienna	direct	15.590616	2.0163890
2010	1167	596033.3	female	Styria	direct	16.964539	0.9470867
2010	1190	824525.6	female	Vienna	direct	18.778813	1.6516759
2010	1317	701899.0	NA	Tyrol	direct	15.308191	1.8190266
2010	1363	684272.5	male	Upper Austria	direct	9.074690	0.6585135
2010	1387	772593.2	female	Lower Austria	direct	16.372949	1.3593056
2010	1417	783115.8	male	Lower Austria	direct	11.348283	1.0510725
2010	1442	737347.5	female	Upper Austria	direct	12.574205	0.7867555
2010	2295	1167045.0	NA	Styria	direct	14.374637	0.6647327
2010	2322	1598931.0	NA	Vienna	direct	17.234683	1.7285320
2010	2804	1555709.0	NA	Lower Austria	direct	13.843623	1.1352401
2010	2805	1421620.0	NA	Upper Austria	direct	10.889773	0.6174542
2010	7267	3979571.7	male	NA	direct	12.026600	0.5083609
2010	7560	4202650.3	female	NA	direct	16.733508	0.4998858
2010	14827	8182222.0	NA	NA	direct	14.444218	0.4477234

Group differences

If differences between groups need to be calculated, e.g difference of poverty rates between gender = "male" and gender = "female", parameter group.diff can be utilised. Setting group.diff = TRUE the differences and the standard error of these differences for all variables defined in groups will be calculated.

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = c("gender", "region"),
                             group.diff = TRUE)
povertyRates$Estimates

year	n	N	gender	region	estimate_type	val_povertyRisk	stE_povertyRisk
2010	549.0	260564.0	NA	Burgenland	direct	19.5398365	2.1973197
2010	641.0	318959.5	NA	Burgenland - Vorarlberg	group difference	3.0025263	3.3108505
2010	733.0	377355.0	NA	Vorarlberg	direct	16.5373102	2.6155590
2010	736.5	398007.5	NA	Burgenland - Salzburg	group difference	5.7524933	2.3747847
2010	813.5	412106.0	NA	Burgenland - Carinthia	group difference	6.4535688	3.0226508
2010	828.5	456403.0	NA	Salzburg - Vorarlberg	group difference	-2.7499670	2.9592565
2010	905.5	470501.5	NA	Carinthia - Vorarlberg	group difference	-3.4510424	3.3047566
2010	924.0	535451.0	NA	Salzburg	direct	13.7873432	0.8944779
2010	933.0	481231.5	NA	Burgenland - Tyrol	group difference	4.2316460	3.2447808
2010	1001.0	549549.5	NA	Carinthia - Salzburg	group difference	-0.7010755	1.6962282
2010	1025.0	539627.0	NA	Tyrol - Vorarlberg	group difference	-1.2291197	3.5103927
2010	1078.0	563648.0	NA	Carinthia	direct	13.0862677	1.4766568
2010	1120.5	618675.0	NA	Salzburg - Tyrol	group difference	-1.5208473	1.9384020
2010	1197.5	632773.5	NA	Carinthia - Tyrol	group difference	-2.2219227	1.6202454
2010	1317.0	701899.0	NA	Tyrol	direct	15.3081905	1.8190266
2010	1422.0	713804.5	NA	Burgenland - Styria	group difference	5.1651992	2.1851077
2010	1435.5	929747.5	NA	Burgenland - Vienna	group difference	2.3051533	1.7535085
2010	1514.0	772200.0	NA	Styria - Vorarlberg	group difference	-2.1626729	2.5408692
2010	1527.5	988143.0	NA	Vienna - Vorarlberg	group difference	0.6973730	3.1756274
2010	1609.5	851248.0	NA	Salzburg - Styria	group difference	-0.5872941	1.1543400
2010	1623.0	1067191.0	NA	Salzburg - Vienna	group difference	-3.4473400	1.7183124
2010	1676.5	908136.5	NA	Burgenland - Lower Austria	group difference	5.6962137	2.8438368
2010	1677.0	841092.0	NA	Burgenland - Upper Austria	group difference	8.6500631	2.1810933
2010	1686.5	865346.5	NA	Carinthia - Styria	group difference	-1.2883695	1.9994172
2010	1700.0	1081289.5	NA	Carinthia - Vienna	group difference	-4.1484155	2.4735025
2010	1768.5	966532.0	NA	Lower Austria - Vorarlberg	group difference	-2.6936874	2.8466066
2010	1769.0	899487.5	NA	Upper Austria - Vorarlberg	group difference	-5.6475368	2.4504376
2010	1806.0	934472.0	NA	Styria - Tyrol	group difference	-0.9335532	2.2275962
2010	1819.5	1150415.0	NA	Tyrol - Vienna	group difference	-1.9264927	3.0897186
2010	1864.0	1045580.0	NA	Lower Austria - Salzburg	group difference	0.0562796	1.4975739
2010	1864.5	978535.5	NA	Salzburg - Upper Austria	group difference	2.8975698	0.9283224
2010	1941.0	1059678.5	NA	Carinthia - Lower Austria	group difference	-0.7573551	1.8571235
2010	1941.5	992634.0	NA	Carinthia - Upper Austria	group difference	2.1964944	1.8753049
2010	2060.5	1128804.0	NA	Lower Austria - Tyrol	group difference	-1.4645677	1.9502902
2010	2061.0	1061759.5	NA	Tyrol - Upper Austria	group difference	4.4184171	2.0075511
2010	2295.0	1167045.0	NA	Styria	direct	14.3746373	0.6647327
2010	2308.5	1382988.0	NA	Styria - Vienna	group difference	-2.8600459	1.5875930
2010	2322.0	1598931.0	NA	Vienna	direct	17.2346832	1.7285320
2010	2549.5	1361377.0	NA	Lower Austria - Styria	group difference	-0.5310145	0.9844988
2010	2550.0	1294332.5	NA	Styria - Upper Austria	group difference	3.4848639	0.8910735
2010	2563.0	1577320.0	NA	Lower Austria - Vienna	group difference	-3.3910604	1.9928045
2010	2563.5	1510275.5	NA	Upper Austria - Vienna	group difference	-6.3449098	1.9642454
2010	2804.0	1555709.0	NA	Lower Austria	direct	13.8436228	1.1352401
2010	2804.5	1488664.5	NA	Lower Austria - Upper Austria	group difference	2.9538494	1.5300538
2010	2805.0	1421620.0	NA	Upper Austria	direct	10.8897734	0.6174542
2010	7267.0	3979571.7	male	NA	direct	12.0266000	0.5083609
2010	7413.5	4091111.0	male - female	NA	group difference	-4.7069081	0.4856047
2010	7560.0	4202650.3	female	NA	direct	16.7335081	0.4998858
2010	14827.0	8182222.0	NA	NA	direct	14.4442182	0.4477234

The resulting output table contains 49 rows. 12 rows for all the direct estimators

\[n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12,\]

and another 37 for all the differences within the variable "gender" and "region" seperately. Variable "gender" has 2 unique values (unique(dat2$gender)) resulting in 1 difference, ~ gender = "male" - gender = "female" and variable "region" has 9 unique values (unique(dat2$region)) resulting in

\[8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = \sum\limits_{1=1}^{9-1}i = 36\]

estimates. Thus the output contains 1 + 36 = 37 estimates with respect to group differences.

If a combintaion of grouping variables is used in group and group.diff = TRUE then differences between combinations will only be calculated if one of the grouping variables differs. For example the difference between the following groups would be calculated

gender = "female" & region = "Vienna" - gender = "male" & region = "Vienna"
gender = "female" & region = "Vienna" - gender = "female" & region = "Salzburg"
gender = "male" & region = "Salzburg" - gender = "female" & region = "Salzburg"

The difference between gender = "female" & region = "Vienna" and gender = "male" & region = "Salzburg" however would not be calculated.

Thus this leads to

\[2\cdot(\sum\limits_{1=1}^{9-1}i) + 9\cdot1 = 81\]

results with respect to the differences. The Output contains an additional column estimate_type and

povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, 
                             group = list(c("gender", "region")),
                             group.diff = TRUE)
povertyRates$Estimates[,.N,by=.(estimate_type)]

estimate_type	N
direct	19
group difference	81

Differences between survey periods

Differences of estimates between periods can be calculated using parameter period.diff. period.diff expects a character vector (if not NULL) specifying for which periods the differences should be calcualed for. The inputs should be specified in the form "period2" - "period1".

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.diff = c("2017 - 2016", "2016 - 2015", "2015 - 2014"))
povertyRates$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk
2014	14827	8182222	direct	15.1455601	0.6637220
2015	14827	8182222	direct	15.5364014	0.6062735
2015-2014	14827	8182222	period difference	0.3908413	0.2726133
2016	14827	8182222	direct	15.0831502	0.3041252
2016-2015	14827	8182222	period difference	-0.4532512	0.4251935
2017	14827	8182222	direct	15.4201916	0.3551701
2017-2016	14827	8182222	period difference	0.3370414	0.2957580

If additional grouping variables are supplied to calc.stError() die differences across periods are also carried out for all variables in group.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             group = "gender",
                             period.diff = c("2017 - 2016", "2016 - 2015", "2015 - 2014"))
povertyRates$Estimates

year	n	N	gender	estimate_type	val_povertyRisk	stE_povertyRisk
2014	7267	3979572	male	direct	14.5035068	0.7103697
2014	7560	4202650	female	direct	15.7535328	0.7021147
2014	14827	8182222	NA	direct	15.1455601	0.6637220
2015	7267	3979572	male	direct	15.1228904	0.6482215
2015	7560	4202650	female	direct	15.9279630	0.6268635
2015	14827	8182222	NA	direct	15.5364014	0.6062735
2015-2014	7267	3979572	male	period difference	0.6193836	0.3223067
2015-2014	7560	4202650	female	period difference	0.1744301	0.3053676
2015-2014	14827	8182222	NA	period difference	0.3908413	0.2726133
2016	7267	3979572	male	direct	14.5796824	0.3502334
2016	7560	4202650	female	direct	15.5598937	0.3938865
2016	14827	8182222	NA	direct	15.0831502	0.3041252
2016-2015	7267	3979572	male	period difference	-0.5432080	0.5696488
2016-2015	7560	4202650	female	period difference	-0.3680693	0.3914186
2016-2015	14827	8182222	NA	period difference	-0.4532512	0.4251935
2017	7267	3979572	male	direct	14.9481591	0.4070599
2017	7560	4202650	female	direct	15.8671684	0.3982385
2017	14827	8182222	NA	direct	15.4201916	0.3551701
2017-2016	7267	3979572	male	period difference	0.3684767	0.4401227
2017-2016	7560	4202650	female	period difference	0.3072748	0.2951695
2017-2016	14827	8182222	NA	period difference	0.3370414	0.2957580

Averages across periods

With parameter period.mean averages across periods are calculated additional. The parameter accepts only odd integer values. The resulting table will contain the direct estimates as well as rolling averages of length period.mean.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.mean = 3)
povertyRates$Estimates

year	n	N	estimate_type	val_povertyRisk	stE_povertyRisk
2014	14827	8182222	direct	15.14556	0.6637220
2014_2015_2016	14827	8182222	period average	15.25504	0.4883283
2015	14827	8182222	direct	15.53640	0.6062735
2015_2016_2017	14827	8182222	period average	15.34658	0.3809402
2016	14827	8182222	direct	15.08315	0.3041252
2017	14827	8182222	direct	15.42019	0.3551701

if in addition the parameters group and/or period.diff are specified then differences and groupings of averages will be calculated.

povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio, 
                             period.mean = 3, period.diff = "2016 - 2015",
                             group = "gender")
povertyRates$Estimates

year	n	N	gender	estimate_type	val_povertyRisk	stE_povertyRisk
2014	7267	3979572	male	direct	14.5035068	0.7103697
2014	7560	4202650	female	direct	15.7535328	0.7021147
2014	14827	8182222	NA	direct	15.1455601	0.6637220
2014_2015_2016	7267	3979572	male	period average	14.7353599	0.4975331
2014_2015_2016	7560	4202650	female	period average	15.7471298	0.5426014
2014_2015_2016	14827	8182222	NA	period average	15.2550372	0.4883283
2015	7267	3979572	male	direct	15.1228904	0.6482215
2015	7560	4202650	female	direct	15.9279630	0.6268635
2015	14827	8182222	NA	direct	15.5364014	0.6062735
2015_2016_2017	7267	3979572	male	period average	14.8835773	0.3839240
2015_2016_2017	7560	4202650	female	period average	15.7850084	0.4310673
2015_2016_2017	14827	8182222	NA	period average	15.3465811	0.3809402
2016	7267	3979572	male	direct	14.5796824	0.3502334
2016	7560	4202650	female	direct	15.5598937	0.3938865
2016	14827	8182222	NA	direct	15.0831502	0.3041252
2016-2015	7267	3979572	male	period difference	-0.5432080	0.5696488
2016-2015	7560	4202650	female	period difference	-0.3680693	0.3914186
2016-2015	14827	8182222	NA	period difference	-0.4532512	0.4251935
2016-2015_mean	7267	3979572	male	difference between period averages	0.1482174	0.2245109
2016-2015_mean	7560	4202650	female	difference between period averages	0.0378785	0.1898100
2016-2015_mean	14827	8182222	NA	difference between period averages	0.0915438	0.1836049
2017	7267	3979572	male	direct	14.9481591	0.4070599
2017	7560	4202650	female	direct	15.8671684	0.3982385
2017	14827	8182222	NA	direct	15.4201916	0.3551701