NPDHS data pooling pre-checks

Getting started

Here we show the pre-requisite code sections. Run these at the outset to avoid errors. First we load the required packages.

easypackages::libraries(
  # Data i/o
  "here",                 # relative file path
  "rio",                  # file import-export
  
  # Data manipulation
  "janitor",              # data cleaning fns
  "haven",                # stata, sas, spss data io
  "labelled",             # var labelling
  "readxl",               # excel sheets
  # "scales",               # to change formats and units
  "skimr",                # quick data summary
  "broom",                # view model results
  
  # Data analysis
  "DHS.rates",            # demographic rates for dhs-like surveys
  "GeneralOaxaca",        # BO decomposition for non-linear
  "survey",               # apply survey weights
  
  # Analysis output
  "gt",
  # "modelsummary",          # output summary tables
  "gtsummary",            # output summary tables
  "flextable",            # creating tables from objects
  "officer",              # editing in office docs
  
  # R graph related packages
  "ggstats",
  "RColorBrewer",
  # "scales",
  "patchwork",
  
  # Misc packages
  "tidyverse",            # Data manipulation iron man
  "tictoc"                # Code timing
)

Next we turn off scientific notations.

options(scipen = 999)

Next we set the default gtsummary print engine for tables.

theme_gtsummary_printer(print_engine = "flextable")

Now we set the flextable output defaults.

set_flextable_defaults(
  font.size = 11,
  text.align = "left",
  big.mark = "",
  background.color = "white",
  table.layout = "autofit",
  theme_fun = theme_vanilla
)

Document introduction

Here we document the variable codes and labels of variables across all the Nepal Demographic and Health Survey (DHS) datasets. We check the variable labels and codes before running the pooling code in “daprep-v01_npdhs.R”. We pool the following Nepal DHS surveys:

# Creating the table of surveys to be used for pooling
npbr1_tmp_intro |> 
  mutate(n_births = prettyNum(n_births, big.mark = ",")) |> 
  select(c(ctr_name, svy_year, n_births)) |> 
  # Join vars from npir_tmp_intro
  left_join(
    npir1_tmp_intro |> 
      mutate(n_women = prettyNum(n_women, big.mark = ",")) |> 
      select(c(year, n_women)),
    by = join_by(svy_year == year)
  ) |> 
  # Join vars from nphr_tmp_intro
  left_join(
    nphr1_tmp_intro |> 
      mutate(n_households = prettyNum(n_households, big.mark = ",")) |> 
      select(svy_year, n_households),
    by = join_by(svy_year)
  ) |> 
  # Join vars from nppr_tmp_intro
  left_join(
    nppr1_tmp_intro |> 
      mutate(n_persons = prettyNum(n_persons, big.mark = ",")) |> 
      select(svy_year, n_persons),
    by = join_by(svy_year)
  ) |> 
  # convert nested tibble to simple tibble
  unnest(cols = c()) |> 
  mutate(
    ccode = row_number(), 
    .before = ctr_name
  ) |> 
  # convert to flextable object
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 1: Nepal DHS datasets and their sample size to be used for pooling

ccode	ctr_name	svy_year	n_births	n_women	n_households	n_persons
1	Nepal	1996	29,156	8,429	8,082	46,576
2	Nepal	2001	28,955	8,726	8,602	47,523
3	Nepal	2006	26,394	10,793	8,707	44,057
4	Nepal	2011	26,615	12,674	10,826	49,791
5	Nepal	2016	26,028	12,862	11,040	49,064
6	Nepal	2022	27,613	14,845	13,786	57,278

We use the following variables for the pooled data analysis:

Dependent variable
- infantd = Index child died during infancy period (0-11 months)
Main Independent variable
- sibsurv_nmv = Survival status of preceding child (Death scarring)
- binterval_3c_nmv_opp = Birth interval preceding to index child
Independent variables [CHILD LEVEL]
- cyob10y_opp = Birth cohort of index child
- bord_c = Birth order of index child
- sex_fm = Gender of index child
- season = Season during birth
Independent variables [MOTHER/PARENT LEVEL]
- ~~myob_opp = Birth cohort of mother~~
- macb_c_opp = Mother’s age during birth of index child
- medu_opp = Mother’s Level of education
- fedu_opp = Father’s level of education
Independent variables [HOUSEHOLD LEVEL]
- religion = Religion
- nat_lang = Native language of respondent
- wi_qt_opp = Household wealth quintile
- ~~hhgen_2c_opp = Generations in household~~
- hhstruc_opp = Household structure
- head_sex_fm = Sex of HH head
Independent variables [COMMUNITY LEVEL]
- por = Place of residence of the household
- ecoreg = Ecological region

Note: (a) Crossed names indicates variable not included.

Data import

We will directly import the nested tibble here. The code for dataset preparation is in the “daprep-v01_npdhs.R” script file.

# Here we import the tibbles to be used for dataset checking
# Import the npbr nested tibble
npbr1_pre_tmp0 <- read_rds(file = here("website_data", "npbr1_nest0.rds"))
# Import the nphr nested tibble
nphr1_pre_tmp0 <- read_rds(file = here("website_data", "nphr1_nest0.rds"))
# Import the nppr nested tibble
nppr1_pre_tmp0 <- read_rds(file = here("website_data", "nppr1_nest0.rds"))

Nepal BR dataset use for variable creation

Checking the Women’s weight variable before harmonization

We will check the formatting of the v005 women’s weight variable before creating the pooled survey weight. For this we will use the labelled::look_for().

# First we create the data dictionary of v005 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v005 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v005) |> 
        look_for(details = "full") |> 
        # For correctly viewing the range column in data dictionary
        convert_list_columns_to_character() |> 
        select(-c(levels:n_na))
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary 
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v005)) |> 
  select(-pos) 
# Convert and view the tibble as flextable
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 2: Data dictionary of v005 variable across the npbr rounds

ctr_name	svy_year	variable	label	col_type	unique_values	range
Nepal	1996	v005	sample weight	dbl	21	412612 - 1538711
Nepal	2001	v005	sample weight	dbl	23	345841 - 1667756
Nepal	2006	v005	sample weight	dbl	260	63525 - 5297300
Nepal	2011	v005	women's individual sample weight (6 decimals)	dbl	25	103855 - 2512923
Nepal	2016	v005	women's individual sample weight (6 decimals)	dbl	381	125730 - 6581418
Nepal	2022	v005	women's individual sample weight (6 decimals)	dbl	473	168774 - 3703774

The women’s weight variables are in numeric class and have no missing values. Therefore, we need not reformat them. Hence we directly use it for preparing the pooled survey weight. NOTE that, the women’s weight for the Nepal 1996, 2001 and 2011 rounds have few unique values. This could be because there might have been fewer sampling units in the secondary stage.

Checking the ID variables before harmonization

Here we check the formatting of the variables using which we will prepare the ID variables for the pooled Nepal birth history recode (br) dataset. We will use the following constituent variables for creating the ID variables for the pooled dataset:

# We check the var type of ID vars in all npbr datasets.
# First we create a data dictionary of the npbr datasets in nested tibble.
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |>
  mutate(lookfor_idvars = map(
    npbr_data,
    \(df) {
      df |> 
        select(v001, v002, v003, bord, v021, v022, v023, v024) |> 
        lookfor(details = "full") |> 
        select(-c(levels:n_na)) |> 
        # For correctly viewing the range column in data dictionary
        convert_list_columns_to_character()
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and output the pooled data dictionary 
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_idvars)) |> 
  arrange(pos)

# Convert and view the tibble as flextable
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 3: Data dictionary of variables to be used for ID creation across the npbr rounds

ctr_name	svy_year	pos	variable	label	col_type	unique_values	range
Nepal	1996	1	v001	cluster number	dbl	253	101 - 7502
Nepal	2001	1	v001	cluster number	dbl	251	101 - 7502
Nepal	2006	1	v001	cluster number	dbl	260	101 - 7502
Nepal	2011	1	v001	cluster number	dbl	289	101 - 7502
Nepal	2016	1	v001	cluster number	dbl	383	1 - 383
Nepal	2022	1	v001	cluster number	dbl	476	1 - 476
Nepal	1996	2	v002	household number	dbl	465	1 - 774
Nepal	2001	2	v002	household number	dbl	548	1 - 9006
Nepal	2006	2	v002	household number	dbl	510	1 - 1319
Nepal	2011	2	v002	household number	dbl	576	1 - 1403
Nepal	2016	2	v002	household number	dbl	382	1 - 963
Nepal	2022	2	v002	household number	dbl	321	1 - 505
Nepal	1996	3	v003	respondent's line number	dbl	26	1 - 27
Nepal	2001	3	v003	respondent's line number	dbl	24	1 - 26
Nepal	2006	3	v003	respondent's line number	dbl	22	1 - 29
Nepal	2011	3	v003	respondent's line number	dbl	21	1 - 26
Nepal	2016	3	v003	respondent's line number	dbl	25	1 - 33
Nepal	2022	3	v003	respondent's line number	dbl	18	1 - 21
Nepal	1996	4	bord	birth order number	dbl	16	1 - 16
Nepal	2001	4	bord	birth order number	dbl	14	1 - 14
Nepal	2006	4	bord	birth order number	dbl	16	1 - 16
Nepal	2011	4	bord	birth order number	dbl	14	1 - 14
Nepal	2016	4	bord	birth order number	dbl	15	1 - 15
Nepal	2022	4	bord	birth order number	dbl	12	1 - 12
Nepal	1996	5	v021	primary sampling unit	dbl	253	101 - 7502
Nepal	2001	5	v021	primary sampling unit	dbl	251	101 - 7502
Nepal	2006	5	v021	primary sampling unit	dbl	260	101 - 7502
Nepal	2011	5	v021	primary sampling unit	dbl	289	101 - 7502
Nepal	2016	5	v021	primary sampling unit	dbl	383	1 - 383
Nepal	2022	5	v021	primary sampling unit	dbl	476	1 - 476
Nepal	1996	6	v022	sample stratum number	dbl	145	51 - 3751
Nepal	2001	6	v022	sample stratum number	dbl	144	51 - 3751
Nepal	2006	6	v022	sample stratum number	dbl	117	1 - 118
Nepal	2011	6	v022	sample strata for sampling errors	dbl+lbl	25	1 - 25
Nepal	2016	6	v022	sample strata for sampling errors	dbl+lbl	14	1 - 14
Nepal	2022	6	v022	sample strata for sampling errors	dbl+lbl	14	1 - 14
Nepal	1996	7	v023	sample domain	dbl+lbl	1	0 - 0
Nepal	2001	7	v023	sample domain	dbl+lbl	13	1 - 13
Nepal	2006	7	v023	sample domain	dbl+lbl	13	1 - 13
Nepal	2011	7	v023	stratification used in sample design	dbl+lbl	13	1 - 13
Nepal	2016	7	v023	stratification used in sample design	dbl+lbl	14	1 - 14
Nepal	2022	7	v023	stratification used in sample design	dbl+lbl	14	1 - 14
Nepal	1996	8	v024	region	dbl+lbl	5	1 - 5
Nepal	2001	8	v024	region	dbl+lbl	5	1 - 5
Nepal	2006	8	v024	region	dbl+lbl	5	1 - 5
Nepal	2011	8	v024	region	dbl+lbl	3	1 - 3
Nepal	2016	8	v024	province	dbl+lbl	7	1 - 7
Nepal	2022	8	v024	province	dbl+lbl	7	1 - 7

From the above we can see that v023 and v024 are of labelled class, while the rest are in numeric class. Therefore, we will check the numeric and labelled variables in different ways. Note that although survey year is a constituent ID variable we have not checked it. It is imperative that survey year would be a 4-digit variable.

Numeric ID variables check

First, let’s find out the required length of the numeric ID variables by checking the maximum values of the constituent ID variable across the Nepal DHS datasets. Here we estimate the summary stats of numeric constituent variables using skim_without_charts().

# Check the summary stats for ID vars using skimr in each npbr dataset.
# First we estimate the summary stats using skim_without_charts().
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(
    skim_id_num = map(
      npbr_data,
      function(df) {
        df |> 
          select(v001, v002, v003, bord, v021, v022) |> 
          skim_without_charts() |> 
          as_tibble() |> 
          select(-c(skim_type, n_missing, complete_rate)) |> 
          rename(
            variable = 1,
            mean = 2,
            sd = 3,
            min = 4,
            p25 = 5,
            p50 = 6,
            p75 = 7,
            max = 8
          )
      }
    )
  )
npbr1_pre_tmp1

Next, we check the summary stats of numeric variables by variable name-wise.

# Now we unnest the nested tibble so that we can compare the variable length 
# across the npbr datasets.
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(skim_id_num)) |> 
  arrange(variable, svy_year) |> 
  # change the decimal places of selected variables
  mutate(
    mean = sprintf("%.1f", mean),
    sd = sprintf("%.1f", sd),
    p75 = sprintf("%.0f", p75)
  )
# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 4: Summary statistics of the numeric ID variables

ctr_name	svy_year	variable	mean	sd	min	p25	p50	p75	max
Nepal	1996	bord	3.2	2.1	1	1	3	4	16
Nepal	2001	bord	3.0	2.0	1	1	3	4	14
Nepal	2006	bord	2.8	1.9	1	1	2	4	16
Nepal	2011	bord	2.5	1.7	1	1	2	3	14
Nepal	2016	bord	2.4	1.5	1	1	2	3	15
Nepal	2022	bord	2.2	1.4	1	1	2	3	12
Nepal	1996	v001	3895.8	2194.2	101	2001	3803	5702	7502
Nepal	2001	v001	3812.3	2332.9	101	1706	3601	5803	7502
Nepal	2006	v001	3914.3	2254.5	101	1804	3902	5803	7502
Nepal	2011	v001	3990.4	2319.5	101	1901	4301	5905	7502
Nepal	2016	v001	199.7	113.1	1	95	208	302	383
Nepal	2022	v001	245.8	140.6	1	116	253	373	476
Nepal	1996	v002	81.0	91.3	1	26	55	98	774
Nepal	2001	v002	214.4	1009.0	1	33	69	128	9006
Nepal	2006	v002	95.4	102.0	1	32	68	126	1319
Nepal	2011	v002	120.8	120.7	1	44	91	164	1403
Nepal	2016	v002	83.6	71.7	1	30	66	122	963
Nepal	2022	v002	79.7	66.0	1	29	63	116	505
Nepal	1996	v003	2.7	2.2	1	2	2	2	27
Nepal	2001	v003	2.5	1.9	1	2	2	2	26
Nepal	2006	v003	2.4	1.8	1	2	2	2	29
Nepal	2011	v003	2.2	1.5	1	2	2	2	26
Nepal	2016	v003	2.2	1.6	1	1	2	2	33
Nepal	2022	v003	2.2	1.4	1	1	2	2	21
Nepal	1996	v021	3895.8	2194.2	101	2001	3803	5702	7502
Nepal	2001	v021	3812.3	2332.9	101	1706	3601	5803	7502
Nepal	2006	v021	3914.3	2254.5	101	1804	3902	5803	7502
Nepal	2011	v021	3990.4	2319.5	101	1901	4301	5905	7502
Nepal	2016	v021	199.7	113.1	1	95	208	302	383
Nepal	2022	v021	245.8	140.6	1	116	253	373	476
Nepal	1996	v022	1948.2	1097.1	51	1001	1902	2851	3751
Nepal	2001	v022	1906.4	1166.5	51	853	1801	2902	3751
Nepal	2006	v022	62.2	34.3	1	31	63	92	118
Nepal	2011	v022	13.6	7.1	1	8	14	19	25
Nepal	2016	v022	7.7	4.1	1	4	8	11	14
Nepal	2022	v022	7.5	4.2	1	4	8	11	14

Now we find out the required length of the numeric ID variables to be set, so that we can correctly concatenate them to create the ID variables. The required length of the numeric ID variables are given in max_digits column. Note that survey year is also a constituent ID variable of 4-digits.

# Processing the above nested tibble further
npbr1_pre_tmp3 <- npbr1_pre_tmp2 |> 
  group_by(variable) |> 
  # find the minimum and maximum values across surveys 
  summarize(
    min_val = min(min),
    max_val = max(max)
  ) |> 
  mutate(
    # calculate the num of digits in the maximum values
    max_digits = nchar(as.character(max_val)),
    # convert char var to factor
    variable = fct(
      variable, 
      levels = c("v001", "v002", "v003", "bord", "v021", "v022")
    )
  ) |> 
  # sort the rows by factor levels 
  arrange(variable) |> 
  # add variable labels and relocate it after variable name.
  bind_cols(vlabel = c("cluster number", "household number", 
                       "respondent's line number", "birth order", 
                       "primary sampling unit", "sample strata for se")) |> 
  relocate(vlabel, .after = 1)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp3 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 5: The maximum length of numeric variables to be set across the npbr rounds for concatenating the ID variables

variable	vlabel	min_val	max_val	max_digits
v001	cluster number	1	7502	4
v002	household number	1	9006	4
v003	respondent's line number	1	33	2
bord	birth order	1	16	2
v021	primary sampling unit	1	7502	4
v022	sample strata for se	1	3751	4

Labelled ID variables check

First we check the labels in sub-national region variable coded as v024 across the npbr datasets. Let’s create a nested tibble of v024’s value labels.

# Create the data dictionary for v024 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v024 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v024) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

Now we view the value labels of v024 in the table below.

# Now we unnest the tibble and refine the pooled data dictionary 
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v024)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |> 
  # Show the variable name in a col
  mutate(var_name = "v024", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 6: Data dictionary of v024 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v024	1	[1] eastern	[1] eastern	[1] eastern	[1] mountain	[1] province 1	[1] koshi
Nepal	v024	2	[2] central	[2] central	[2] central	[2] hill	[2] province 2	[2] madhesh province
Nepal	v024	3	[3] western	[3] western	[3] western	[3] terai	[3] province 3	[3] bagmati province
Nepal	v024	4	[4] midwestern	[4] mid-western	[4] mid-western		[4] province 4	[4] gandaki province
Nepal	v024	5	[5] farwestern	[5] far-western	[5] far-western		[5] province 5	[5] lumbini province
Nepal	v024	6					[6] province 6	[6] karnali province
Nepal	v024	7					[7] province 7	[7] sudurpashchim province

NOTE: The sub-national region var, v024 has different label values in each survey year. It was same for npbr 1996, 2001 and 2006. After that the label values are different for each survey round.
VERD: In this analysis, we do not use the region var in the ID var.

Secondly, we check the labels in v023 variable that denotes the stratifications used for sampling design. First we create a nested tibble of v023’s value labels.

# Create the data dictionary for v023 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v023 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v023) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

Now we view the value labels of v023 in the table below.

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v023)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v023", .before = 2) 

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 7: Data dictionary of v023 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v023	0	[0] national
Nepal	v023	1	[1] country specific	[1] eastern mountain	[1] eastern mountain	[1] eastern mountain	[1] province 1 - urban	[1] koshi - urban
Nepal	v023	2		[2] central mountain	[2] central mountain	[2] central mountain	[2] province 1 - rural	[2] koshi - rural
Nepal	v023	3		[3] western mountain	[3] western mountain	[3] western mountain	[3] province 2 - urban	[3] madhesh province - urban
Nepal	v023	4		[4] eastern hill	[4] eastern hill	[4] eastern hill	[4] province 2 - rural	[4] madhesh province - rural
Nepal	v023	5		[5] central hill	[5] central hill	[5] central hill	[5] province 3 - urban	[5] bagmati province - urban
Nepal	v023	6		[6] western hill	[6] western hill	[6] western hill	[6] province 3 - rural	[6] bagmati province - rural
Nepal	v023	7		[7] mid-western hill	[7] mid-western hill	[7] mid-western hill	[7] province 4 - urban	[7] gandaki province - urban
Nepal	v023	8		[8] far-western hill	[8] far-western hill	[8] far-western hill	[8] province 4 - rural	[8] gandaki province - rural
Nepal	v023	9		[9] eastern terai	[9] eastern terai	[9] eastern terai	[9] province 5 - urban	[9] lumbini province - urban
Nepal	v023	10		[10] central terai	[10] central terai	[10] central terai	[10] province 5 - rural	[10] lumbini province - rural
Nepal	v023	11		[11] western terai	[11] western terai	[11] western terai	[11] province 6 - urban	[11] karnali province - urban
Nepal	v023	12		[12] mid-western terai	[12] mid-western terai	[12] mid-western terai	[12] province 6 - rural	[12] karnali province - rural
Nepal	v023	13		[13] far-western terai	[13] far-western terai	[13] far-western terai	[13] province 7 - urban	[13] sudurpashchim province - urban
Nepal	v023	14					[14] province 7 - rural	[14] sudurpashchim province - rural

NOTE: The labels of v023 are different across the survey rounds.
VERD: Therefore we cannot use v023 in the ID variable preparation.

Altly, we can use the ecological region variable (secoreg) in the ID var. We will check for this in future.

Checking the Birth History variables before harmonization

Undoubtedly the birth history variables are important for this study objective. Therefore, we need to scrutinize all the birth history variables before using them to prepare harmonized variables for the pooled dataset.

# We check the birth history vars in all npbr datasets.
# First we create a data dictionary in nested tibble.
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |>
  mutate(lookfor_bhvars = map(
    npbr_data,
    \(df) {
      df |> 
        select(bidx, matches("^b[0-9]+")) |> 
        lookfor(details = "full") |> 
        select(-c(levels:n_na)) |> 
        # For correctly viewing the range column in data dictionary
        convert_list_columns_to_character()
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_bhvars)) |> 
  arrange(pos)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 8: Data dictionary of birth history variables across the npbr rounds

ctr_name	svy_year	pos	variable	label	col_type	missing	unique_values	range
Nepal	1996	1	bidx	birth column number	dbl	0	16	1 - 16
Nepal	2001	1	bidx	birth column number	dbl	0	14	1 - 14
Nepal	2006	1	bidx	birth column number	dbl	0	16	1 - 16
Nepal	2011	1	bidx	birth column number	dbl	0	14	1 - 14
Nepal	2016	1	bidx	birth column number	dbl	0	15	1 - 15
Nepal	2022	1	bidx	birth column number	dbl	0	12	1 - 12
Nepal	1996	2	b0	child is twin	dbl+lbl	0	4	0 - 3
Nepal	2001	2	b0	child is twin	dbl+lbl	0	4	0 - 3
Nepal	2006	2	b0	child is twin	dbl+lbl	0	4	0 - 3
Nepal	2011	2	b0	child is twin	dbl+lbl	0	3	0 - 2
Nepal	2016	2	b0	child is twin	dbl+lbl	0	4	0 - 3
Nepal	2022	2	b0	child is twin	dbl+lbl	0	4	0 - 3
Nepal	1996	3	b1	month of birth	dbl	0	12	1 - 12
Nepal	2001	3	b1	month of birth	dbl	0	12	1 - 12
Nepal	2006	3	b1	month of birth	dbl	0	12	1 - 12
Nepal	2011	3	b1	month of birth	dbl+lbl	0	12	1 - 12
Nepal	2016	3	b1	month of birth	dbl	0	12	1 - 12
Nepal	2022	3	b1	month of birth	dbl	0	12	1 - 12
Nepal	1996	4	b2	year of birth	dbl	0	38	16 - 53
Nepal	2001	4	b2	year of birth	dbl	0	36	2023 - 2058
Nepal	2006	4	b2	year of birth	dbl	0	38	2026 - 2063
Nepal	2011	4	b2	year of birth	dbl	0	38	2030 - 2068
Nepal	2016	4	b2	year of birth	dbl	0	37	2036 - 2073
Nepal	2022	4	b2	year of birth	dbl	0	38	2042 - 2079
Nepal	1996	5	b3	date of birth (cmc)	dbl	0	424	198 - 638
Nepal	2001	5	b3	date of birth (cmc)	dbl	0	410	1479 - 1898
Nepal	2006	5	b3	date of birth (cmc)	dbl	0	415	1523 - 1960
Nepal	2011	5	b3	date of birth (cmc)	dbl	0	413	1561 - 2018
Nepal	2016	5	b3	date of birth (cmc)	dbl	0	417	1637 - 2085
Nepal	2022	5	b3	date of birth (cmc)	dbl	0	413	1710 - 2150
Nepal	1996	6	b4	sex of child	dbl+lbl	0	2	1 - 2
Nepal	2001	6	b4	sex of child	dbl+lbl	0	2	1 - 2
Nepal	2006	6	b4	sex of child	dbl+lbl	0	2	1 - 2
Nepal	2011	6	b4	sex of child	dbl+lbl	0	2	1 - 2
Nepal	2016	6	b4	sex of child	dbl+lbl	0	2	1 - 2
Nepal	2022	6	b4	sex of child	dbl+lbl	0	2	1 - 2
Nepal	1996	7	b5	child is alive	dbl+lbl	0	2	0 - 1
Nepal	2001	7	b5	child is alive	dbl+lbl	0	2	0 - 1
Nepal	2006	7	b5	child is alive	dbl+lbl	0	2	0 - 1
Nepal	2011	7	b5	child is alive	dbl+lbl	0	2	0 - 1
Nepal	2016	7	b5	child is alive	dbl+lbl	0	2	0 - 1
Nepal	2022	7	b5	child is alive	dbl+lbl	0	2	0 - 1
Nepal	1996	8	b6	age at death	dbl+lbl	23610	85	100 - 330
Nepal	2001	8	b6	age at death	dbl+lbl	24407	81	100 - 326
Nepal	2006	8	b6	age at death	dbl+lbl	23028	108	100 - 324
Nepal	2011	8	b6	age at death	dbl+lbl	23920	86	100 - 330
Nepal	2016	8	b6	age at death	dbl+lbl	23906	78	100 - 328
Nepal	2022	8	b6	age at death	dbl+lbl	25749	80	100 - 328
Nepal	1996	9	b7	age at death (months-imputed)	dbl	23608	61	0 - 360
Nepal	2001	9	b7	age at death (months-imputed)	dbl	24407	54	0 - 312
Nepal	2006	9	b7	age at death (months-imputed)	dbl	23028	80	0 - 288
Nepal	2011	9	b7	age at death (months, imputed)	dbl	23920	54	0 - 360
Nepal	2016	9	b7	age at death (months, imputed)	dbl	23906	48	0 - 336
Nepal	2022	9	b7	age at death (months, imputed)	dbl	25749	50	0 - 336
Nepal	1996	10	b8	current age of child	dbl	5548	38	0 - 36
Nepal	2001	10	b8	current age of child	dbl	4548	36	0 - 34
Nepal	2006	10	b8	current age of child	dbl	3366	37	0 - 35
Nepal	2011	10	b8	current age of child	dbl	2695	36	0 - 34
Nepal	2016	10	b8	current age of child	dbl	2122	38	0 - 36
Nepal	2022	10	b8	current age of child	dbl	1864	37	0 - 35
Nepal	1996	11	b9	who child lives with	dbl+lbl	5548	3	0 - 4
Nepal	2001	11	b9	child lives with whom	dbl+lbl	4548	3	0 - 4
Nepal	2006	11	b9	child lives with whom	dbl+lbl	3366	3	0 - 4
Nepal	2011	11	b9	child lives with whom	dbl+lbl	2695	3	0 - 4
Nepal	2016	11	b9	child lives with whom	dbl+lbl	2122	3	0 - 4
Nepal	2022	11	b9	child lives with whom	dbl+lbl	1864	3	0 - 4
Nepal	1996	12	b10	completeness of information	dbl+lbl	0	4	1 - 5
Nepal	2001	12	b10	completeness of information	dbl+lbl	0	5	1 - 8
Nepal	2006	12	b10	completeness of information	dbl+lbl	0	6	1 - 8
Nepal	2011	12	b10	completeness of information	dbl+lbl	0	4	1 - 8
Nepal	2016	12	b10	completeness of information	dbl+lbl	0	2	0 - 3
Nepal	2022	12	b10	completeness of information	dbl+lbl	0	5	0 - 6
Nepal	1996	13	b11	preceding birth interval	dbl	7515	152	6 - 197
Nepal	2001	13	b11	preceding birth interval	dbl	7805	154	9 - 235
Nepal	2006	13	b11	preceding birth interval	dbl	7825	151	9 - 319
Nepal	2011	13	b11	preceding birth interval (months)	dbl	8849	163	9 - 293
Nepal	2016	13	b11	preceding birth interval (months)	dbl	9269	170	6 - 221
Nepal	2022	13	b11	preceding birth interval (months)	dbl	10784	185	6 - 249
Nepal	1996	14	b12	succeeding birth interval	dbl	7534	152	6 - 197
Nepal	2001	14	b12	succeeding birth interval	dbl	7835	154	9 - 235
Nepal	2006	14	b12	succeeding birth interval	dbl	7868	151	9 - 319
Nepal	2011	14	b12	succeeding birth interval (months)	dbl	8876	163	9 - 293
Nepal	2016	14	b12	succeeding birth interval (months)	dbl	9320	170	6 - 221
Nepal	2022	14	b12	succeeding birth interval (months)	dbl	10837	185	6 - 249
Nepal	1996	15	b13	flag for age at death	dbl+lbl	23608	8	0 - 8
Nepal	2001	15	b13	flag for age at death	dbl+lbl	24407	4	0 - 7
Nepal	2006	15	b13	flag for age at death	dbl+lbl	23028	5	0 - 9
Nepal	2011	15	b13	flag for age at death	dbl+lbl	23920	4	0 - 6
Nepal	2016	15	b13	flag for age at death	dbl+lbl	23906	2	0 - 0
Nepal	2022	15	b13	flag for age at death	dbl+lbl	25749	2	0 - 0
Nepal	1996	16	b14	birth interval >= 4 years	dbl+lbl	7037	3	0 - 1
Nepal	2001	16	b15	live birth between births -na	dbl	28955	1
Nepal	2006	16	b15	na-live birth between births	dbl+lbl	26394	1
Nepal	2011	16	b15	live birth between births	dbl+lbl	8168	2	0 - 0
Nepal	2016	16	b15	live birth between births	dbl+lbl	8430	2	0 - 0
Nepal	2022	16	b15	live birth between births	dbl+lbl	0	2	0 - 1
Nepal	1996	17	b15	live birth between births	dbl+lbl	25954	3	0 - 1
Nepal	2001	17	b16	child's line number in household	dbl+lbl	4548	30	0 - 28
Nepal	2006	17	b16	child's line number in household	dbl+lbl	3366	30	0 - 30
Nepal	2011	17	b16	child's line number in household	dbl+lbl	2695	31	0 - 31
Nepal	2016	17	b16	child's line number in household	dbl+lbl	2122	33	0 - 37
Nepal	2022	17	b16	child's line number in household	dbl+lbl	1864	23	0 - 23
Nepal	2001	18	b0_92	child is twin	dbl+lbl	0	4	0 - 3
Nepal	2006	18	b0_x	child is twin	dbl+lbl	0	4	0 - 3
Nepal	2016	18	b17	day of birth	dbl	0	32	1 - 32
Nepal	2022	18	b17	day of birth	dbl	0	32	1 - 32
Nepal	2001	19	b1_92	month of birth/ending of pregnancy	dbl	0	12	1 - 12
Nepal	2006	19	b1_x	month of birth/ending of pregnancy	dbl	0	12	1 - 12
Nepal	2016	19	b18	century day code of birth (cdc)	dbl	0	9472	13291 - 26925
Nepal	2022	19	b18	century day code of birth (cdc)	dbl	0	9720	15525 - 28916
Nepal	2001	20	b2_92	year of birth/end of pregnancy	dbl	0	36	2023 - 2058
Nepal	2006	20	b2_x	year of birth/end of pregnancy	dbl	0	38	2026 - 2063
Nepal	2016	20	b19	current age of child in months (months since birth for dead children)	dbl	0	415	0 - 442
Nepal	2022	20	b19	current age of child in months (months since birth for dead children)	dbl	0	410	0 - 434
Nepal	2001	21	b3_92	date of birth/end of pregnancy (cmc)	dbl	0	410	1479 - 1898
Nepal	2006	21	b3_x	date of birth/end of pregnancy (cmc)	dbl	0	415	1523 - 1960
Nepal	2016	21	b20	duration of pregnancy	dbl	20476	7	6 - 11
Nepal	2022	21	b20	duration of pregnancy in months	dbl	0	6	5 - 10
Nepal	2001	22	b4_92	sex of child	dbl+lbl	0	2	1 - 2
Nepal	2006	22	b4_x	sex of child	dbl+lbl	0	2	1 - 2
Nepal	2022	22	b21	duration of pregnancy	dbl	0	10	131 - 210
Nepal	2001	23	b5_92	child is alive	dbl+lbl	0	2	0 - 1
Nepal	2006	23	b5_x	child is alive	dbl+lbl	0	2	0 - 1
Nepal	2001	24	b6_92	age at death	dbl+lbl	24407	81	100 - 326
Nepal	2006	24	b6_x	age at death	dbl+lbl	23028	108	100 - 324
Nepal	2001	25	b7_92	age at death (months-imputed)	dbl	24407	54	0 - 312
Nepal	2006	25	b7_x	age at death (months-imputed)	dbl	23028	80	0 - 288
Nepal	2001	26	b8_92	current age of child	dbl	4548	36	0 - 34
Nepal	2006	26	b8_x	current age of child	dbl	3366	37	0 - 35
Nepal	2001	27	b9_92	child lives with whom	dbl+lbl	4548	3	0 - 4
Nepal	2006	27	b9_x	child lives with whom	dbl+lbl	3366	3	0 - 4
Nepal	2001	28	b10_92	completeness of information	dbl+lbl	0	5	1 - 8
Nepal	2006	28	b10_x	completeness of information	dbl+lbl	0	6	1 - 8
Nepal	2001	29	b11_92	preceding birth interval	dbl	7315	149	9 - 235
Nepal	2006	29	b11_x	preceding birth interval	dbl	7315	149	9 - 260
Nepal	2001	30	b12_92	succeeding birth interval	dbl	7454	157	2 - 235
Nepal	2006	30	b12_x	succeeding birth interval	dbl	7326	157	3 - 260
Nepal	2001	31	b13_92	flag for age at death	dbl+lbl	24407	4	0 - 7
Nepal	2006	31	b13_x	flag for age at death	dbl+lbl	23028	5	0 - 9
Nepal	2001	32	b16_92	child's line number in household	dbl+lbl	4548	30	0 - 28
Nepal	2006	32	b16_x	child's line number in household	dbl+lbl	3366	30	0 - 30

From the above table we get an overall snapshot of the birth history variables. We see that the variables b1-b13 are common in all the six npbr datasets. Notably npbr 2001 and 2006 have some extra variables that are not available in other rounds. Next, we look at the other labelled variables which are common across npbr in more details. We would like to see if the value labels of the common birth history variables are similar across the npbr datasets.

b0 - child is twin

We check the value labels of b0 variable that denotes whether the child is twin. First we create a nested tibble of b0’s value labels.

# Create the data dictionary for b0 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_b0 = map(
    npbr_data,
    \(df) {
      df |> 
        select(b0) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_b0)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "b0", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 9: Data dictionary of b0 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	b0	0	[0] single birth	[0] single birth	[0] single birth	[0] single birth	[0] single birth	[0] single birth
Nepal	b0	1	[1] 1st of multiple	[1] 1st of multiple	[1] 1st of multiple	[1] 1st of multiple	[1] 1st of multiple	[1] 1st of multiple
Nepal	b0	2	[2] 2nd of multiple	[2] 2nd of multiple	[2] 2nd of multiple	[2] 2nd of multiple	[2] 2nd of multiple	[2] 2nd of multiple
Nepal	b0	3	[3] 3rd of multiple	[3] 3rd of multiple	[3] 3rd of multiple	[3] 3rd of multiple	[3] 3rd of multiple	[3] 3rd of multiple
Nepal	b0	4	[4] 4th of multiple	[4] 4th of multiple	[4] 4th of multiple	[4] 4th of multiple	[4] 4th of multiple	[4] 4th of multiple
Nepal	b0	5	[5] 5th of multiple	[5] 5th of multiple	[5] 5th of multiple	[5] 5th of multiple	[5] 5th of multiple	[5] 5th of multiple

We can see the value labels of b0 in the above table. We see that the value labels are same across all the npbr datasets.

b1 - month of birth

We see that the b1 variable has value labels only for npbr 2011. Therefore, we check the value labels of the variable during this round.

# Create the data dictionary for b1 in npbr 2011
npbr1_pre_tmp0$npbr_data$npbr_2011 |> 
  select(b1) |> 
  look_for(details = "full") |> 
  lookfor_to_long_format() |> 
  convert_list_columns_to_character() |> 
  select(-c(pos, levels, class:n_na)) |> 
  qflextable() |> 
  autofit()

Table 10: Data dictionary of b1 in npbr 2011

variable	label	col_type	value_labels	unique_values	range
b1	month of birth	dbl+lbl	[1] baisakh	12	1 - 12
b1	month of birth	dbl+lbl	[2] jestha	12	1 - 12
b1	month of birth	dbl+lbl	[3] ashad	12	1 - 12
b1	month of birth	dbl+lbl	[4] srawan	12	1 - 12
b1	month of birth	dbl+lbl	[5] bhadra	12	1 - 12
b1	month of birth	dbl+lbl	[6] aswin	12	1 - 12
b1	month of birth	dbl+lbl	[7] kartik	12	1 - 12
b1	month of birth	dbl+lbl	[8] mangsir	12	1 - 12
b1	month of birth	dbl+lbl	[9] poush	12	1 - 12
b1	month of birth	dbl+lbl	[10] magh	12	1 - 12
b1	month of birth	dbl+lbl	[11] falgun	12	1 - 12
b1	month of birth	dbl+lbl	[12] chaitra	12	1 - 12

Note that the birth months correspond to months in hindu calendar. The days of months do not correspond to the english calendar and this creates a problem when we will prepare the season during birth variable, later.

SOL: We can re-create the birth month variable from b3 - child’s dob (in cmc) by dividing b3 by 12 and taking the remainder as birth month.

b4 - sex of child

We check the value labels of b4 variable which gives the sex of the child. First we create a nested tibble of b4’s value labels.

# Create the data dictionary for b4 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_b4 = map(
    npbr_data,
    \(df) {
      df |> 
        select(b4) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_b4)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "b4", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 11: Data dictionary of b4 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	b4	1	[1] male	[1] male	[1] male	[1] male	[1] male	[1] male
Nepal	b4	2	[2] female	[2] female	[2] female	[2] female	[2] female	[2] female

We can see the value labels of b4 in the above table. The value labels are same across all the npbr datasets.

b5 - child is alive

We check the value labels of b5 variable which gives the survival status of the child. First we create a nested tibble of b5’s value labels.

# Create the data dictionary for b5 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_b5 = map(
    npbr_data,
    \(df) {
      df |> 
        select(b5) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_b5)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "b5", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 12: Data dictionary of b5 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	b5	0	[0] no	[0] no	[0] no	[0] no	[0] no	[0] no
Nepal	b5	1	[1] yes	[1] yes	[1] yes	[1] yes	[1] yes	[1] yes

The above table shows that the value labels of survival status of child are same across all the npbr datasets.

b6 - age at death

We check the value labels of b6 variable which shows the age at death of children. Note that this variable has many missing values across all npbr rounds as not all children experienced mortality throughout their lifetime. First we create a nested tibble of b6’s value labels.

# Create the data dictionary for b5 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_b6 = map(
    npbr_data,
    \(df) {
      df |> 
        select(b6) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_b6)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "b6", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 13: Data dictionary of b6 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	b6	100				[100] died on day of birth	[100] died on day of birth	[100] died on day of birth
Nepal	b6	101				[101] days: 1	[101] days: 1	[101] days: 1
Nepal	b6	199				[199] days: number missing	[199] days: number missing	[199] days: number missing
Nepal	b6	201				[201] months: 1	[201] months: 1	[201] months: 1
Nepal	b6	299				[299] months: number missing	[299] months: number missing	[299] months: number missing
Nepal	b6	301				[301] years: 1	[301] years: 1	[301] years: 1
Nepal	b6	399				[399] years: number missing	[399] years: number missing	[399] years: number missing
Nepal	b6	997	[997] inconsistent	[997] inconsistent	[997] inconsistent	[997] inconsistent	[997] inconsistent	[997] inconsistent
Nepal	b6	998	[998] don't know	[998] don't know	[998] don't know	[998] don't know	[998] don't know	[998] don't know

The above table shows that the value labels of age at death of child are in two groups. First, they are same for npbr 1996, 2001 and 2006 and and then for npbr 2011, 2016 and 2022.

b9 - child lives with whom

We check the value labels of b9 variable which gives info on who the child lives with. First we create a nested tibble of b9’s value labels.

# Create the data dictionary for b9 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_b9 = map(
    npbr_data,
    \(df) {
      df |> 
        select(b9) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_b9)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "b9", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 14: Data dictionary of b9 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	b9	0	[0] respondent	[0] respondent	[0] respondent	[0] respondent	[0] respondent	[0] respondent
Nepal	b9	1	[1] father	[1] father	[1] father	[1] father	[1] father	[1] father
Nepal	b9	2	[2] other relative	[2] other relative	[2] other relative	[2] other relative	[2] other relative	[2] other relative
Nepal	b9	3	[3] someone else	[3] someone else	[3] someone else	[3] someone else	[3] someone else	[3] someone else
Nepal	b9	4	[4] lives elsewhere	[4] lives elsewhere	[4] lives elsewhere	[4] lives elsewhere	[4] lives elsewhere	[4] lives elsewhere

We can see in the above table that the value labels of b9 are same across all the npbr datasets.

b10 - completeness of information

We check the value labels of b10 variable which gives the completeness of birth history information. First we create a nested tibble of b10’s value labels.

# Create the data dictionary for b10 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_b10 = map(
    npbr_data,
    \(df) {
      df |> 
        select(b10) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_b10)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "b10", .before = 2) 

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |>
  align(align = "left", part = "all") |> 
  autofit()

Table 15: Data dictionary of b10 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	b10	0					[0] month, year and day	[0] month, year and day
Nepal	b10	1	[1] month and year	[1] month and year	[1] month and year	[1] month and year - information complete	[1] month and year - information complete	[1] month and year - information complete
Nepal	b10	2	[2] month and age -y imp	[2] month and age -y imp	[2] month and age -y imp	[2] month and age - year imputed	[2] month and age - year imputed	[2] month and age - year imputed
Nepal	b10	3	[3] year and age - m imp	[3] year and age - m imp	[3] year and age - m imp	[3] year and age - month imputed	[3] year and age - month imputed	[3] year and age - month imputed
Nepal	b10	4	[4] y & age - y ignored	[4] y & age - y ignored	[4] y & age - y ignored	[4] year and age - year ignored	[4] year and age - year ignored	[4] year and age - year ignored
Nepal	b10	5	[5] year - a, m imp	[5] year - a, m imp	[5] year - a, m imp	[5] year - age/month imputed	[5] year - age/month imputed	[5] year - age/month imputed
Nepal	b10	6	[6] age - y, m imp	[6] age - y, m imp	[6] age - y, m imp	[6] age - year/month imputed	[6] age - year/month imputed	[6] age - year/month imputed
Nepal	b10	7	[7] month - a, y imp	[7] month - a, y imp	[7] month - a, y imp	[7] month - age/year imputed	[7] month - age/year imputed	[7] month - age/year imputed
Nepal	b10	8	[8] none - all imp	[8] none - all imp	[8] none - all imp	[8] none - all imputed	[8] none - all imputed	[8] none - all imputed

We can see in the above table that the value labels of b10 are same across npbr 1996, 2001, 2006 and 2011 datasets. Then they are same for npbr 2016 and 2022.

Checking the Common independent variables before harmonization

Next we start documenting the common independent variables. First we will check the data dictionary of the common independent variables. Then we will check them variable wise.

# We check the common independent vars in all npbr datasets.
# First we create the data dictionary in nested tibble.
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |>
  mutate(lookfor_comindvars = map(
    npbr_data,
    \(df) {
      df |> 
        # select the common independent variables
        select(v106, v011, v501, v701, v025, v151, v152) |> 
        lookfor(details = "full") |> 
        select(-c(levels:n_na)) |> 
        # For correctly viewing the range column in data dictionary
        convert_list_columns_to_character()
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary 
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_comindvars)) |> 
  arrange(pos)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 16: Data dictionary of common independent variables across the npbr rounds

ctr_name	svy_year	pos	variable	label	col_type	missing	unique_values	range
Nepal	1996	1	v106	highest educational level	dbl+lbl	0	4	0 - 3
Nepal	2001	1	v106	highest educational level	dbl+lbl	0	4	0 - 3
Nepal	2006	1	v106	highest educational level	dbl+lbl	0	4	0 - 3
Nepal	2011	1	v106	highest educational level	dbl+lbl	0	5	0 - 8
Nepal	2016	1	v106	highest educational level	dbl+lbl	0	4	0 - 3
Nepal	2022	1	v106	highest educational level	dbl+lbl	0	4	0 - 3
Nepal	1996	2	v011	date of birth (cmc)	dbl	0	414	35 - 456
Nepal	2001	2	v011	date of birth (cmc)	dbl	0	409	1296 - 1708
Nepal	2006	2	v011	date of birth (cmc)	dbl	0	409	1356 - 1771
Nepal	2011	2	v011	date of birth (cmc)	dbl	0	407	1415 - 1830
Nepal	2016	2	v011	date of birth (cmc)	dbl	0	407	1480 - 1903
Nepal	2022	2	v011	date of birth (cmc)	dbl	0	410	1546 - 1962
Nepal	1996	3	v501	current marital status	dbl+lbl	0	4	1 - 5
Nepal	2001	3	v501	current marital status	dbl+lbl	0	4	1 - 5
Nepal	2006	3	v501	current marital status	dbl+lbl	0	4	1 - 5
Nepal	2011	3	v501	current marital status	dbl+lbl	0	5	0 - 5
Nepal	2016	3	v501	current marital status	dbl+lbl	0	6	0 - 5
Nepal	2022	3	v501	current marital status	dbl+lbl	0	6	0 - 5
Nepal	1996	4	v701	partner's education level	dbl+lbl	27	5	0 - 3
Nepal	2001	4	v701	partner's education level	dbl+lbl	0	5	0 - 8
Nepal	2006	4	v701	partner's education level	dbl+lbl	0	5	0 - 8
Nepal	2011	4	v701	husband/partner's education level	dbl+lbl	3	6	0 - 8
Nepal	2016	4	v701	husband/partner's education level	dbl+lbl	942	6	0 - 8
Nepal	2022	4	v701	husband/partner's education level	dbl+lbl	1198	6	0 - 8
Nepal	1996	5	v025	type of place of residence	dbl+lbl	0	2	1 - 2
Nepal	2001	5	v025	type of place of residence	dbl+lbl	0	2	1 - 2
Nepal	2006	5	v025	type of place of residence	dbl+lbl	0	2	1 - 2
Nepal	2011	5	v025	type of place of residence	dbl+lbl	0	2	1 - 2
Nepal	2016	5	v025	type of place of residence	dbl+lbl	0	2	1 - 2
Nepal	2022	5	v025	type of place of residence	dbl+lbl	0	2	1 - 2
Nepal	1996	6	v151	sex of household head	dbl+lbl	0	2	1 - 2
Nepal	2001	6	v151	sex of household head	dbl+lbl	0	2	1 - 2
Nepal	2006	6	v151	sex of household head	dbl+lbl	0	2	1 - 2
Nepal	2011	6	v151	sex of household head	dbl+lbl	0	2	1 - 2
Nepal	2016	6	v151	sex of household head	dbl+lbl	0	2	1 - 2
Nepal	2022	6	v151	sex of household head	dbl+lbl	0	2	1 - 2
Nepal	1996	7	v152	age of household head	dbl	0	81	12 - 95
Nepal	2001	7	v152	age of household head	dbl+lbl	0	77	14 - 95
Nepal	2006	7	v152	age of household head	dbl+lbl	0	75	16 - 96
Nepal	2011	7	v152	age of household head	dbl+lbl	0	76	16 - 95
Nepal	2016	7	v152	age of household head	dbl+lbl	0	73	17 - 95
Nepal	2022	7	v152	age of household head	dbl+lbl	0	76	17 - 95

From the above table we get an overall snapshot of the common independent variables. We see that majority of the have different number of value labels across the six npbr datasets. Only v025 and v151 have the same number of value labels across npbr rounds. Next, we look at the labelled variables among these common variables in more details. We would like to see if the value labels and codes of the common independent variables are similar across the npbr datasets.

v106 - Mother’s education level

We check the value labels of v106 variable that denotes the highest education level of mother. First we create a nested tibble of v106’s value labels.

# Create the data dictionary for v106 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v106 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v106) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v106)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v106", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 17: Data dictionary of v106 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v106	0	[0] no education	[0] no education	[0] no education	[0] no education	[0] no education	[0] no education
Nepal	v106	1	[1] primary or less	[1] primary	[1] primary	[1] primary	[1] primary	[1] basic
Nepal	v106	2	[2] some secondary	[2] secondary	[2] secondary	[2] secondary	[2] secondary	[2] secondary
Nepal	v106	3	[3] slc and above	[3] higher	[3] higher	[3] higher	[3] higher	[3] higher
Nepal	v106	4	[4]
Nepal	v106	8				[8] don't know

We can see the value labels of v106 are mostly similar except for npbr 1996 and 2011 datasets.

v011 - Date of birth (in CMC)

The v011 variable, which has the dob of mothers in cmc, is a numeric variable. Let’s check the range of these values in further details such as checking for outliers. First let’s create a nested tibble of the summary statistics of v011 variable.

# Create the summary statistics for v011 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(skim_v011 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v011) |> 
        skim_without_charts() |> 
        as_tibble() |> 
        select(-c(skim_type, complete_rate)) |> 
        rename(
          variable = 1,
          n_miss = 2,
          mean = 3,
          sd = 4,
          min = 5,
          p25 = 6,
          p50 = 7,
          p75 = 8,
          max = 9
        )
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(skim_v011)) |> 
  # Make variable values have one decimal point 
  mutate(
    mean = sprintf("%.1f", mean),
    sd = sprintf("%.1f", sd)
  )

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 18: Data dictionary of v011 across the npbr rounds

ctr_name	svy_year	variable	mean	sd	min	p25	p50	p75	max
Nepal	1996	v011	206.0	97.9	35	126	203	286	456
Nepal	2001	v011	1465.4	97.1	1296	1384	1465	1541	1708
Nepal	2006	v011	1525.7	98.4	1356	1444	1520	1607	1771
Nepal	2011	v011	1583.6	95.0	1415	1505	1580	1660	1830
Nepal	2016	v011	1646.9	96.7	1480	1565	1642	1723	1903
Nepal	2022	v011	1711.0	96.7	1546	1631	1706	1787	1962

v501 - Mother’s marital status

We check the value labels of v501 variable which gives the current marital status of mother. First we create a nested tibble of v501’s value labels.

# Create the data dictionary for v501 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v501 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v501) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v501)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v501", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 19: Data dictionary of v501 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v501	0	[0] never married	[0] never married	[0] never married	[0] never in union	[0] never in union	[0] never in union
Nepal	v501	1	[1] married	[1] married	[1] married	[1] married	[1] married	[1] married
Nepal	v501	2	[2] living together	[2] living together	[2] living together	[2] living with partner	[2] living with partner	[2] living with partner
Nepal	v501	3	[3] widowed	[3] widowed	[3] widowed	[3] widowed	[3] widowed	[3] widowed
Nepal	v501	4	[4] divorced	[4] divorced	[4] divorced	[4] divorced	[4] divorced	[4] divorced
Nepal	v501	5	[5] not living together	[5] not living together	[5] not living together	[5] no longer living together/separated	[5] no longer living together/separated	[5] no longer living together/separated

All the npbr rounds have 5 value labels. The npbr 1996, 2001 and 2006 rounds have a set of similar value label texts. Then npbr 2011, 2016 and 2022 have another set of similar value labels.

v701 - Husband/Partner’s education level

We check the value labels of v701 variable which gives the current marital status of mother. First we create a nested tibble of v701’s value labels.

# Create the data dictionary for v701 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v701 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v701) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v701)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v701", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 20: Data dictionary of v701 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v701	0	[0] no education	[0] no education	[0] no education	[0] no education	[0] no education	[0] no education
Nepal	v701	1	[1] primary	[1] primary	[1] primary	[1] primary	[1] primary	[1] basic
Nepal	v701	2	[2] secondary	[2] secondary	[2] secondary	[2] secondary	[2] secondary	[2] secondary
Nepal	v701	3	[3] higher	[3] higher	[3] higher	[3] higher	[3] higher	[3] higher
Nepal	v701	8	[8] don't know	[8] don't know	[8] don't know	[8] don't know	[8] don't know	[8] don't know

All the npbr rounds have 5 value labels. The npbr 1996, 2001 and 2006 rounds and npbr 2011, 2016 and 2022 have a similar set of value labels with a difference in wording among them.

v025 - Type of place of residence

We check the value labels of v025 variable which shows if a household belongs to rural or urban psu. First we create a nested tibble of v025’s value labels.

# Create the data dictionary for v025 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v025 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v025) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(c(ctr_name, svy_year, lookfor_v025)) |> 
  unnest(cols = c(lookfor_v025)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v025", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 21: Data dictionary of v025 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v025	1	[1] urban	[1] urban	[1] urban	[1] urban	[1] urban	[1] urban
Nepal	v025	2	[2] rural	[2] rural	[2] rural	[2] rural	[2] rural	[2] rural

The values labels and codes for v025 are same across all the npbr rounds.

v151 - Sex of household head

We check the value labels of v151 variable which gives the gender of the household head. First we create a nested tibble of v151’s value labels.

# Create the data dictionary for v151 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v151 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v151) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v151)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v151", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 22: Data dictionary of v151 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v151	1	[1] male	[1] male	[1] male	[1] male	[1] male	[1] male
Nepal	v151	2	[2] female	[2] female	[2] female	[2] female	[2] female	[2] female

The values labels and codes for v151 are same across all the npbr rounds.

v152 - Age of household head

Interestingly, we see v152 (a continuous variable) has value labels for all rounds except npbr 1996. Therefore, we check the value labels of v152 for those rounds. First we create a nested tibble of v152’s value labels.

# Create the data dictionary for v152 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  filter(svy_year != 1996) |> 
  mutate(lookfor_v152 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v152) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v152)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v152", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 23: Data dictionary of v152 across the npbr rounds

ctr_name	var_name	label_num	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v152	97	[97] 97+	[97] 97+	[97] 97+	[97] 97+	[97] 97+
Nepal	v152	98	[98] dk	[98] dk	[98] don't know	[98] don't know	[98] don't know

We can see that the value labels of v152 are mostly for missing values. However, since v152 has no missing values across the npbr rounds, we need not be concerned about them.

Checking the Social group variables before harmonization

Now we document the social group variables and then harmonize them. Upon manually checking the full data dictionaries of each npbr dataset we find the following variables - religion, ethnicity, and, native language. First we will check the data dictionary of these social group variables. Then we will check them variable wise.

# We check the social group vars in all npbr datasets.
# First we create the data dictionary in nested tibble.
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |>
  mutate(lookfor_socgrp = map(
    npbr_data,
    \(df) {
      df |> 
        # select the social group variables
        select(
          v130, v131, 
          matches("slang[nr]|snlang|slnative|v045c")
        ) |> 
        lookfor(details = "full") |> 
        select(-c(levels:n_na)) |> 
        # For correctly viewing the range column in data dictionary
        convert_list_columns_to_character()
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary 
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_socgrp)) |> 
  arrange(pos)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 24: Data dictionary of social group variables across the npbr rounds

ctr_name	svy_year	pos	variable	label	col_type	missing	unique_values	range
Nepal	1996	1	v130	religion	dbl+lbl	55	6	1 - 5
Nepal	2001	1	v130	religion	dbl+lbl	0	5	1 - 6
Nepal	2006	1	v130	religion	dbl+lbl	0	6	1 - 6
Nepal	2011	1	v130	religion	dbl+lbl	0	6	1 - 96
Nepal	2016	1	v130	religion	dbl+lbl	0	6	1 - 96
Nepal	2022	1	v130	religion	dbl+lbl	0	6	1 - 96
Nepal	1996	2	v131	ethnicity	dbl+lbl	0	13	0 - 12
Nepal	2001	2	v131	ethnicity	dbl+lbl	0	56	1 - 96
Nepal	2006	2	v131	ethnicity	dbl+lbl	0	75	1 - 96
Nepal	2011	2	v131	ethnicity	dbl+lbl	0	11	1 - 996
Nepal	2016	2	v131	ethnicity	dbl+lbl	0	11	1 - 96
Nepal	2022	2	v131	ethnicity	dbl+lbl	0	11	1 - 96
Nepal	1996	3	slangn	native language of respondent	dbl+lbl	5	6	1 - 5
Nepal	2001	3	slangr	home language of respondent	dbl+lbl	0	5	1 - 5
Nepal	2006	3	snlang	native language of respondent	dbl+lbl	0	5	1 - 5
Nepal	2011	3	slnative	native language of respondent	dbl+lbl	0	4	1 - 6
Nepal	2016	3	v045c	native language of respondent	dbl+lbl	0	5	1 - 5
Nepal	2022	3	v045c	native language of respondent	dbl+lbl	0	5	1 - 6

The above table gives an overall snapshot of the social group variables. All the variables are of labelled class across all the npbr datasets. We see that all the variables have different number of value labels across the six npbr datasets. Note that, the religion and native language of respondent variable has some missing values in the npbr 1996 dataset. Next, we look at the variables individually for matching the value labels across the npbr datasets.

v130 - Religion of hh head

We check the value labels of the first social group variable v130, which gives the religion of household head. First we create a nested tibble of v130’s value labels.

# Create the data dictionary for v130 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v130 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v130) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v130)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v130", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 25: Data dictionary of v130 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v130	1	[1] hindu	[1] hindu	[1] hindu	[1] hindu	[1] hindu	[1] hindu
Nepal	v130	2	[2] buddhist	[2] buddhist	[2] buddhist	[2] buddhist	[2] buddhist	[2] buddhist
Nepal	v130	3	[3] muslim	[3] muslim	[3] mulsim	[3] muslim	[3] muslim	[3] muslim
Nepal	v130	4	[4] christian	[4] christian	[4] kirat	[4] kirat	[4] kirat	[4] kirat
Nepal	v130	5	[5] other		[5] christian	[5] christian	[5] christian	[5] christian
Nepal	v130	6		[6] other	[6] other
Nepal	v130	96				[96] other	[96] other	[96] other

Evidently, the values labels and codes for v130 are different across all the npbr rounds. Only the first three value labels “hindu”, “buddhist” and “muslim” and their label codes are same across all the npbr rounds. Therefore, we will work with these labels for harmonization.
NOTE: The labels “christian” and “other” are also present but their label codes vary across the npbr rounds.

v131 - Ethnicity of hh head

Next, we check the value labels of the v130 variable, which gives the ethnicity of household head. First we create a nested tibble of v131’s value labels.

# Create the data dictionary for v131 in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_v131 = map(
    npbr_data,
    \(df) {
      df |> 
        select(v131) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_v131)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "v131", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 26: Data dictionary of v131 across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	v131	0	[0] brahmin
Nepal	v131	1	[1] chhetri	[1] yadav ahir	[1] chhetri	[1] hill brahmin	[1] hill brahmin	[1] hill brahmin
Nepal	v131	2	[2] newar	[2] kayastha	[2] brahmin	[2] hill chhetri	[2] hill chhetri	[2] hill chhetri
Nepal	v131	3	[3] gurung	[3] kumhar	[3] magar	[3] terai brahmin/chhetri	[3] terai brahmin/chhetri	[3] terai brahmin/chhetri
Nepal	v131	4	[4] magar	[4] baniya	[4] tharu	[4] other terai caste	[4] other terai caste	[4] other terai caste
Nepal	v131	5	[5] tamang	[5] dhobi	[5] tamang	[5] hill dalit	[5] hill dalit	[5] hill dalit
Nepal	v131	6	[6] rai, limbu	[6] sundhi kalwar	[6] newar	[6] terai dalit	[6] terai dalit	[6] terai dalit
Nepal	v131	7	[7] muslim, churaute	[7] kurmi	[7] muslim	[7] newar	[7] newar	[7] newar
Nepal	v131	8	[8] tharu, rajbanshi	[8] brahman	[8] kami	[8] hill janajati	[8] hill janajati	[8] hill janajati
Nepal	v131	9	[9] yadav, ahir	[9] rajput	[9] yadav	[9] terai janajati	[9] terai janajati	[9] terai janajati
Nepal	v131	10	[10] occupational	[10] tharu	[10] rai	[10] muslim	[10] muslim	[10] muslim
Nepal	v131	11	[11] other hill origin	[11] teli	[11] gurung
Nepal	v131	12	[12] other terai origin	[12] kushwaha	[12] damai/dholi
Nepal	v131	13		[13] musalman	[13] limbu
Nepal	v131	14		[14] haluwai	[14] thakuri
Nepal	v131	15		[15] malaha	[15] sharki
Nepal	v131	16		[16] rajbanshi	[16] teli
Nepal	v131	17		[17] dhimal	[17] chamar
Nepal	v131	18		[18] gangai	[18] koiri
Nepal	v131	19		[19] marwadi	[19] kurmi
Nepal	v131	20		[20] bangali	[20] sanyasi
Nepal	v131	21		[21] dhanuk	[21] dhanuk
Nepal	v131	22		[22] shikha	[22] mushahar
Nepal	v131	23		[23] dushad	[23] dushad
Nepal	v131	24		[24] chamar	[24] sherpa
Nepal	v131	25		[25] khatwe	[25] sonar
Nepal	v131	26		[26] bhumihar	[26] kewat
Nepal	v131	27		[27] kewat	[27] brahmin (terai)
Nepal	v131	28		[28] rajbhar	[28] baniya
Nepal	v131	29		[29] kanu	[29] gharti/bhujel
Nepal	v131	30		[30] tarai others	[30] malaha
Nepal	v131	31		[31] brahman	[31] kalwar
Nepal	v131	32		[32] chhetri	[32] kumal
Nepal	v131	33		[33] thakuri	[33] hajam
Nepal	v131	34		[34] sanyashi	[34] kanu
Nepal	v131	35		[35] newar	[35] rajbanshi
Nepal	v131	36		[36] limbu	[36] sunuwar
Nepal	v131	37		[37] rai	[37] sundi
Nepal	v131	38		[38] gurung	[38] lohar
Nepal	v131	39		[39] thakali	[39] tatma
Nepal	v131	40		[40] tamang	[40] khatwe
Nepal	v131	41		[41] magar	[41] dhobi
Nepal	v131	42		[42] danuwar	[42] majhi
Nepal	v131	43		[43] jirel	[43] nuniya
Nepal	v131	44		[44] majhi	[44] kumhar
Nepal	v131	45		[45] sunuwar	[45] dunuwar
Nepal	v131	46		[46] gaine	[46] chepang/praja
Nepal	v131	47		[47] chepang	[47] haluwai
Nepal	v131	48		[48] kumhal	[48] rajput
Nepal	v131	49		[49] churaute (pahadi musalman)	[49] kayastha
Nepal	v131	50		[50] bote	[50] badahi
Nepal	v131	51		[51] lepcha	[51] marwadi
Nepal	v131	52		[52] raute	[52] santhal/satar
Nepal	v131	53		[53] darai	[53] dangad/jhangad
Nepal	v131	54		[54] raji	[54] bantar
Nepal	v131	55		[55] thami	[55] barai
Nepal	v131	56		[56] damai	[56] kahar
Nepal	v131	57		[57] kami	[57] gangai
Nepal	v131	58		[58] sharki	[58] lodha
Nepal	v131	59		[59] badi	[59] rajbhar
Nepal	v131	60		[60] pahadi others	[60] thami
Nepal	v131	61		[61] sherpa	[61] dhimal
Nepal	v131	62		[62] mugrali/humli/kar bhote	[62] bhote
Nepal	v131	63		[63] himali others	[63] bing/binda
Nepal	v131	64			[64] bhedihar/gaderi
Nepal	v131	65			[65] nurang
Nepal	v131	66			[66] yakha
Nepal	v131	67			[67] darai
Nepal	v131	68			[68] tajpuriya
Nepal	v131	69			[69] thakali
Nepal	v131	70			[70] chidimar
Nepal	v131	71			[71] pahadi
Nepal	v131	72			[72] mali
Nepal	v131	73			[73] bangali
Nepal	v131	74			[74] chantel
Nepal	v131	75			[75] dom
Nepal	v131	76			[76] kamar
Nepal	v131	77			[77] bote
Nepal	v131	78			[78] dbrahmu/baramu
Nepal	v131	79			[79] gainai
Nepal	v131	80			[80] jirel
Nepal	v131	81			[81] aadibasi
Nepal	v131	82			[82] dura
Nepal	v131	83			[83] churaute
Nepal	v131	84			[84] badi
Nepal	v131	85			[85] meche
Nepal	v131	86			[86] lepcha
Nepal	v131	87			[87] halkhor
Nepal	v131	88			[88] panjabi/sihk
Nepal	v131	89			[89] kisan
Nepal	v131	90			[90] bhumihar
Nepal	v131	91			[91] kushawa
Nepal	v131	92			[92] hayu
Nepal	v131	93			[93] koche
Nepal	v131	94			[94] dhuniya
Nepal	v131	95			[95] walung
Nepal	v131	96		[96] others	[96] other caste*		[96] other	[96] other
Nepal	v131	98		[98] do not know
Nepal	v131	996				[996] other

Similar to v130, the values labels and codes for v131 are different across all the npbr rounds. Notably, npbr 2001 and 2006 have more than 60 ethnicity categories. Unfortunately, we do not know how to group these categories. Therefore, we might not use this variable as a social group characteristic.

Native language of hh respondent

Next, we check the value labels of the native language of hh respondent variable. The variable names of this variable differs across the npbr datasets. First we create a nested tibble of the value labels.

# Create the data dictionary in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_lang = map(
    npbr_data,
    \(df) {
      df |> 
        select(matches("slang[nr]|snlang|slnative|v045c")) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_lang)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "npbr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "Native language", .before = 2)

# Convert the tibble to flextable for easy viewing
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 27: Data dictionary of native language of respondent variable across the npbr rounds

ctr_name	var_name	label_num	npbr_1996	npbr_2001	npbr_2006	npbr_2011	npbr_2016	npbr_2022
Nepal	Native language	1	[1] nepali	[1] nepali	[1] nepali	[1] nepali	[1] english	[1] english
Nepal	Native language	2	[2] bhojpuri	[2] bhojpuri	[2] bhojpuri	[2] bhojpuri	[2] nepali	[2] nepali
Nepal	Native language	3	[3] maithili	[3] maithili	[3] maithili	[3] maithili	[3] maithili	[3] maithali
Nepal	Native language	4	[4] tharu	[4] tharu	[4] tharu		[4] bhojpuri	[4] bhojpuri
Nepal	Native language	5	[5] other	[5] other	[5] other	[5] english	[5] other
Nepal	Native language	6				[6] other		[6] other
Nepal	Native language	9			[9] missing

The values labels are same for npbr 1996 and 2001, and then they vary for the other datasets. The value labels “nepali”, “bhojpuri” and “maithili” are same across all the npbr rounds hut their labels code are different. Therefore, we will use these labels for harmonization.

Correcting year-related variables

The year-related variables might have different formatting in each survey. Therefore, we need to check and harmonize them before appending the datasets.

# First we create the data dictionary of year-related vars in nested tibble
npbr1_pre_tmp1 <- npbr1_pre_tmp0 |> 
  mutate(lookfor_year = map(
    npbr_data,
    \(df) {
      df |> 
        select(c(b2, v007, v010)) |> 
        look_for(details = "full") |> 
        # For correctly viewing the range column in data dictionary
        convert_list_columns_to_character() |> 
        select(-c(levels:n_na))
    }
  ))
npbr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary 
npbr1_pre_tmp2 <- npbr1_pre_tmp1 |> 
  select(-c(unf, npbr_data, n_births)) |> 
  unnest(cols = c(lookfor_year)) |> 
  arrange(pos)
# Convert and view the tibble as flextable
npbr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 28: Data dictionary of year-related variables across the npbr rounds

ctr_name	svy_year	pos	variable	label	col_type	unique_values	range
Nepal	1996	1	b2	year of birth	dbl	38	16 - 53
Nepal	2001	1	b2	year of birth	dbl	36	2023 - 2058
Nepal	2006	1	b2	year of birth	dbl	38	2026 - 2063
Nepal	2011	1	b2	year of birth	dbl	38	2030 - 2068
Nepal	2016	1	b2	year of birth	dbl	37	2036 - 2073
Nepal	2022	1	b2	year of birth	dbl	38	2042 - 2079
Nepal	1996	2	v007	year of interview	dbl	2	52 - 53
Nepal	2001	2	v007	year of interview	dbl	2	2057 - 2058
Nepal	2006	2	v007	year of interview	dbl	2	2062 - 2063
Nepal	2011	2	v007	year of interview	dbl	2	2067 - 2068
Nepal	2016	2	v007	year of interview	dbl	1	2073 - 2073
Nepal	2022	2	v007	year of interview	dbl	2	2078 - 2079
Nepal	1996	3	v010	respondent's year of birth	dbl	36	2 - 37
Nepal	2001	3	v010	respondent's year of birth	dbl	36	2007 - 2042
Nepal	2006	3	v010	respondent's year of birth	dbl	36	2012 - 2047
Nepal	2011	3	v010	respondent's year of birth	dbl	36	2017 - 2052
Nepal	2016	3	v010	respondent's year of birth	dbl	36	2023 - 2058
Nepal	2022	3	v010	respondent's year of birth	dbl+lbl	36	2028 - 2063

We find that none of the year variables have the correct values. Just look at the year of interview variable. We see that the year of interview values are higher than the corresponding survey years. Therefore, we need to reformat them before appending the datasets.

v010 - Respondent’s year of birth

We see that the v010 variable has some value labels only for npbr 2022. Therefore, we check those value labels for anything strange.

# Create the data dictionary for b1 in npbr 2011
npbr1_pre_tmp0$npbr_data$npbr_2022 |> 
  select(v010) |> 
  look_for(details = "full") |> 
  lookfor_to_long_format() |>
  convert_list_columns_to_character() |>
  select(-c(pos, levels, class:n_na)) |>
  qflextable() |> 
  autofit()

Table 29: Data dictionary of v010 in npbr 2022

variable	label	col_type	missing	value_labels	unique_values	range
v010	respondent's year of birth	dbl+lbl	0	[9997] not applicable/inconsistent	36	2028 - 2063
v010	respondent's year of birth	dbl+lbl	0	[9998] don't know year	36	2028 - 2063

Note that there are two value labels that correspond to missing values. However as seen in Table 28, v010 has no missing values and those missing value labels would not have been used in the data. So we need not take further actions.

Nepal HH dataset use for variable creation

Checking the ID variables before harmonization

Here we check the formatting of the constituent variables with which we will prepare the ID variables for the pooled Nepal household recode (hr) dataset. We will use the following constituent variables for creating the ID variables for the pooled dataset:

# We check the var type of ID vars in all nphr datasets.
# First we create a data dictionary of the nphr datasets in nested tibble.
nphr1_pre_tmp1 <- nphr1_pre_tmp0 |>
  mutate(lookfor_idvars = map(
    nphr_data,
    \(df) {
      df |> 
        select(hv001, hv002) |> 
        lookfor(details = "full") |> 
        select(-c(levels:n_na)) |> 
        # For correctly viewing the range column in data dictionary
        convert_list_columns_to_character()
    }
  ))
nphr1_pre_tmp1

# Now we unnest the tibble and output the pooled data dictionary 
nphr1_pre_tmp2 <- nphr1_pre_tmp1 |> 
  select(c(ctr_name, svy_year, lookfor_idvars)) |> 
  unnest(cols = c(lookfor_idvars)) |> 
  arrange(pos)

# Convert and view the tibble as flextable
nphr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 30: Data dictionary of variables to be used for ID creation across the nphr rounds

ctr_name	svy_year	pos	variable	label	col_type	unique_values	range
Nepal	1996	1	hv001	cluster number	dbl	253	101 - 7502
Nepal	2001	1	hv001	cluster number	dbl	251	101 - 7502
Nepal	2006	1	hv001	cluster number	dbl	260	101 - 7502
Nepal	2011	1	hv001	cluster number	dbl	289	101 - 7502
Nepal	2016	1	hv001	cluster number	dbl	383	1 - 383
Nepal	2022	1	hv001	cluster number	dbl	476	1 - 476
Nepal	1996	2	hv002	household number	dbl	488	1 - 774
Nepal	2001	2	hv002	household number	dbl	605	1 - 9006
Nepal	2006	2	hv002	household number	dbl	568	1 - 1319
Nepal	2011	2	hv002	household number	dbl	636	1 - 1403
Nepal	2016	2	hv002	household number	dbl	427	1 - 963
Nepal	2022	2	hv002	household number	dbl	338	1 - 506

From the above we can see that both the hv001 and hv002 are of numeric class with no missing values. These variables can be used for preparing the ID variables after finding the maximum length of their largest value. Note that survey year is also a constituent ID variable of 4-digits and we need not check it.

# We thought to process the above nested tibble further by decomposing the 
# "range" col into min and max values using separate_wider_regex().
# However, we hit a roadblock as pattern did not identify the max values in 
# some nphr rounds correctly
nphr1_pre_tmp3 <- nphr1_pre_tmp0 |> 
  # Generate the summary stats for id vars
  mutate(skim_idvars = map(nphr_data, \(df) {
    df |> 
      select(hv001, hv002) |> 
      skim_without_charts()
  })) |> 
  # Pool the summary stats for all nphr rounds
  select(c(ctr_name, svy_year, skim_idvars)) |> 
  unnest(cols = c(skim_idvars)) |> 
  arrange(skim_variable, svy_year) |> 
  # Group and generate the max and min values for each variable
  group_by(variable = skim_variable) |> 
  summarize(
    min_val = min(numeric.p0),
    max_val = max(numeric.p100)
  ) |> 
  # calculate the num of digits in the maximum values
  mutate(
    max_digits = nchar(as.character(max_val))
  ) |>
  # add variable labels and relocate it after variable name
  bind_cols(vlabel = c("cluster number", "household number")) |>
  relocate(vlabel, .after = 1)

# Convert the tibble to flextable for easy viewing
nphr1_pre_tmp3 |>
  qflextable() |>
  align(align = "left", part = "all") |>
  autofit()

Table 31: The maximum length of constituent ID variables to be set across the nphr rounds

variable	vlabel	min_val	max_val	max_digits
hv001	cluster number	1	7502	4
hv002	household number	1	9006	4

The above table gives the required length of the constituent ID variables to be set, so that we can correctly concatenate them to create the ID variables. The required length of the ID variables are given in max_digits column. Note that survey year is also a constituent ID variable of 4-digits.

Checking HH-level variables before harmonization

Here we check the ecological region and wealth quintile variables before harmonizing them. Note in Nepal 1996 and 2001 the wealth quintile variables are provided in separate datasets. Therefore we join those variables to the hh file before proceeding with the checking.

Upon manually checking the full data dictionaries we find the variable names. Now we will check the data dictionary of these hh-level variables. Then we will check their value labels variable wise.

# We check the hh-level vars in all nphr datasets.
# First we create the data dictionary in nested tibble.
nphr1_pre_tmp1 <- nphr1_pre_tmp0 |>
  mutate(lookfor_hhvars = map(
    nphr_data,
    \(df) {
      df |> 
        # select the common independent variables
        select(
          matches("^wlthind5$|^hv270$"), 
          matches("shez|shreg1|shecoreg")
        ) |> 
        lookfor(details = "full") |> 
        select(-c(levels:n_na)) |> 
        # For correctly viewing the range column in data dictionary
        convert_list_columns_to_character()
    }
  ))
nphr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary 
nphr1_pre_tmp2 <- nphr1_pre_tmp1 |> 
  select(c(svy_year, lookfor_hhvars)) |> 
  unnest(cols = c(lookfor_hhvars)) |> 
  arrange(pos, svy_year)

# Convert the tibble to flextable for easy viewing
nphr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 32: Data dictionary of hh-level variables across the nphr rounds

svy_year	pos	variable	label	col_type	unique_values	range
1996	1	wlthind5	quintiles of wealth index	dbl+lbl	5	1 - 5
2001	1	wlthind5	quintiles of wealth index	dbl+lbl	5	1 - 5
2006	1	hv270	wealth index	dbl+lbl	5	1 - 5
2011	1	hv270	wealth index	dbl+lbl	5	1 - 5
2016	1	hv270	wealth index combined	dbl+lbl	5	1 - 5
2022	1	hv270	wealth index combined	dbl+lbl	5	1 - 5
1996	2	shez	hh ecozone	dbl+lbl	3	0 - 2
2001	2	shreg1	ecological region	dbl+lbl	3	1 - 3
2006	2	shreg1	ecological zone	dbl+lbl	3	1 - 3
2011	2	shecoreg	ecological region	dbl+lbl	3	1 - 3
2016	2	shecoreg	ecological zone	dbl+lbl	3	1 - 3
2022	2	shecoreg	ecological region	dbl+lbl	3	1 - 3

The above table gives an overall snapshot of the hh-level variables. All the variables are of labelled class and have the same number of value labels across all the nphr datasets. Note that, the ecological region variable has a different value label code in the npbr 1996 dataset. Next, we compare the value labels of the variables across the nphr datasets.

Ecological region variable

# Create the data dictionary in nested tibble
nphr1_pre_tmp1 <- nphr1_pre_tmp0 |> 
  mutate(lookfor_ecoreg = map(
    nphr_data,
    \(df) {
      df |> 
        select(matches("shez|shreg1|shecoreg")) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
nphr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
nphr1_pre_tmp2 <- nphr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(c(ctr_name, svy_year, lookfor_ecoreg)) |> 
  unnest(cols = c(lookfor_ecoreg)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "nphr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "Ecological region", .before = 2)

# Convert the tibble to flextable for easy viewing
nphr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 33: Data dictionary of ecological region variable across the nphr rounds

ctr_name	var_name	label_num	nphr_1996	nphr_2001	nphr_2006	nphr_2011	nphr_2016	nphr_2022
Nepal	Ecological region	0	[0] mountain
Nepal	Ecological region	1	[1] hill	[1] mountain	[1] mountain	[1] mountain	[1] mountain	[1] mountain
Nepal	Ecological region	2	[2] terai	[2] hill	[2] hill	[2] hill	[2] hill	[2] hill
Nepal	Ecological region	3		[3] terai	[3] terai	[3] terai	[3] terai	[3] terai

Clearly, the value label texts are same for all nphr rounds. However, the value label code is different in nphr 1996 (codes 0-2) when compared to the rest of nphr rounds (codes 1-2). Therefore, we need to be mindful of this during harmonization.

Wealth index quintile variable

Next, we check the value labels of the household wealth quintile variable. The variable names of this variable differs across the npbr datasets. First we create a nested tibble of the value labels.

# Create the data dictionary in nested tibble
nphr1_pre_tmp1 <- nphr1_pre_tmp0 |> 
  mutate(lookfor_wiqt = map(
    nphr_data,
    \(df) {
      df |> 
        select(matches("^wlthind5$|^hv270$")) |> 
        look_for() |> 
        lookfor_to_long_format() |> 
        select(value_labels)
    }
  ))
nphr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
nphr1_pre_tmp2 <- nphr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(c(ctr_name, svy_year, lookfor_wiqt)) |> 
  unnest(cols = c(lookfor_wiqt)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "nphr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "Wealth index quintiles", .before = 2)

# Convert the tibble to flextable for easy viewing
nphr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 34: Data dictionary of wealth quintiles across the nphr rounds

ctr_name	var_name	label_num	nphr_1996	nphr_2001	nphr_2006	nphr_2011	nphr_2016	nphr_2022
Nepal	Wealth index quintiles	1	[1] lowest quintile	[1] lowest quintile	[1] poorest	[1] poorest	[1] poorest	[1] poorest
Nepal	Wealth index quintiles	2	[2] second quintile	[2] second quintile	[2] poorer	[2] poorer	[2] poorer	[2] poorer
Nepal	Wealth index quintiles	3	[3] middle quintile	[3] middle quintile	[3] middle	[3] middle	[3] middle	[3] middle
Nepal	Wealth index quintiles	4	[4] fourth quintile	[4] fourth quintile	[4] richer	[4] richer	[4] richer	[4] richer
Nepal	Wealth index quintiles	5	[5] highest quintile	[5] highest quintile	[5] richest	[5] richest	[5] richest	[5] richest

Clearly, the value label codes are same in all nphr rounds. However, the value label texts are different in nphr 1996 and 2001, compared to the nphr 2006, 2011, 2016 and 2022 rounds. Therefore, we need to be mindful of this during harmonization.

Nepal PR dataset use for family structure variables creation

Checking the ID variables before harmonization

Here we check the formatting of the constituent variables with which we will prepare the ID variables for the pooled Nepal person recode (pr) dataset. We will use the following constituent variables for creating the ID variables for the pooled dataset:

# We check the var type of ID vars in all nppr datasets.
# First we create a data dictionary of the nppr datasets in nested tibble.
nppr1_pre_tmp1 <- nppr1_pre_tmp0 |>
  mutate(lookfor_idvars = map(nppr_data, \(df) {
    df |> 
      select(hv001, hv002, hvidx) |> 
      lookfor(details = "full") |> 
      select(-c(levels:n_na)) |> 
      # For correctly viewing the range column in data dictionary
      convert_list_columns_to_character()
  }))
nppr1_pre_tmp1

# Now we unnest the tibble and output the pooled data dictionary 
nppr1_pre_tmp2 <- nppr1_pre_tmp1 |> 
  select(c(ctr_name, svy_year, lookfor_idvars)) |> 
  unnest(cols = c(lookfor_idvars)) |> 
  arrange(pos)

# Convert and view the tibble as flextable
nppr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 35: Data dictionary of variables to be used for ID creation across the nppr rounds

ctr_name	svy_year	pos	variable	label	col_type	unique_values	range
Nepal	1996	1	hv001	cluster number	dbl	253	101 - 7502
Nepal	2001	1	hv001	cluster number	dbl	251	101 - 7502
Nepal	2006	1	hv001	cluster number	dbl	260	101 - 7502
Nepal	2011	1	hv001	cluster number	dbl	289	101 - 7502
Nepal	2016	1	hv001	cluster number	dbl	383	1 - 383
Nepal	2022	1	hv001	cluster number	dbl	476	1 - 476
Nepal	1996	2	hv002	household number	dbl	488	1 - 774
Nepal	2001	2	hv002	household number	dbl	605	1 - 9006
Nepal	2006	2	hv002	household number	dbl	568	1 - 1319
Nepal	2011	2	hv002	household number	dbl	636	1 - 1403
Nepal	2016	2	hv002	household number	dbl	427	1 - 963
Nepal	2022	2	hv002	household number	dbl	338	1 - 506
Nepal	1996	3	hvidx	line number	dbl	42	1 - 42
Nepal	2001	3	hvidx	line number	dbl	29	1 - 29
Nepal	2006	3	hvidx	line number	dbl	30	1 - 30
Nepal	2011	3	hvidx	line number	dbl	31	1 - 31
Nepal	2016	3	hvidx	line number	dbl	38	1 - 38
Nepal	2022	3	hvidx	line number	dbl	26	1 - 26

From the above table we can see that all the three constituent ID variables are of numeric class with no missing values. These variables can directly be used for preparing the ID variables after finding the maximum length of their largest value. Note that survey year is also a constituent ID variable of 4-digits and we need not check it.

# We thought to process the above nested tibble further by decomposing the 
# "range" col into min and max values using separate_wider_regex().
# However, we hit a roadblock as pattern did not identify the max values in 
# some nppr rounds correctly
nppr1_pre_tmp3 <- nppr1_pre_tmp0 |> 
  # Generate the summary stats for id vars
  mutate(skim_idvars = map(nppr_data, \(df) {
    df |> 
      select(hv001, hv002, hvidx) |> 
      skim_without_charts()
  })) |> 
  # Pool the summary stats for all nppr rounds
  select(c(ctr_name, svy_year, skim_idvars)) |> 
  unnest(cols = c(skim_idvars)) |> 
  arrange(skim_variable, svy_year) |> 
  # Group and generate the max and min values for each variable
  group_by(variable = skim_variable) |> 
  summarize(
    min_val = min(numeric.p0),
    max_val = max(numeric.p100)
  ) |> 
  # calculate the num of digits in the maximum values
  mutate(
    max_digits = nchar(as.character(max_val))
  ) |>
  # add variable labels and relocate it after variable name
  bind_cols(vlabel = c("cluster number", "household number", "Persons line number")) |>
  relocate(vlabel, .after = 1)

# Convert the tibble to flextable for easy viewing
nppr1_pre_tmp3 |>
  qflextable() |>
  align(align = "left", part = "all") |>
  autofit()

Table 36: The maximum length of constituent ID variables to be set across the nppr rounds

variable	vlabel	min_val	max_val	max_digits
hv001	cluster number	1	7502	4
hv002	household number	1	9006	4
hvidx	Persons line number	1	42	2

Checking Family structure variables before harmonization

Here we check the family structure related variables before harmonizing them. The variable names were collected by manually checking the full data dictionaries. Here we will check the data dictionary of these hh-level variables and focus on the variable types.

# We check the family structure vars in all nppr datasets.
# First we create the data dictionary in nested tibble.
nppr1_pre_tmp1 <- nppr1_pre_tmp0 |>
  mutate(lookfor_famstrvars = map(nppr_data, \(df) {
    df |> 
      # select the common independent variables
      select(c(hv101, hv102, hv103, hv104, hv105)) |> 
      lookfor(details = "full") |> 
      select(-c(levels:n_na)) |> 
      # For correctly viewing the range column in data dictionary
      convert_list_columns_to_character()
  }))
nppr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary 
nppr1_pre_tmp2 <- nppr1_pre_tmp1 |> 
  select(c(svy_year, lookfor_famstrvars)) |> 
  unnest(cols = c(lookfor_famstrvars)) |> 
  arrange(pos, svy_year)

# Convert the tibble to flextable for easy viewing
nppr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 37: Data dictionary of family structure vars across the nppr rounds

svy_year	pos	variable	label	col_type	missing	unique_values	range
1996	1	hv101	relationship to head	dbl+lbl	2	13	1 - 12
2001	1	hv101	relationship to head	dbl+lbl	0	12	1 - 12
2006	1	hv101	relationship to head	dbl+lbl	0	14	1 - 15
2011	1	hv101	relationship to head	dbl+lbl	0	12	1 - 12
2016	1	hv101	relationship to head	dbl+lbl	0	14	1 - 15
2022	1	hv101	relationship to head	dbl+lbl	0	14	1 - 15
1996	2	hv102	usual resident	dbl+lbl	0	2	0 - 1
2001	2	hv102	usual resident	dbl+lbl	0	2	0 - 1
2006	2	hv102	usual resident	dbl+lbl	0	2	0 - 1
2011	2	hv102	usual resident	dbl+lbl	0	2	0 - 1
2016	2	hv102	usual resident	dbl+lbl	0	2	0 - 1
2022	2	hv102	usual resident (househods with no de jure members)	dbl+lbl	0	2	0 - 1
1996	3	hv103	slept last night	dbl+lbl	8	3	0 - 1
2001	3	hv103	slept last night	dbl+lbl	0	2	0 - 1
2006	3	hv103	slept last night	dbl+lbl	0	2	0 - 1
2011	3	hv103	slept last night	dbl+lbl	0	2	0 - 1
2016	3	hv103	slept last night	dbl+lbl	0	2	0 - 1
2022	3	hv103	stayed last night	dbl+lbl	0	2	0 - 1
1996	4	hv104	sex of household member	dbl+lbl	0	2	1 - 2
2001	4	hv104	sex of household member	dbl+lbl	0	2	1 - 2
2006	4	hv104	sex of household member	dbl+lbl	0	2	1 - 2
2011	4	hv104	sex of household member	dbl+lbl	0	2	1 - 2
2016	4	hv104	sex of household member	dbl+lbl	0	2	1 - 2
2022	4	hv104	sex of household member	dbl+lbl	0	2	1 - 2
1996	5	hv105	age of household members	dbl+lbl	0	99	0 - 98
2001	5	hv105	age of household members	dbl+lbl	2	100	0 - 98
2006	5	hv105	age of household members	dbl+lbl	0	97	0 - 96
2011	5	hv105	age of household members	dbl+lbl	0	96	0 - 95
2016	5	hv105	age of household members	dbl+lbl	0	96	0 - 95
2022	5	hv105	age of household members	dbl+lbl	0	97	0 - 98

The above table gives an overall snapshot of the family structure related variables. Interestingly, all the variables including age of hh members (a continuous var) are of labelled class. The relation to head and de facto resident variables have few missing values in nppr 1996. Note that, the three variables of interest hv101-hv102, two variables hv101 and hv103 have different number of value labels across the nppr rounds. Next, we compare the value labels of the individual variables across the nppr datasets.

hv101 - Relationship to head

Next, we check the value labels of the relationship to the household head variable. First we create a nested tibble of the value labels.

# Create the data dictionary in nested tibble
nppr1_pre_tmp1 <- nppr1_pre_tmp0 |> 
  mutate(lookfor_hv101 = map(nppr_data, \(df) {
    df |> 
      select(hv101) |> 
      look_for() |> 
      lookfor_to_long_format() |> 
      select(value_labels)
  }))
nppr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
nppr1_pre_tmp2 <- nppr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(c(ctr_name, svy_year, lookfor_hv101)) |> 
  unnest(cols = c(lookfor_hv101)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "nppr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "hv101", .before = 2)

# Convert the tibble to flextable for easy viewing
nppr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 38: Data dictionary of relationship to head variable across the nppr rounds

ctr_name	var_name	label_num	nppr_1996	nppr_2001	nppr_2006	nppr_2011	nppr_2016	nppr_2022
Nepal	hv101	1	[1] head	[1] head	[1] head	[1] head	[1] head	[1] head
Nepal	hv101	2	[2] wife or husband	[2] wife or husband	[2] wife or husband	[2] wife or husband	[2] wife or husband	[2] wife or husband
Nepal	hv101	3	[3] son/daughter	[3] son/daughter	[3] son/daughter	[3] son/daughter	[3] son/daughter	[3] son/daughter
Nepal	hv101	4	[4] son/daughter-in-law	[4] son/daughter-in-law	[4] son/daughter-in-law	[4] son/daughter-in-law	[4] son/daughter-in-law	[4] son/daughter-in-law
Nepal	hv101	5	[5] grandchild	[5] grandchild	[5] grandchild	[5] grandchild	[5] grandchild	[5] grandchild
Nepal	hv101	6	[6] parent	[6] parent	[6] parent	[6] parent	[6] parent	[6] parent
Nepal	hv101	7	[7] parent-in-law	[7] parent-in-law	[7] parent-in-law	[7] parent-in-law	[7] parent-in-law	[7] parent-in-law
Nepal	hv101	8	[8] brother/sister	[8] brother/sister	[8] brother/sister	[8] brother/sister	[8] brother/sister	[8] brother/sister
Nepal	hv101	9	[9] co-spouse	[9] co-spouse	[9] co-spouse	[9] co-spouse	[9] co-spouse	[9] co-spouse
Nepal	hv101	10	[10] other relative	[10] other relative	[10] other relative	[10] other relative	[10] other relative	[10] other relative
Nepal	hv101	11	[11] adopted/foster child	[11] adopted/foster child	[11] adopted/foster child	[11] adopted/foster child	[11] adopted/foster child	[11] adopted/foster child
Nepal	hv101	12	[12] not related	[12] not related	[12] not related	[12] not related	[12] not related	[12] not related
Nepal	hv101	13			[13] niece/nephew by blood	[13] niece/nephew by blood	[13] niece/nephew	[13] niece/nephew by blood
Nepal	hv101	14			[14] niece/nephew by marriage	[14] niece/nephew by marriage	[14] niece/nephew by marriage	[14] niece/nephew by marriage
Nepal	hv101	15			[15] brother-in-law/sister-in-law		[15] brother/sister in law	[15] brother/sister in law
Nepal	hv101	98	[98] dk	[98] dk	[98] dk	[98] don't know	[98] don't know	[98] don't know

The above table shows that the value label texts vary across the nppr rounds. To harmonize the relationship to head variable we can use the following value labels -

1 head
2 spouse
3 child
4 child-in-law
5 grandchild
6 parent
7 parent-in-law
8 sibling
9 others

Here, we merge the “spouse” and “co-spouse” categories into “spouse” category, and the “son/daughter” and “adopted/foster child” categories into “child” category.

hv102 - de jure/usual resident

Next, we check the value labels of the de jure resident variable. This means if a household member is an usual resident of the household. First we create a nested tibble of the value labels.

# Create the data dictionary in nested tibble
nppr1_pre_tmp1 <- nppr1_pre_tmp0 |> 
  mutate(lookfor_hv102 = map(nppr_data, \(df) {
    df |> 
      select(hv102) |> 
      look_for() |> 
      lookfor_to_long_format() |> 
      select(value_labels)
  }))
nppr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
nppr1_pre_tmp2 <- nppr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(c(ctr_name, svy_year, lookfor_hv102)) |> 
  unnest(cols = c(lookfor_hv102)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "nppr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "hv102", .before = 2)

# Convert the tibble to flextable for easy viewing
nppr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 39: Data dictionary of the De jure resident variable across the nppr rounds

ctr_name	var_name	label_num	nppr_1996	nppr_2001	nppr_2006	nppr_2011	nppr_2016	nppr_2022
Nepal	hv102	0	[0] no	[0] no	[0] no	[0] no	[0] no	[0] no
Nepal	hv102	1	[1] yes	[1] yes	[1] yes	[1] yes	[1] yes	[1] yes

The above table shows that hv102 has the same value label texts and codes across the nppr rounds. Therefore, we can use this variable directly after converting to factor type.

hv103 - de facto resident

Next, we check the value labels of the de facto resident variable. In DHS this means if a household member slept last night in the household. First we create a nested tibble of the value labels.

# Create the data dictionary in nested tibble
nppr1_pre_tmp1 <- nppr1_pre_tmp0 |> 
  mutate(lookfor_hv103 = map(nppr_data, \(df) {
    df |> 
      select(hv103) |> 
      look_for() |> 
      lookfor_to_long_format() |> 
      select(value_labels)
  }))
nppr1_pre_tmp1

# Now we unnest the tibble and refine the pooled data dictionary
nppr1_pre_tmp2 <- nppr1_pre_tmp1 |> 
  # First we select the required cols and unnest()
  select(c(ctr_name, svy_year, lookfor_hv103)) |> 
  unnest(cols = c(lookfor_hv103)) |> 
  # Next we make the num of value labels same across each round
  mutate(label_num = parse_number(value_labels)) |> 
  complete(ctr_name, svy_year, label_num) |>
  # Next we create col of value labels for each survey round
  pivot_wider(
    names_from = svy_year, 
    values_from = value_labels,
    names_prefix = "nppr_"
  ) |>
  # Show the variable name in a col
  mutate(var_name = "hv103", .before = 2)

# Convert the tibble to flextable for easy viewing
nppr1_pre_tmp2 |> 
  qflextable() |> 
  align(align = "left", part = "all") |> 
  autofit()

Table 40: Data dictionary of the De facto resident variable across the nppr rounds

ctr_name	var_name	label_num	nppr_1996	nppr_2001	nppr_2006	nppr_2011	nppr_2016	nppr_2022
Nepal	hv103	0	[0] no	[0] no	[0] no	[0] no	[0] no	[0] no
Nepal	hv103	1	[1] yes	[1] yes	[1] yes	[1] yes	[1] yes	[1] yes

The above table shows that hv103 has the same value label texts and codes across the nppr rounds. Therefore, we can use this variable directly after converting to factor type.

START FROM HERE

TASK:

Handling multiple births in death scarring vars may not be necessary.
Preceding birth interval construction has changed with DHS-7. We could re-construct it.

TO BE CONTINUED …