Vocational Training Grant Fund Impact Evaluation 2011-2016, Baseline and Follow-up Surveys
Baseline and Follow-up Surveys
The impact evaluation of the Vocational Training Grant Fund (VTGF) subactivity in Namibia used a random assignment design to determine the effects of VTGF-funded scholarships for vocational training on recipients' training and labor market outcomes, such as employment and earnings. Under this design, eligible applicants to each VTGF-funded training in which the number of applications exceeded the number of available slots were randomly assigned by the training provider either to a group that was offered a VTGF scholarship (treatment group) or one that was not (control group). The treatment and control groups for each training were expected to be equivalent, on average, except for the offer of VTGF funding. Therefore, differences in the outcomes of the treatment and control groups measured about one year after the end of training could be attributed to the impact of the VTGF funding. As described in the VTGF final evaluation report, the impact evaluation found that the scholarship offer substantially increased participation in and completion of vocational training, but that this did not translate into positive impacts on employment, earnings, or income. The impact evaluation was complemented by an implementation analysis, which drew on qualitative data collected close to the end of the compact; the implementation findings were provided in an interim evaluation report covering all three subactivities.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
Anonymized dataset for public distribution
Vocational training providers throughout Namibia.
Applicants to VTGF-funded trainings throughout Namibia who were randomly assigned to treatment and control groups.
Producers and sponsors
Mathematica Policy Research
Millennium Challenge Corporation
The targeted sample for the VTGF evaluation consists of all applicants to VTGF-funded trainings who were randomly assigned to the treatment and control groups.
For the baseline survey, there were 1,892 unique applicants to the 28 trainings included in the evaluation, including 955 assigned to the treatment group and 937 assigned to the control group. Of these applicants, 55 (3 percent) applied to multiple trainings; these applicants were linked to the first included training for purposes of the evaluation. Of the 1,892 unique applicants, 1,406 completed a baseline survey, and constitute the analytic sample used for the VTGF baseline analysis.
For the follow-up survey, 2 of the 28 trainings initially included in the evaluation were dropped, as the scheduled follow-up fell outside the evaluation period. There were 1,801 unique applicants to the remaining 26 trainings included in the evaluation, including 889 assigned to the treatment group and 912 assigned to the control group. Of the 1,801 unique applicants, 1,250 completed a follow-up survey (642 in the treatment group and 608 in the control group), and constitute the analytic sample used for the VTGF follow-up analysis.
Deviations from the Sample Design
The follow-up sample used for the impact analysis covers 26 VTGF-funded trainings, which is not the full set of trainings funded by the subactivity (the baseline sample included an additional 2 trainings that were subsequently dropped). Specifically, the follow-up sample excludes 27 trainings for which there was no control group (typically because there were sufficient slots to accommodate all applicants), 22 trainings for which the follow-up survey date (one year after the end of training) would fall outside of the evaluation period, and 9 trainings for which there were severe violations of random assignment. These 58 excluded trainings comprise about half of the total number of VTGF-funded trainees.
The response rate to the baseline survey was 74 percent (78 percent in the treatment group and 71 percent in the control group).
The response rate to the follow-up survey was 69.4 percent (72.2 percent in the treatment group and 66.6 percent in the control group).
No weights were used in the main analysis either at baseline or at follow-up.
However, as a robustness check, we estimated results at follow-up using non-reponse weights. These weights were designed to make the weighted follow-up sample reflect the applicant sample in terms of its distribution across trainings. To create these weights, we weighted each follow-up respondent by the inverse of the response rate in the training to which they applied, separately by treatment status. We then top-coded these weights at 3 standard deviations above the mean for the full sample (separately by treatment status) to account for outliers and normalized the sum of the weights (again separately by treatment status) to equal the number of observations. The non-response weight variable is called t1_weight. The results applying these weights were very similar to the unweighted results.
Dates of Data Collection
Baseline (conducted by MCA-Namibia)
Baseline (conduced by NORC/Survey Warehouse)
Baseline (conduced by NORC/Survey Warehouse)
Follow-up (conducted by MPR/Survey Warehouse)
Two MCA-N staff members were responsible for collecting the portion of the baseline data for which MCA-N was responsible. The remainder of the baseline data, collected by NORC/SW, were collected by a handful of interviewers who worked from NORC's office in Windhoek. These were supervised by a senior on-site SW staff member. NORC staff provided support as required.
The follow-up data were collected by a handful of interviewers who worked from SW's office in Windhoek. These were supervised by a senior on-site SW staff member. NORC staff (first six months of data collection) and Mathematica staff (remaining data collection period) provided support as required.
Data Collection Notes
For the baseline survey, MCA-N and NORC/Survey Warehouse surveyed eligible applicants between December 2011 and July 2014. This period corresponds roughly to that during which random assignment was conducted for various VTGF-funded trainings. Ideally, applicants for a given training would be surveyed after random assignment (at which point their telephone contact information was made available for survey purposes) but before the start of training, so that the baseline data would provide information on VTGF applicants before training. However, in practice, the baseline survey almost always was conducted after training had started-often several months later (the mean delay was 3.4 months after the start of training, and the median was 2 months after the start of training). Less than 1 percent of the sample was surveyed on or before the training start date.
Baseline interviews were conducted in English using a computer-assisted telephone interview system. Between December 2011 and August 2012, MCA-N staff conducted the interviews. For the remainder of the data collection period, staff from Survey Warehouse conducted the interviews from their offices in Windhoek, with oversight by on-site supervisors. NORC managed the data collection effort by Survey Warehouse, providing training and ongoing oversight and support.
The follow-up survey was conducted from March 2014 to April 2016. Although the plan was for the follow-up survey to occur roughly one year after the scheduled end of each training, in practice the timing varied considerably (between 6 and 28 months). However, the median was close to one year after the end of training (13 months).
Follow-up interviews were conducted in English, Afrikaans, or Oshiwambo using a computer-assisted telephone interview system. Survey Warehouse collected these data between March and July 2014, with oversight from NORC. Mathematica took over oversight of the follow-up data collection in February 2015 (when the next cohort was due for follow-up) through the end of the follow-up survey period in April 2016.
The VTGF baseline survey was originally developed by Millennium Challenge Account-Namibia (MCA-N). It was designed as a computer-assisted survey to be conducted by telephone, in English. The survey collected data on basic demographic characteristics of the applicants, together with a range of outcome measures that focused on the applicants' vocational training history, employment status, and earnings and income. Minor changes were made to the instrument when NORC/Survey Warehouse too over the data collection from MCA-N, and again when Mathematica joined the evaluation. These involved adjusting the wording of some questions, adding or removing some questions, and making some changes in question order and skip patterns. Despite these changes, the basic survey instrument and methodology remained similar over time, enabling us to combine data from different periods for the analysis. The questionnaire, marked to show changes over time, is provided as part of the baseline data package.
The VTGF follow-up survey was developed by Mathematica, and was also a computer-assisted survey that was conducted by telephone. The survey was developed in English and was translated into Afrikaans and Oshiwambo; the translated versions were used for respondents who were not comfortable in English. The survey included the following modules: (1) education and vocational training; (2) employment and earnings; (3) income and household demographics; and (4) health behaviors (realted to HIV/AIDS and pregnancy). The questionnaires (in all languages) are provided as part of the follow-up data package.
For the baseline data, MCA-N cleaned the data that they collected and provided a clean SPSS data file to NORC. NORC cleaned the data collected by SW, combined it with the MCA-N data, and provided a clean SPSS file to Mathematica. Mathematica conducted additional cleaning of this combiined file in Stata, which included checking the validity of variable values and ranges; verifying skip patterns; cleaning and back-coding common "other-specify" responses; creating binaries of categorical variables; checking and correcting for duplicate observations (applicants who applied to multiple trainings and were surveyed twice); and recoding skips, missing data, and other non-response values to standardized lettered indicators. Mathematica then merged these data with a database of eligible training applicants to identify the training to which each individual applied, as well as their assigned treatment status. Applicants who applied to multiple trainings were assigned to the first training to which they applied for analytic purposes. Only trainees who applied for the 28 trainings initially included in the evaluation were retained in the final baseline dataset.
For the follow-up data, Mathematica conducted cleaning of the raw data file provided by Survey Warehouse in Stata, similar to the cleaning conducted on the baseline data. Mathematica then merged these data with variables from the baseline survey dataset that would be used in the follow-up analysis. These variables were those related to the training to which each individual applied, their assigned treatment status, basic demographic characteristics, and pre-VTGF training experience. The remaining variables in the baseline dataset were not used in the follow-up analysis and were therefore not included in the follow-up dataset. Only trainees who applied for the 26 trainings included in the final evaluation appear in the follow-up dataset.
Baseline data collected by MCA-N were entered into CSPro during the telephonic interviews with respondents. A subset of interviews was validated by having other MCA-N staff contact respondents again with a short list of questions included in the original survey, and comparing the responses.
For the remaining baseline data and the follow-up data, SW enumerators directly entered survey responses into a web based system, allowing for real time logic and consistency checks. SW placed a strong emphasis on gathering high quality data from respondents. To this end, multiple supervision and quality control measures took place:
1. The supervisors regularly reviewed cases to ensure surveys were properly completed, consistent, and that the respondent was correctly identified.
2. During data collection supervisors back-checked 15% of the sample, spread out evenly across all the enumerators. For the back-checks, supervisors called the survey respondents to ask them key survey questions using a validation form. This validation form includes information such as confirmation that the interview took place; the approximate time taken by the interview; and checking critical variables for completeness. The responses obtained during validation were data entered by SW staff for comparison to responses obtained during the interview itself.
3. Supervisors observed (involving listening in on) 5% of interviews to ensure the proper execution of the survey and provide constructive feedback to enumerators.
4. While conducting interviews, enumerators used a paper-based form to track issues such as item non-response or system/computer errors during the interview, and entered the data directly into a comment section once the interview was complete.
Estimates of Sampling Error
The survey data were intended to cover the universe of applicants to the included trainings, and did not involve any sampling. The only source of error in the estimated means is survey non-response. Users can therefore rely on standard formulae to calculate the sampling error for the estimated means.
Standard errors for differences between the treatment and control groups were estimated in a linear regression framework that accounted for training fixed effects. No other adjustments to the standard errors were necessary.
Monitoring and Evaluation Division of the Millennium Challenge Corporation
Public use files, accessible to all.
Mamun, Arif, Evan Borkum, Malik Mubeen, and Linus Marco."Evaluation of the Vocational Training Grant Fund (VTGF) Subactivity Baseline Survey, 2011-2014". Data submitted to the Millennium Challenge Corporation. Washington, DC: Mathematica Policy Research, September 2015.
Borkum, Evan, Arif Mamun, and Malik Mubeen."Evaluation of the Vocational Training Grant Fund (VTGF) Subactivity Follow-up Survey, 2014-2016". Data submitted to the Millennium Challenge Corporation. Washington, DC: Mathematica Policy Research, September 2017.
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
DDI Document ID
Mathematica Policy Research
Date of Metadata Production
DDI Document version
Version 01 (March 2019): Edited version based on Version 2 (DDI-MCC-NAM-MPR-VTGF-2017-v2) that was done by Millennium Challenge Corporation.