The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.
The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.
Kind of data
Sample survey data [ssd]
Version 02, edited anonymous datasets for public distribution.
Version 01 was published in June 2014, but now is replaced with v02.
The difference between v02 and v01 datasets:
- Changes made to all STEP countries:
1) The literacy variables had incorrect labelling, which has now been fixed
2) The 'emp' variable has been cleaned
3) The 'write_dif' variable has been corrected
4) All monetary variables (identifiable by '_usd') have been converted to PPP dollars
The survey covers the urban area of two largest cities of Vietnam, Ha Noi and HCMCT.
Unit of analysis
The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.
The STEP target population is the population aged 15 to 64 included, living in urban areas, as defined by each country's statistical office. In Vietnam, the target population comprised all people from 15-64 years old living in urban areas in Ha Noi and Ho Chi Minh City (HCM).
The reasons for selection of these two cities include :
(i) They are two biggest cities of Vietnam, so they would have all urban characteristics needed for STEP study, and
(ii) It is less costly to conduct STEP survey in these to cities, compared to all urban areas of Vietnam, given limitation of survey budget.
- The target population is not representative for the national urban population.
The following are excluded from the sample:
- Residents of institutions (prisons, hospitals, etc)
- Residents of senior homes and hospices
- Residents of other group dwellings such as college dormitories, halfway homes, workers' quarters, etc
- Persons living outside the country at the time of data collection
Producers and sponsors
STEP Co-Task Team Leader, Education Global Practice
Maria Laura Sanchez Puerta
STEP Co-Task Team Leader, Social Protection and Labor Global Practice
World Bank Consultant Project Coordinator
Technical assistance in project management, data collection, data processing and data analysis
World Bank Consultant Senior Labor Economist
Technical assistance in project management, questionnaire design, and data analysis
World Bank Consultant Survey Consultant
Technical assistance in questionnaire design, sampling methodology, and data collection
Sebastian Monroy Taborda
World Bank Consultant Research Analyst
Technical assistance in data processing and data analysis
Multi-Donor Trust Fund Labor Markets, Job Creation and Economic Growth
Bank Netherlands Partnership Program
Educational Testing Services
Designed the Reading Literacy Assessment Module and conducted the preliminary analysis of the reading literacy data, including generating plausible values for the Extended Assessment
- The sample of 3405 households was selected from 227 urban Enumeration Areas (EAs) in Ha Noi (107 EAs) and Ho Chi Minh City (120 EAs). From each EA 15 households were selected, so the number of households selected in Ha Noi was 1245 HHs, and in HCM, 2160 HHs.
- The 2009 Population and Housing Census was used as a sample frame.
- Regarding PSUs (EAs), the sampling frame is the list of 15% of total EAs of the 2009 Population Census. Data items on the frame for PSU include provincecode, districtcode, commune code, and EA code; address of EA, number of households.
- Regarding ultimate sampling units (households), sampling frame is a list of (100) households in each EA. Data items on the frame for ultimate sampling units (households) include names of heads of households.
The sample frame includes the list of urban EAs and the count of households for each EA. Changes of the EAs list and household list would impact on coverage of sample frame. In a recent review of Ha Noi, there were only 3 EAs either new or destroyed from 140 randomly selected Eas (2%). GSO would increase the coverage of sample frame (>95% as standard) by updating the household list of the selected Eas before selecting households for STEP.
A detailed description of the sample design is available in section 4 of the NSDPR provided with the metadata.
On completion of the household listing operation, GSO will deliver to the World Bank a copy of the lists, and an Excel spreadsheet with the total number of households listed in each of the 227 visited PSUs.
The response rate for Vietnam (urban) was 62%. (See STEP Methodology Note Table 4).
While the Vietnam Phase 2 three-stage stratified cluster design greatly enhanced the operational feasibility of data collection, it resulted in differential probabilities of selection for the selected persons. Consequently, each selected person in the survey does not necessarily represent the same number of persons in the target population. To account for differential probabilities of selection due to the nature of the design and to ensure accurate survey estimates, STEP requires a sampling weight for each person that participated in the survey.
In general, the objectives of the STEP weighting are to construct a set of survey weights to:
1) compensate for unequal probabilities of selection;
2) compensate for household-level non-response and person-level non-response;
3) adjust the weighted sample distribution for key variables of interest (for example, age, gender, education) so that it conforms to a known population distribution for these variables.
The general weighting procedure for the Vietnam STEP survey required the following tasks.
1) Creation of a data file to input into the weighting process;
2) World Bank (WB) Weight Requirement:
Create survey weights for sampled cases of households and persons that provided sufficient data to be considered a participant in the survey. This requirement does not necessarily include the completion of an assessment General Booklet, and an Exercise Booklet, and all household and individual questionnaire modules.
a) Calculation of a PSU weight for 227 sampled PSUs;
b) Calculation of a household weight for each sampled household; i) Calculation of a household-level non-response adjustment independently for each PSU.
c) Calculation of a person weight for each selected person (SP); i) Calculation of a non-response adjustment independently for each sampled person.
3) The required output from the weighting process is a final Viet Nam data file with the survey design weights (i.e., for each sampled PSU, household, person) appended to each data record.
Dates of collection
Mode of data collection
Data collection supervision
Number of Team leaders
The STEP main study will have 8 team leaders; each team leader will be responsible for a survey team with 4 interviewers.
Team leader Responsibilities
A team leader is directly responsible for all activities of a survey team at local. They have to contact with local authority to ensure all the necessary conditions (local guiders) are ready for activities of the survey team at households. Also team leader has to ensure every day activities of interviewers. GSO will ensure there is a rigorous supervision process and mechanisms (including spot-checks) in place to ascertain an appropriate implementation of the survey (verifying adherence to the sample selected), correct implementation of tests and adherence to established interview protocols. They have to participate in interviews to guarantee the proper implementation of survey procedures as well as the quality of questionnaires; deal with problems in replacement of the household sample or respondent collaboration; The team leaders along with regional supervisors will participate in the interviews of interviewers or contact with households/respondents by phone to check/re-interview to deal with survey problems, especially non-response or completed sample structure.
A field supervisor will revisit each household in the following situations:
a. A household refuses or does not begin the interview because of special circumstances (result codes 1 or 2).
b. A household stops before finishing the Household Module, Module 1.
c. A household where the selected individual is not able to begin the questionnaire - for refusal, for special circumstance, absence, other reasons.
d. A household where the individual stops without finishing the individual modules 2-7.
e. A household where the individual stops during the Reading Exercises Module and refuses to attempt all the items.
If any interviewer's work is found to be suspect, the interviewer will be dismissed and all of the interviews done by that interviewer will be redone in their entirety. Team leaders will report all problems (through weekly report or by phone) to the Regional Supervisors or Centre STEP GSO
Number of Regional Supervisors
The STEP main study will have at least 4 Regional Supervisors who will be supervised by a Central GSO team (Project Manager, Field Manager, Survey Methodologist and Data Manager).
Regional Supervisor Responsibilities
They will be responsible with GSO for supervising survey teams at regions (Ha Noi and HCMCT). They will participate the interviews of interviewers to help them to improve data quality; check the filled questionnaires to find the errors and take over before delivering to GSO; contact with households/respondents through phone to re-interview; and weekly reports with Centre GSO team all technical and fieldwork-organization issues as well as solutions for the problems. Through these activities, the regional supervisors will find out problems and solutions in non-response as well as completed sample structure. In particular, the supervisors will carry out a verification of each interviewer's visits by a revisit to 5% of the households in each interviewer assignment and a follow-up of a further 5% of households by telephone in each interviewer assignment. The households involved in the verification process will be randomly selected within each PSU. If any interviewer's work is found to be suspect, the interviewer will be dismissed and all of the interviews done by that interviewer will be redone in their entirety.
Progress Reporting: Each week during the survey period, GSO will submit to the WB Team a data file containing all the entered survey data to date.
The STEP survey instruments include:
(i) a Background Questionnaire developed by the WB STEP team
(ii) a Reading Literacy Assessment developed by Educational Testing Services (ETS).
All countries adapted and translated both instruments following the STEP Technical Standards: 2 independent translators adapted and translated the Background Questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator.
The WB STEP team and ETS collaborated closely with the survey firms during the process and reviewed the adaptation and translation to Vietnamese (using a back translation).
- The survey instruments were both piloted as part of the survey pretest.
- The adapted Background Questionnaires are provided in English as external resources. The Reading Literacy Assessment is protected by copyright and will not be published.
Vietnam General Statistical Office
STEP Data Management Process
1. Raw data is sent by the survey firm
2. The WB STEP team runs data checks on the Background Questionnaire data.
- ETS runs data checks on the Reading Literacy Assessment data.
- Comments and questions are sent back to the survey firm.
3. The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data.
4. The WB STEP team and ETS check the data files are clean. This might require additional iterations with the survey firm.
5. Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies.
6. ETS scales the Reading Literacy Assessment data.
7. The WB STEP team merges the Background Questionnaire data with the Reading Literacy Assessment data and computes derived variables.
Detailed information data processing in STEP surveys is provided in the 'Guidelines for STEP Data Entry Programs' document provided as an external resource. The template do-file used by the STEP team to check the raw background questionnaire data is provided as an external resource.
Data entry processes, including team composition, are described in each country NSDPR. In most countries, data entry took place at the survey firm's headquarters. Note that in the case of Colombia, as interviews were conducted using handheld devices, data entry per se was only carried out for the Reading Literacy Assessment data.
1. Background Questionnaire Data
For the Background Questionnaire data, survey firms could use the WB STEP Data Entry Program (DEP) or design their own. In the latter case, the WB STEP team checked their DEP to ensure it complied with STEP Technical Standards.The STEP DEP was developed in Excel and mirrored the Background Questionnaire. Vietnam used the STEP DEP.
(i) Countries which used the STEP DEP
- Yunnan Province of China
(ii) Countries which developed their own DEP in CSPro
- Lao PDR
- Sri Lanka
Standards for Data Entry are detailed in the 'Guidelines for STEP Data Entry Programs' and summarized in the NSDPR. Double data entry process was required. All range checks and skips were controlled by the program. Consistency checks were also included in the data entry program.
2. Reading Literacy Assessment Data
All survey firms were required to score the Reading Literacy Assessment booklets and to enter the data using the Data Entry Program developed by ETS. Double data entry process was required. Consistency checks were also included in the data entry program.
A weighting documentation was prepared for each participating country and provides some information on sampling errors.
All country weighting documentations are provided as an external resource.
Public use files, accessible to all
STEP Skills Measurement Program, Household Survey 2014, The World Bank. Ref: VNM_2012_STEP-HH_v02_M. Dataset downloaded from [URL] on [date]
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
(c) STEP 2014, The World Bank
Development Economics Data Group
The World Bank
Documentation of the DDI
Version 02 (March 2016)
Changes in v02 of study documentation compared to v01 published in June 2014
- v01 datasets were replaced with v02
- Study Title, Series Information and Abstract were edited