The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.
The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.
Kind of data
Sample survey data [ssd]
Version 02, edited anonymous datasets for public distribution.
Version 01 was published in June 2014, but is now replaced with v02.
The difference between v02 and v01 datasets:
- Changes made to all STEP countries:
1) The literacy variables had incorrect labelling, which has now been fixed
2) The 'emp' variable has been cleaned
3) The 'write_dif' variable has been corrected
4) All monetary variables (identifiable by '_usd') have been converted to PPP dollars
The STEP target population is the urban population aged 15 to 64 included. Sri Lanka sampled both urban and rural areas.
Areas are classified as rural or urban based on each country's official definition.
Unit of analysis
The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.
The target population for the Sri Lanka STEP survey comprised all non-institutionalized persons 15 to 64 years of age (inclusive) living in private dwellings in urban and rural areas of Sri Lanka at the time of data collection.
The target population excludes:
- Foreign diplomats and non-nationals working for international organizations;
- People in institutions such as hospitals or prisons;
- Collective dwellings or group quarters;
- Persons living outside the country at the time of data collection, e.g., students at foreign universities;
- Persons who are unable to complete the STEP assessment due to a physical or mental condition, e.g., visual impairment or paralysis.
The sample frame for the selection of first stage sample units was the Census 2011/12
Producers and sponsors
STEP Co-Task Team Leader, Education Global Practice
Maria Laura Sanchez Puerta
STEP Co-Task Team Leader, Social Protection and Labor Global Practice
World Bank Consultant Project Coordinator
Technical assistance in project management, data collection, data processing and data analysis
World Bank Consultant Senior Labor Economist
Technical assistance in project management, questionnaire design, and data analysis
World Bank Consultant Survey Consultant
Technical assistance in questionnaire design, sampling methodology, and data collection
Sebastian Monroy Taborda
World Bank Consultant Research Analyst
Technical assistance in data processing and data analysis
Multi-Donor Trust Fund Labor Markets, Job Creation and Economic Growth
Bank Netherlands Partnership Program
Educational Testing Services
Designed the Reading Literacy Assessment Module and conducted the preliminary analysis of the reading literacy data, including generating plausible values for the Extended Assessment
The Sri Lanka sample size was 2,989 households.
The sample design is a 5 stage stratified sample design. The stratification variable is Urban-Rural indicator.
First Stage Sample
The primary sample unit (PSU) is a Grama Niladari (GN) division. The sampling objective was to conduct interviews in 200 GNs, consisting of 80 urban GNs and 120 rural GNs. Because there was some concern that it might not be possible to conduct any interviews in some initially selected GNs (e.g. due to war, conflict, or inaccessibility, for some other reason), the sampling strategy also called for the selection of 60 extra GNs (i.e., 24 urban GNs and 36 rural GNs) to be held in reserve for such eventualities. Hence, a total of 260 GNs were selected, consisting of 200 'initial' GNs and 60 'reserve' GNs. Two GNS from the initial sample of GNs were not accessible and reserve sampled GNs were used instead. Thus a total of 202 GNs were activated for data collection, and interviews were conducted in 200 GNs. The sample frame for the selection of first stage sample units was the list of GNs from the Census 2011/12. Note: The sample of first stage sample units was selected by the Sri Lanka Department of Census & Statistics (DCS) and provided to the World Bank. The DCS selected the GNs with probability proportional to size (PPS), where the measure of size was the number of dwellings in a GN.
Second Stage Sample
The second stage sample unit (SSU) is a GN segment, i.e., GN BLOCK. One GN Block was selected from each activated PSU (i.e., GN). According to the Sri Lanka survey firm, each sampled GN was divided into a number of segments, i.e., GN Blocks, with approximately the same number of households, and one GN Block was selected from each sampled GN.
Third Stage Sample
The third stage sample unit is a dwelling. The sampling objective was to obtain interviews at 15 dwellings within each selected SSU.
Fourth Stage Sample
The fourth stage sample unit is a household. The sampling objective was to select one household within each selected third stage dwelling.
Fifth Stage Sample
The fourth stage sample unit is an individual aged 15-64 (inclusive). The sampling objective was to select one individual with equal probability from each selected household.
Please refer to the Sri Lanka STEP Survey Weighting Procedures Summary for additional information on sampling.
The response rate for Sri Lanka (urban and rural) was 63%. (See STEP Methodology Note Table 4).
The Sri Lanka five-stage stratified cluster design resulted in differential probabilities of selection for the selected persons. Consequently, each selected person in the survey does not necessarily represent the same number of persons in the target population. To account for differential probabilities of selection due to the nature of the design and to ensure accurate survey estimates, the Sri Lanka STEP requires a sampling weight for each person that participated in the survey.
In general, the objectives of the Sri Lanka STEP weighting are to construct a set of survey weights to,
1) Compensate for unequal probabilities of selection;
2) Compensate for household-level non-response and person-level non-response;
3) Adjust the weighted sample distribution for key variables of interest (for example, age, gender, education) so that it conforms to a known population distribution for these variables.
The general weighting procedure for the Sri Lanka STEP survey required the following tasks.
1) Creation of a data file to input to the weighting process;
2) World Bank (WB) Weight Requirement:
Create survey weights for sampled cases of households and persons that provided sufficient data to be considered a participant in the survey. This requirement does not necessarily include the completion of an assessment General Booklet, nor does it necessarily include the completion of all household and individual questionnaire modules.
a) Calculation of a PSU weight for 200 sampled PSUs (i.e., 80 urban PSUs, 120 rural PSUs);
b) Calculation of a household weight for each sampled household; i) Calculation of a household-level non-response adjustment independently for each PSU.
c) Calculation of a person weight for each selected person (SP); i) Calculation of a non-response adjustment independently for each sampled person.
3) The required output from the weighting process is a final Sri Lanka data file with the survey design weights (i.e., for each sampled PSU, household, person) appended to each data record.
Dates of collection
Mode of data collection
Data collection supervision
Number of Supervisors
The STEP main study will have tentatively 110 Interviewers and 23 Supervisors who will be supervised by a Senior Filed Executive, who comes under the Field Manager.
As mentioned in TOR, Nielsen will ensure that there is a rigorous supervision process and mechanism (including spot-checks) in place to ascertain an appropriate implementation of the survey (verifying adherence to the sample selected), correct implementation of tests and adherence to established interview protocols. In particular, the supervisors will carry out a verification of each interviewer's visits by a revisit to 15% of the households in each interviewer assignment and a follow-up of a further 15% of the households by telephone in each interviewer assignment. The households involved in the verification process will be randomly selected within each PSU. However, the telephone penetration in rural areas is very low. In our experience significant levels of respondents in Sri Lanka do not agree to share their telephone numbers and even we can't expect the respondent to share cell numbers. Since many information has been included with more than two hrs. Interview getting a telephone number will be a challenge for some respondents. So we suggested 25% revisits and 5% by telephone to be more realistic. If any interviewer's work is found to be suspicious, the interviewer will be dismissed and all of the interviews done by that interviewer will be redone in their entirety. Nielsen suggests if suspicious work is confirmed then those interviews have to be carried out again with replacement samples, since re-interviewing is not acceptable by the same respondent and those aspects will be discussed in a later stage. As commented by the WB team, if the interviewer has actually been in the HH, the HH can be replaced, but if the interviewer made up the answers without being in the household, or made up all the individual answers, the original HH should be used. As presented and discussed during the Washington training, wherever possible the supervisors will have to carry out the selection of the households (second stage of the sampling design) in their respective PSUs, as newly recommended in the Technical Standards for the Design and Implementation of the STEP survey. They will submit the listings of dwellings and the corresponding sample selection to the Senior Field Executive.
The supervisor will also be in charge of following-up with the households which refused the interview, in order to try and convert these households to taking the interview. If successful, an interviewer will be dispatched to interview that household.
Each of the Supervisors will supervise about 3 - 4 interviewers. The supervisors' responsibilities will include:
• Attend and participate in the interviewer training
• Assign cases to their interviewers and monitor clear productivity and expenses etc.
• Hold a weekly meeting with each interviewer to review the status of each of their cases, find out how much they have worked, review any problem situations, and motivate them to finish on time; they will also need to be available to receive calls from interviewers who have problems throughout the week
• Monitor the progress of data collection, review nonresponse reported by the interviewers, and implement reassignment and conversion procedures • Review interviewers reporting of time and expenses
• Perform 15-25% validation of a designated fraction of each interviewer's work by visiting and telephoning the respondent and asking a brief set of questions
• Edit the data collected from each interviewer if required • Report to the Senior Field Executive/ Manager on a weekly basis (or more frequently if a problem arises) on the progress of the survey in their district/province
Progress Reporting: Nielsen will submit to the WB Team a data file containing all the entered survey data to date bi weekly.
The STEP survey instruments include:
(i) A Background Questionnaire developed by the WB STEP team.
(ii) A Reading Literacy Assessment developed by Educational Testing Services (ETS).
All countries adapted and translated both instruments following the STEP Technical Standards: 2 independent translators adapted and translated the Background Questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator.
- The survey instruments were both piloted as part of the survey pretest.
- The adapted Background Questionnaires are provided in English as external resources. The Reading Literacy Assessment is protected by copyright and will not be published.
The Nielsen Company Lanka (Pvt) Ltd
STEP Data Management Process
1. Raw data is sent by the survey firm
2. The WB STEP team runs data checks on the Background Questionnaire data.
- ETS runs data checks on the Reading Literacy Assessment data.
- Comments and questions are sent back to the survey firm.
3. The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data.
4. The WB STEP team and ETS check the data files are clean. This might require additional iterations with the survey firm.
5. Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies.
6. ETS scales the Reading Literacy Assessment data.
7. The WB STEP team merges the Background Questionnaire data with the Reading Literacy Assessment data and computes derived variables.
Detailed information data processing in STEP surveys is provided in the 'Guidelines for STEP Data Entry Programs' document provided as an external resource. The template do-file used by the STEP team to check the raw background questionnaire data is provided as an external resource.
Data entry processes, including team composition, are described in each country NSDPR. In most countries, data entry took place at the survey firm's headquarters. Note that in the case of Colombia, as interviews were conducted using handheld devices, data entry per se was only carried out for the Reading Literacy Assessment data.
1. Background Questionnaire Data
For the Background Questionnaire data, survey firms could use the WB STEP Data Entry Program (DEP) or design their own. In the latter case, the WB STEP team checked their DEP to ensure it complied with STEP Technical Standards.The STEP DEP was developed in Excel and mirrored the Background Questionnaire. Sri Lanka developed their own DEP in CSPro.
(i) Countries which used the STEP DEP
- Yunnan Province of China
(ii) Countries which developed their own DEP in CSPro
- Lao PDR
- Sri Lanka
Standards for Data Entry are detailed in the 'Guidelines for STEP Data Entry Programs' and summarized in the NSDPR. Double data entry process was required.All range checks and skips were controlled by the program. Consistency checks were also included in the data entry program.
2. Reading Literacy Assessment Data
All survey firms were required to score the Reading Literacy Assessment booklets and to enter the data using the Data Entry Program developed by ETS. Double data entry process was required. Consistency checks were also included in the data entry program.
A weighting documentation was prepared for each participating country and provides some information on sampling errors. Weighting documentation is provided as an external resource.
World Bank. Sri Lanka STEP Skills Measurement Household Survey 2012 (Wave 1). Ref. LKA_2012_STEP-HH_v02_M. Dataset downloaded from [URL] on [date].
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
(c) STEP 2014, The World Bank
Development Economics Data Group
The World Bank
Documentation of the DDI
Version 02 (March 2016)
Changes in v02 of study documentation compared to v01 published in June 2014
- v01 datasets were replaced with v02
- Study Title, Series Information and Abstract were edited