The STEP (Skills Toward Employment and Productivity) Measurement program is the first ever initiative to generate internationally comparable data on skills available in developing countries. The program implements standardized surveys to gather information on the supply and distribution of skills and the demand for skills in labor market of low-income countries.
The uniquely-designed Household Survey includes modules that measure the cognitive skills (reading, writing and numeracy), socio-emotional skills (personality, behavior and preferences) and job-specific skills (subset of transversal skills with direct job relevance) of a representative sample of adults aged 15 to 64 living in urban areas, whether they work or not. The cognitive skills module also incorporates a direct assessment of reading literacy based on the Survey of Adults Skills instruments. Modules also gather information about family, health and language.
Kind of data
Sample survey data [ssd]
Version 02, edited anonymous datasets for public distribution.
Version 01 was published in June 2014, but now is replaced with v02.
The difference between v02 and v01 datasets:
- Changes made specifically to Georgia:
1) Labelling of 'yes' and 'no' responses was standardized to labelling in other countries (0 is no and 1 is yes)
2) The labelling of 'other' was also standardized to labelling in other countries (now 7 instead of 19 or 9)
- Changes made to all STEP countries:
1) The literacy variables had incorrect labelling, which has now been fixed
2) The 'emp' variable has been cleaned
3) The 'write_dif' variable has been corrected
4) All monetary variables (identifiable by '_usd') have been converted to PPP dollars
The capital and other urban areas.
Unit of analysis
The units of analysis are the individual respondents and households. A household roster is undertaken at the start of the survey and the individual respondent is randomly selected among all household members aged 15 to 64 included. The random selection process was designed by the STEP team and compliance with the procedure is carefully monitored during fieldwork.
The STEP target population are individuals 15-64 years old, living in urban areas.
In Georgia, the following groups were excluded:
- Residents of institutions (prisons, hospitals, etc.)
- Residents of senior homes and hospices
- Residents of other group dwellings (college dormitories, halfway homes, workers' quarters, etc.)
- Persons living outside the country at the time of data collection
- Persons living in the conflict regions Abkhazia and South Ossetia
Producers and sponsors
STEP Co-Task Team Leader, Education Global Practice
Maria Laura Sanchez Puerta
STEP Co-Task Team Leader, Social Protection and Labor Global Practice
World Bank Consultant, Project Coordinator
Technical assistance in project management, data collection, data processing and data analysis
World Bank Consultant, Senior Labor Economist
Technical assistance in project management, questionnaire design, and data analysis
World Bank Consultant, Survey Consultant
Technical assistance in questionnaire design, sampling methodology, and data collection
Sebastian Monroy Taborda
World Bank Consultant, Research Analyst
Technical assistance in data processing and data analysis
Multi-Donor Trust Fund Labor Markets, Job Creation and Economic Growth
Bank Netherlands Partnership Program
Educational Testing Services
Designed the Reading Literacy Assessment Module and conducted the preliminary analysis of the reading literacy data, including generating plausible values for the Extended Assessment
Stratified 3-stage sample design was implemented in Georgia. The sample was stratified by five geographic areas: Capital, Other Urban Northeast, Other Urban Northwest, Other Urban Southeast, and Other Urban Southwest.
The primary sample unit (PSU) is an electoral precinct; each PSU is uniquely defined by the sample frame variable 'PrecinctID'. The first stage units were selected by a World Bank survey methodologist. The sampling objective was to select 225 PSUs, comprised of 200 initial PSUs and 25 reserve PSUs. Although 225 PSUs were selected, only 200 PSUs were activated; none of the 25 reserve PSUs was activated during data collection. The PSUs were selected using a systematic probability proportional to size (PPS) sampling method, where the measure of size was the estimated number of persons 15 to 64 years old (i.e., final data file variable 'estimatedsize') in a PSU.
The second stage sample unit (SSU) is a household. The sampling objective was to obtain interviews at 15 households within each selected PSU. The households were selected in each PSU using a systematic random method. For the 'Other Urban' strata, in order to provide sufficient sample to allow for a scenario of a 50% response rate the number of sampled cases was doubled in each selected 'Other Urban' electoral precinct. In other words, a reserve sample of 15 households per PSU was selected in each 'Other Urban' stratum. For the Capital stratum, the initial sample size was adjusted to allow for a possible expected response rate of 44%. In the case of the Capital stratum, a reserve sample of 19 households per PSU was selected. Also, during the survey fieldwork, there were 14 PSUs in the Capital stratum where, due to higher than expected non-response, the survey firm drew an additional 34 reserve households. The Precinct IDs for the 14 PSUs are: 2015, 2025, 2033, 2060, 2076, 3003, 3012, 3022, 3029, 3071, 3083, 4014, 5070, and 6106.
The third stage sample unit was an individual 15-64 years old (inclusive). The sampling objective was to select one individual with equal probability from each selected household.
An overall response rate of 62.6% was achieved in the Georgia STEP Survey. Table 18 in "STEP Weighting Procedures Summary" provides detailed percentage distribution by final status code.
While the 3-stage stratified sample design greatly enhanced the operational feasibility of data collection, it resulted in differential probabilities of selection for the selected persons. Consequently, each selected person in the survey does not necessarily represent the same number of persons in the target population. To account for differential probabilities of selection due to the nature of the design and to ensure accurate survey estimates, STEP requires a sampling weight for each person that participated in the survey.
The objectives of the STEP weighting are to construct a set of survey weights to compensate for unequal probabilities of selection, and to compensate for household-level non-response and person-level non-response.
Detailed information about weighting procedures is available in "STEP Weighting Procedures Summary", provided in external resources.
Dates of collection
Mode of data collection
Data collection supervision
Each interviewer team reports to a team supervisor. Interviewers must hand over to their supervisor properly filled questionnaires and reading exercise booklets (for Reading Literacy Assessment), and report all information about the fieldwork conducted.
Team supervisors are responsible for coordinating fieldwork, monitoring interviewers' work, documenting non-response, assigning reading exercise booklets and communicating regularly with a field manager. Also, once the household listing exercise is completed, the team supervisor randomly selects 15 households to be interviewed in the primary sampling unit (PSU), as well as reserve households that may be required to be activated (used) in the case of a non-response by one of the originally selected 15 households.
Field supervision details are outlined in "National Survey Design Planning Report" and "Interviewer's Manual and Team Supervisor's Manual", available in external resources.
The STEP survey instruments include:
- The background questionnaire developed by the World Bank (WB) STEP team
- Reading Literacy Assessment developed by Educational Testing Services (ETS).
All countries adapted and translated both instruments following the STEP technical standards: two independent translators adapted and translated the STEP background questionnaire and Reading Literacy Assessment, while reconciliation was carried out by a third translator.
The WB STEP team and ETS collaborated closely with the Georgian survey firm during the process and reviewed the adaptation and translation to Georgian using a back translation.
The survey instruments were piloted as part of the survey pre-test.
The background questionnaire covers such topics as respondents' demographic characteristics, dwelling characteristics, education and training, health, employment, job skill requirements, personality, behavior and preferences, language and family background.
The background questionnaire, the structure of the Reading Literacy Assessment and Reading Literacy Data Codebook are provided in the document "Georgia STEP Skills Measurement Survey Instruments", available in external resources.
Georgia Caucasus Research Resource Center for Georgia
STEP data management process:
1) Raw data is sent by the survey firm
2) The World Bank (WB) STEP team runs data checks on the background questionnaire data. Educational Testing Services (ETS) runs data checks on the Reading Literacy Assessment data. Comments and questions are sent back to the survey firm.
3) The survey firm reviews comments and questions. When a data entry error is identified, the survey firm corrects the data.
4) The WB STEP team and ETS check if the data files are clean. This might require additional iterations with the survey firm.
5) Once the data has been checked and cleaned, the WB STEP team computes the weights. Weights are computed by the STEP team to ensure consistency across sampling methodologies.
6) ETS scales the Reading Literacy Assessment data.
7) The WB STEP team merges the background questionnaire data with the Reading Literacy Assessment data and computes derived variables.
Detailed information on data processing in STEP surveys is provided in "STEP Guidelines for Data Processing" document, available in external resources. The template do-file used by the STEP team to check raw background questionnaire data is provided as an external resource, too.
Data entry processes are described in the National Survey Design Planning Report (NSDPR), available as an external resource. In most countries, data entry took place at the survey firm's headquarters.
For the background questionnaire data, survey firms could use the World Bank (WB) STEP Data Entry Program (DEP) or design their own. In the latter case, the WB STEP team checks their DEP to ensure it complies with STEP technical standards. The STEP DEP was developed in Excel and mirrored the background questionnaire. Georgia developed their own DEP in CSPro. Yunnan Province of China, Ghana and Vietnam used the WB STEP Data Entry Program. Armenia, Georgia, Bolivia, Colombia, Lao PDR and Sri Lanka developed their own DEP in CSPro. Standards for data entry are detailed in "Guidelines for STEP Data Entry Programs" and summarized in the NSDPR. Double data entry process was required. All range checks and skips were controlled by the program. Consistency checks were also included in the data entry program.
All survey firms were required to score the Reading Literacy Assessment booklets and to enter the data using the Data Entry Program developed by Educational Testing Services (ETS). Double data entry process was required. Consistency checks were also included in the data entry program.
Use of the dataset must be acknowledged using a citation which would include:
- the Identification of the Primary Investigator
- the title of the survey (including country, acronym and year of implementation)
- the survey reference number
- the source and date of download
World Bank. Georgia STEP Skills Measurement Household Survey 2013 (Wave 2). Ref. GEO_2013_STEP-HH_v02_M. Dataset downloaded from [URL] on [date].
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
Development Data Group
The World Bank
Documentation of the study
Version 02 (March 2016)
Changes in v02 of study documentation compared to v01 published in June 2014
- v01 datasets were replaced with v02
- Study Title, Series Information and Abstract were edited