Skills Profile Survey 2017, A Refugee and Host Community Survey
A Refugee and Host Community Survey
Other Household Survey
The SPS 2017 was conducted in refugee camps and host communities in four regions in Ethiopia. The survey was used to draw a profile for skills and potential opportunities for refugees and host communities to design a better mix of approaches which could help the government in designing livelihood opportunities for these communities. The SPS 2017 contains information on employment, barriers to labor force participation, livelihood structures of refugees before displacement, education and economic conditions as well as access to services, and perceptions. The data combines detailed household questionnaire information with displacement-specific information including drivers of displacement, access to resettlement mechanisms, and return intentions. It also includes comprehensive information on assets and consumption, to allow estimation of poverty based on the Rapid Consumption methodology as detailed in Pape and Mistiaen (2018).
Kind of Data
Sample survey data [ssd]
Unit of Analysis
Household and individual.
The Skills Profile Survey 2017 covered refugees in camps, and surrounding host communities, in Ethiopia. Refugees of four nationalities were surveyed: Eritrean, Somali, South Sudanese, and Sudanese. Only the refugees living in camps were surveyed, because tracing households outside the camps was not feasible. However, 66 percent of all refugees in Ethiopia live in camps, while those that live outside camps are largely Eritrean. Host communities, defined as Ethiopian non-displaced households living within a 5km radius of a camp, were also surveyed.
Producers and sponsors
Utz Johann Pape (IBRD - World Bank)
The SPS 2017 is a household survey with a multi-stage stratified random sample. The sampling frame was the list of refugee camps, sites and locations as of January 2017, provided by UNHCR-Ethiopia. The sample consisted of four strata based on four regions: Tigray Afar (Eritrean refugees), Gambella (South Sudanese refugees), Benishangul Gumuz (Sudanese refugees), and Somali (Somali refugees). Each region hosts predominantly one refugee nationality leading to an implicit stratification based on nationality. In each stratum, camps were divided into Enumeration Areas (EAs) of equal size using GIS technology, and 82 EAs were selected per stratum. Within the stratum, the number of EAs per camp was selected proportional to the size of the camp. Within camps, EAs were selected with equal probability. Within each selected EA, all households were listed and then 12 were randomly selected for interview.
The host community sample consisted of the same four regional strata. Within each stratum, areas within a 5km radius of a camp, were divided into EAs of equal size. Of these EAs, those classified as 'residential' by Open Street Maps were used as the sampling frame. 42 EAs were selected per stratum. Within a stratum, EAs were selected using probability proportional to size, with the probability of selection of an EA corresponding to the area of the EA. Within each selected EA, all households were listed and then 12 were randomly selected for interview.
Deviations from the Sample Design
Due to security concerns, revisions were made to the sample during fieldwork.
Enumerators in Gambella stratum (hosting South Sudanese refugees) faced repeated security threats and could survey only 439 of the intended 900 refugee households in the region. As the survey team was withdrawn from Gambella region, the host community in Gambella region was not surveyed. The remaining interviews with refugees in Gambella region were substituted by oversampling EAs in Benishangul Gumuz, as 25 percent of the refugee population in this region is South Sudanese. In September 2017, violent conflict in Oromia and Somali regions escalated, rendering some of the camps in Somali stratum inaccessible. The EAs of Jijiga sub-region were replaced by EAs in non-violent areas of Somali stratum. Further, as most refugee camps are in remote areas with sparse host populations, the final number of host households surveyed fell short of the original intended sample of 500 host households per stratum. However, despite the changes in sample, the survey captured roughly similar number of refugee households of the four main refugee nationalities.
The sampling weight is the inverse probability of selection. The selection probability for a household can be decomposed into the selection probability of the EA within the stratum, and the selection probability of the household within the EA. For refugees, the selection probability of an EA is calculated as the number of households within the EA divided by the number of households within the stratum, multiplied by the number of selected EAs in the stratum. For the host community, the selection probability of an EA is calculated as the number of EAs selected per stratum divided by the total number of EAs in the stratum. The selection probability for a household within an EA is constant across households, and is calculated as the number of households selected in the EA (usually 12) divided by the total number of households listed in the EA. Sampling weights for refugees were scaled to equal the number of households per strata as per the sampling frame provided by UNHCR. Due to changes in the sample during fieldwork, the number of EAs surveyed in each stratum differed from the original sample. The weights were also scaled to correct for this change.
Dates of Data Collection
Data Collection Mode
Computer Assisted Personal Interview [capi]
Data Collection Notes
Due to insecurities at the time of data collection, the sampling design had to be altered, see sampling section.
The questionnaire contains modules on Household Member Roster, Household Characteristics, Food Consumption, Non food consumption, Livestock, Durable Good, Wellbeing and Opinions and Forced Displacement, Movement and Return Intentions. The questionnaire is available for download with the dataset.
See accompanying Stata do-files, available under the related materials tab.
The data in the 1-CleanInput folder is the processed and anonymized data, but it has not been cleaned yet. The set of do files provided does the cleaning and the cleaned data is then saved in the 1-CleanOutput folder.
Missing values are coded as:
“.” for missing data
“.a” for “don't know”
“.b” for “Refused to respond”
“.z” for “Not asked due to questionnaire skipping patterns”
Utz Johann Pape
Before being granted access to the dataset, all users have to formally agree: 1. To make no copies of any files or portions of files to which s/he is granted access except those authorized by the data depositor. 2. Not to use any technique in an attempt to learn the identity of any person, establishment, or sampling unit not identified on public use data files. 3. To hold in strictest confidence the identification of any establishment or individual that may be inadvertently revealed in any documents or discussion, or analysis. Such inadvertent identification revealed in her/his analysis will be immediately brought to the attention of the data depositor.
The dataset has been anonymized and is available as a Public Use Dataset. It is accessible to all for statistical and research purposes only, under the following terms and conditions: 1. The data and other materials will not be redistributed or sold to other individuals, institutions, or organizations without the written agreement of the World Bank Microdata Library. 2. The data will be used for statistical and scientific research purposes only. They will be used solely for reporting of aggregated information, and not for investigation of specific individuals or organizations. 3. No attempt will be made to re-identify respondents, and no use will be made of the identity of any person or establishment discovered inadvertently. Any such discovery would immediately be reported to the World Bank Microdata Library. 4. No attempt will be made to produce links among datasets provided by the World Bank Microdata Library, or among data from the World Bank Microdata Library and other datasets that could identify individuals or organizations. 5. Any books, articles, conference papers, theses, dissertations, reports, or other publications that employ data obtained from the World Bank Microdata Library will cite the source of data in accordance with the Citation Requirement provided with each dataset.
Use of the dataset must be acknowledged using a citation which would include: - the Identification of the Primary Investigator - the title of the survey (including country, acronym and year of implementation) - the survey reference number - the source and date of download.
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
DDI Document ID
Development Economics Data Group
The World Bank
Documentation of the DDI
Date of Metadata Production
DDI Document version
Version 01 (March 2019)