The Ethiopia Socioeconomic Survey (ESS) is a collaborative project between the Central Statistics Agency of Ethiopia (CSA) and the World Bank Living Standards Measurement Study-Integrated Surveys on Agriculture (LSMS-ISA) team. The objective of the LSMS-ISA is to collect multi-topic, household-level panel data with a special focus on improving agriculture statistics and generating a clearer understanding of the link between agriculture and other sectors of the economy. The project also aims to build capacity, share knowledge across countries, and improve survey methodologies and technology.
ESS is a long-term project to collect panel data. The project responds to the data needs of the country, given the dependence of a high percentage of households in agriculture activities in the country. The ESS collects information on household agricultural activities along with other information on the households like human capital, other economic activities, access to services and resources. The ability to follow the same households over time makes the ESS a new and powerful tool for studying and understanding the role of agriculture in household welfare over time as it allows analyses of how households add to their human and physical capital, how education affects earnings, and the role of government policies and programs on poverty, inter alia. The ESS is the first panel survey to be carried out by the CSA that links a multi-topic household questionnaire with detailed data on agriculture.
Kind of data
Sample survey data [ssd]
- v2.2: Edited, anonymous dataset for public distribution.
Identical to version 1, except that village name was removed from post-harvest data.
National Coverage. ESS2 and ESS3 covered all regional states including the capital, Addis Ababa. The majority of the sample comprises rural areas as it was carried over from ESS1. The ESS2 and ESS3 were implemented in 433 enumeration areas (EAs) out of which 290 were rural, 43 were small town EAs from ESS1, and 100 were EAs from major urban areas.
Unit of analysis
ESS uses a nationally representative sample of over 5,000 households living in rural and urban areas. The urban areas include both small and large towns.
Producers and sponsors
Central Statistical Agency of Ethiopia
The World Bank
National Bank of Ethiopia
The sample is a two-stage probability sample. The first stage of sampling entailed selecting primary sampling units, or CSA enumeration areas (EAs). A total of 433 EAs were selected based on probability proportional to size of the total EAs in each region. For the rural sample, 290 EAs were selected from the AgSS EAs. A total of 43 and 100 EAs were selected for small town and urban areas, respectively. In order to ensure sufficient sample size in the most populous regions (Amhara, Oromiya, SNNP, and Tigray) and Addis Ababa, quotas were set for the number of EAs in each region. The sample is not representative for each of the small regions including Afar, Benshangul Gumuz, Dire Dawa, Gambella, Harari, and Somalie regions. However, estimates can be produced for a combination of all smaller regions as one “other region” category. A more detailed description of the sample design is provided in Section 3 of the Basic Information Document provided under the Related Materials tab.
During wave 3, 1255 households were re-interviewed yielding a response rate of 85 percent.
Attrition in urban areas is 15% due to consent refusal and inability to trace the whereabouts of sample households.
The ESS3 data needs to be weighted to represent the national-level population of rural, small and large town households. A sample weight with post-stratification adjustments was calculated for the households and this weight variable is included in all the datasets.20 It reflects the adjusted probability of selecting the household into the sample. The inverse of this weight can be considered an expansion factor that sums to the total population of households in the nation. When this weight is used in a household-level file, it sums to the population of households. When this weight is used in an individual-level file, it sums to the population of individuals. If the data user wishes to produce an estimate for the population of individuals in a household-level file, an approximate expansion factor is the sample weight times the household size of each household.
The ESS3 sample weights were calculated in two stages. In the first stage, weights were separately calculated or adjusted for the three different sampling frames (rural, small town, and large town21). For the rural and small town sample, the wave 1 weights were adjusted to account for relisting, non-response, and attrition of households in the sample frame between the two waves (wave 1 and wave 3). In each of the waves, the rural and small town EAs were re-listed which reflects EA-specific population growth patterns. The post-stratification adjustment accounts for this change.
Similarly for the mid- and large-town sample, the wave 1 weights were adjusted to account for relisting, non-response, and attrition of households in the sample frame between the two waves (wave 2 and wave 3). In each of the waves, the mid- and large-town EAs were re-listed which reflects EA-specific population growth patterns. The post-stratification adjustment accounts for this change.
Dates of collection
Post-planting agriculture and Livestock questionnaires
Crop cut questionnaire
Household, Community, and Post-harvest agriculture questionnaire
All modules in large town EAs
Mode of data collection
Data collection supervision
Routine supervision by CSA’s field supervisors entailed the field-level coordination by all CSA branch offices. Branch level statisticians and supervisors who were assigned to this project conducted the routine supervision. The branch supervisors made three extended visits to the EAs between September, 2015 and April, 2016. As noted above, one field supervisor checked the work of three enumerators in three EAs. The last visit was combined with community interviews that were conducted by the supervisors themselves. Up to two branch statisticians were also in the field to check the work of the supervisors and enumerators.
Additional supervision was conducted by CSA head office experts and Bank staff and consultants. The teams’ first visit was held in September-December 2015 when interviews with the Post-planting, crop-cut and livestock questionnaires were being conducted. The second visit was in February-April 2016 when the household, community, and post-harvest agriculture data were being collected.
The survey consisted of five questionnaires. These questionnaires are similar with the questionnaires used during in the ESS1 and ESS2 with revisions based both on the results of the ESS2 and also on identified areas of need for new data (see Section 7 of the Basic Information Document provided under the Related Materials tab). The household questionnaire was administered to all households in the sample. The community questionnaire was administered to a group of community members to collect information on the socio-economic indicators of the enumeration areas where the sample households reside.3 The three agriculture questionnaires consisting of a post-planting agriculture questionnaire, post-harvest agriculture questionnaire and livestock questionnaire were administered to all household members (agriculture holders) who are engaged in agriculture activities. A holder is a person who exercises management control over the operations of the agricultural holdings and makes the major decisions regarding the utilization of the available resources. S/he has technical and economic responsibility for the holding. S/he may operate the holding directly as an owner or as a manager. Hence it is possible to have more than one holder in single sampled households. As a result we have administered more than one agriculture questionnaire in a single sampled household if the household has more than one holder.
Household questionnaire: The household questionnaire provides information on basic demographics; education; health (including anthropometric measurement for children); labor and time use; saving; food and non-food expenditure; household nonfarm income-generating activities; food security and shocks; safety nets; housing conditions; assets; credit; and other sources of household income (Table 2.1). Household location is geo-referenced in order to be able to later link the ESS data to other available geographic data sets (See Appendix 1 for discussion of the geo-data provided with the ESS).
Community questionnaire: The community questionnaire solicits information on infrastructure; community organizations; resource management; changes in the community; key events; community needs, actions and achievements; and local retail price information (Table 2.2).
Agriculture questionnaire: The post-planting and post-harvest agriculture questionnaires focus on crop farming activities and solicit information on land ownership and use; farm labor; inputs use; GPS land area measurement and coordinates of household fields; agriculture capital; irrigation; and crop harvest and utilization. The livestock questionnaire collects information on animal holdings and costs; and production, cost and sales of livestock by products (Table 2.3). The livestock module implemented in ESS3 is significantly difference from the module implemented in ESS1 and ESS2.
The interviews were carried out using pen-and-paper (PAPI) as well as computer-assisted personal interviewing (CAPI) method. A concurrent data entry arrangement was implemented for PAPI. In this arrangement, the enumerators did not wait until all the interviews were completed. Rather, once the enumerators completed approximately 3-4 questionnaires, supervisors collected these interviews from enumerators and brought them to the branch offices for data entry. This process took place as enumerators continued administering interviews with other households. Then questionnaires were keyed at the branch offices as soon as they were completed using the CSPro data entry application software. The data from the completed questionnaires were then checked for any interview or data entry errors using a STATA program. Data entry errors were flagged for the data entry clerks and the interview errors were then sent to back to the field for correction and feedback to the ongoing interviews. Several rounds of this process were undertaken until the final data files were produced. Additional cleaning was carried out, as needed, by checking the hard copies. In ESS3, CAPI (with a Survey Solutions platform) was used to collect the community data in large town areas.
The electronic datasets are organized by questionnaire with the following labels on file names in parentheses: household (hh), community (com), post-planting agriculture (pp), post-harvest agriculture (ph), and livestock (ls). The data within each questionnaire do not contain any constructed variables. For example, the ESS data provide most all variables needed to construct an estimate of total household consumption, but the data set does not contain an estimated value of total consumption. The only compiled data that are included with the ESS files are the geo-spatial variables described in Section 5.2 of the Basic Information Document
The ESS collects confidential information on respondents. The confidential variables pertain to (i) names of the respondents to the household and community questionnaires, (ii) village and constituency names, (iii) descriptions of household dwelling and agricultural field locations, (iv) phone numbers of household members and their reference contacts, (v) GPS-based dwelling and agricultural field locations, (vi) names of the children of the head/spouse living elsewhere, (vii) names of the deceased household members, (viii) names of individuals listed in the network roster, and (ix) names of field staff. To maintain confidentiality, this information is not included in the ESS public use data.
To partially satisfy user interest in geo-referenced location, while preserving the confidentiality of sample household and communities, modified EA-level coordinates are provided as part of the household geovariable table. Modified coordinates are generated by applying a random offset within a specified range to the average EA value (following the MeasureDHS approach). For households that have moved between waves 1 and 3, and are more than 5 km from their baseline location, the offset is with respect to the new household location. More specifically, the coordinate modification strategy relies on random offset of EA center-point coordinates (or average of household GPS locations by EA in ESS) within a specified range determined by the urban and rural classification. For small towns and urban areas, an offset range of 0-2 km is used. In rural areas, where communities are more dispersed and risk of disclosure may be higher, a range of 0-5 km offset is used. Additionally, an offset range of 0-10 km is applied to 1% of EAs, effectively increasing the known range for all points to 10 km while introducing only a small amount of noise. Offset points are constrained at the zone level, so that they still fall within the correct zone for spatial joins, or point-in-polygon overlays. The result is a set of coordinates, representative at the EA level, that fall within known limits of accuracy. Users should take into account the offset range when considering different types of spatial analysis or queries with the data. Analysis of the spatial relationships between locations in close proximity would not be reliable. However, spatial queries using medium or low resolution datasets should be minimally affected by the offsets
Before being granted access to the dataset, all users have to formally agree:
1. To make no copies of any files or portions of files to which s/he is granted access except those authorized by the data depositor.
2. Not to use any technique in an attempt to learn the identity of any person, establishment, or sampling unit not identified on public use data files.
3. To hold in strictest confidence the identification of any establishment or individual that may be inadvertently revealed in any documents or discussion, or analysis. Such inadvertent identification revealed in her/his analysis will be immediately brought to the attention of the data depositor.
- Public use files, accessible to all
Use of the dataset must be acknowledged using a citation which would include:
- the Identification of the Primary Investigator
- the title of the survey (including country, acronym and year of implementation)
- the survey reference number
- the source and date of download
Central Statistical Agency of Ethiopia. Ethiopia Socioeconomic Survey,Wave 3 (ESS3) 2015-2016. Public Use Dataset. Ref: ETH_2015_ESS_v02_M. Downloaded from[URL] on [Date]
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
LSMS Database Manager
Development Economics Data Group (The World Bank)
World Bank Microdata Library
Development Economics Data Group (The World Bank)
Development Economics Data Group
The World Bank
Documentation of the DDI
Version 02 (January 2018)
Identical to version 1, except village name removed from post-harvest data.