As part of its strategic goal of building a climate for investment, job creation, and sustainable growth, the World Bank has promoted improving the business environment as a key strategy for development, which has led to a systematic effort in collecting enterprise data across countries. The Enterprise Surveys (ES) are an ongoing World Bank project in collecting both objective data based on firms' experiences and enterprises' perception of the environment in which they operate.
An Enterprise Survey is a firm-level survey of a representative sample of an economy's private sector. Firm-level surveys have been conducted since 1998 by different units within the World Bank. Since 2005-2006, most data collection efforts have been centralized within the Enterprise Analysis Unit. The Enterprise Surveys are conducted across all geographic regions and cover small, medium, and large companies. The surveys are administered to a representative sample of firms in the non-agricultural formal private economy. Data are used to create indicators that benchmark the quality of the business and investment climate across countries.
The survey was conducted in Guatemala between October 2017 and May 2018, as part of Enterprise Surveys project, an initiative of the World Bank. Data from 345 establishments was analyzed.
The objective of the Enterprise Survey is to gain an understanding of what firms experience in the private sector. Through interviews with firms in the manufacturing and services sectors, the survey assesses the constraints to private sector growth and creates statistically significant business environment indicators that are comparable across countries
The standard Enterprise Survey topics include firm characteristics, gender participation, access to finance, annual sales, costs of inputs and labor, workforce composition, bribery, licensing, infrastructure, trade, crime, competition, capacity utilization, land and permits, taxation, informality, business-government relations, innovation and technology, and performance measures. Over 90 percent of the questions objectively ascertain characteristics of a country's business environment. The remaining questions assess the survey respondents' opinions on what are the obstacles to firm growth and performance.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
The primary sampling unit of the study is the establishment. An establishment is a physical location where business is carried out and where industrial operations take place or services are provided. A firm may be composed of one or more establishments. For example, a brewery may have several bottling plants and several establishments for distribution. For the purposes of this survey an establishment must make its own financial decisions and have its own financial statements separate from those of the firm. An establishment must also have its own management and control over its payroll.
v01, edited, anonymous dataset for public distribution
The scope of the study includes:
- Infrastructure and Services
- Sales and Supplies
- Management Practices
- Degree of Competition
- Land and Permits
- Business-Government Relations
- Business Environment
Regional stratification for the Guatemala ES was done across two regions: Greater Guatemala City, and Rest of the country.
The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
Producers and sponsors
The sample for 2017 Guatemala ES was selected using stratified random sampling, following the methodology explained in the Sampling Note. Stratified random sampling was preferred over simple random sampling for several reasons:
a. To obtain unbiased estimates for different subdivisions of the population with some known level of precision.
b. To obtain unbiased estimates for the whole population. The whole population, or universe of the study, is the non-agricultural economy. It comprises: all manufacturing sectors according to the group classification of ISIC Revision 3.1: (group D), construction sector (group F), services sector (groups G and H), and transport, storage, and communications sector (group I). Note that this definition excludes the following sectors: financial intermediation (group J), real estate and renting activities (group K, except sub-sector 72, IT, which was added to the population under study), and all public or utilities-sectors.
c. To make sure that the final total sample includes establishments from all different sectors and that it is not concentrated in one or two of industries/sizes/regions.
d. To exploit the benefits of stratified sampling where population estimates, in most cases, will be more precise than using a simple random sampling method (i.e., lower standard errors, other things being equal.)
e. Stratification may produce a smaller bound on the error of estimation than would be produced by a simple random sample of the same size. This result is particularly true if measurements within strata are homogeneous.
f. The cost per observation in the survey may be reduced by stratification of the population elements into convenient groupings.
Three levels of stratification were used in this country: industry, establishment size, and region. The original sample design with specific information of the industries and regions chosen
Industry stratification was designed in the way that follows: the universe was stratified as into manufacturing, retail and other services industries- Manufacturing (ISIC Rev. 3.1 codes 15 - 37), Retail (ISIC code 52) and Other Services (ISIC codes 45, 50, 51, 55, 60-64, and 72).
For the Guatemala ES, size stratification was defined as follows: small (5 to 19 employees), medium (20 to 99 employees), and large (100 or more employees).
Regional stratification for the Guatemala ES was done across two regions: Greater Guatemala City, and Rest of the country.
Survey non-response must be differentiated from item non-response. The former refers to refusals to participate in the survey altogether whereas the latter refers to the refusals to answer some specific questions. Enterprise Surveys suffer from both problems and different strategies were used to address these issues.
Item non-response was addressed by two strategies:
a- For sensitive questions that may generate negative reactions from the respondent, such as corruption or tax evasion, enumerators were instructed to collect the refusal to respond (-8) as a different option from don't know (-9).
b- Establishments with incomplete information were re-contacted in order to complete this information, whenever necessary. However, there were clear cases of low response. The following graph shows non-response rates for the sales variable, d2, by sector. Please, note that for this specific question, refusals were not separately identified from "Don't know" responses.
Survey non-response was addressed by maximizing efforts to contact establishments that were initially selected for interview. Attempts were made to contact the establishment for interview at different times/days of the week before a replacement establishment (with similar strata characteristics) was suggested for interview. Survey non-response did occur but substitutions were made in order to potentially achieve strata-specific goals; whenever this was done, strict rules were followed to ensure replacements were randomly selected within the same stratum. Further research is needed on survey non-response in the Enterprise Surveys regarding potential introduction of bias.
Since the sampling design was stratified and employed differential sampling, individual observations should be properly weighted when making inferences about the population. Under stratified random sampling, unweighted estimates are biased unless sample sizes are proportional to the size of each stratum. With stratification the probability of selection of each unit is, in general, not the same. Consequently, individual observations must be weighted by the inverse of their probability of selection (probability weights or pw in Stata.)
Special care was given to the correct computation of the weights. It was imperative to accurately adjust the totals within each region/industry/size stratum to account for the presence of ineligible units (the firm discontinued businesses or was unattainable, education or government establishments, no reply after having called in different days of the week and in different business hours, no tone in the phone line, answering machine, fax line, wrong address or moved away and could not get the new references). The information required for the adjustment was collected in the first stage of the implementation: the screening process. Using this information, each stratum cell of the universe was scaled down by the observed proportion of ineligible units within the cell. Once an accurate estimate of the universe cell (projections) was available, weights were computed using the number of completed interviews.
Dates of Data Collection
Data Collection Mode
Data Collection Notes
Asies was the main contractor that implemented the Guatemala 2017 ES.
The sample frame consisted of listings of firms from two sources. For panel firms the list of 565 firms from the Guatemala 2010 ES was used and for fresh firms (i.e., firms not covered in 2010) firm data from Directorio Nacional de Empresas y sus Locales (DINEL), Banco de Guatemala (Banguat) was used.
The structure of the data base reflects the fact that 2 different versions of the survey instrument were used for all registered establishments. Questionnaires have common questions (core module) and respectfully additional manufacturing- and services-specific questions. The eligible manufacturing industries have been surveyed using the Manufacturing questionnaire (includes the core module, plus manufacturing specific questions). Retail firms have been interviewed using the Services questionnaire (includes the core module plus retail specific questions) and the residual eligible services have been covered using the Services questionnaire (includes the core module). Each variation of the questionnaire is identified by the index variable, a0.
All variables are named using, first, the letter of each section and, second, the number of the variable within the section, i.e. a1 denotes section A, question 1 (some exceptions apply due to comparability reasons). Variable names preceded by the prefix "LAC" indicate questions specific to Guatemala and other countries in Latin America 2016, therefore, they may not be found in the implementation of the rollout in other countries. All other suffixed variables are global and are present in all country surveys over the world. All variables are numeric with the exception of those variables with an "x" at the end of their names. The suffix "x" denotes that the variable is alpha-numeric.
There are 2 establishment identifiers, idstd and id. The first is a global unique identifier. The second is a country unique identifier. The variables a2 (sampling region), a6a (sampling establishment's size), and a4a (sampling sector) contain the establishment's classification into the strata chosen for each country using information from the sample frame. The strata were defined according to the guidelines described above.
There are three levels of stratification: industry, size and region. Different combinations of these variables generate the strata cells for each industry/region/size combination. A distinction should be made between the variable a4a and d1a2 (industry expressed as ISIC rev. 3.1 code). The former gives the establishment's classification into one of the chosen industry-strata based on the sample frame, whereas the latter gives the establishment's actual industry classification (four digit code) based on the main activity at the time of the survey.
All of the following variables contain information from the sampling frame. They may not coincide with the reality of individual establishments as sample frames may contain inaccurate or outdated information. The variables containing the sample frame information are included in the data set for researchers who may want to further investigate statistical features of the survey and the effect of the survey design on their results.
-a2 is the variable describing sampling regions
-a6a: coded using the same standard for small, medium, and large establishments as defined above.
-a4a: coded following the stratification by sector as defined above.
The surveys were implemented following a 2 stage procedure. Typically first a screener questionnaire is applied over the phone to determine eligibility and to make appointments. Then a face-to-face interview takes place with the Manager/Owner/Director of each establishment. However, sometimes the phone numbers were unavailable in the sample frame, and thus the enumerators applied the screeners in person. The variables a4b and a6b contain the industry and size of the establishment from the screener questionnaire.
Note that there are variables for size (l1, l6 and l8) that reflect more accurately the reality of each establishment. Advanced users are advised to use these variables for analytical purposes. Variables l1 (number of permanent full-time workers at the end of the last complete fiscal year), l6 (number of full-time seasonal workers employed during last complete fiscal year) and l8 (average length of employment of full-time temporary employees during last complete fiscal year) were designed to obtain a more accurate measure of employment accounting for permanent and temporary employment. Special efforts were made to make sure that this information was not missing for most establishments.
The last complete fiscal year is January to December 2016. For questions pertaining to monetary amounts, the unit is the Guatemalan Quetzal.
Enterprise Analysis Unit
Confidentiality of the survey respondents and the sensitive information they provide is necessary to ensure the greatest degree of survey participation, integrity and confidence in the quality of the data. Surveys are usually carried out in cooperation with business organizations and government agencies promoting job creation and economic growth, but confidentiality is never compromised.
Use of the dataset must be acknowledged using a citation which would include:
- the Identification of the Primary Investigator
- the title of the survey (including country, acronym and year of implementation)
- the survey reference number
- the source and date of download
The World Bank. Guatemala - Enterprise Survey (ES) 2017, Ref. GTM_2017_ES_v01_M. Dataset downloaded from [URL] on [date].
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
DDI Document ID
Development Data Group
The World Bank
Documentation of the DDI
Date of Metadata Production
DDI Document version
Version 01 (November 2018)