Electricity Transmission and Distribution Systems Rehabilitation & Extension 2011
Mathematica's evaluation will assess the impacts of the T&D activity and the financing scheme initiative on communities and households using quantitative data from the community and households surveys. Mathematica will investigate whether the compact activities led to improved economic, educational, and health outcomes at the household and community levels. A case study of the relationship between the T&D activity and financial outcomes among enterprises in a sub-set of communities in the Tanga region will be conducted using data from the enterprise survey.
For the T&D evaluation, Mathematica is using a difference-in-differences design. For the financing scheme evaluation, a cluster random assignment design (with communities as the unit of assignment) is being used.
The National Rural Electric Cooperative Association (NRECA) International, a U.S.-based firm, conducted the baseline community, household and enterprise surveys in 2011. At baseline, NRECA surveyed 725 communities, 10,298 households, and 59 enterprises. Follow-up surveys are expected to be carried out during the summer and fall of 2015.
The data files provided here include the cleaned, anonymized data from each survey, as well as files containing the variables constructed for analysis.
Kind of data
Sample survey data [ssd]
Anonymized dataset for public distribution
Unit of analysis
Community, household, and enterprise
The community and household surveys target villages (rural) and mitaa (urban), and households, respectively, in the intervention and comparison communities in six regions of the country: Dodoma, Iringa, Mbeya, Morogoro, Mwanza, and Tanga. Households eligilbe for the survey included only those that were neither already connected to the national grid, nor within 30 meters of exisiting power lines. The enterprise survey focuses on standalone enterprises (non-household-based businesses) of any size in eight communities in the Tanga region.
Producers and sponsors
Mathematica Policy Research, Inc.
Millennium Challenge Corporation
The primary sampling unit (PSU) for the community survey was a village (kijiji) in rural areas and a mtaa in urban areas. These are the smallest administrative units for which it was possible to develop a sampling frame. To select communities in which to carry out the baseline surveys, Mathematica obtained a list from MCA-T of 337 communities in which T&D activities were planned. A random sample of 182 communities was selected from this list. Then, using existing data from the Census and other sources on more than 6,100 communities, Mathematica identified 546 potential comparison communities through propensity score matching. The potential comparison communities were chosen from among all of the non-intervention communities in the same region. The community survey was then carried out in the 182 intervention and 546 potential comparison communities, and data from the community survey were used to select the 182 matched comparison communities through another round of propensity score matching. The household survey was then carried out in the 182 intervention and 182 matched comaprison communities.
For the household survey, we used a mtaa as the primary sampling unit (PSU) in urban areas, and a village (kijiji) or a sub-village (kitongoji)as the PSU in the rural areas. To select households for the survey, all households in the selected communities were listed, excluding households already connected to power lines or within about 30 meters of existing lines. Households in the intervention group were sampled based on approximate eligibility for a subsidy pilot intervention that was supposed to target economically more disadvantaged households (later replaced by the financing scheme without targetting). Approximate pilot-eligibility was based on whether or not the household appeared to have two or fewer rooms. The survey team made this determination during the household listing process in the intervention areas. They then oversampled those households so that 40 percent of the resulting sample qualified, compared to 25 percent in the sampling frame. Overall, a total of 10,298 households were interviewed, with 4,767 households in the intervention group and 5,531 households in the comparison group.
The enterprise survey collected data from 59 enterprises from seven intervention and seven comparison communities that were randomly selected from the community sampled for the evaluation in the Tanga region. All standalone enterprises (enterprises not located within the premises of a residence) were sampled irrespective of the number of employees, and the survey was conducted with enterprises that are currently not connected to the national grid as well as those that are connected.
Deviations from sample design
The target sample size for the enterprise survey was 32 enterprises in eight intervention communities, and another 32 enterprises in eight comparison communities. However, one intervention community was dropped because it no longer received new lines under the T&D activity, and one comparison community was dropped because no eligible enterprises were identified there. Also, we originally planned to survey only stand-alone businesses that do not already have access to the national grid and that have five or more employees. However, when all stand-alone businesses in the selected intervention communities were listed, we found that there are relatively few of them in these communities—and almost all of them already have access to the national grid. Subsequently, the evaluation team, in consultation with MCC, MCA-Tanzania, and NRECA, decided to sample businesses that are currently connected to the national grid as well as those that are not connected, and also to not impose any restriction regarding the number of employees in the business.
The response rates for the baseline community survey were 100 percent for the intervention group, and 99.5 percent for the comparison group. For the household survey, the response rates were 91 percent overall, with 81.9 percent among the intervention households, and 95.0 percent among the comparison households. The response rates for the enterprise survey were 100 percent for the intervention group, and 84.4 percent for the comparison group.
For our intervention group, we created weights to adjust for sampling and survey nonresponse. Households in the intervention group were sampled based on approximate eligibility for a subsidy pilot intervention. The pilot-eligible households were oversampled so that 40 percent of the resulting sample would qualify for the subsidy pilot, compared to 25 percent in the sampling frame. The sampling weights for the intervention group households were calculated as the inverse of the probability of sampled. We then adjusted these sampling weights for nonresponse using 18 categories for nonresponse. These categories were based on region and total migration (in-migration plus out-migration as reported in the community survey). We also created weights for the comparison group that account for non-response by community but not for sampling since all households were sampled with equal probability within a community. The household weight variable that account for sampling and survey nonresponse is called FWT.
To account for the household level matching between the intervention and comparison group, we created another weight variable called MATCHWT. Use of these weights would make the estimates of household outcomes representative of communities where large fractions of households are receiving the new T&D lines. Details on the weight variables are available in the T&D baseline report (Chaplin, Mamun and Scurrer, 2012).
Dates of collection
Data collection supervision
NRECA formed a 14-member core survey team consisting of a Project Director, a Team Leader, a Demographic Expert, a Field Coordinator, a Data Entry Specialist, five Field
Supervisors, and four Data Entry Clerks. Forty field enumerators were hired and trained before the field survey program was initiated. Five data collection teams consisting of 8 to 11 trained interviewers and a field supervisor were deployed to the six regions for data collection. Field supervisors were responsible for coordinating with community leaders to facilitate listing and data collection activities in each community, for randomly selecting households and enterprises from the listings, and for ensuring the quality of data collected. Field supervisors reported to a field coordinator who oversaw the work of all five data collection teams and worked with the team leader to ensure that activities conformed to protocols provided by Mathematica.
Community, household, and enterprise questionnaires
Data cleaning was carried out by NRECA at their office in Dar es Salaam, and entailed a pre-specified set of checks, including range, outliers and invalid values for categorical variables (codes for education level, type of fuels used, etc.) using the data-entry program and specialized statistical package program such as SPSS. The list of invalid entries were printed and examined for correction. Data analysts then checked for logical consistency, skip patterns, missing values, and inapplicable answers in all records in the file. Finally, a preliminary tabulation of every variable in the data file (that is, frequency counts of all categorical variables and descriptive analysis [mean median, minimum, and maximum total number of cases, number of missing cases, and non-applicable cases] of all continuous variables) was produced. Printouts of these results were carefully checked and questionable results were marked for correction, prior to delivery of the final dataset to MCA-T.
Hard copy questionnaires completed in the field were delivered to Dar es Salaam each week for data entry on desktop computers. The NRECA team leader and data entry supervisor were responsible for checking the accuracy and consistency of the collectd data and preparing the database for analysis. Questionnaires were checked for completeness and consistency by the data entry supervisor prior to entry, and were entered by trained operators using CSPro. Double-entry was employed for 100% of questionnaires to ensure quality.
Mamun, Arif, Duncan Chaplin, Kathy Buek, John Schurrer, Ebo Dawson-Andoh, and Xiaofan Sun. "Evaluation of the Millennium Challenge Corporation’s Electricity- Transmission and Distribution Line- Extension Activity in Tanzania: Baseline Community and Household Survey Data." Data submitted to the Millennium Challenge Corporation. Washington, DC: Mathematica Policy Research, January 24, 2014.