The 2014/2015 Tanzania National Panel Survey (NPS) is the fourth round in a series of nationally representative household panel surveys that collect information on a wide range of topics including agricultural production, non-farm income generating activities, consumption expenditures, and a wealth of other socioeconomic characteristics. All four rounds of the NPS have been implemented by the Tanzania National Bureau of Statistics (NBS). The first round of the survey was conducted over twelve months, from October 2008 to September 2009. The main fieldwork of the second round of the NPS started in October 2010 and finished in September 2011, with specialized tracking teams remaining in the field until November 2011. Similarly, the duration and timing of the field work for the third round of NPS was from October 2012 to November 2013. Field work for the fourth round started in October 2014 and lasted until January 2016.
The main objective of the NPS is to provide high-quality household-level data to the Tanzanian government and other stakeholders for monitoring poverty dynamics, tracking the progress of the Five Year Development Plan (FYDP) II poverty reduction strategy and its predecessor plans, and evaluating the impact of other major, national-level government policy initiatives. As an integrated survey covering a number of different socioeconomic factors, it compliments other more narrowly focused survey efforts, such as the Demographic and Health Survey (DHS) on health, the Integrated Labour Force Survey (ILFS) on labour markets, the Household Budget Survey (HBS) on expenditure, and the National Sample Census of Agriculture (NSCA). Secondly, as a panel household survey in which the same households are revisited over time, the NPS allows for the study of poverty and welfare transitions and the determinants of living standard changes.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
v02: additional information in datasets com_sec_a1a2; com_sec_cb; com_sec_cc; com_sec_cd; com_sec_ce; com_sec_cf; com_sec_cf_id and com_sec_cg
The 2014-15 NPS covers the following topics:
- HOUSEHOLD: Household identification; Survey staff details; Household member roster; Education, Health, Labour; Food outside the household; Subject welfare; Food security; Housing, water and sanitation; Consumption of food over the past one week; Non-food expenditures (past one week & one month); Non-food expenditures (past twelve months); Household assets; Family/household non-farm enterprises; Assistance and groups; Credit; Finance; Recent shocks to household welfare; Deaths in the household; Household recontact information; Filter questions; Anthropometry.
- AGRICULTURE:Household roster; Plot roster; Plot details; Crops by plot; Crops - Household totals ( production and sales); Permanent crops by plot; Permanent crops - Household totals (production and sales); Input vouchers; Outgrower schemes and contract farming; Processed agricultural products and agricultural by-products; Farm implements and machinery extension; Extension.
- LIVESTOCK AND FISHERY: Household member roster; Livestock stock; Animal health; Feed, water, housing, breeding; Livestock-labour; Milk; Animal power & dung; Other livestock products; Fishery- Household labour; Fishery- Hired labour; Fishing inputs; Fisheries output; Fish trading.
- COMMUNITY: Community identification; Survey staff details; Access to basic services; Investments projects; Land use; Demographics, land & livestock, Market prices; Local units.
Designed for analysis of key indicators at four primary domains of inference, namely:
Dar es Salaam,
The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.
Producers and sponsors
National Bureau of Statistics
Ministry of Finance and Planning
World Bank Living Standard Measurement Study Team
Bill and Melinda Gates Foundation
NPS Technical Committee
The NPS sample was refreshed for NPS 2014/2015. Longitudinal surveys tend to suffer from bias introduced by households leaving the survey over time (i.e. attrition). Although the NPS maintains a highly successful recapture rate (roughly 96% retention at the household level), minimizing the escalation of this selection bias, a refresh of longitudinal cohorts is typically done to ensure proper representativeness of estimates while maintaining a sufficient primary sample to maintain cohesion within panel analysis. Additionally, the refreshing of a longitudinal sample realigns the sample with any changes in administrative boundaries, demographic shifts, or updated population information. In the case of Tanzania, a newly completed Population and Housing Census (PHC) in 2012 providing updated population figures, along with changed in administrative boundaries, emboldened an opportunity to realign the NPS sample.
Similar to the sample in NPS 2008/2009, the sample design for the “Refresh Panel” allows analysis at four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar.
The sample design is a stratified two-stage design. The design consists of 51 design strata (identified in the data as ‘strataid’) corresponding to a rural/urban designation for each of the 26 regions; however, Dar es Salaam is pure urban and therefore constitutes only one stratum. The allocation across the design strata was informed by the last round of the NPS and seeks to balance multiple survey objectives and maximize precision given survey parameters. The intended sample design consisted of a new selection of 3,360 households corresponding to 420 EAs from the latest PHC in 2012. This new cohort in NPS 2014/2015 will be maintained and tracked in all future rounds between national censuses.
A nationally representative sub-sample was selected to continue as part of an “Extended Panel”. This “Extended Panel” allowed general comparison of sample groups and monitoring indicator comparability. The “Extended Panel” is not included in the initial NPS 2014/2015 data release.
Deviations from the Sample Design
During the survey, one selected EA was demolished and subsequently not interviewed. The resulting sample consists of 3,352 households across 419 EAs.
The NPSY4 “Refresh Panel” sample was a stratified two-stage sample design. The sample was stratified along two dimensions: (i) 26 regions, and (ii) rural/urban designation within each region. The combination of these two dimensions yields 51 independent strata. The first stage of sampling involved the selection of survey clusters with the probability of selection proportional to cluster size within a stratum. Following a listing exercise, eight households were selected with systematic random selection. Additionally, three households were randomly selected within each EA in case of possible household non-response.
The expansion factors are Winsorized for the top 1 percentile and post-stratified to 2015 regional household projections. The NPS 2014/2015 household cluster weight, variable “y4_weight”, has been integrated into Section A (“HH_SEC_A”) of the household data files. Additionally, unique identifiers for the first-stage sampling units, “clusterid”, and for the sampling strata, “strataid” can also be located in Section A of the household data files. The complex sample design must be taken into account to ensure proper calculation of standard errors.
Dates of Data Collection
Data Collection Mode
Data Collection Notes
The field team supervisors were trained for four days prior to the main enumerator training. The field staff was trained in Morogoro in September 2014 over a period of three weeks with enumerator and data entry training done concurrently. During a standard training week, four days were spent in classroom, and one day in field training. On each Saturday of the training month, the field staff was debriefed on the previous day’s field exercise and what they had learned over the previous week. Over the three week training period, the field staff spent one week on the Household Questionnaire, and a week and a half on the Agricultural Questionnaire, Livestock/Fishery Questionnaire, and tracking. The last three days of the training were devoted to field practice. Select households from an MCAT survey conducted in 2010 were revisited to provide the team supervisors practice with conducting tracking during fieldwork. After the pilots, extensive discussion and revisions were conducted with the participation of all team supervisors.
Over the training period, three tests were administered to the field teams. The goal was to gain feedback from the training sessions and to select the enumerators. Overall, there were 55 enumerator candidates, with 48 being selected. Interviewer manuals were developed with detailed instructions for field staff during training and as the main reference guide for the survey over the course of the fieldwork. At the end of the training, the enumerators were each provided with an interviewer manual in Kiswahili.
The main data collection began in October 2014 and finished in October 2015, with tracking fieldwork continuing until the end of January 2016. The survey was primarily implemented by eight mobile field teams, each composed of: one supervisor, five or six enumerators, one data entry technician, and one driver. Seven mobile field teams were responsible for different regions on the mainland and one team was responsible for all of Zanzibar.
Field teams visited each cluster for three to four days. The questionnaires were administered to the selected households over the course of that time. This allowed the field team to make return visits to the household to complete the entire Household Questionnaire, Agriculture Questionnaire for farming households, and Livestock & Fisheries Questionnaire for households engaged in livestock or fisheries activities. To ensure the depth and quality of each section of the survey, the questionnaire was administered across multiple respondents to the most knowledgeable about each topic. For all of the sampled households, areas of all owned and/or cultivated agricultural plots were measured via GPS unless the household refused, the terrain was too difficult, or if the plot was more than one hour from the location of the household. Anthropometric measurements were taken for all individuals that were at home, not too ill, and willing to participate.
When the field teams enter a new cluster, they listed all of the households within the boundaries of the EA. This consisted of collecting basic information on the households in the EA, including name of head of household, contact information, and size of household. After all the households in the EA had been listed, the information was then entered into a data entry program in CSPro. Total listing household counts where compared with previous census counts and when significant variation existed, listing accuracy was confirmed. After all the information has been entered, the application would then select with systematic random selection and report eight households in the EA to be interviewed by the team. The application additionally provided three randomly selected replacement households.
Data Processing & Management:
The NPS 2014/2015 contains a robust, multi-level quality assurance and data management system. Great effort was placed on the development and utilization of this system by the NBS, with technical assistance from the World Bank, to assist in the management of the complex household panel survey and address the growing need for high quality timely data.
The NPS utilizes a concurrent field entry system known as CAFE, or Computer Assisted Field Entry. This system was selected to increase the availability of data for review by managing staff as well as to provide regular and consistent quality assessment of data directly to the field staff. As with the earlier rounds, CSPro was used for data entry and initial quality reporting while STATA was utilized to perform complex aggregated checks. Building off of the work conducted for the NPS 2010/2011 and NPS 2012/2013, the NPS 2014/2015 data entry application further develops the quantity and complexity of data quality checking routines while simplifying reporting. Furthermore, due to the panel nature of the survey, where applicable and appropriate, data was checked against data from previous rounds.
As data entry took place while in the interview area, when data issues were identified and reported the field teams would return to households and clarify and correct inconsistent information prior to the transmission of the data to headquarters. Data files from completed clusters were transmitted to NBS headquarters via syncing to a server using 3G USB modems. Received data files were concatenated at the headquarters, and regular checks were performed to ensure the fieldwork was proceeding according to the schedule and that quality standards were met. During the course of field work, data was routinely checked at the aggregate level to identify any potential issues and, where identified, additional checks where integrated into the CAFE system.
Throughout the course of field work, the field teams regularly sent the paper questionnaires back to the NBS headquarters for further processing. Once the paper questionnaires and data files for completed EAs were received at NBS headquarters, a double-entry procedure was implemented. Six data entry operators were hired by NBS to perform the second data entry for the paper questionnaires into the CSPro-based data entry system for all questionnaires administered. A comparison between the entered values in the field based data entry and headquarters based data entry was conducted and any discrepancies in values between the two were flagged for manual inspection of the physical questionnaire and corrected. The application of the third level of data consistency validation further allowed for the assessment of the quality of the entry work performed by both the field entry staff and the headquarters based entry staff. Regular feedback was supplied to data entry staff resulting in improved quality where needed and overall efficiency.
Additional data cleaning was conducted as the final stage of the data processing. Further adjustment of the data post-entry was conducted under the principle of absolute certainty where adjustments must be evidence-based and correction values true beyond a reasonable doubt. As such, the resulting final data files may still contain some inconsistencies and outliers. Handling of these values is thus left entirely to the data user.
Throughout the data processing system, versions of the data are archived at all key steps and all checking and cleaning syntax documented and archived.
Estimates of Sampling Error
The sample of households selected in the NPS 2014/2015 is only one of many samples that could have been selected from the same population. Each alternative sample would yield slightly different from the results of the selected sample. Sampling errors are a measure of the variability between all possible samples and although the degree of variability cannot be directly observed, it can be estimated from the survey results and statistically evaluated. A sampling error can be measured in terms of the standard error for a particular statistic.
The computer software program STATA used estat effects to calculate sampling errors for the NPS 2014/2015. In addition to the standard error, STATA computed the design effect (DEFF) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFF value of 1.0 indicates that the sample design is as efficient as a simple random sample, while a value greater than 1.0 indicates the increase in the sampling error is due to the use of a more complex and less statistically efficient (but perhaps more logistically efficient) design. STATA also computed the relative error and confidence limits for the estimates.
Sampling errors for the NPS 2014/2015 are calculated for selected variables considered to be of primary interest at the household and individual levels. The results are presented in the BID Appendix A at the national level and for each of the four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar. For each variable of interest, the value of the statistic (R), its standard error (SE), the number of cases, the design effect (DEFF), the relative standard error (SE/R), and the 95 percent confidence limits (R±2SE) are provided in Tables 1-10 in the BID. The DEFF is considered undefined when the standard error in a simple random sample is zero (when the estimate is close to 0 or 1).
The Primary Data Investigator undertakes that no attempt will be made to identify any individual person, family, business, enterprise or organization. If such a unique disclosure is made inadvertently, no use will be made of the identity of any person or establishment discovered and full details will be reported to the NBS. The identification will not be revealed to any other person not included in the Data Access Agreement.
The dataset has been anonymized and is available as a Public Use Dataset. It is accessible to all for statistical and research purposes only, under the following terms and conditions:
1. The data and other materials will not be redistributed or sold to other individuals, institutions, or organizations without the written agreement of the Nationa Bureau of Statistics, Tanzania.
2. The data will be used for statistical and scientific research purposes only. They will be used solely for reporting of aggregated information, and not for investigation of specific individuals or organizations.
3. No attempt will be made to re-identify respondents, and no use will be made of the identity of any person or establishment discovered inadvertently. Any such discovery would immediately be reported to the National Bureau of Statistics.
4. No attempt will be made to produce links among datasets provided by the NBS, or among data from the National Bureau of Statistics and other datasets that could identify individuals or organizations.
5. Any books, articles, conference papers, theses, dissertations, reports, or other publications that employ data obtained from the National Bureau of Statistics will cite the source of data in accordance with the Citation Requirement provided with each dataset.
6. An electronic copy of all reports and publications based on the requested data will be sent to the National Bureau of Statistics
The original collector of the data, the National Bureau of Statistics, and the relevant funding agencies bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
National Bureau of Statistics (NBS) [Tanzania]. 2015. National Panel Survey (NPS) - Wave 4, 2014 - 2015. Ref.TZA_2014_NPS-R4_v03_M. Downloaded from [URL] on [Date]. Dar es Salaam, Tanzania: NBS. (www.nbs.go.tz)
Disclaimer and copyrights
Although the data have been produced and processed from sources believed to be reliable, no warranty expressed or implied is made regarding accuracy, adequacy, completeness, legality, reliability or usefulness of any information. This disclaimer applies to both isolated and aggregate uses of the information. This data is provided on an "AS IS" basis. Changes may be periodically added to the information herein; these changes may or may not be incorporated in any new version of the publication(s). It is recommended that the user pay careful attention to the contents of any metadata associated with a file. If the user finds any errors or omissions, we encourage the user to report them to the data producer(s).
DDI Document ID
Development Economics Data Group
The World Bank
Documentation of the DDI
Date of Metadata Production
DDI Document version
Version 01 (July 2017)
Version 02 (April 2019): identical to v01 with revised datasets (com_sec_a1a2; com_sec_cb; com_sec_cc; com_sec_cd; com_sec_ce; com_sec_cf; com_sec_cf_id and com_sec_cg)
Version 03 (April 2019): identical to v02 with revised dataset (consumptionNPS4) - coding error fixed.The variable “area” is irrelevant and subsequently removed from consumption file.