The main objective of the NPS is to provide high-quality household-level data to the Tanzanian government and other stakeholders for monitoring poverty dynamics, tracking the progress of the Five Year Development Plan (FYDP) II poverty reduction strategy and its predecessor plans, and evaluating the impact of other major, national-level government policy initiatives. As an integrated survey covering a number of different socioeconomic factors, it compliments other more narrowly focused survey efforts, such as the Demographic and Health Survey (DHS) on health, the Integrated Labour Force Survey (ILFS) on labour markets, the Household Budget Survey (HBS) on expenditure, and the National Sample Census of Agriculture (NSCA). Secondly, as a panel household survey in which the same households are revisited over time, the NPS allows for the study of poverty and welfare transitions and the determinants of living standard changes.
Kind of data
Sample survey data [ssd]
v02: additional information in datasets com_sec_a1a2; com_sec_cb; com_sec_cc; com_sec_cd; com_sec_ce; com_sec_cf; com_sec_cf_id and com_sec_cg
Designed for analysis of key indicators at four primary domains of inference, namely:
Dar es Salaam,
Unit of analysis
The universe includes all households and individuals in Tanzania with the exception of those residing in military barracks or other institutions.
Producers and sponsors
National Bureau of Statistics
Ministry of Finance and Planning
World Bank Living Standard Measurement Study Team
Bill and Melinda Gates Foundation
NPS Technical Committee
The NPS sample was refreshed for NPS 2014/2015. Longitudinal surveys tend to suffer from bias introduced by households leaving the survey over time (i.e. attrition). Although the NPS maintains a highly successful recapture rate (roughly 96% retention at the household level), minimizing the escalation of this selection bias, a refresh of longitudinal cohorts is typically done to ensure proper representativeness of estimates while maintaining a sufficient primary sample to maintain cohesion within panel analysis. Additionally, the refreshing of a longitudinal sample realigns the sample with any changes in administrative boundaries, demographic shifts, or updated population information. In the case of Tanzania, a newly completed Population and Housing Census (PHC) in 2012 providing updated population figures, along with changed in administrative boundaries, emboldened an opportunity to realign the NPS sample.
Similar to the sample in NPS 2008/2009, the sample design for the “Refresh Panel” allows analysis at four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar.
The sample design is a stratified two-stage design. The design consists of 51 design strata (identified in the data as ‘strataid’) corresponding to a rural/urban designation for each of the 26 regions; however, Dar es Salaam is pure urban and therefore constitutes only one stratum. The allocation across the design strata was informed by the last round of the NPS and seeks to balance multiple survey objectives and maximize precision given survey parameters. The intended sample design consisted of a new selection of 3,360 households corresponding to 420 EAs from the latest PHC in 2012. This new cohort in NPS 2014/2015 will be maintained and tracked in all future rounds between national censuses.
A nationally representative sub-sample was selected to continue as part of an “Extended Panel”. This “Extended Panel” allowed general comparison of sample groups and monitoring indicator comparability. The “Extended Panel” is not included in the initial NPS 2014/2015 data release.
Deviations from sample design
During the survey, one selected EA was demolished and subsequently not interviewed. The resulting sample consists of 3,352 households across 419 EAs.
The NPSY4 “Refresh Panel” sample was a stratified two-stage sample design. The sample was stratified along two dimensions: (i) 26 regions, and (ii) rural/urban designation within each region. The combination of these two dimensions yields 51 independent strata. The first stage of sampling involved the selection of survey clusters with the probability of selection proportional to cluster size within a stratum. Following a listing exercise, eight households were selected with systematic random selection. Additionally, three households were randomly selected within each EA in case of possible household non-response.
The expansion factors are Winsorized for the top 1 percentile and post-stratified to 2015 regional household projections. The NPS 2014/2015 household cluster weight, variable “y4_weight”, has been integrated into Section A (“HH_SEC_A”) of the household data files. Additionally, unique identifiers for the first-stage sampling units, “clusterid”, and for the sampling strata, “strataid” can also be located in Section A of the household data files. The complex sample design must be taken into account to ensure proper calculation of standard errors.
Dates of collection
Mode of data collection
Additional data cleaning was conducted as the final stage of the data processing. Further adjustment of the data post-entry was conducted under the principle of absolute certainty where adjustments must be evidence-based and correction values true beyond a reasonable doubt. As such, the resulting final data files may still contain some inconsistencies and outliers. Handling of these values is thus left entirely to the data user.
Throughout the data processing system, versions of the data are archived at all key steps and all checking and cleaning syntax documented and archived.
The sample of households selected in the NPS 2014/2015 is only one of many samples that could have been selected from the same population. Each alternative sample would yield slightly different from the results of the selected sample. Sampling errors are a measure of the variability between all possible samples and although the degree of variability cannot be directly observed, it can be estimated from the survey results and statistically evaluated. A sampling error can be measured in terms of the standard error for a particular statistic.
The computer software program STATA used estat effects to calculate sampling errors for the NPS 2014/2015. In addition to the standard error, STATA computed the design effect (DEFF) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFF value of 1.0 indicates that the sample design is as efficient as a simple random sample, while a value greater than 1.0 indicates the increase in the sampling error is due to the use of a more complex and less statistically efficient (but perhaps more logistically efficient) design. STATA also computed the relative error and confidence limits for the estimates.
Sampling errors for the NPS 2014/2015 are calculated for selected variables considered to be of primary interest at the household and individual levels. The results are presented in the BID Appendix A at the national level and for each of the four primary domains of inference, namely: Dar es Salaam, other urban areas on mainland Tanzania, rural mainland Tanzania, and Zanzibar. For each variable of interest, the value of the statistic (R), its standard error (SE), the number of cases, the design effect (DEFF), the relative standard error (SE/R), and the 95 percent confidence limits (R±2SE) are provided in Tables 1-10 in the BID. The DEFF is considered undefined when the standard error in a simple random sample is zero (when the estimate is close to 0 or 1).
The Primary Data Investigator undertakes that no attempt will be made to identify any individual person, family, business, enterprise or organization. If such a unique disclosure is made inadvertently, no use will be made of the identity of any person or establishment discovered and full details will be reported to the NBS. The identification will not be revealed to any other person not included in the Data Access Agreement.
The dataset has been anonymized and is available as a Public Use Dataset. It is accessible to all for statistical and research purposes only, under the following terms and conditions:
1. The data and other materials will not be redistributed or sold to other individuals, institutions, or organizations without the written agreement of the Nationa Bureau of Statistics, Tanzania.
2. The data will be used for statistical and scientific research purposes only. They will be used solely for reporting of aggregated information, and not for investigation of specific individuals or organizations.
3. No attempt will be made to re-identify respondents, and no use will be made of the identity of any person or establishment discovered inadvertently. Any such discovery would immediately be reported to the National Bureau of Statistics.
4. No attempt will be made to produce links among datasets provided by the NBS, or among data from the National Bureau of Statistics and other datasets that could identify individuals or organizations.
5. Any books, articles, conference papers, theses, dissertations, reports, or other publications that employ data obtained from the National Bureau of Statistics will cite the source of data in accordance with the Citation Requirement provided with each dataset.
6. An electronic copy of all reports and publications based on the requested data will be sent to the National Bureau of Statistics
The original collector of the data, the National Bureau of Statistics, and the relevant funding agencies bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
National Bureau of Statistics (NBS) [Tanzania]. 2015. National Panel Survey (NPS) - Wave 4, 2014 - 2015. Ref.TZA_2014_NPS-R4_v03_M. Downloaded from [URL] on [Date]. Dar es Salaam, Tanzania: NBS. (www.nbs.go.tz)
Disclaimer and copyrights
Although the data have been produced and processed from sources believed to be reliable, no warranty expressed or implied is made regarding accuracy, adequacy, completeness, legality, reliability or usefulness of any information. This disclaimer applies to both isolated and aggregate uses of the information. This data is provided on an "AS IS" basis. Changes may be periodically added to the information herein; these changes may or may not be incorporated in any new version of the publication(s). It is recommended that the user pay careful attention to the contents of any metadata associated with a file. If the user finds any errors or omissions, we encourage the user to report them to the data producer(s).
National Bureau of Statistics
Living Standards Measurement Study
World Bank Microdata Library
Development Economics Data Group
The World Bank
Documentation of the DDI
Version 01 (July 2017)
Version 02 (April 2019): identical to v01 with revised datasets (com_sec_a1a2; com_sec_cb; com_sec_cc; com_sec_cd; com_sec_ce; com_sec_cf; com_sec_cf_id and com_sec_cg)
Version 03 (April 2019): identical to v02 with revised dataset (consumptionNPS4) - coding error fixed.The variable “area” is irrelevant and subsequently removed from consumption file.