Survey ID Number
TZA_2010_NPS-R2_v03_M
Title
National Panel Survey 2010-2011, Wave 2
Notes
- HOUSEHOLD: Household Identification, Household Member Roster, Eduaction, Health, Labour, Food Outside the Household, Subject welfare, Government, Food Security, Housing, Water and Sanitation, Food Consumption, Non-Food Expenditures, Household Assets, Assistance and Groups, Credit, Finance, Recent Shocks to HH Welfare, Anthropometry;
- AGRICULTURE:Household Roster, Plot Roster, Plot Details, Annual Crops by plot, Annual Crop Production and Sales, Permanent Crops by plot, Permanent crops - production and sales, Outgrower schemes and contract farming, processed agricultural products and agricultural bi-products, livestock, Farm implents and machinery extension;
- FISHERY:Survey Information, fisheries Calendar, household labour, Fishing Labour, input, output, Gear Rentet out, trading;
- COMMUNITY: Community Identification, Survey staff Details, Access to basic services, Investments projects, land use,, Agriculture, Demography and family issues, Governance, Water and sanitation, Roster of Community leaders, Market price.
Weighting
The methodology described in this paper builds upon published documentation from established panel surveys, such as the Panel Study of Income Dynamics [PSID], conducted since 1968 by the Institute for Social Research at the University of Michigan; and the British Household Panel Survey [BHPS], whose first 13 waves were conducted between 1991 and 2003 by Institute for Social and Economic Research at the University of Essex. Both the PSID and the BHPS are nationally-representative panel surveys in the USA and the UK respectively.
The weights are developed following these steps:
1) Begin with the “base weights” or those calculated during the first round of the survey; The panel weight calculations are based on the 2008/2009 household weights. These weights are based on the inverse probability of selection, EA level non-response correction, trimming of outlier weights, and a post-stratification correction11. These probability weights form the first component of the 2010/2011 calculations.
W1=W2008
2) incorporate fair-share weights for composition changes;
Based on the tracking protocols, the tracking for split off rules for the TZNPS allow for the incorporation of people who now live with original sample members. For example a young adult living with his parents in 2008, may be 2010 have formed a new household, getting married and having a child. The wife and infant will be incorporated into the survey and thus require a probability of selection. Such corrections are routinely used to distribute weight to new sample members in panel surveys. See Rendtel and Harms (2009) for a discussion of several different methods of weight correction. Because split-off individuals are tracked and interviewed in their new households, there are multiple ways that a household can become part of the survey.
o Either by being selected initially for the first round of the TZNPS
o By receiving a member that came from a household that was selected for the first round of the TZNPS.
In an ideal world, it would be possible to know the probability of selection that each new member brought into the household, and adjust the household weight accordingly. This is necessary since households receiving members have higher probabilities of selection (and therefore lower weights) because the household could have been selected in multiple ways. Since we cannot know the probabilities of every member, we must make simplifying assumptions. The first simplifying assumption is that the arriving members arrived together from one other household. This would be the case if a man and woman get married and set up a new household, or in the case of an older relative moving in with adult children. In certain cases, however, arriving members come from more than one household. Assuming only two source households underestimates slightly the probability of selection (and therefore over-estimates the weights). Incidence of these cases is believed to be relatively rare, and any resulting bias should be negligible. The second simplifying assumption we make is that the arriving members have the same probability of selection, on average, as those members that are already there. This would not be true on a case-by-case basis but would be true in the aggregate. With these simplifying assumptions, we add a factor of ½ for all households, ‘split’ or ‘parent’ that have new members arriving from other households.
This takes into account the fact that they could have been selected in two ways, and assumes the probability of selection is equal.
A limitation of the panel methodology is that the represented population is not identical to the 2010 Tanzanian household population, as it does not include immigrants in new households. Inclusion of these groups would necessitate refreshing the sample with new households. However, the represented population is close enough to the 2010 Tanzanian population to permit the desired cross-sectional estimates.
3) derive attrition adjusted weights for all individuals, including split-off10 households, then aggregate these weights to the household level; All household panel surveys must tackle the problem of attrition, sample members selected for follow interview which cannot be located and/or interviewed. The methodology used to adjust weights for attrition in the UNPS follows Rosenbaum & Rubin (1984). We use predicted response probabilities from a logistic regression model based on the covariates to form the weighting classes or cells. This approach has also been adopted in the PSID; see for example, Gouskova (2008).
The total sample size for the second round of the NPS has a total sample size of 3924 households. This represents 3168 round one households, a re-interview rate of over 97 percent. In addition, of the 10,420 eligible adults (over age 15 in 2010), 9,338 were re-interviewed, a re-interview rate of approximately 90 percent. To obtain the attrition adjustment factor the probability that a sample household was successfully reinterviewed in the second round of surveys is modeled with the linear logistic model at the level of the
individual. A binary response variable is created by coding the response disposition for eligible households that do not respond in the second round as 0, and households that do respond as 1.
Then a logistic response propensity model is fitted, using 2005 UNHS household and individual characteristics measured in the first wave as covariates. In a few limited cases, values of unit level variables were missing from the 2008/2009 household dataset. These values were imputed using multivariate regression and logistic regression techniques. Imputations are done using the ‘impute’ command in Stata at the level of the UNPS strata (urban/rural and region). Overall, less than one percent of the variables required imputation to replace missing values. The estimated logistic model is used to obtain a predicted probability of response for each household member in the 2010/2011 survey. These response probabilities were then aggregated to the household level (by calculating the mean), the using the household-level predicted response probabilities as the ranking variable, all households are ranked into 10 equal groups (deciles). An attrition adjustment factor was then defined as the reciprocal of the empirical response rate for the household-level propensity score decile.
4) post-stratify the pooled weights to known population totals.
To reduce the overall standard errors, and weight the population totals up to the known population figures, a post-stratification correction is applied. Based on the projected number of households in the urban and rural segments of each region, adjustment factors are calculated. This correction also reduces overall standard errors (see Little et al, 1997).
Data Collection Notes
The survey was implemented by eight mobile field teams, each composed of: one supervisor, four enumerators, one data entry technician, and one driver.
The teams visited each enumeration area for between 4-5 days. The questionnaires were administered to the selected households over the course of that time. This allowed the field team to make return visits to the household to complete the entire Household questionnaire and, for farm households, Agriculture questionnaires, and for Fishery questionnaires. To ensure the depth and quality of each section of the survey, the questionnaire was administered across multiple respondents to the most knowledgeable about each topic. For all of the sampled households, areas of all owned and/or cultivated agricultural plots were measured via GPS unless the household refused, the terrain was too difficult, or if the plot was more than 1 hour from the location of the household. Anthropometric measurements were taken for all individuals that were at home, not too ill, and willing to participate.
If the field teams enter an enumeration area and find that the entire household or a member(s) of the household has moved, they are required to follow the tracking protocol. If the entire household has moved from the original residence, teams are required to fill a T-1 form. The T-1 form contains information on the new location of the household, allowing for the teams to locate and interview the household members. If a member or members of the household have split from the original household, a T-2 form is filled out by the teams. Similar to the T-1, a T-2 form contains information on the location of the member(s) who have split from the household. Once the tracking targets have been found, teams are required to interview them and any new additions to the household. Out of the tracking individuals/households, only those over 15 years of age are included in the tracking protocol unless an individual under 15 years of age moved with another individual over 15 years of age, and both were part of the round one data collection.
Within the tracking protocol, there are local and distance cases. Local and distance tracking applies to both T-1 and T-2 forms. Local tracking occurred when the tracking target is within one hour traveling distance from the original EA and at least one tracking member from the household is over 15 years of age. If that is the case, the teams are required to interview the tracking target before leaving the original EA. Distance tracking occurs when the tracking target is not within one hour traveling distance from the original EA. In this case, the teams fill out the appropriate tracking form and send the information NBS headquarters. Once at NBS, the distance tracking case is given to the tracking team, who is then responsible for locating that household and conducting the interview.
The mobile tracking team consisted of one supervisor, two interviewers, one data entry technician, and one driver. In addition, there were two dedicated tracking enumerators that remained in Dar es Salaam. The tracking team began interviews three months after the beginning of fieldwork to allow enough time to accumulate a sufficient number of tracking targets. Tracking targets were grouped into geographic regions, and the team would visit the regions approximately every 2-3 months. Any tracking target not located was remained in the pool to be visited during the next trip, in addition to any new tracking cases that had accumulated in the intervening months. In addition, the regular field teams also sporadically would perform tracking within their interview regions if there was a backlog of cases. Finally, following the completion of the main fieldwork activities, four supervisors led dedicated tracking teams to interview the remaining cases.
Data entry was done concurrently with data collection by the data entry technician, using a laptop, known as first data entry. The data entry program was a CSPro-based system, developed by NBS with support from the World Bank. This facilitated the performance of internal crosschecks prior to departure from the enumeration area, allowing enumerators to return to households and clarify inconsistent information on the questionnaires. Data files from completed EAs were then e-mailed to headquarters using 3G modems. These files were concatenated and periodic checks were done to ensure the fieldwork was proceeding according to the calendar. The field teams also send the paper questionnaires back to the headquarters on a monthly basis.
Once the paper questionnaires and data files for completed EAs were received at NBS headquarters, a double entry procedure was implemented. Eight data entrants were hired by NBS to re-enter the data from the paper questionnaires into the CSPro-based data entry system for all households and questionnaires administered. A cross comparison between the entered values in the field based data entry and double entry was conducted and any differences in values between the two were flagged for manual inspection of the physical questionnaire. Corrections based on this inspection exercise were ultimately encoded in the dataset.
Additionally, an extensive review of data files was conducted, including interviewer errors such as missing values, ranges and outliers. Observations were returned for manual inspection of the physical questionnaires if continuous values fell outside five standard deviations of the mean, categorical values were not eligible responses, or there were internal inconsistencies within the dataset (for example, the age of an individual was not consistent with their educational status, there was more than one head of household listed, an individual was engaged in multiple primary activities, the quantity of crops and their byproducts produced, harvested, and sold not listed, the distance from the market and an individual’s plot was not listed, the number of weeks, days per week, and hours per day an individual engaged in fishery activity was not recorded, the species and quantity of fish caught, bought, sold, or traded was not listed, etc).
When it was determined that these values were the result of data-entry error, the values were corrected. In addition, cases deemed to reflect obvious enumerator error were also corrected in this cleaning process. The majority of such cases involved the use of incorrect measurement units, e.g. recording grams as kilograms or vice versa.