The objectives of the survey were to provide information regarding the following:
a. Household use of, and expenditure patterns for, social services;
b. Reasons for low levels of household investment in education and health services for children;
c. The distribution of the benefits of public spending for social services and how to improve targeting;
d. Households' evaluation of the social services available to them;
e. The potential for demand-side interventions to increase human capital investment directly (especially for girls and the poor); and
f. The feasibility of repeated national monitoring surveys to assess the impact of future Bank and government projects in the social sectors, and to increase Tanzania's capacity to perform household survey work.
Kind of data
Sample survey data [ssd]
Producers and sponsors
University of Dar es Salaam
The World Bank
British Overseas Development Administration
Government of Japan
Sample size is 5,184 households
The HRDS is national in scope and uses all the 222 clusters of the National Master Sample (NMS) maintained by the Bureau of Statistics as its sampling frame.4 Two NMS clusters were not surveyed because of weather conditions. For example, Nyamburi village in the Mara region was inaccessible. Heavy rains had washed away a bridge 8 kms (14 miles) from the village. All household surveys conducted by the Bureau of Statistics (e.g. Agricultural Sample Survey since 1986/87, Labor Force Survey in 1990/91) have used the framework of the NMS. This permits obtaining estimates at the national level and by area: rural, Dar es Salaam (DSM), and other urban towns. The current NMS covers 222 clusters: 100 rural villages representing the rural areas, and 122 Enumeration Areas (EAs) representing the urban areas. Fifty-two EAs are from the capital city, itself, 40 EAs are from the nine municipalities (Arusha, Dodoma, Moshi, Tanga, Morogoro, Iringa, Mbeya, Tabora, and Mwanza), and 10 EAs are from the remaining regional headquarters.
Selection of households and non-response.
Household selection was done in the field. In each cluster the team supervisor would first obtain the list of ten-cell leaders from the local authorities, and then, from each ten cell-leader, the list of households belonging to his/her cell. Each household was assigned a unique number, and then, using a table of random numbers, randomly selected. In each cluster, a list of about 30 households was then obtained, the last households in the list being alternates. With the collaboration of local authorities, the field workers were able to have an almost 100 percent reponse rate, except for the cases in which no member of the household was present for intervieing, and returning to the household was not feasible. Refusals to cooperate were rare. In those cases--absent households or refusals--, new households were drawn from the list of alternates.
The survey covered a total of 4,953 households in the 20 regions of Mainland Tanzania: 2,135 rural and 2,818 urban (see Table 1). In a second stage, the survey was extended to Zanzibar, where 230 households, in 24 clusters, were interviewed.
In each cluster, between 20 and 25 households were targeted to be interviewed.5 These households were randomly selected. For a household belonging to village i, the household sampling weight (factor from household level to the national level) is defined according to the following expression:
Wh = Wi * (1 / Ni)
where Ni gives the sample size in village i, and Wi is the NMS village weight (factor that expands from the cluster level to the national level).
Dates of collection
For mainland Tanzania
Mode of data collection
Data collection supervision
The main mission of the supervisor was to check the work of the other team members at all levels, guarantee that interviews in the field proceeded satisfactorily, maintain high quality standards, review the completed questionnaires, and reinterview a sub-sample of the households. As the number of completed questionnaires was viewed as an indicator of interviewer performance, it was conceivable that interviewers would indicate an incorrect response to questions requiring follow-up questions in order to expedite the process. For example, if the household had no income from crop production, then the interviewer would not need to ask questions regarding the type of crop grown. If a household were revisited, the upervisor
would use a "mini-questionnaire," containing a set of such questions. Every household questionnaire was checked by the supervisor after the interview was completed. At the end of each day, the supervisor and the enumerators discussed particular cases covered, problems faced, questions that proved to be difficult, etc. The supervisors' other tasks included arranging logistics, random selection of households in each cluster, assignment of households to each enumerator, obtaining clearance from several administrative layers, and completion of the price questionnaire. To perform these tasks, each supervisor received, in addition to materials received by the enumerator, a Supervisor's Logbook . It included the following:
i. Supervisor's Manual.
ii. Price questionnaire to be administered in each cluster by the Supervisor.
iii. Sample Design and Selection:
(a) Selection of the clusters was based on the National Master Sample maintained by the Bureau of Statistics (1988 Census). Each supervisor was provided with a set of maps for each cluster.
(b) Within each cluster, households were randomly sampled. Instructions to sample households were given to each team supervisor. These instructions included a set of random numbers tables.
iv. Forms to re-interview households.
v. Forms to be completed for each cluster with the population of the cluster and its number of households. This information is essential to construct the weights to be used in the analysis.
vi. General set of instructions on how to assign a unique identifier to each household.
vii. For the first week of the field work, the supervisor's logbook also included a spreadsheet in which the results from the contigent valuation questions were to be entered.
After each interview the supervisor was expected to spend some time checking and editing each questionnaire. This is particularly important in the rural areas, and in urban clusters far from Dar es Salaam, where the costs of returning to the cluster are high. The household questionnaire even included one page (page 4B) for the supervisor's check. We noticed that, in some questionnaires, the supervisor's check was not very strict. The majority of mistakes that we handled during the data cleaning and coding phase could have been avoided had the supervisor been more careful with the checking of each questionnaire in the field. However, we will now be able to improve the manuals by specifically noting many of the problems we found. Data entry in the field, or close to the clusters, would probably reduce such problems tremendously. Also, contrary to what was planned the supervisors did not reinterview any of the households.
Development of Survey Instrument.
The first draft of the household survey was developed in English in July, 1993. Training of enumerators, based on this draft, began on August 2, 1993. The month of August was devoted to training the enumerators and pre-testing the questionnaire. The first pre-test of the questionnaire took place in mid-August. The household questionnaire was almost completely precoded to eliminate coding errors and time delays. A category labeled "other: specify" was added to several questions. For those questions for which answers were not mutually exclusive, we precoded them with letters, rather than numbers, to allow for unambiguously coding of multiple answers. To minimize nonsampling errors, the questionnaire was in a form that reduced to a minimum the number of decisions required of interviewers while in the field. In anticipation of pages becoming detached from the questionnaire, every page contained a space for the household number and the last digit of the cluster code. Despite the fact that questions were written exactly as they were supposed to be asked by the interviewer, interviewers were granted some flexibility to give the interview greater semblance to a conversation, rather than an inquisition.
Pre-Test of Questionnaire.
The "pre-pre-test" of the questionnaire (August 16, 1993) was done only to discern whether the questions were understood, how long the administration of the survey required, whether all responses had been anticipated, which sections needed to be stressed during the training, etc. In this pre-pre-test, each questionnaire required an average of 4 hours to complete, far longer than the planned 1.5 hour maximum. The survey was consequently shortened and streamlined.
The true pre-test was conducted in two different types of clusters: Ubungo ward in DSM (urban) and Kibaha in the Coast Region (rural) over a period of two days. We chose these clusters because they are representative of two distinct groups, so a broader spectrum of answers and problems with the instrument could be anticipated. In the pre-test each questionnaire required an average of 2.5 hours. After a couple weeks of interviewing, the enumerators became more familiar with the instrument, resulting in their spending an average of 1.5 to 2 hours per questionnaire.
During the pre-test, each supervisor was asked to comment on each interview. The supervisor was asked to pay special attention to questions that seemed to make the respondent uncomfortable, that the respondent had difficulty understanding, or that the respondent seemed to dislike. The supervisor also evaluated which sections seemed to go slowly, had the most difficult questions, or provided insufficient opportunity for a complete response.
Revision of questionnaire.
Given the results of the two pre-tests, several areas for improvement in the questionnaire were identified. Perhaps most importantly, the willingness-to-pay amounts were adjusted. The sample distributions of the maximum willingness-to-pay questions were analyzed, and, based on that analysis, we decided to change some of the values. For example, in the child spacing question, the "pay Tsh 1,000" responses unexpectedly accounted for a large share of the bids. Thus, we provided the option of paying more by introducing "pay Tsh 50,000" and "pay Tsh 25,000" as answer choices. For the other contigent valuation sections--health and education--the first pre-test determined that there was also a large lumping of responses at the high end of the scale. We adjusted the ranges accordingly, although there remains some lumping at the high end in the final data.
We also changed the order of the sections. Based on the pre-test and judgment of the field workers, we decided to first ask the questions in the individual section, then the contigent valuation questions, then the household questions. Because the respondents enjoyed the contigent valuation questions so much, this decision helped increase interest in the questionnaire and re-energized the respondent before proceeding with the household questions--the last part of the questionnaire. The final survey instrument, incorporating all of the changes dictated by the pre-tests and other expert advice, was completed on September 12, 1993.
Translation of the survey instrument was a joint effort of the enumerators and supervisors. Given the specific characteristics of the Kswahili language, this was a much better approach than asking one translator to translate from English to Kswahili, and another one to translate from Kswahili to English. The "group" translation, involving those who would ask the questions, was intended to avoid different interpretations of the same question and achieve uniformity. In this way the enumerators were able to better convey the message/objective of each question.
The majority of the interviews were conducted in swahili. In very few cases, because no one in the selected household could speak swahili, the need arose to use interpreters.
Our initial plan called for the field work to start no later than August 29. However, unforeseen circumstances, including both financial and logistical problems, delayed the first field trip. Both the money and the materials were available by September 6, and five of the six teams left for Tanga region on that day. Initially we had planned to have the sixth team based full-time in Dar es Salaam; however, tighter time constraints imposed by the above and subsequent delays eventually made it necessary to send the sixth team into the field as well, as detailed below.
Description of questionnaires
The main objective of the survey was to obtain data on the use of, and spending on, the social sectors. The primary emphasis was on education and health--the areas in which the major gaps in availability of data were identified. The survey was divided into five major components, each of which was further subdivided, as described below:
I. Individual Questionnaire
A. Household Roster;
B. Information on parents of children between 7 and 15 years of age;
C. Information on the utilization of, and spending on, education services;
D. Information on the utilization of health services for those reported ill in the month previous to the interview;
E. Information on the utilization of, and spending on, prenatal care, delivery, and family planning.
II. Contingent Valuation Questions on:
A. Primary Health Facility (includes modules allowing respondents to assess desired characteristics of facilities, to reveal their willingness to pay for health services, and to provide information on the available health care facilities);
B. Primary Education Facility (includes modules allowing respondents to assess desired characteristics of schools and curriculum, to reveal their willingness to pay for education services, and to provide information on the available schools and curriculum);
C. Demand for Child Spacing;
D. Envisaged, Required Income Level.
III. Household Questionnaire
A. Land and livestock ownership;
B. Household income and economic activities;
C. Annual expenditures;
D. Monthly expenditures;
E. Weekly expenditures;
F. Housing characteristics and expenditures;
G. Mortality: deaths in the last 12 months.
IV. Community Price Questionnaire
V. Cognitive Test:
The design of the questionnaire took advantage of the huge volume of work done on household questionnaires over the past decade. The next paragraphs highlight areas in which this survey is different from the Social Dimensions of Adjustment (see Delaine et al. 1992) or Living Standard Measurement-type of surveys (e.g. Ainsworth et al. 1992; Grosh 1991). Where appropriate, a summary of the reasons for the difference in approach is also presented.
The Yellow Card. A yellow-colored Household Roster card was included in the questionnaire. The interviewer had to copy some of the information--age, gender, and name--from the household roster onto this yellow card. The removable Household roster card was then used throughout the rest of the questionnaire for reference as to which members of the household were eligible for particular sections and what their ID numbers were.
Income Questions. It was decided that our survey, unlike the majority of the surveys we reviewed, would not include data to be used to estimate income levels. Gathering complete and accurate income data is a very time-consuming activity in countries where few receive wages and the majority are self-employed and engaged in non-market activities. In analysis, income questions are difficult to use; monetary incomes are often calculated to be negative; and, when much of the sample works outside the formal market economy, these problems are compounded. Given our time and budget constraints it was not logical for us to try to measure income.
We did include a few questions about sources of income. In this section, our objective was not to gather the information necessary to estimate income levels, but rather to ascertain the main economic activities in which the household engaged, and how the household ranked them in order of importance. For those growing crops, we wanted to establish the relative importance of each crop and the proportion of it that was marketed.
Expenditures Section. The three principal issues which had to be resolved regarding the expenditure section were as follows:
(i) How to organize the expenditures in terms of levels of observation: individuals or households, and the recall period;
(ii) Whether to split consumption expenditure and consumption of home production, or to ask them in the same question; and
(iii) How to take into account seasonality of food consumption.
Accurate and complete measurement of expenditures is essential. To maximize the accuracy of the information gathered, different types of expenditures were organized into different levels of observation, depending on the consumption item for which we were measuring expenditures. For example, better estimates of consumption are likely to be obtained by adapting the period of recall to the frequency of purchasing the good. Accordingly food expenditures had a week-long recall period, while education expenditures had a one-year recall period. For some items, we gathered information at the individual level- -e.g. education and health--and for some, at the household level--e.g. housing and utilities. For food items, we split consumption expenditure and consumption of home production. For items such as education, for which expenditures are more likely to be cash expenditures, we did not split them. However, we explicitly included the following instructions with the expenditure questions: "Please include contributions of labor and other non-cash items, which we will convert to shillings."
To address the problem posed by seasonality of consumption, we phrased the question as "During a typical week this past year, did anyone in this household acquire or spend money on..." Also it must be kept in mind that not all consumption expenditures are recorded in Part 3, Sections C, D, and E of the Consumption Section. Expenditures on education can be obtained from Section 1, Part C: Schooling, while housing expenditures were included in Section 3, Part F: Housing.
The survey asked only a few questions on ownership of durables and none regarding their acquisition cost or present value. There is evidence from Grosh, Zhao, and Jeancard (1995) that information on durables does not change the results of welfare ranking. This is another area in which we chose to shorten the questionnaire and the length of interview. 10. The Contigent Valuation Questions. One of the distinguishing features of the HRDS survey is the use of contigent valuation questions to better understand households' perception and valuation of some services available to them, which characteristics they value the most, and how far actual levels of provision are from desired levels. A three-step process was followed. In the first step, the respondent was given 20 chips (or shillings), representing a budget constraint. The enumerator then showed him or her a card with 5 pictures, representing 5 characteristics of a health facility (or of a primary school). The respondent was asked to allocate the 20 shillings among the 5 characteristics. In the second step, the enumerator asked how much the respondent was willing to pay for a visit to a health facility--or, in the case of education, for one year of tuition in a primary school--that matched the most important characteristics for the respondent. In the third and final step, the respondent was asked to characterize the closest health facility (and closest primary school) in terms of the five characteristics that he/she was previously asked to rank. This information should provide a picture of what households consider important in primary health and education services, and of how well the available facilities meet the household's desires. We tried to select characteristics that designers of health and education services tend to emphasize as important. For health, we mixed characteristics of public and private goods.
Respondent Rules. Different respondents were chosen for different sections to increase accuracy. For some topics, information from proxy or household respondents will be less accurate or less applicable (e.g. the contigent valuation questions on child spacing from a 10-year-old boy). In the header on each page of the questionnaire, there is a space to identify the respondent. This information can be useful for some types of analysis in which it is important to know characteristics of the specific respondent. In the introduction to each section, the preferred respondent is explicitly stated. Accordingly, for Section 1, Part A, the head of the household is the preferred respondent; but in Part D, the preferred respondent is "Each eligible individual, with assistance of the head of household if necessary."
The Community Questionnaire. We asked community questions to all selected households in a cluster because of the statistical principle that multiple answers to the same question provide better information, on average, than asking only a single "principal informant" these questions. Also, for some questions--e.g. distance to the closest village health center--it is important to obtain answers from each household. In some rural clusters in Tanzania, houses can be 15 miles apart. Therefore important intercluster inequalities may be neglected if not all households are questioned. The price questions, because they do not vary much across households in a cluster, were answered either by a principal respondent or through inspection in the local markets or shops.
Two possible firms in Tanzania were identified to perform the data entry. Both were asked to present cost estimates. Based on the cost estimates and the conditions proposed by each firm, it was decided to contract the services of Dr. Swai at the Data Processing Center at the Muhimbili Hospital. The team was composed of one supervisor, 12 data entry people working in shifts, and 1 liaison at the World Bank Resident Mission in Dar es Salaam.
At the start of the field work, the interviewing and data entry were nearly simultaneous. Each team was in charge of mailing the completed questionnaires to the Resident Mission, which then delivered them to the supervisor. However, because the field work slowed down after classes began (only the 10 non-student enumerators continued interviewing) and only returned to full speed in mid-December (again with a full team of 31 interviewers), the flow of questionnaires was not nearly as smooth as planned.11 We had great difficulty keeping track of the survey effort during November and December due to communication problems between the nodes: the field, the data entry people, the Resident Mission, and Washington. Given that the household survey was an essential building block for the Social Sector Review, and because breakdowns in the data entry system caused grave concern about the timeliness and quality of the data to be made available to us, we decided to have the questionnaires shipped to the U.S. Several possible firms were identified, asked to present cost estimates, and to propose a schedule of activities. Based on the information provided, a contract was awarded to Office Remedies. To minimize data entry mistakes, the data were entered twice, by different people. This ensures over 99% accuracy.
As previously mentioned, the questionnaire that was used during the Tanga trip included an additional question on "TIME USE DURING SCHOOL YEAR." This information was entered at the World Bank headquarters using the SURVEY Program.
Other forms of data appraisal
Major problems with the Survey:
i. Delay in receiving funds and materials from Washington . We operated in the nearly impossible situation of raising money to pay for the survey while it was underway. In addition, the time schedule placed incredible stress on World Bank procurement and payment mechanisms. If we had had more time, all of these processes could have been arranged well in advance. As this was not the case, it was all done on an emergency basis. We caused incredible stress to the system, everyone around us, and ourselves. The delays were usually short, but they caused a few important and avoidable delays in the survey. The goodwill and flexibility of our colleagues, especially of Professor Amani, made it possible to succeed nearly on schedule. However, none of us would advocate repeating the experience in this manner.
iii. Price Questionnaires. We only obtained price questionnaires for 70 clusters, out of 222 planned. For the remaining clusters, the prices questionnaires were either lost or not completed. The remaining data still seem useful to obtain some information on price variation within and across different regions. iv. Two Forms. During the first week of the field-work we used a questionnaire that included a section on time usage for some members of the household, and had a different ordering of questions in other sections. The existence of two forms caused additional problems during the data entry phase.
World Bank LSMS
In receiving these data it is recognized that the data are supplied for use within my organization, and I agree to the following stipulations as conditions for the use of the data:
1. The data are supplied solely for the use described in this form and will not be made available to other organizations or individuals. Other organizations or individuals may request the data directly.
2. Three copies of all publications, conference papers, or other research reports based entirely or in part upon the requested data will be supplied to:
The World Bank
Development Economics Research Group
LSMS Database Administrator
1818 H Street, NW
Washington, DC 20433, USA
tel: (202) 473-9041
fax: (202) 522-1153
3. The researcher will refer to the 1993 Tanzania Human Resources Development Survey as the source of the information in all publications, conference papers, and manuscripts with the following statement "The data used in this paper come from a nationally representative survey of 5,000 households in Tanzania. The survey was a joint effort undertaken by the Department of Economics of the University of Dar es Salaam, the Government of Tanzania, and the World Bank, and was funded by the World Bank, the Government of Japan, and the British Overseas Development Agency." At the same time, the World Bank is not responsable for the estimations reported by the analyst(s).
4. Users who download the data may not pass the data to third parties.
5. The database cannot be used for commercial ends, nor can it be sold.
The researcher will refer to the 1993 Tanzania Human Resources Development Survey as the source of the information in all publications, conference papers, and manuscripts with the following statement "The data used in this paper come from a nationally representative survey of 5,000 households in Tanzania. The survey was a joint effort undertaken by the Department of Economics of the University of Dar es Salaam, the Government of Tanzania, and the World Bank, and was funded by the World Bank, the Government of Japan, and the British Overseas Development Agency."
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.