Kagera Health and Development Survey 2004 (Wave 5 Panel)
The Kagera Health and Development Survey 2004 (KHDS 2004) took place in 2004 as a fifth round following on the four rounds of the baseline Kagera Health and Development Survey 1991-1994 (KHDS 91-94). The KHDS 2004 was designed to provide data to understand economic mobility and changes in living standards of the sample of individuals interviewed 10-13 years ago. The KHDS 2004 attempted to reinterview all respondents ever interviewed in the KHDS 91-94. This entailed attempting to track these individuals, even if they had moved out of the village, region or country.
Kind of data
Sample survey data [ssd]
Kagera region of Tanzania
Domains: Agronomic zone (Tree Crop, Riverine, Annual Crop, Urban)
Unit of analysis
Producers and sponsors
Economic Development Initiatives
Danish Agency for Development Assistance
Knowledge for Change Trust Fund at the World Bank
Joachim De Weerdt
Management of data entry
University of Dar es Salaam
Sample size is 900 households
KHDS 91-94 Household Sample: First Stage
The KHDS 91-94 household sample was drawn in two stages, with stratification based on geography in the first stage and mortality risk in both stages. A more detailed overview of the sampling procedures is outlined in "User’s Guide to the Kagera Health and Development Survey Datasets." (World Bank, 2004).
In the first stage of selecting the sample, the 550 primary sampling units (PSUs) in Kagera region were classified according to eight strata defined over four agronomic zones and, within each zone, the level of adult mortality (high and low). A PSU is a geographical area delineated by the 1988 Tanzanian Census that usually corresponds to a community or, in the case of a town, to a neighborhood. Enumeration areas of households were drawn randomly from the PSUs in each stratum, with a probability of selection proportional to the size of the PSU.
Within each agronomic zone, PSUs were classified according to the level of adult mortality. The 1988 Tanzanian Census asked a 15 percent sample of households about recent adult deaths. Those answers were aggregated at the level of the "ward", which is an administrative area that is smaller than a district. The adult mortality rate (ages 15-50) was calculated for each ward and each PSU was assigned the mortality rate of its ward.
Because the adult mortality rates were much higher in some zones than others and the distribution was quite different within zones, "high" and "low" mortality PSUs were defined relative to other PSUs within the same zone. A PSU was allocated to the "high" mortality category if its ward adult mortality rate was at the 90th percentile or higher of the ward adult mortality rates within a given agronomic zone.
The KHDS 1991-1994 selected 51 communities as primary sampling units (also referred to as enumeration areas or clusters). In actuality, 2 pairs of enumeration areas were within the same community (in the sense of collecting community data on infrastructure, prices or schools). This, for community-level surveys, there are 49 areas to interview.
KHDS 91-94 Household Sample: Second Stage
The household selection at the second stage (with enumeration areas) was a stratified random sample. That is, households expected to experience an adult death were over-sampled. In order to stratify the population, an enumeration of all households was undertaken. Between March 15 and June 13, 1991, 29,602 households were enumerated in the 51 areas. In addition to recording the name of the head of each household, the number of adults in the household (15 and older), and the number of children, the enumeration form asked: "Are any adults in this household ill at this moment and unable to work? If so, the age of the sick adult and the number of weeks he/she has been too sick to work were also noted."
"Has any adult 15-50 in this household died in the past 12 months? If so, the age of each adult and the cause of death (illness, accident, childbirth, other) were also noted. The enumeration form asked explicitly about illness and death of adults between the ages of 15-50 because this is the age group disproportionately affected by the HIV/AIDS epidemic; it is the impact of these deaths that was of research interest. Out of over 29,000 households enumerated, only 3.7 percent, or 1,101, had experienced the death of an adult aged 15-50 caused by illness during the twelve months before the interview and only 3.9 percent, or 1,145, contained a primeage adult too sick to work at the time of the interview. Only 77 households had both an adult death due to illness and a sick adult. This supports the point that, even with some stratification based on community mortality rates and in an area with very high adult mortality caused by an AIDS epidemic, a very large sample would have had to have been selected to ensure a sufficient number of households that would experience an adult death during the two-year survey.
Using data from the enumeration survey, households were stratified according to the extent of adult illness and mortality. It was assumed that in communities suffering from an HIV epidemic, a history of prior adult death or illness in a household might predict future adult deaths in the same household. The households in each enumeration area were classified into two groups, based on their response to the enumeration:
- "Sick" huseholds: Those that had either an adult death (aged 15-50) due to illness in the past 12 months, an adult too sick to work at the time of the survey, or both (n=2,169).
- "well" households: Those that had neither an adult death (aged 15-50) due to illness nor an adult (aged 15-50) too sick to work (n=27,433).
In selecting the sixteen households to be interviewed in each enumeration area from which a enumeration area was drawn, fourteen were selected at random from the "sick" households in that enumeration area and two were selected at random from the "well" households. In one enumeration area, where the number of "sick" households available was less than fourteen, all available sick households were included in the sample; the numbers were balanced using well households. The final sample drawn for the first passage consisted of 816 households in 51 enumeration areas.
KHDS 2004 sampling strategy was to reinterview all individuals who were household members in any round of the KHDS 1991-1994 and who were alive in the last interview. [One serious problem that is side-stepped by this approach is constructing a definition of what makes a household the same household as 10 years ago, especially if there are individuals who have migrated, split-off or the household has dissolved.] The household in which these individuals live would be administered the full household questionnaire. For all household members alive during the last interview in 1991-1994, but found to be deceased by 2004, information about the deceased would be collected in the mortality questionnaire. This questionnaire intended to collect data on the circumstances of their death, as well as on their living arrangements and limited information on health seeking behavior prior to death. The respondents for this questionnaire were typically panel respondents who were previous household members with the deceased, other relatives, neighbors or close friends.
Although the KHDS is a panel of respondents and the concept of a 'household' after 10-13 years is a vague notion, it is common in panel surveys to consider recontact rates in terms of households. Table 7 shows the rate of recontact of the baseline households, where a recontact is defined as having interviewed at least one person from the household.3 In this case, the term household is defined by the baseline KHDS survey which spans a period of 2.5 years. Due to movements in and out of the household, some household members may have not, in fact, lived together in the household at the same time in the 1991-1994 rounds (for example, consider one sibling of the household head moving into the household for 1 year and then moving out, followed by another sibling moving into the household).
Excluding households in which all previous members are deceased (17 households and 27 people), the field team managed to recontact 93 percent of the baseline households. Not all 912 households received four interviews. Not surprisingly, households that were in the baseline survey for all four rounds had the highest probability of being reinterviewed. Of these 746 households, 96 percent were reinterviewed.
Because people have moved out of their original household, the new sample in KHDS 2004 consists of over 2,700 households from the baseline 832, which were recontacted. Much of the success in recontacting respondents was due to the effort to track people who had moved out of the baseline villages. One-half of all households interviewed were tracking cases, meaning they did not reside in the baseline communities. Of those households tracked, only 38 percent were located nearby the baseline community. Overall, 32 percent of all households were not located near the baseline communities. While tracking is costly, it is an important exercise because migration and dissolution of households are often hypothesized to be important responses to hardship. Excluding these households in the sample raises obvious concerns regarding the selectivity of attrition. In particular, out-migration from the village, dissolving of households, and even marriage, may be responses to adult mortality. At the same time, tracking will provide a unique opportunity to study these coping mechanisms: who uses them, what is the effect, do they get people out of poverty or do they themselves constitute a poverty trap.
Turning to recontact rates of the sample of 6,204 respondents, Reinterview rates are monotonically decreasing with age, although the reasons (deceased or not located) vary by age group. The older respondents were much more likely to be located if living, which is consistent with higher migration rates among the young adults in the sample. Among the youngest respondents, over three-quarter were successfully re-interviewed. Excluding people who died, 82 percent of all respondents were re-interviewed. Without tracking, re-interview rates of surviving respondents would have fallen from 82 percent to 52 percent. Non-local migration is not trivial; restricting the tracking to nearby villages would have resulted in 63 percent recontact of survivors. Migration proved to be an important factor in determining whether someone was recontacted. Respondents who were untraced were much more likely to be residing outside Kagera (52 percent) compare to their counterparts who were re-interviewed (9 percent).
KHDS 2004 tracked international migrants for Uganda only. Although the location of those in other countries was known, they were not traced. For those respondents who were not reinterviewed, the KHDS 2004 gives some information about their interactions with the reinterviewed respondents. Survey modules on the frequency of contact with all previous household members inform on the cash, in-kind and labor interactions between former household members.
Dates of collection
Mode of data collection
The KHDS 2004 mainly consists of a household survey covering a wide range of topics. The KHDS 2004 also includes three community questionnaires to accompany the household survey (community, price, and primary school questionnaires).
The KHDS 2004 project used the original questionnaires from the KHDS 91-94 as the foundation of the survey instruments. The household questionnaire collects information on a wide range of topics, including: housing amenities, consumption, income, assets, time allocation of individuals, business activities, remittances, support from organizations, education, and health, including anthropometric measures. The community questionnaire collects data on the physical, economic and social infrastructure of the baseline communities. The primary school questionnaire collects information on the amenities at schools, composition of the student body, and assistance to schools. Finally, up to three price observations are collected in each community from local markets/stalls on a list of commonly purchased food and non-food items.
Where possible, comparability is maintained with the KHDS 91-94 survey instruments. However, the questionnaires for the KHDS 2004 were revised to reflect changes in the region since 1994. Further, the household questionnaire was redesigned in an effort to capture key transitions that have occurred since the previous survey. These revisions included:
- Inclusion of a module on the incidence of economic shocks from the last 10 years (both positive and negative) for all panel respondents.
- Inclusion of a module on migration for respondents who relocated since the KHDS 91-94.
- Inclusion of a module on informal insurance groups.
- Expansion of questions on the circumstances of deaths.
- Inclusion of information on the remittances, loans, bride price payments, social communication and labor transfers between previous members of the KHDS 91-94.
This section of the Basic Information Document reviews the 4 surveys of the KHDS 2004. For each survey, substantial differences are highlighted between the survey instrument used in the 1991-1994 rounds and in 2004.
Users are encouraged to use this as a general guide to understand the questionnaires; however, this should not substitute for looking at the actual questionnaires directly. Users are encouraged to look directly at the survey instruments for literal question wording and to identify differences between survey instruments. The household questionnaires are available in Swahili (as used in the field) and English (a translated version of the Swahili field questionnaire); the community surveys were produced only in English.
Household Questionnaire: Review of Sections
The household questionnaire is divided into numerous sections, each of which covers a fairly distinct aspect of household activities. Anthropometric measurements and the questionnaire on mortality of household members are administered in separate forms attached to the household questionnaire.
Each section of the household questionnaire has four types of respondents selected according to the content of the section: the interviewer, household head, most knowledgeable person in the household and individual household members. The only section for which household members are not respondents is the first section covering basic survey information (household location, GPS Coordinates, interviewing language, completion status of section, etc…).
Household Questionnaire: Highlights of Substantial Differences
Many changes were made in the KHDS 2004 household questionnaire compared to the KHDS 91-94 household questionnaire. Some questions were added and some dropped. Section 4 arrangement was also revised to provide better continuity during interviews. The following are the main changes included in the 2004 questionnaire:
- Section 9 (Fertility) and Section 11C (Age of tree crops) from the 1991-1994 questionnaire were dropped in 2004.
- Section 13 (Fishing) from 1991-1994 was incorporated in Section 13 (non-farm business) in 2004.
- In 1991-1994, information in Section 7 was collected on the main job and secondary job done in the past 12 months, while in 2004 data was collected only on the main job done.
- In 1991-1994, Section 20b collected information on deaths of deceased relatives who were not household members. In 2004, the mortality questionnaire covered only deceased household members from the 1991-1994 survey.
- The following were new sections introduced in 2004: Section 10 (Shocks experienced in the past 10 years), Section 15C (Household Two-Week Expenditures), Section 15D (Inheritance and Bride Price), Section 17B (Ability to cope) and Section 18A (Interactions with Network Members).
Detailed information on the key changes, section-by-section, between the KHDS 91-94 household questionnaire and the KHDS 2004 household questionnaire are provided in Kathleen Beegle, Joachim De Weerdt and Stefan Dercon, March, 3 2006, "Kagera Health and Development Survey 2004 - Basic Information Document".
Community Questionnaire: Highlights of Substantial Differences
The substantial changes to the community questionnaire include:
• A new section was included on shocks experienced in the past 10 years (Section 7).
• Data was collected on population share of ethnic groups.
• GPS coordinates were taken in each community for all enumeration areas.
• Questions on access to roads, electricity and water were introduced.
• Questions on the culture of mourning were asked for three different periods: the time of the survey, 10 years prior to the interview and 20 years prior to the interview.
• Information was collected on access to vocational training and secondary education.
• Information was also collected on temporary migration and seasonal employment of community members.
See details in Kathleen Beegle, Joachim De Weerdt and Stefan Dercon, March, 3 2006, "Kagera Health and Development Survey 2004 - Basic Information Document".
Price Questionnaire: Highlights of Substantial Differences
Overall there were no substantive changes; a few items were added to the list.
See details in Kathleen Beegle, Joachim De Weerdt and Stefan Dercon, March, 3 2006, "Kagera Health and Development Survey 2004 - Basic Information Document".
Data entry was done at the main office in Bukoba, concurrent with the main fieldwork. The data entry team consisted of seven data entry operators and one data entry supervisor. Data was entered in CsPro then transformed to Stata format. Questionnaires were entered and verified after each entry. Although internal consistency checks were performed in CsPro, in addition to more elaborate checks for inconsistency and outliers were done in Stata.
All responses obtained from individual, household, and community level interviews were recorded in questionnaires. In cases where the respondent did not know the answer, the interviewers recorded "DK" (Don't know) in the questionnaires. Data entry were trained to input this as nine (9) which represents missing information in the datasets. In cases where nine was an eligible code, the highest value for the number of digits was entered. For example, DK's for questions with up to two eligible digit codes were entered as 99; 999 was entered for DKs for questions with eligible three digit codes (assuming 999 was not otherwise an eligible response). For the mortality questionnaire, in some cases, multiple informants were interviewed. The data were consolidated such that each baseline household has one mortality questionnaire in the data files (with, perhaps, multiple deceased therein).
The community, price and primary school data are only relevant for households located in the vicinity. That is, these questionnaires were only administered in the original 51 enumeration areas (which are 49 unqiue communities). Households that are located in or near the baseline community can be identified by the question si2c on the first page of the household questionnaire. The enumeration area number for these households is the first two digits of the six-digit household identification number. For example: HHID 150105 has si2c=1, meaning that the household resides in the original sample community. In a strict sense the community, price and primary school data can only be used for people living in the same village. Some households reside nearby, although not in the same community. Many of the variables collected at community level may be valid for people tracked nearby the original enumeration areas (variable si2c in hh.dta equal to 2). One can, in theory, link them to their baseline community data, although it is not necessarily the best community data to describe the community of that household, since some of these nearby communities were actually several kilometers away and in another village entirely.
Linking Individuals Over Time
In the KHDS 1991-1994 survey household identification was based on two-digit enumeration area number (cluster) and two-digit household number within the enumeration area (hh). Individuals in the household were assigned a person-ID number (equivalent to their roster line number) (id). Since a very small number of people during the baseline survey moved out of one panel household and into another, in order to uniquely identify people, each person is also assigned a 6-digit panel respondent (pid91_94) which is almost always the combined of cluster+hh +id. pid91_94 uniquely identifies every person ever interviewed in the KHDS (be it one of the first four rounds of 1991-1994 or 2004).
Households in the 2004 survey were assigned 6-digit identification numbers (hhid2). Household identification numbers in 2004 were designed to allow the user to easily link back to the 1991-1994 community and household. The first four digits of the 2004 household identification are the same as cluster and hh from 1991-1994. The last two digits number the 2004 household such that households with the same origin household are not given the same 6-digit identification code. In rare cases, two panel respondents from two different baseline (1991-1994) households now reside together. Thus, in these rare cases, hhid2 may not refer to the KHDS 91-94 household for each panel respondent in that household.
Individuals can be linked back to their 1991-1994 data through data from Section 1 question 10 in the household questionnaire. The respondent’s identification number from the household roster of their baseline household is recorded in this question. The roster ID of a person in KHDS 2004 (id2) does not correspond with their roster ID in the KHDS 91-94 (id). Data from Section 1 question 10 must be used in order to link panel respondents to their KHDS 91-94 data. Section 1 question 10 appears as four variables in the data set, corresponding to the variables described above: cluster, hh, id, and pid91_94.
In very few cases, a panel respondent could reside in two households at the same time. These are cases where two observations in Section 1 have the same pid91_94. Variables in sec1.dta (including s1q10_oth and s1q10_plgm) explain the reasons why occurred, including:
- Section 1 question 8 is no: the person was listed on the roster by the household head but doesn’t qualify as a household member by the stated criterion.
- The person moved to another sample household during the field work and qualifies as a household member in both households.
- The person was reported as the household head in one household (which automatically qualifies the person as a household member), although is actually residing in another location.
- The person is polygamous and maintains two separate households.
In receiving these data it is recognized that the data are supplied for use within my organization, and I agree to the following stipulations as conditions for the use of the data:
1. The data are supplied solely for the use described in this form and will not be made available to other organizations or individuals. Other organizations or individuals may request the data directly.
2. Three copies of all publications, conference papers, or other research reports based entirely or in part upon the requested data will be supplied to:
The World Bank
Development Economics Research Group
LSMS Database Administrator
1818 H Street, NW
Washington, DC 20433, USA
tel: (202) 473-9041
fax: (202) 522-1153
3. The researcher will refer to the 2004 Kagera, Tanzania Health and Development Survey as the source of the information in all publications, conference papers, and manuscripts. At the same time, the World Bank is not responsable for the estimations reported by the analyst(s).
4. Users who download the data may not pass the data to third parties.
5. The database cannot be used for commercial ends, nor can it be sold.
Kagera Health and Development Survey 2004. Ref. TZA_2004_KHDS_v01_M. The World Bank.
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.