Kagera Health and Development Survey 2004 (Wave 5 Panel)
The Kagera Health and Development Survey 2004 (KHDS 2004) took place in 2004 as a fifth round following on the four rounds of the baseline Kagera Health and Development Survey 1991-1994 (KHDS 91-94). The KHDS 2004 was designed to provide data to understand economic mobility and changes in living standards of the sample of individuals interviewed 10-13 years ago. The KHDS 2004 attempted to reinterview all respondents ever interviewed in the KHDS 91-94. This entailed attempting to track these individuals, even if they had moved out of the village, region or country.
Kind of data
Sample survey data [ssd]
Unit of analysis
Kagera region of Tanzania
Domains: Agronomic zone (Tree Crop, Riverine, Annual Crop, Urban)
Unit of analysis
Producers and sponsors
Economic Development Initiatives
Danish Agency for Development Assistance
Knowledge for Change Trust Fund at the World Bank
Joachim De Weerdt
Management of data entry
University of Dar es Salaam
Sample size is 900 households
KHDS 91-94 Household Sample: First Stage
The KHDS 91-94 household sample was drawn in two stages, with stratification based on geography in the first stage and mortality risk in both stages. A more detailed overview of the sampling procedures is outlined in "User’s Guide to the Kagera Health and Development Survey Datasets." (World Bank, 2004).
In the first stage of selecting the sample, the 550 primary sampling units (PSUs) in Kagera region were classified according to eight strata defined over four agronomic zones and, within each zone, the level of adult mortality (high and low). A PSU is a geographical area delineated by the 1988 Tanzanian Census that usually corresponds to a community or, in the case of a town, to a neighborhood. Enumeration areas of households were drawn randomly from the PSUs in each stratum, with a probability of selection proportional to the size of the PSU.
Within each agronomic zone, PSUs were classified according to the level of adult mortality. The 1988 Tanzanian Census asked a 15 percent sample of households about recent adult deaths. Those answers were aggregated at the level of the "ward", which is an administrative area that is smaller than a district. The adult mortality rate (ages 15-50) was calculated for each ward and each PSU was assigned the mortality rate of its ward.
Because the adult mortality rates were much higher in some zones than others and the distribution was quite different within zones, "high" and "low" mortality PSUs were defined relative to other PSUs within the same zone. A PSU was allocated to the "high" mortality category if its ward adult mortality rate was at the 90th percentile or higher of the ward adult mortality rates within a given agronomic zone.
The KHDS 1991-1994 selected 51 communities as primary sampling units (also referred to as enumeration areas or clusters). In actuality, 2 pairs of enumeration areas were within the same community (in the sense of collecting community data on infrastructure, prices or schools). This, for community-level surveys, there are 49 areas to interview.
KHDS 91-94 Household Sample: Second Stage
The household selection at the second stage (with enumeration areas) was a stratified random sample. That is, households expected to experience an adult death were over-sampled. In order to stratify the population, an enumeration of all households was undertaken. Between March 15 and June 13, 1991, 29,602 households were enumerated in the 51 areas. In addition to recording the name of the head of each household, the number of adults in the household (15 and older), and the number of children, the enumeration form asked: "Are any adults in this household ill at this moment and unable to work? If so, the age of the sick adult and the number of weeks he/she has been too sick to work were also noted."
"Has any adult 15-50 in this household died in the past 12 months? If so, the age of each adult and the cause of death (illness, accident, childbirth, other) were also noted. The enumeration form asked explicitly about illness and death of adults between the ages of 15-50 because this is the age group disproportionately affected by the HIV/AIDS epidemic; it is the impact of these deaths that was of research interest. Out of over 29,000 households enumerated, only 3.7 percent, or 1,101, had experienced the death of an adult aged 15-50 caused by illness during the twelve months before the interview and only 3.9 percent, or 1,145, contained a primeage adult too sick to work at the time of the interview. Only 77 households had both an adult death due to illness and a sick adult. This supports the point that, even with some stratification based on community mortality rates and in an area with very high adult mortality caused by an AIDS epidemic, a very large sample would have had to have been selected to ensure a sufficient number of households that would experience an adult death during the two-year survey.
Using data from the enumeration survey, households were stratified according to the extent of adult illness and mortality. It was assumed that in communities suffering from an HIV epidemic, a history of prior adult death or illness in a household might predict future adult deaths in the same household. The households in each enumeration area were classified into two groups, based on their response to the enumeration:
- "Sick" huseholds: Those that had either an adult death (aged 15-50) due to illness in the past 12 months, an adult too sick to work at the time of the survey, or both (n=2,169).
- "well" households: Those that had neither an adult death (aged 15-50) due to illness nor an adult (aged 15-50) too sick to work (n=27,433).
In selecting the sixteen households to be interviewed in each enumeration area from which a enumeration area was drawn, fourteen were selected at random from the "sick" households in that enumeration area and two were selected at random from the "well" households. In one enumeration area, where the number of "sick" households available was less than fourteen, all available sick households were included in the sample; the numbers were balanced using well households. The final sample drawn for the first passage consisted of 816 households in 51 enumeration areas.
KHDS 2004 sampling strategy was to reinterview all individuals who were household members in any round of the KHDS 1991-1994 and who were alive in the last interview. [One serious problem that is side-stepped by this approach is constructing a definition of what makes a household the same household as 10 years ago, especially if there are individuals who have migrated, split-off or the household has dissolved.] The household in which these individuals live would be administered the full household questionnaire. For all household members alive during the last interview in 1991-1994, but found to be deceased by 2004, information about the deceased would be collected in the mortality questionnaire. This questionnaire intended to collect data on the circumstances of their death, as well as on their living arrangements and limited information on health seeking behavior prior to death. The respondents for this questionnaire were typically panel respondents who were previous household members with the deceased, other relatives, neighbors or close friends.
Although the KHDS is a panel of respondents and the concept of a 'household' after 10-13 years is a vague notion, it is common in panel surveys to consider recontact rates in terms of households. Table 7 shows the rate of recontact of the baseline households, where a recontact is defined as having interviewed at least one person from the household.3 In this case, the term household is defined by the baseline KHDS survey which spans a period of 2.5 years. Due to movements in and out of the household, some household members may have not, in fact, lived together in the household at the same time in the 1991-1994 rounds (for example, consider one sibling of the household head moving into the household for 1 year and then moving out, followed by another sibling moving into the household).
Excluding households in which all previous members are deceased (17 households and 27 people), the field team managed to recontact 93 percent of the baseline households. Not all 912 households received four interviews. Not surprisingly, households that were in the baseline survey for all four rounds had the highest probability of being reinterviewed. Of these 746 households, 96 percent were reinterviewed.
Because people have moved out of their original household, the new sample in KHDS 2004 consists of over 2,700 households from the baseline 832, which were recontacted. Much of the success in recontacting respondents was due to the effort to track people who had moved out of the baseline villages. One-half of all households interviewed were tracking cases, meaning they did not reside in the baseline communities. Of those households tracked, only 38 percent were located nearby the baseline community. Overall, 32 percent of all households were not located near the baseline communities. While tracking is costly, it is an important exercise because migration and dissolution of households are often hypothesized to be important responses to hardship. Excluding these households in the sample raises obvious concerns regarding the selectivity of attrition. In particular, out-migration from the village, dissolving of households, and even marriage, may be responses to adult mortality. At the same time, tracking will provide a unique opportunity to study these coping mechanisms: who uses them, what is the effect, do they get people out of poverty or do they themselves constitute a poverty trap.
Turning to recontact rates of the sample of 6,204 respondents, Reinterview rates are monotonically decreasing with age, although the reasons (deceased or not located) vary by age group. The older respondents were much more likely to be located if living, which is consistent with higher migration rates among the young adults in the sample. Among the youngest respondents, over three-quarter were successfully re-interviewed. Excluding people who died, 82 percent of all respondents were re-interviewed. Without tracking, re-interview rates of surviving respondents would have fallen from 82 percent to 52 percent. Non-local migration is not trivial; restricting the tracking to nearby villages would have resulted in 63 percent recontact of survivors. Migration proved to be an important factor in determining whether someone was recontacted. Respondents who were untraced were much more likely to be residing outside Kagera (52 percent) compare to their counterparts who were re-interviewed (9 percent).
KHDS 2004 tracked international migrants for Uganda only. Although the location of those in other countries was known, they were not traced. For those respondents who were not reinterviewed, the KHDS 2004 gives some information about their interactions with the reinterviewed respondents. Survey modules on the frequency of contact with all previous household members inform on the cash, in-kind and labor interactions between former household members.
Dates of collection
Mode of data collection
The KHDS 2004 mainly consists of a household survey covering a wide range of topics. The KHDS 2004 also includes three community questionnaires to accompany the household survey (community, price, and primary school questionnaires).
The KHDS 2004 project used the original questionnaires from the KHDS 91-94 as the foundation of the survey instruments. The household questionnaire collects information on a wide range of topics, including: housing amenities, consumption, income, assets, time allocation of individuals, business activities, remittances, support from organizations, education, and health, including anthropometric measures. The community questionnaire collects data on the physical, economic and social infrastructure of the baseline communities. The primary school questionnaire collects information on the amenities at schools, composition of the student body, and assistance to schools. Finally, up to three price observations are collected in each community from local markets/stalls on a list of commonly purchased food and non-food items.
Where possible, comparability is maintained with the KHDS 91-94 survey instruments. However, the questionnaires for the KHDS 2004 were revised to reflect changes in the region since 1994. Further, the household questionnaire was redesigned in an effort to capture key transitions that have occurred since the previous survey. These revisions included:
- Inclusion of a module on the incidence of economic shocks from the last 10 years (both positive and negative) for all panel respondents.
- Inclusion of a module on migration for respondents who relocated since the KHDS 91-94.
- Inclusion of a module on informal insurance groups.
- Expansion of questions on the circumstances of deaths.
- Inclusion of information on the remittances, loans, bride price payments, social communication and labor transfers between previous members of the KHDS 91-94.
This section of the Basic Information Document reviews the 4 surveys of the KHDS 2004. For each survey, substantial differences are highlighted between the survey instrument used in the 1991-1994 rounds and in 2004.
Users are encouraged to use this as a general guide to understand the questionnaires; however, this should not substitute for looking at the actual questionnaires directly. Users are encouraged to look directly at the survey instruments for literal question wording and to identify differences between survey instruments. The household questionnaires are available in Swahili (as used in the field) and English (a translated version of the Swahili field questionnaire); the community surveys were produced only in English.
Household Questionnaire: Review of Sections
The household questionnaire is divided into numerous sections, each of which covers a fairly distinct aspect of household activities. Anthropometric measurements and the questionnaire on mortality of household members are administered in separate forms attached to the household questionnaire.
Each section of the household questionnaire has four types of respondents selected according to the content of the section: the interviewer, household head, most knowledgeable person in the household and individual household members. The only section for which household members are not respondents is the first section covering basic survey information (household location, GPS Coordinates, interviewing language, completion status of section, etc…).
Household Questionnaire: Highlights of Substantial Differences
Many changes were made in the KHDS 2004 household questionnaire compared to the KHDS 91-94 household questionnaire. Some questions were added and some dropped. Section 4 arrangement was also revised to provide better continuity during interviews. The following are the main changes included in the 2004 questionnaire:
- Section 9 (Fertility) and Section 11C (Age of tree crops) from the 1991-1994 questionnaire were dropped in 2004.
- Section 13 (Fishing) from 1991-1994 was incorporated in Section 13 (non-farm business) in 2004.
- In 1991-1994, information in Section 7 was collected on the main job and secondary job done in the past 12 months, while in 2004 data was collected only on the main job done.
- In 1991-1994, Section 20b collected information on deaths of deceased relatives who were not household members. In 2004, the mortality questionnaire covered only deceased household members from the 1991-1994 survey.
- The following were new sections introduced in 2004: Section 10 (Shocks experienced in the past 10 years), Section 15C (Household Two-Week Expenditures), Section 15D (Inheritance and Bride Price), Section 17B (Ability to cope) and Section 18A (Interactions with Network Members).
Detailed information on the key changes, section-by-section, between the KHDS 91-94 household questionnaire and the KHDS 2004 household questionnaire are provided in Kathleen Beegle, Joachim De Weerdt and Stefan Dercon, March, 3 2006, "Kagera Health and Development Survey 2004 - Basic Information Document".
Community Questionnaire: Highlights of Substantial Differences
The substantial changes to the community questionnaire include:
• A new section was included on shocks experienced in the past 10 years (Section 7).
• Data was collected on population share of ethnic groups.
• GPS coordinates were taken in each community for all enumeration areas.
• Questions on access to roads, electricity and water were introduced.
• Questions on the culture of mourning were asked for three different periods: the time of the survey, 10 years prior to the interview and 20 years prior to the interview.
• Information was collected on access to vocational training and secondary education.
• Information was also collected on temporary migration and seasonal employment of community members.
See details in Kathleen Beegle, Joachim De Weerdt and Stefan Dercon, March, 3 2006, "Kagera Health and Development Survey 2004 - Basic Information Document".
Price Questionnaire: Highlights of Substantial Differences
Overall there were no substantive changes; a few items were added to the list.
See details in Kathleen Beegle, Joachim De Weerdt and Stefan Dercon, March, 3 2006, "Kagera Health and Development Survey 2004 - Basic Information Document".
Data entry was done at the main office in Bukoba, concurrent with the main fieldwork. The data entry team consisted of seven data entry operators and one data entry supervisor. Data was entered in CsPro then transformed to Stata format. Questionnaires were entered and verified after each entry. Although internal consistency checks were performed in CsPro, in addition to more elaborate checks for inconsistency and outliers were done in Stata.
All responses obtained from individual, household, and community level interviews were recorded in questionnaires. In cases where the respondent did not know the answer, the interviewers recorded "DK" (Don't know) in the questionnaires. Data entry were trained to input this as nine (9) which represents missing information in the datasets. In cases where nine was an eligible code, the highest value for the number of digits was entered. For example, DK's for questions with up to two eligible digit codes were entered as 99; 999 was entered for DKs for questions with eligible three digit codes (assuming 999 was not otherwise an eligible response). For the mortality questionnaire, in some cases, multiple informants were interviewed. The data were consolidated such that each baseline household has one mortality questionnaire in the data files (with, perhaps, multiple deceased therein).
In receiving these data it is recognized that the data are supplied for use within my organization, and I agree to the following stipulations as conditions for the use of the data:
1. The data are supplied solely for the use described in this form and will not be made available to other organizations or individuals. Other organizations or individuals may request the data directly.
2. Three copies of all publications, conference papers, or other research reports based entirely or in part upon the requested data will be supplied to:
The World Bank
Development Economics Research Group
LSMS Database Administrator
1818 H Street, NW
Washington, DC 20433, USA
tel: (202) 473-9041
fax: (202) 522-1153
3. The researcher will refer to the 2004 Kagera, Tanzania Health and Development Survey as the source of the information in all publications, conference papers, and manuscripts. At the same time, the World Bank is not responsable for the estimations reported by the analyst(s).
4. Users who download the data may not pass the data to third parties.
5. The database cannot be used for commercial ends, nor can it be sold.
Kagera Health and Development Survey 2004. Ref. TZA_2004_KHDS_v01_M. The World Bank.
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.