Survai Aspek Kehidupan Rumah Tangga Indonesia 1993
The 1993 Indonesia Family Life Survey (IFLS) provides data at the individual and family level on fertility, health, education, migration, and employment. Extensive community and facility data accompany the household data. The survey was a collaborative effort of Lembaga Demografi of the University of Indonesia and RAND, with support from the National Institute of Child Health and Human Development, USAID, Ford Foundation, and the World Health Organization. In Indonesia, the 1993 IFLS is also referred to as SAKERTI 93 (Survai Aspek Kehidupan Rumah Tangga Indonesia). The IFLS covers a sample of 7,224 households spread across 13 provinces on the islands of Java, Sumatra, Bali, West Nusa Tenggara, Kalimantan, and Sulawesi. Together these provinces encompass approximately 83 percent of the Indonesian population and much of its heterogeneity. The survey brings an interdisciplinary perspective to four broad topic areas:
• fertility, family planning, and contraception
• infant and child health and survival
• education, migration and employment
• the social, economic, and health status of adults, young and old
Additionally, extensive community and facility data accompany the household data. Village leaders and heads of the village women's group provided information in each of the 321 enumeration areas from which households were drawn, and data were collected from 6,385 schools and health facilities serving community residents.
Household Survey data were collected for household members through direct interviews (for adults) and proxy interviews (for children, infants and temporarily absent household members). The IFLS-1 conducted detailed interviews with the following household members:
- The household head and their spouse
- Two randomly selected children of the head and spouse aged 0 to 14 (interviewed by proxy)
- An individual age 50 and above and their spouse, randomly selected from remaining members
- For a randomly selected 25 percent of the households, an individual age 15 to 49 and their spouse, randomly selected from remaining members.
The Community and Facility Survey collected data from a variety of respondents including: the village leader and his staff and the leader of the village women's group; Ministry of Health clinics and subclinics; private practices of doctors, midwives, nurses, and paramedics; community-based health posts and contraceptive distribution centers; public, private, and religious elementary schools; public, private, and religious junior high schools; public, private, and religious senior high schools. Unlike many other surveys, the sample frame for the survey of facilities was drawn from the list of facilities used by household survey respondents in the area.
Producers and sponsors
Lembaga Demografi (LD)
University of Indonesia
National Institute of Child Health and Human Development
United States Agency for International Development
World Health Organization
National Institute of Child Health and Human Development
Funding for revised IFLS1 data (IFLS1-RR) and documentation
The Household Survey Sampling Procedure
The household survey component of the 1993 IFLS was designed to collect contemporaneous and retrospective information on a wide array of family life topics for a representative sample of the Indonesian population. In IFLS1 it was determined to be too costly to interview all household members, so a sampling scheme was used to randomly select several members within a household to provide detailed individual information. IFLS1 conducted detailed interviews with the following household members:
- the household head and his/her spouse
- two randomly selected children of the head and spouse age 0 to 14
- an individual age 50 or older and his/her spouse, randomly selected from remaining members, and
- for a randomly selected 25% of the households, an individual age 15 to 49 and his/her spouse, randomly selected from remaining members.
The IFLS sampling scheme stratified on provinces, then randomly sampled within provinces. Provinces were selected to maximize representation of the population, capture the cultural and socioeconomic diversity of Indonesia, and be cost effective given the size and terrain of the country. The far eastern provinces of East Nusa Tenggara, East Timor, Maluku and Irian Jaya were readily excluded due to the high costs of preparing for and conducting fieldwork in these more remote provinces. Aceh, Sumatra's most northern province, was deleted out of concern for the area's political violence and the potential risk to interviewers. Finally, due to their relatively higher survey costs, we omitted three provinces on each of the major islands of Sumatra (Riau, Jambi, and Bengkulu), Kalimantan (West, Central, East), and Sulawesi (North, Central, Southeast). The resulting sample consists of 13 of Indonesia's 27 provinces: four on Sumatra (North Sumatra, West Sumatra, South Sumatra, and Lampung), all five of the Javanese provinces (DKI Jakarta, West Java, Central Java, DI Yogyakarta, and East Java), and four provinces covering the remaining major island groups (Bali, West Nusa Tenggara, South Kalimantan, and South Sulawesi). The resulting sample represents 83 percent of the Indonesian population. (see Figure 1.1 of the Overview and Field Report in External Documents). Table 2.1 of the same document shows the distribution of Indonesia's population across the 27 provinces, highlighting the 13 provinces included in the IFLS sample.
The IFLS randomly selected enumeration areas (EAs) within each of the 13 provinces. The EAs were chosen from a nationally representative sample frame used in the 1993 SUSENAS, a socioeconomic survey of about 60,000 households.The SUSENAS frame, designed by the Indonesian Central Bureau of Statistics (BPS), is based on the 1990 census.The IFLS was based on the SUSENAS sample because the BPS had recently listed and mapped each of the SUSENAS EAs (saving us time and money) and because supplementary EA-level information from the resulting 1993 SUSENAS sample could be matched to the IFLS-1 sample areas.Table 2.1 summarizes the distribution of the approximately 9,000 SUSENAS EAs included in the 13 provinces covered by the IFLS. The SUSENAS EAs each contain some 200 to 300 hundred households, although only a smaller area of about 60 to 70 households was listed by the BPS for purposes of the annual survey. Using the SUSENAS frame, the IFLS randomly selected 321 enumeration areas in the 13 provinces, over-sampling urban EAs and EAs in smaller provinces to facilitate urbanrural and Javanese-non-Javanese comparisons. A straight proportional sample would likely be dominated by Javanese, who comprise more than 50 percent of the population. A total of 7,730 households were sampled to obtain a final sample size goal of 7,000 completed households. Table 2.1 shows the sampling rates that applied to each province and the resulting distribution of EAs in total, and separately by urban and rural status. Within a selected EA, households were randomly selected by field teams based upon the 1993 SUSENAS listings obtained from regional offices of the BPS. A household was defined as a group of people whose members reside in the same dwelling and share food from the same cooking pot (the standard BPS definition). Twenty households were selected from each urban EA, while thirty households were selected from each rural EA. This strategy minimizes expensive travel between rural EAs and reduces intra-cluster correlation across urban households, which tend to be more similar to one another than do rural households. Table 2.2 (Overview and Field Report) shows the resulting sample of IFLS households by province, separately by completion status.
Selection of Respondents within Households
For each household selected, a representative member provided household-level demographic and economic information. In addition, several household members were randomly selected and asked to provide detailed individual information.
The Community Survey Sampling Procedure
The goal of the CFS was to collect information about the communities of respondents to the household questionnaire. The information was solicited in two ways. First, the village leader of each community was interviewed about a variety of aspects of village life (the content of this questionnaire is described in the next section). Information from the village leader was supplemented by interviewing the head of the village women's group, who was asked questions regarding the availability of health facilities and schools in the area, as well as more general questions about family health in the community. In addition to the information on community characteristics provided by the two representatives of the village leadership, we visited a sample of schools and health facilities, in which we conducted detailed interviews regarding the institution's activities.
A priori we wanted data on the major sources of outpatient health care, public and private, and on elementary, junior secondary, and senior secondary schools. We defined eight strata of facilities/institutions from which we wanted data. Different types of health providers make up five of the strata, while schools account for the other three. The five strata of health care providers are: government health centers and subcenters (puskesmas, puskesmas pembantu); private doctors and clinics (praktek umum/klinik); the private practices of midwives, nurses, and paramedics (perawats, bidans, paramedis, mantri); traditional practitioners (dukun, sinshe, tabib, orang pintar); and community health posts (posyandu, PPKBD).The three strata of schools are elementary, junior secondary, and senior secondary. Private, public, religious, vocational, and general schools are all eligible as long as they provide schooling at one of the three levels.
Our protocol for selecting specific schools and health facilities for detailed interview reflects our desire that selected facilities represent the facilities available to members of the communities from which household survey respondents were drawn. For that reason we were hesitant to select facilities based solely either on information from the village leader or on proximity to the village center. The option we selected instead was to sample schools and health care providers from lists provided by respondents to the household survey.
For each enumeration area lists of facilities in each of the eight strata were constructed by compiling information provided by the household regarding the names and locations of facilities the household respondent either knew about or used. To generate lists of relevant health and family planning facilities, the CFS drew on two pieces of information from the household survey. The IFLS queried wives of household heads as to whether they, a family member, a friend, or someone else they knew had ever used a particular health facility, such as a health center (section PP of Book I, excerpted in Appendix B). When women responded positively, they were asked to provide the name and location of a facility of that type. When women responded negatively, they were asked if they knew of any facilities of that type, and if so, were asked about the name and location of the facility. These responses provided one source of information regarding health facilities of relevance to community members. Information was collected for four types of facilities/providers: government health centers and subcenters; private clinics; private doctors' practices; the practices of nurses, midwives, and paramedics; and traditional practitioners.
In Indonesia health facilities are also a source of contraceptives. Ever married women between the ages of 15 and 49 were asked whether they knew about various of methods of contraception (Section CX, Book IV, excerpted in Appendix B). When women knew of a method, they were asked to identify the specific facility from which they could obtain that method. For three methods (oral contraceptives, IUDs, and injectables), the name and location of the facility that the woman mentioned was added to the list of health providers if it fell into one of the five strata to be visited by the CFS team. The information from the "knowledge of contraceptive methods" section is the only source of information about the names and locations of community health posts.
The two sources of household information about health facilities are not tied solely to use of those facilities/providers by household members. Though it is possible (and probable) that someone in the household has used the facility that is mentioned, any facility known to the respondent may be mentioned. An alternative procedure would be to base the list on facilities the respondent (or another household member) has actually used in the recent past. We rejected this approach because we felt it would result in a more limited picture of community health care options (since use of health care is sporadic), and possibly be biased by factors such as what illnesses were common around the time of the interview.
The lists of schools were obtained in a slightly different manner. The respondent to the household roster (Section AR, Book I, excerpted in Appendix B) provided the name and location of all schools currently attended by household members under 25 years of age. Consequently, the lists of schools compiled from household information are all schools attended by at least one member of at least one IFLS household. For each enumeration area eight lists of facilities (one per strata) were constructed based on the combined household responses from that EA. Tables 3.1 and 3.2 (Overview and Field Report) provide the cumulative distributions of the numbers of facilities (by strata) identified within EAs. For example, the combined number of health centers identified was less than six in 80 percent of the 132 rural EAs in which we interviewed. The combined numbers of health centers identified was less than six in 68 percent of the 189 urban EAs in which we interviewed. Thus, on average, the combined household responses in urban EAs generate a longer list of health centers than do the combined responses in rural EAs. On average, the lists are longer in urban areas than in rural areas for doctors/clinics and all levels of schools as well. However, on average, the lists are longer in rural areas than in urban areas for nurses/midwives and for traditional practitioners.
Of the 7,730 households sampled, a complete interview was obtained for 7,039 households or 91.1 percent of households. A partial interview (i.e., roster-level information was obtained but only a subset of selected household members were interviewed) was obtained for another 185 households (2.4 percent of households), while 506 sampled households (6.5 percent) were not interviewed.2 The completion rate ranged from a low of 87 percent to a high of 97 percent across the thirteen provinces. The final sample of 7,224 partially or fully completed households consists of 3,436 households in urban areas (90.7 percent partial/full completion rate), and 3,788 households in rural areas (95.9 percent partial/full completion rate).
Community and Facility Survey:
Not all identified facilities are eligible for interview. Facilities were excluded if they had been interviewed in connection with a previous EA, if they were more than a 45 minute motorcycle trip, or if they were located in another province. The facilities on each list were ranked by frequency of mention. These ranked lists provided frames for each stratum from which a sample of two to four facilities was drawn. In all strata, the most frequently mentioned facility was always visited. Additional facilities were randomly selected to fill the quota for that stratum. In each EA, the interview target for health centers and subcenters was four. The target was three for nurse/midwife/paramedic's practices, community health posts, elementary schools, and junior secondary schools. The target was two for senior secondary schools, traditional practitioners, and doctors' practices/clinics. In some enumeration areas the pooled household responses did not generate a
sufficient number of facilities to fill the quota. In these cases information from the village leader was used to supplement the sample. The average number of facilities (by strata) interviewed per EA is presented in Table 3.3. Numbers of facilities (by strata) interviewed in each province are presented in Tables 3.4 and 3.5(Overview and Field Report).
Household Survey Weighting
The IFLS Household Survey was designed to support a range of analyses based on a smaller, but richly detailed micro-level database covering a wide array of demographic, economic, and health outcomes. The survey was not envisioned as a database to produce national-level or provincial-level estimates of demographic or economic variables. (Other Indonesian surveys such as the SUSENAS are better suited for this purpose.) The public use file does include a series of household and individual analytic weights so that analysts can adjust, when appropriate, for the IFLS household and within-household sampling procedures. The weights are discussed further in The 1993 Indonesian Family Life Survey: Appendix C, Household Codebook (DRU-1195/4-NICHD/AID).
The household weights are designed to correct for the over-sampling of urban EAs and EAs in smaller provinces discussed above and summarized in Table 2.1 (Overview and Field Report), as well as the differential sampling rates in urban and rural EAs. When the household weights are applied to the IFLS household sample, the resulting weighted distribution will reflect the 1993 distribution of households by urban and rural status within each of the 13 Indonesian provinces covered by the IFLS. The 1993 distribution of households by province and urban/rural status was generated from 1993 projected population counts provided by BPS and from average household sizes computed from the 1993 SUSENAS. BPS projected population counts were divided by average household sizes to get an estimate of the number of households in 1993 in each province/urban–rural strata.
The public use file contains three types of individual weights: respondent weights, roster weights, and anthropometry weights.
Respondent weights. The respondent weights are designed to adjust for the within household sampling scheme used to select respondents for detailed interview. From the household roster, the number of household members eligible to be a Book III, IV or V respondent within each household was determined based on the intra-household sampling rules discussed above. Sampling probabilities were then computed for individuals in each of four sampling groups:
1) household heads and their spouse;
2) among remaining members, individuals age 50 or over and their spouse;
3) among remaining members, individuals age 15-49 and their spouse;
4) children of household head/spouse age 0-14 (includes fostered children).
Individuals in the third group were eligible for interview in one out of every four households, so individuals in that group had only a 25 percent probability of selection in addition to their probability of selection within that group. Furthermore, a household could have a maximum of four Book III respondents (see the earlier discussion of the within household sampling rules)/ Because only 13 households had more than 4 selected respondents, no additional adjustment was made to the weights for these cases. The computed sampling probability for the individual respondent was then inverted to create a respondent weight for that person. Only eligible respondents of Books III, IV or V were given a respondent weight; respondents for those books who were incorrectly chosen by interviewers were given a respondent weight of zero. Examples of such “ineligible” respondents are children age 0-14 who are not biological or adopted children of the household head and spouse but who have a parent in the household, and individuals in the third group who were interviewed even though the household was not in the 25 percent of the sample where such respondents were eligible for interview. The respondent weight (i.e., the inverted sampling probability) was then normalized within each of the sampling groups above. By construction, this normalized weight sums to the number of eligible respondents within the respondent’s sampling group across the 7,224 households where a Book I was completed. Finally, the normalized respondent weight was capped at a value of 3 (99 percent had a weight of 3 or less) to adjust for outliers: individuals with tiny probabilities of selection and thus given very large weights could distort weighted tabulations.
Roster weights. The roster weights are designed so that the weighted age and sex distribution of individuals in the household roster data will reflect the 1993 population age and sex distribution by urban and rural strata within the 13 provinces covered by the survey. Five-year age groupings were used, where individuals age 75 and older were treated as one group. The population distribution was based on data from the 1993 SUSENAS. The roster weight is the ratio of the 1993 SUSENAS population proportion to the household roster proportion for the given province/urban-rural/sex/age group strata into which the individual falls. A roster weight was calculated for all household members listed in the roster (Book I, section AR). If the individual’s age was missing, an age group for the individual was imputed. The imputation involved examining the age of the individual’s spouse and children; if the individual was a Book III, IV or V respondent, dates and ages provided in those sections were used as part of the imputation.
Anthropometry weights. The anthropometry weights are designed to account for the intra-household sampling scheme used to select the respondents who were weighed and measured. All respondents of Books III, IV or V and any additional children under age 6 living in the household were eligible for anthropometric measurement. Respondents of Books III, IV and V who were measured were given an anthropometry weight equal to their respondent weight (unnormed and uncapped); other children under age 6 were given the household weight (based on the 7,224 household sample). Household members who were measured but not eligible (i.e., they did not fit the selection criteria) were given an anthropometry weight of zero. The initial anthropometry weight was then normalized to sum to the number of those across all households who were eligible to be measured, to account for the fact that not all household members eligible for anthropometric measurement were actually measured. Finally, as with the respondent weight, the anthropometry weight was capped at 3 to control for those with very small probabilities of selection.
Community and Facility Survey Weighting
The CFS was designed to provide extensive community and facility information to complement the household data. The CFS was not designed to produce nationallyrepresentative estimates of community and facility distributions or characteristics. The weights are included so that users can adjust for sampling procedures in their analyses. The CFS database has two basic sets of weights: community weights and facility weights.
The community weights are designed to correct for the over-sampling of urban EAs and EAs in smaller provinces. When weighted, the CFS communities reflect the number of EAs in the province/urban-rural strata in which the community lies. The total number of EAs in a given province and urban-rural strata was computed using 1993 SUSENAS sampling frame data from BPS. The community weight variable is the ratio of the number of actual EAs to the number of sampled EAs.
Ideally a facility should receive a weight that is equal to that facility's sampling probability, where the sampling probability is a function of the sampling scheme and the sampling frame.
As discussed in the Sample Design and Response Rates section, the sampling frame for the facility survey is generated by household responses to questions about relevant facilities. This frame is incomplete to the extent that the sample of household respondents fails to identify all facilities of relevance to the population of the EA. The sampling scheme specifies that the probability of being sampled is proportional to market share.The construction of weights based on sampling probabilities is complicated by the fact that we do not each facility's true market share. Instead, we know the market share that a particular facility captures among the sample of household respondents in the EA. We use a model of market shares to simulate observed market shares, assuming a fixed number of household respondents and multinomial sampling. Comparison of the simulated outcomes to the observed outcomes yields an estimate of the true number facilities in each EA. The estimated number of facilities in each EA specifies the estimated market share and thus the rank for each facility in the EA.
The next step is to determine the place of each observed facility in the estimated distribution of all facilities and their associated market shares. We do not know the true market share (or even the rank) of an observed facility among all facilities. Instead, we observe a facility's rank (as determined by the number of respondents mentioning that facility) among those facilities identified by our sample of EA residents. This observed rank may or may or not be the true rank. For example, the most frequently mentioned facility among sampled EA residents might be only the second or third most frequently mentioned facility if one were to interview all EA residents.
Although the observed rank does not necessarily equal the true rank, it provides information about the true rank. Using the observed rank we make a probabalistic determination of each facility's true rank. We then determine its sampling probability using this model. Our final weight can be summarized as an estimate of the probability that we would sample an observed facility if we conducted another survey using the same sample design.
Dates of collection
Mode of data collection
Data collection supervision
Dr. Sulistinah Achmad was the LD survey director based in Jakarta. She collaborated with a group of seven senior supervisors (Korlaps) who were all staff at LD, and a group of six junior supervisors (Jaksups) who were employed during the survey period. The Jaksups were all recent college graduates, and were teamed with a more senior Korlap who supervised and guided their work. The Korlaps and Jaksups each had overall responsibility for a given province and served as the group of LD field coordinators. Dr. I.G.N. Agung and Dr. Sri Harijati Hatmadji were the LD directors of the CFS. They collaborated with RAND and LD staff in designing and implementing the fieldwork plan. Three CFS Korlaps provided supervision for the CFS teams in the field. The field work in each province was carried out by one to three interviewing teams, depending on the size of the sample; a total of 21 teams covered the 13 provinces. Each household survey team consisted of one team supervisor, six to eight interviewers, one editor and one anthropometrist. CFS teams were composed of one supervisor and three interviewers. This ratio of team supervisors to interviewers allowed proper supervision to insure the quality of the data collected. Teams differed with respect to their ethnic mix and language skills so that they would closely match the language requirements of the region to which they were assigned.
The team supervisor was responsible for contacting the village leader to make preparations for the arrival of the team and to establish a team ‘base camp’. The HH team supervisor handled the EA sample materials, assigned the work of each interviewer, and was responsible for all record keeping (e.g., production log). The CF team supervisor was responsible for drawing the sample of facilities to be interviewed, record keeping, interviewing the village leader, arranging transportation to facilities (typically motorbikes were rented) and assigning interviewers to specific facilities. HH and CF supervisors reported to Jakarta every week by telephone or fax and were responsible for shipping the hard copy questionnaires to Jakarta. The supervisor was also responsible for ensuring the high quality of the data collection. In this capacity, he conducted regular observations of his interviewers and verifications. He also performed troubleshooting and retrained individual interviewers as needed.
A team of RAND researchers representing a variety of disciplines (e.g., economists, demographers, sociologists, health experts, and survey methodologists), in conjunction with LD research staff, spent nearly 18 months developing the detailed data collection instruments for the household and community-facility components of the IFLS. Other members of the U.S. and Indonesian research community were consulted through a workshop held at RAND in March 1992, and an informal session in Denver at the 1992 annual meetings at the Population Association of America.
The length and complexity of the IFLS household (HH) and community-facility (CF) questionnaires required a wide array of development techniques in Indonesia to refine the instrument. Specifically, small pilot surveys and focus groups were used for initial questionnaire development, while larger pretests were employed for refinement of questionnaires and field procedures. Where appropriate, existing survey instruments were used as the basis for the first versions of the instrument. Sources included the Malaysian Family Life Surveys (MFLS-1 and -2), for all sections; the Indonesian Resource Mobilization Study for health status, provider utilization, and time allocation; and the Demographic and Health Surveys for fertility and contraception questions. However, questions adapted from English questionnaires often required significant alteration to make them culturally appropriate. Facility questionnaires were presented to officials at the Ministry of Health and the Ministry of Education. Suggestions received during these briefings were incorporated into revised versions of the questionnaires.
During the 18-month development period, a series of small-scale pilot tests and two full scale pretests were conducted as part of the household questionnaire development process. The first pretest site was in Sukabumi, an area in West Java, while the second took place in the province of Lampung. Each pretest sampled 20 urban households and 30 rural households for interview. The first pretest focused on the questionnaire instrument, while the second also tested the training and field procedures. The first pretest was conducted by LD staff who served as interviewers. This approach provided optimal feedback since these interviewers were intimately familiar with the study objectives and questionnaire content. For the second pretest, a separate field staff was hired and trained, with the LD staff serving as trainers. RAND and LD staff were onsite during the training, fieldwork and debriefing sessions of both pretests. The CFS questionnaires and field procedures were pretested in several sites in Jakarta and West Java.
The structure of the Household Survey questionnaire is summarized in table 2.6 of the Overview and Field report, and the content summarized under the Survey Instruments section of the same report. The questionnaire subsections are also listed below:
Book K: Control Book
-Module SC: Sampling and enumeration record
-Module IK : Recontact information
-Module PS: Within-household sample selection
-Module FP: Questionnaire tracking form
Book I: Household Roster and Characteristics
-Module AR: Household member roster
-Module KR: Household characteristics
-Module KS: Consumption
-Module PP: Outpatient care provider knowledge
Book II: Household Economy
-Module UT: Farm business
-Module NT: Nonfarm business
-Module PH: Labor and nonlabor income
-Module HR:Household assets
-Module GE: Household economic shocks
-Module AK: Health insurance
Book III: Adult Information
-Module DL: Education history
-Module TK: Employment history
-Module AW: Time allocation
-Module KW: Marital history
-Module BR: Pregnancy summary (women 50+)
-Module MG: Migration history
-Module SR: Circular migration history
-Module KM: Tobacco smoking
-Module KK: Health condition
-Module MA: Acute morbidity
-Module PS: Self-treatment
-Module RJ: Outpatient utilization
-Module RN: Inpatient utilization
-Module BA: Noncoresident family roster and transfers
-Module TF: Other transfers
-Module HI: Individual assets & nonlabor income
Book IV: EMW Information
-Module KW: Marital history
-Module BR: Pregnancy summary
-Module CH: Pregnancy and infant feeding history
-Module CX: Contraceptive knowledge and use
-Module KL: Contraceptive calendar
Book V: Child Information
-Module DLA: Child education history
-Module MAA: Child acute morbidity
-Module PSA: Child self-treatment
-Module RJA: Child outpatient utilization
-Module RNA: Child inpatient utilization
Book CA: Anthropometric Record
-Module CA: Anthropometric Measurements
Community and Facility Survey
Three books constitute the community questionnaire in IFLS-1, while another set of instruments comprise the facility questionnaire. The questionnaire subsections are summarized in Table 3.6. and listed below:
Book 1: Village Heads
- Module LK: Basic information
- Module A: Transportation
- Module B: Electricity
- Module C: Water sources and sanitation
- Module D: Agriculture and industry
- Module E: History and climate
- Module F: Migration
- Module G: Credit institutions
- Module I: History of schools
- Module J: History of health services availability
- Module K: Respondents’ identities
Book 2: Village Records
- Module LK: Basic information
- Module S: Statistics
- Module OL: Direct observation
Book PKK: Women's Group
- Module LK: Basic information
- Module H: Food prices
- Module I: History of schools
- Module J: History of health services availability
- Module LK: Basic information
- Module A: Head of facility
- Module B: Development of facility
Book PUSK: Government Health Centers
- Module C: Service availability
Book DR: Private doctors and clinics
- Module D: Staff
Book BIDAN: Nurses, midwives, and paramedics
- Module E: Equipment and supplies
Book PPKB: Community health and FP post
- Module F: Direct observation
Book TRAD: Traditional healers
- Module G: Family planning services
- Module H: Family planning vignette
- Module I: Preg exam vignette
- Module J: Cough, fever vignette
- Module K: Vomit, diarrhea vignette
- Module LK: Basic information
Book SD: Primary
- Module A: Principal
Book SMP: Junior Secondary
- Module B: School characteristics
Book SMA: Senior Secondary
- Module C: Teachers
- Module D: Classrooms
- Module E: Test scores, revenues
RAND Family Life Surveys
University of Indonesia
All data entry was conducted centrally in Jakarta by a staff of data entry personnel. Data entry supervisors were members of LD’s permanent staff, while keypunchers were recruited from local universities for the data entry period. Data entry personnel were trained in data entry techniques and in the use of ISSA, a computer-assisted data entry program that allowed immediate checks on data consistency and logic. Once an enumeration area was completed, the questionnaires were packed and shipped to Jakarta with a packing sheet identifying the enclosed questionnaires by number.Questionnaires were then assigned for data entry in batches by enumeration area. Data were entered using ISSA with 100 percent verification (i.e., double entered). Batch editing programs were used in Indonesia to further check the data for completeness and consistency.
The data was transcribed from the recording forms into the PC-based data entry system ISSA(Integrated System for Survey Analysis)6 , by staff at Lembaga Demografi (LD) of the University of Indonesia. The data entry program was developed by Nick Murray of RAND with the assistance of LD staff. All data was 100%-verified at data entry (i.e., double entered) and the data entry program contained checks on valid ranges and skip patterns. Upon receipt of the IFLS data at RAND, the ISSA ASCII files were converted into SAS® files for use in the data cleaning process and the preparation of a public use file version of the data. Due to the double entry and data entry program checks, data entry errors were basically nil. The source of remaining data errors was interviewer error and respondent error. Based on problems uncovered so far, there appears to be about a 1-2 percent interviewer/respondent error rate. For files that have, for example, 20,000 records, the 1-2 percent error rate suggests 200-400 records with potential problems. In more complicated sections of the questionnaire, this rate may be a bit higher.
Given the size and complexity of the IFLS-HH and IFLS-CF databases and the available project resources, the preparation of the public use files required a data cleaning strategy that would meet basic user needs and make the data available to the research community in a reasonable time frame. Given 100%-verification at data entry, the basic approach, then, was to concentrate on those data cleaning activities which required access to information that was privacy protected. Such cleaning activities could only be done at RAND. Priorities were given to the cleaning of identifier variables--respondent identifiers, anthropometry roster identifiers, household members mentioned elsewhere in the IFLS-HH besides the household roster, the non-coresident sibling and children rosters, and facility identifiers. Within the IFLS-HH, efforts also focused on trying to clean the household roster data so that it could serve as the main source of basic demographic information on household members.
Users could then take information from the household roster and use it throughout to provide consistency in characteristics. Additional areas where data checking efforts were made reflect those sections of interest to projects within the P01 grant that included original IFLS funding and those of interest to the report prepared for AID, a sponsor of the survey. Those areas included anthropometric data, income data, outpatient and inpatient utilization, education status and expenses, pregnancy histories and infant feeding, and interhousehold transfers. Efforts also focused on trying to provide as much translated material as possible. Users should be aware that similar information was sometimes collected in more than one section and sometimes from different individuals. One data preparation activity that was not able to be done in much detail before public release was the examination of inconsistencies in responses by household members to the same item or event, or by a given respondent to the same event asked about in more than one place. In general, the public use files do not include efforts to reconcile possible differences.
Data problems due to interviewer error were the types of problems on which cleaning efforts focused. Discrepancies or oddities due to respondent confusion or to different respondents referring to the same event were generally not addressed during data cleaning. After public release, subsequent data cleaning efforts sponsored by RAND projects will continue and results of those efforts will be made available to the IFLS user community.
Following a policy of not “over-cleaning” data, only those changes for which we had solid information on the correct value were incorporated in to the IFLS-HH and IFLS-CF data. Numerous other suggested changes are available in a set of “fixes” files which contain SAS® programming statements to fix variable values that we believe are in error with our best guess at the correct response. Many of these suggested fixes came from preliminary analyses of selected sections of the IFLS database. The IFLS-HH and IFLS-CF codebooks identify those variables which have suggested fixes available. Users are welcome to incorporate these corrections in their data if they so choose. These “fixes” files are listed in the respective IFLS codebook introductory sections.
Observations on Data Quality
Data problems that were uncovered in the survey are discussed in detail in the IFLS1 User's Guide under "Observations on Data Quality" pages 8-29. Many of the problems discussed have been corrected in the IFLS public use database so are noted in the User's' Guide data quality section. In some cases, however, only suggested corrections are provided via the “fixes” files described above, and are noted accordingly. In other cases, decisions on how to handle a particular problem belong in the hands of the research analyst and in such cases, we alert users to the type of problems we have uncovered, but do not provide suggested fixes. The discussion in the User's Guide may help users understand remaining interviewer and respondent errors not detected before public release.The User's Guide is provided as external resources.
Our experiences with the public release of other survey data such as MFLS-1 and MFLS-2 have led us to develop a policy of cleaning -- but not ‘overcleaning’ -- public use data. In addition, since most researchers will want to construct their own analysis files, merging and selecting from the data in several ways, the public use files are designed to give users the flexibility they need to put together different types of analysis files. Upon completion of data entry, the keypunched data were shipped to RAND in Santa Monica for data cleaning and public use file preparation. Since all data were 100 percent verified at data entry and the data entry program contained checks on valid ranges and skip patterns, data entry errors were basically nonexistent. Consequently, data cleaning efforts initially focused on those activities which required access to information that was privacy protected, such as individual and facility identifiers. In addition, the principal survey materials such as questionnaires and interviewer manuals were translated from Bahasa Indonesia to English. After the initial public release of the IFLS data, subsequent data cleaning efforts sponsored by RAND projects will continue and results of those efforts will be made available to the IFLS user community through Family Life Surveys Home Page on the World Wide Web (http://www.rand.org/organization/drd/labor/FLS) and the FLS Newsletter.
The IFLS data are placed in the public domain to support research analyses. As a user of the IFLS public use files, you are expected to respect the anonymity of all our respondents. This means that you will make no attempt to identify any individual, household, family, service provider or community other than in terms of the anonymous codes used in the IFLS.
Please do not distribute these data. The data are freely available on our website. It is useful for everyone if we maintain a list of all users. If you plan to work with other people using these data, please ask them to register or register them yourself. If you are a data librarian, please ask users to register on our web page if they obtain a copy of the data from you.
Use of the dataset must be acknowledged using a citation which would include:
- the Identification of the Primary Investigator
- the title of the survey (including country, acronym and year of implementation)
- the survey reference number
- the source and date of download
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
IFLS1 is copyrighted by RAND and Lembaga Demografi.