Survey of Living Standards 2007 and Extension 2008
In 2007-2008 a multi-topic household survey, the Timor Leste Survey of Living Standards (TLSLS2) was conducted in East Timor with the main objectives of developing a system of poverty monitoring and supporting poverty reduction, and to monitor human development indicators and progress toward the Millennium Development Goals. Information collected in the TLSLS2 questionnaire included: household information, housing, access to facilities, expenditures/consumption, education, health, fertility and maternity history, employment, farming and livestock, transfers, borrowing and saving, other income, social capital, subjective well-being, AIDs and anthropometrics.
The TLSLS2-X extension survey was designed to re-visit one third of the households interviewed under the TLSLS2 2007-08 to explore different facets of household welfare and behavior in the country, while also being able to make use of information collected in the TLSLS2 survey for analytic purposes.
The four new topics investigated in the extension survey are:
- Risk and Vulnerability: This section is designed to help us understand the dimensions and sources of household-level vulnerability to uninsured risks in Timor Leste, and the efficacy and welfare effects of various risk-management strategies (prevention, mitigation, coping) and mechanisms (private as well as public, formal as well as informal) households do (or do not) have access to. The work in Timor Leste is part of a program of analytic work and policy dialogue throughout the EAP region, more information on which can be found on the World Bank website.
- Land Degradation and Poverty: This section of the questionnaire is designed to identify proximate causes of deforestation through land use patterns and links with poverty; understand strengths and failures of common land resource management institutions (property rights, enforcement); understand the impact of the Siam Weed problem on household welfare.
- Justice for Poor: The Justice for the Poor/Access to Justice (J4P/A2J) module of the survey will serve mainly as an initial diagnostic for project development in the country. The topics we would be interested in covering would be Dispute Processing/Resolution; Social Legal Norms and Perceptions of Efficiency in Government (Local, Sub-District, District and National level).
- Access to Financial Services: The financial service work has the following two objectives: (i) to collect data on access to and use financial services (savings and credit), both formal and informal, and (ii) assess the quality of information on access to financial services obtained from head of households vs. from all adults - i.e. is there a bias introduced by not asking all household members, do the characteristics of the head or the household affect this (gender, age, nuclear family, urban, education levels, wealth, etc.).
Kind of data
Sample survey data [ssd]
Unit of analysis
Domains: Urban/rural; Regional
Unit of analysis
Producers and sponsors
National Statistics Directorate
SAMPLE DESIGN FOR THE 2007-2008 SURVEY:
The TLSLS sample was designed to have two components: (i) a cross-sectional component of 4,500 households selected with the intention of representing the current population of Timor-Leste, and (ii) a panel component of 900 households, where half of the 2001 TLSS sample of 1800 households are randomly selected and re-interviewed, with the purpose to evaluating changes in the living conditions for the same set of households between the two surveys. However, this panel component is not being released at this time, so it will be neither covered in the rest of the documentation nor included in the data files. The cross-sectional component is expected to provide independent estimates for rural and urban areas of each of five recently defined regions, which are groups of districts defined as follows:
- Region 1: Baucau, Lautem and Viqueque;
- Region 2: Ainaro, Manufahi and Manatuto;
- Region 3: Aileu, Dili and Ermera;
- Region 4: Bobonaro, Cova Lima and Liquiçá; and
- Region 5: Oecussi.
The cross-sectional sample is selected in two stages. In the first stage, 300 Census Enumeration Areas (EAs) are selected as the primary sampling units (PSUs). In the second stage, 15 households are selected in each EA. The design recognizes ten explicit strata - the Urban and Rural areas in each of the five regions. The allocation of the 300 cross-sectional PSUs among regions resulted from the following line of reasoning:
- In spite of their different populations and total number of households, sampling theory dictates that a sample of the roughly the same size (60 EAs) should be allocated to each region in order to produce estimates of similar quality for each of them.
- A similar case could have been made for allocating a sample of the same size (30 EAs) to urban and rural areas within each region, but since the definition of urban and rural areas outside Dili was still a matter of discussion, it was decided to opt for an allocation closer to proportional: 25 EAs in Urban areas and 35 EAs to Rural areas.
- Region 5 represents a special case. It is composed of a single district of difficult access (Oecussi,) that ought to be the responsibility of a dedicated team. This imposed a total sample size of 50 EAs for this region, of which only 48 can be allocated to the cross-sectional component since the panel component contains two EAs in Oecussi.
- The capacity thus liberated to visit an additional 12 EAs in the rest of the country was devoted to reinforce the urban sample in Region 3, where Dili is located.
The first sampling stage used the list of 1,163 Census Enumeration Areas (EAs) generated by the 2004 Census as a sample frame. Within each stratum, the allocated number of EAs was selected with probability proportional to size (pps) using the number of households reported by the census as a measure of size. No efforts were made to append the smaller EAs to neighboring EAs, or to segment the larger EAs in order to make the size of the primary sampling units (PSUs) more uniform.
The second sampling stage used an exhaustive household listing operation in all selected EAs as its sample frame. Sample households in each EA were selected from the list by systematic equal probability sampling.
As a result of the relatively large sampling fraction in some of the strata, certain large EAs were selected more than once by the pps procedure adopted at the first sampling stage. In fact, the cross-sectional sample only consists of only 269 (rather than 300) different EAs. This necessitated selecting a multiple of 15 households (rather than just 15 households) in the EAs that were selected more than once. The final cross-sectional sample consists of 4,477 households. Table 2 shows the distribution of the total TLSLS sample across the rural and urban areas of the five main regions in the country. The sample can be considered representative at national level as well as at the level of the ten domains represented by the rural and urban areas of the five regions.
Lastly, it may be helpful to clarify the definition or urban and rural areas. At the time of the 2001 TLSS, 71 of Timor-Leste's 498 sucos were conventionally qualified as urban, of which 31 sucos in the Dili and Baucau districts were qualified as major urban centers. By the time of preparation of the sample design for the 2007 TLSLS, 60 of the 498 sucos defined by the 2001 Suco Survey were conventionally qualified as urban. The partition of the country into sucos was also modified in September 2004. With the amalgamation of several sucos, the original 498 sucos were now collapsed into 442. Many of the rearrangements took place in urban areas with the result that the 60 "old" sucos are now considered urban only constitute 38 "new" sucos.
SAMPLE DESIGN FOR THE 2008 EXTENSION SURVEY
Sampling for the TLSLS2 - Extension survey was a sub-sample of the original TLSLS2 sample. The TLSLS2 field work was divided into 52 "weeks", with each week being a random subset of the total sample. The sub-sample was chosen by randomly selecting 19 weeks from the original field work schedule.
Each week contained seven Primary Sampling Units (PSUs) for a total of 133 PSUs. In each PSU the teams were to interview 12 of the original 15 households, with the remaining three to serve as replacements. The total nominal sample size was thus 1596.
Following the collection and initial analysis of the data, it was determined that data from one district, Manatuto, and partially from another district, Oecussi, were of insufficient quality in certain modules. Therefore it was decided to repeat the survey in another 25 PSUs of these two districts - six in Manatuto, and 19 in Oecussi. The additional PSUs chosen were randomly selected within the two districts from the remaining non-panel PSUs in the original TLSLS2 sample.
SELECTION PROBABILITIES AND RAISING FACTORS FR THE 2007-2008 SURVEY:
For the cross-sectional sample of TLSLS, the selection probabilities and raising factors are determined in accordance with the sample design described below.
The probability of selecting Census Enumeration Area ij in stratum i is
Pij = (Mi*Nij) / Ni
where Nij is the number of households in the EA (as reported by the 2004 Census), Ni is the total number of households in the stratum (also as per the 2004 Census) and mi is the number of EAs selected in the stratum.
The probability of selecting household ijk in EA ij of stratum i is
Pijk = Pij*(15/N'ij)
where N’ij is the number of households in the EA, as per the household listing operation.
The raising factor or weight Wijk for household ijk is the inverse of the selection probability pijk. If the number n’ij of households found at the time of the listing operation were equal to the number nij recorded by the census in all EAs, the sample would be self-weighted in each stratum, with a constant raising factor equal to ni/15mi. In practice the numbers nij and n’ij will seldom be equal but often close to each other, meaning that the samples will not be exactly self-weighted, but quite approximately so.1
[Note: Strictly speaking, the above formulae are valid only when the size of the EA is such that it can be selected at most once by the pps procedure. However, the artifact of selecting 15t households in the second stage whenever an EA is selected t times in the first stage has the effect of making them applicable to compute raising factors even for the large EAs where that may not be the case. Formula (2) may be inadequate if the actual size n'ij of EA ij happens to be less than 15. In that (quite unlikely) case, all households in the EA will need to be visited, and pijk simplifies to pij.]
The household weights are further adjusted such that the population totals as estimated from the full sample match the demographic projections for mid-2007 for each stratum. This corresponds to a mid-2007 total population for Timor-Leste of 1,047, 632 persons.
[Note: This population total relates to the medium-level projection in DNE (2007), Population Projections 2004-2050: Analysis of Census Results, Report 1, General Population Census of Timor-Leste 2004.]
WEIGHTS FOR THE 2008 EXTENSION SURVEY:
Due to the necessity of additional interviews, there are three possible combinations of the data, with each combination having its own set of weights:
(1) the original extension data
(2) the original data, excluding the "questionable" data and including the additional interviews
(3) the complete data, including both all the original data and the additional interviews
Therefore three different sets of sampling weights have been calculated.
The sample weights for the extension survey are indirect weights based on the original probability weights calculated for the TLSLS2. The TLSLS2 weights were calculated by Juan Muñoz of Sistemas Integrals. These weights were based on each household's selection probability, and then scaled by an adjustment factor, intended to match the demographic projections for the population of urban and rural areas in the five major regions of Timor Leste in mid-2007.
As the extension survey was selected as a sub-sample of the original TLSLS sample, the original weights were used as a basis for the construction of the extension weights. In consultation with Juan Muñoz, it was decided to also use adjustment factors, to have the population estimates of the extension survey match the demographic projections in urban and rural areas of the five regions, for each of the three possible combinations of data described above.
The indirect weights are constructed by determining a scaling factor for the original weights which would bring the estimated population of the ten strata up to the same level as the original population projections. Separate scaling factors were calculated for each of the three possible combinations of the data. (Scaling factors are included in weights provided in the dataset and do not need to be added to analysis.)
The scaling factors are constant across the three sets of weights for regions 1, 3 and 4 - those unaffected by re-interviews. Region 2 shows a small change between factors 1 and 2, based on the slightly different composition of the households but having the same sample size. There is a decrease between factors 1 and 2 and factor 3 as the third combination of the data includes both the original and re-sampled households, therefore effectively over-sampling region 2. The reduction in the scaling factor and by extension the weights corrects estimates for this over-sampling.
Region 5 (Oecussi) shows a large change between factors 1 and 2, in addition to the expected decrease due to over-sampling in factor 3. This large change results from the difficulty of exactly reproducing the original stratification of the PSUs into urban and rural areas in Oecussi. Therefore urban households in Oecussi were effectively over-sampled during the original extension, then under-sampled during the re-interviews. Because of the resulting large differences in factors 1 and 2, it was decided that in Oecussi alone, it would be more logical to calculate the adjustment factors based on the population as a whole rather than including the urban/rural stratifications.
None of the estimates from the extension data with the indirect weights are statistically different from the original estimates based on the TLSLS2 data, nor are these estimates statistically significantly different from each other.
Dates of collection
Mode of data collection
2008 Extension Survey Data Cleaning
The TLSLS2-X had a significant number of responses in which the response is "other". In general, if the response clear fit into a pre-coded response category, it was recoded into that category during the cleaning and compilation process. Some responses where additional information was provided were not recoded even though they clearly fit into pre-coded categories. For example, "agriculture project" would be recoded into the "agriculture" category, while "community garden" would not. Data users can either use the additional information, or re-code into categories as they see fit.
Other forms of data appraisal
Potential Data Quality Issues in 2008 Extension survey
Similarly to the individual roster of the previous section, the plots listed in the previous survey are listed on the pre-printed cover page and all changes noted. The agricultural section, similarly to the other sections, suffers from problems with open-ended questions. This is particularly the case for the question asking what community restrictions are placed on the clearing of forest land (section 2d). The translation from the original question was vague (using the Tetun word for "boundary" for "restriction,") and therefore many of the responses relate to physical boundaries on the land, such as stone walls and tree lines. Additionally, the translation of all answers from Tetun into English is imperfect, and those wishing to use this information for analytical purposes are advised to also refer to the original Tetun. Analysts should be careful in using the data from the open ended questions because of translation problems. Also, it was noted during the training and field work that many interviewers had significant difficulties understanding definitions with some of the land management and investment questions. In general, however, all agricultural data may be used for analysis, sampling weights w3.
It should be noted that the quality of the data for the finance experiment (comparing the knowledge of the household head to that of other household members) was not sufficient for the experiment to be deemed a success. Subsequent spot-checking reveled that in many cases, interviewers asked the household head about the financial activities of various household members instead of asking them directly. Therefore this data should only be used to measure the access to finance at the household level. The finance sections were not repeated during the additional interviews in the replacement PSUs. Sampling weights w1 should be used when doing any analysis with this data.
Shocks and Vulnerability:
It was determined following the initial round of data collection that the shocks and vulnerability module had some issues with uneven interview quality. Two reasons were listed as potential causes of the data quality issues: (1) fundamental inability to adequately translate both the word and concept of a "shock" into the Timorese context, and (2) incomplete / questionable responses to the health shock questions in particular. Analysis for health shocks should drop the "questionable" households and use the "re-interview" households, sampling weights w2.
Justice for the Poor:
Similar to the shocks and vulnerability module, the justice module included a long series of follow up questions if the household indicated having experienced a dispute during the recall period. Again, the number of disputes experienced by the household seemed extremely low compared to expectations. This was particularly a problem with the Manatuto district in which no disputes were recorded during the first set of TLSLS2-X interviews. Analysis for the disputes section of the justice module should drop the "questionable" households and use the "re-interview" households, sampling weights w2. The justice model also has a number of instances in which the specifications for "other" were not recorded. Every effort was made to ensure this data was as complete as possible, but gaps do remain. Also, data users should use caution when using the imputed rank variable in section 5D. The rank in terms of importance was not explicitly captured in the data entry software, and the rankings therefore had to be imputed from the order they were listed in the original data entry. Inconsistencies may exist in this variable.
World Bank LSMS
In receiving these data it is recognized that the data are supplied for use within my organization, and I agree to the following stipulations as conditions for the use of the data:
1. The data are supplied solely for the use described in this form and will not be made available to other organizations or individuals. Other organizations or individuals may request the data directly.
2. Three copies of all publications, conference papers, or other research reports based entirely or in part upon the requested data will be supplied to:
National Statics Directorate
Caicoli, Dili, Timor Leste
The World Bank
Development Economics Research Group
LSMS Database Administrator
1818 H Street, NW
Washington, DC 20433, USA
tel: (202) 473-9041
fax: (202) 522-1153
3. The researcher will refer to the 2007 Timor Leste Living Standards Measurement Survey as the source of the information in all publications, conference papers, and manuscripts. At the same time, the National Statistics Directorate is not responsable for the estimations reported by the analyst(s).
4. Users who download the data may not pass the data to third parties.
5. The database cannot be used for commercial ends, nor can it be sold.
Use of the dataset must be acknowledged using a citation which would include:
- the Identification of the Primary Investigator
- the title of the survey (including country, acronym and year of implementation)
- the survey reference number
- the source and date of download
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.