Survey ID Number
Demographic and Health Survey 2007
The 2007 Ukraine Demographic and Health Survey (UDHS) was the first survey of its kind carried out in Ukraine. The survey was a nationally representative sample survey of 15,000 households, with an expected yield of about 7,900 completed interviews of women age 15-49. It was designed to provide estimates on fertility, infant and child mortality, use of contraception and family planning, knowledge and attitudes toward HIV/AIDS and other sexually transmitted infections (STI), and other family welfare and health indicators. Ukraine is made up of 24 oblasts, the Autonomous Republic of Crimea, and two special cities (Kyiv and Sevastopol), which together make up 27 administrative regions, each subdivided into lower-level administrative units. The 27 administrative regions were grouped for this survey into five geographic regions: North, Central, East, South and West. The five geographic regions are the five study domains of the survey. The estimates obtained from the 2007 UDHS are presented for the country as a whole, for urban and rural areas, and for each of the five geographic regions.
A men's survey was conducted at the same time as the women's survey, in a subsample consisting of one household in every two selected for the female survey. All men age 15-49 living in the selected households were eligible for the men's survey. The survey collected information on men's use of contraception and family planning and their knowledge and attitudes toward HIV/AIDS and other sexually transmitted infections (STI).
The sampling frame used for the 2007 UDHS was the Ukraine Population Census conducted in 2001 (SSC, 2003a), provided by the State Statistical Committee (SSC) of Ukraine. The sampling frame consisted of about 38 thousand enumeration areas (EAs) with an average of 400-500 households per EA. Each EA is subdivided into 4-5 enumeration units (EUs) with an average of 100 households per EU. An EA is a city block in urban areas; in rural areas, an EA is either a village or part of a large village, or a group of small villages (possibly plus a part of a large village). An EU is a list of addresses (in a neighborhood) that was used as a convenient counting unit for the census. Both EAs and EUs include information about the location, type of residence, address of each structure in it, and the number of households in each structure.
Census maps were available for most of the EAs with marked boundaries. In urban areas, the census maps have marked boundaries/locations of the EUs. In rural areas, the EUs are defined by detailed descriptions available at the SSC local office. Therefore, either the EA or the EU could be used as the primary sampling unit (PSU) for the 2007 UDHS. Because the EAs in urban areas are large (an average of 500 households), using EAs as PSUs in urban areas would require a great deal of work to implement the household listing, so it was decided to use the EUs as PSUs in urban areas. In rural areas, the EUs are too small (less than 100 households) to be used as PSUs. At the same time, the EAs are (geographically) too large to be used as PSUs. It was decided therefore that for rural areas the large EAs (300 or more households) would be divided to form two PSUs and the small EAs (less than 300 households) would be single PSUs. This segmentation of the sample was done in the office prior to the selection of the PSUs. Thus, in rural areas, a PSU is either an EA or a part of an EA.
SAMPLE DESIGN AND THE SAMPLING PROCEDURE
The sample for 2007 UDHS was a stratified sample selected in two stages from the 2001 census frame. Stratification was achieved by separating every administrative region into urban and rural areas. Therefore, the 27 regions had been stratified into 53 sampling strata because the city of Kyiv had only urban areas. Samples were selected independently in every stratum by a two stages probability selection. In first stage, a certain number of PSU were selected with probability proportional to the PSU size; the size of the PSU was the number of people enumerated in the 2001 census. Implicit stratifications and proportional allocation would have been achieved at each of the lower administrative levels by sorting the sampling frame according to different administrative units and geographical orders, and by using a probability proportional to size selection at the first stage's sampling.
In the first stage, 500 PSU were selected with probability proportional to the PSU size. A household listing operation was carried out in all of the selected PSUs before the main survey, and the resulting lists of households was served as sampling frame for the selection of households in the second stage. In the second stage, a fixed number of 30 households were selected in each selected PSU with an equal probability systematic selection. Some of the selected PSUs were of large size. In order to minimize the task of household listing, for the selected PSUs which counted more than 300 households in the household listing operation were segmented during the operation. Only one segment was selected for the survey with probability proportional to the segment size. Household listing was conducted only in the selected segment. So a 2007 UDHS cluster is either a PSU or a segment of a PSU. By selecting 30 households per cluster, a total of 15 000 households were selected. A spreadsheet for household selection was prepared in advance and was used for household selection in the central office. The survey interviewers were asked to interview only the preselected households. No replacements and no changes of the preselected households were allowed in the implementing stages in order to prevent bias. All women age 15-49 who slept in a selected household the night before the survey (de facto) were interviewed with the Women's Questionnaire. A subsample of one household in every two selected for the female survey was selected for a male survey. All men aged 15-49 who slept in a selected household the night before the survey were interviewed with the Men's Questionnaire.
Because of the tight budget restrictions, the sample allocation was not a proportional allocation since otherwise some of the small regions would have received a too small sample size. In order that the survey precisions for most of the survey indicators are acceptable at domain level, and that the survey precisions are comparable across study domains, the sampled households were equally allocated to the 5 study domains, that is, 100 PSU and 3000 households per each study domain. The 3000 households in each domain were then allocated to the administrative regions within the domain according to the size of region and by type of residence. The size of a region was the population enumerated in the population census 2001. Result shows the sample allocation of expected completed women and men interviews according to administrative regions and by type of residence. Among the 500 clusters, 310 clusters are in urban areas, 190 clusters are in rural areas.
The sample allocations were calculated based on the facts obtained from the 2001 population census, 1999 Ukraine reproductive Health Survey and empirical knowledge. The average number of women 15-49 per household was 0.686; the average number of men 15-49 per household was 0.668; the household gross response rate was 90 percent; women response rate was 84 percent in urban areas and 89 percent in rural areas; men response rate was 80 percent in both urban and rural areas. The number of households selected in each cluster was 30.
Estimates of Sampling Error
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2007 UDHS is only one of many samples that could have been selected from the same population, using the same design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of the standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulas for calculating sampling errors. However, the 2007 UDHS sample was the result of a multi-stage stratified design, and, consequently, it was necessary to use more complex formulae. The computer software used to calculate sampling errors for the 2007 UDHS is a Macro SAS procedure. This procedure used the Taylor linearization method of variance estimation for survey estimates that are means or proportions. The Jackknife repeated replication method is used for variance estimation of more complex statistics such as fertility and mortality rates.
The Jackknife repeated replication method derives estimates of complex rates from each of several replications of the parent sample, and calculates standard errors for these estimates using simple formulae. Each replication considers all but one cluster in the calculation of the estimates. Pseudo-independent replications are thus created. In the 2007 UDHS, there were 500 non-empty clusters. Hence, 500 replications were created.
In addition to the standard error, the design effect (DEFT) for each estimate is calculated, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFT value of 1.0 indicates that the sample design is as efficient as a simple random sample, while a value greater than 1.0 indicates the increase in the sampling error due to the use of a more complex and less statistically efficient design. The relative standard error and confidence limits for the estimates are also calculated.
Sampling errors for the 2007 UDHS are calculated for selected variables considered to be of primary interest for the women's survey and for the men's surveys, respectively. The results are presented in an appendix to the Final Report for the country as a whole, for urban and rural areas, and for each of the 5 geographical regions. For each variable, the type of statistic (mean, proportion, or rate) and the base population are given in Table B.1 of the Final Report. Tables B.2 to B.9 present the value of the statistic (R), its standard error (SE), the number of unweighted (N) and weighted (WN) cases, the design effect (DEFT), the relative standard error (SE/R), and the 95 percent confidence limits (R±2SE), for each variable. The DEFT is considered undefined when the standard error considering simple random sample is zero (when the estimate is close to 0 or 1). In the case of the total fertility rate and total abortion rate, the number of unweighted cases is not relevant, as there is no known unweighted value for woman-years of exposure to childbearing.
The confidence interval, e.g., as calculated for children ever born, can be interpreted as follows: the overall average from the national sample is 1.118 and its standard error is 0.015. Therefore, to obtain the 95 percent confidence limits, one adds and subtracts twice the standard error to the sample estimate, i.e., 1.118±2×0.015. There is a high probability (95 percent) that the true average number of children ever born is between 1.088 and 1.148. For the total sample, the value of the DEFT, averaged over all women variables, is 1.39. This means that, due to multistage clustering of the sample, the average standard error is increased by a factor of 1.39 over that in an equivalent simple random sample.