Survey of Conflict Prevention and Cooperation 2004
The project uses public opinion polling to gather and then analyze a sample that represents the entire population of each of four different countries of Central Asia: Kazakhstan, Kyrgyzstan, Tajikistan, and Uzbekistan.
Kind of Data
Sample survey data [ssd]
Data provided to the World Bank by The Brookings Institution on January 31, 2006
The project uses public opinion polling to gather and then analyze a sample that represents the entire population of the country.
Producers and sponsors
The Brookings Institution
The World Bank
For all four Central Asian countries in this survey, the sampling procedure is a three-stage stratified clustered one. Census data on the territorial dispersion of the population is used as the base to start the sampling methodology. The sampling procedure takes the total population of the country, considers geographic units within the country as either urban or rural, and then develops random procedures to select who to survey in three stages: first by randomly selected smaller geographic urban and units in each province (the primary sampling units or PSUs), second randomly chosing households within these units, and third, to randomly select which household member to interview in each household.
The sampling frame used to divide these four countries into smaller geographic units to randomly sample from differs slightly for each Central Asian country, based on differences in data availability on the population of the country and its dispersion. Subsequent sections explain the sampling methodology used and how this sampling frame differs in each country. Then all four countries have PSUs, random selection of households, and random sampling of individuals within households using the same methods, which are discussed at length only in the first country example - Kazakhstan.
Kazakhstan has 14 provinces plus the cities of Almaty and Astana which are considered separate units. All provinces are divided into districts, of which there are 198 in the country. Districts incorporate towns (with more than 100,000 inhabitants), small towns (with between 30,000 to 100,000 inhabitants) and villages (less than 30,000 inhabitants). A number of villages, in turn, are incorporated into rural districts (selskiy okrug). In total, Kazakhstan has cities of Almaty and Astana, 17 towns, 258 small towns, 2,140 rural districts, and 7,986 villages. The population of Kazakhstan wass 14,953,126 people, of which 8,377,303 (56%) lived in urban areas, and 6,575,823 (44%) lived in rural areas as of January 1, 2004.
In Kazakhstan, since interviewers would not be allowed in electoral districts that use administrative restrictions to prohibit access of outsiders or that are unsafe for polling, 395 city electoral districts are excluded from the sampling frame. These 19.6% of the total number of electoral districts in the country are hospitals, prisons and military zones. The estimate of the population in excluded electoral districts is not available, because there is no resident population in these areas as defined in the census.
The sampling frame for Kazakhstan was developed from a list of three types of small territorial units, which are the primary sampling units (PSUs) used in the survey. The three are: small settlements of less than 3,000 inhabitants for which each is a distinct PSU; parts of large settlements divided into populations between 2,500 and 5,000 for urban settlements and 1,500 to 3,000 for rural settlements each as a separate PSU; and electoral districts from large settlements each as separate PSU. Such a procedure is suboptimal, but needed when there is no information on a population in administrative-territorial units smaller sizes (such as there is by makhallas in Uzbekistan).
Sampling is through three-stage stratified clustered sampling. First, PSUs are determined by province stratified by urban and rural population size. This primary probabiliby sampling (PPS-sampling) of PSUs selects a total of 61 PSUs represent the urban and rural population of Kazakhstan to generate 1,500 interviews. Second, sequential random sampling of households is done to select secondary sampling units (SSUs) in the selected PSUs. Third, a Kish grid is used to ensure random sampling of respondents within each household.
To generate PSUs, each province is treated as a separate unit for sampling. For each province, sampling is proportionate to the share of the population of the country that it comprises, which in turn is divided into the share of the urban and rural population each province comprises of the entire country. This allocation is done for all 16 provinces. Based on their size relative to the entire urban and rural population of the country, the proportion of the sample that should be drawn from each urban and rural population of each province to represent the nation is determined. For example, Akmola has 748,930 residents, which is 5.0% of the population of Kazakhstan. Thus in a sample of 1,500 residents of the country, 5% or 75 people are drawn from Akmola. The share of urban and rural interviews is determined from the proportion of the country that the urban and rural population is for the rprovince. Provinces with larger urban and rural populations will have more people selected for interviews relative to those with smaller populations. Again, for Akmola, 349,153 people are urban residents, which is 46.6% of the province. This leads to sampling 35 of these city dwellers. 399,777 people are rural inhabitants, 53.4% of the population of the province, which leads to sampling 40 rural residents from Akmola.
The number of PSUs to be sampled to achieve the needed quota for urban and rural residents in each province depends on a minimum number of interviews to be achieved per PSU, the costs of data collection, supervision, control and follow-up, as well as minimum effective number to conduct the survey in a PSU. The number of people surveyed varies in Kazakhstan in the 61 PSUs surveyed from a low of 8 to a high of 30 people. An approximately equal number of interviews are allocated respectively for each selected urban and rural PSU.
Then the actual geographic units (PSUs) in each province to be polled are determined by a random process. A list of all urban and rural PSUs is composed for each province. The probability a PSU is selected for the survey depends on the size of either the urban or rural population within it. The PPS-sampling is carried out by sorted these units by size and randomly chosing which PSUs to survey over and over until the required number of urban and rural units is reached. To stick with the Akmola example, the quota of 35 urban residents can be reasonably reached by surveying 2 urban PSUs, and querying 18 people in one and 17 in another. For the quota of 40 rural respondents, again 2 PSUs are selected randomly and 20 respondents will be selected in each. Thus interviewers will visit 4 different randomly selected PSUs in the province to find these 35 urban and 40 rural Kazakhstanis.
Sequential random sampling of households is done by supervisors and interviewers during the fieldwork through a special form with random numbers that is used to draw a sample of households. Ideally, when interviewers brief local authorities that they will be conducting a survey in the district, they obtain a list of households from the authorities. However, in many cases the lists of households were made by interviewers without participation of local authorities because the administration was either not willing to provide assistance or was located far away from the district.
Sequential random sampling is done by random numbers associated with serial numbers of households in the list. Once a household has been selected, it cannot be selected again. Any household where the interview fails, from not finding the household or respondent refusal, is replaced with the next one randomly selected, according to the order of the random numbers. Selection is repeated until a required number of interviews is reached in each PSU.
A kish grid is used to randomly sample respondents within households. To selecting a single adult in each selected household, 8 types of Kish grids each with different selection of respondents are combined together under strict proportions to ensure almost equal overall probability for any eligible household member to be chosen to participate in the survey. All household members eligible for the survey are sorted by gender, the primary sorting, and then by age, the secondary sorting. Each is assigned a serial number and a respondent is determined according to the type of Kish grid. Kish grids were assigned to each sample address randomly and in advance to avoid the tendency for interviewers' to interview a "convenient" rather than random household member.
As the table below indicates, the achieved sample differs somewhat from the characteristics of the population found in the prior census. Surveys in Central Asia typically have these issues: an underrepresentation of men and youth, who are difficult to find due to their higher geographic mobility. Weighting is used to somewhat reduce these disproportions statistically.
In fieldwork, 181 potential respondents refused to participate, and thus are non-respondents. The average response rate is thus 89% (1,500 of 1,681 cases). Non-response is registered if a completed interview is not achieved after three interviewer callbacks. High numbers of non-response were noted in Akmola, where the response rate was 65.8%; elsewhere response rates were always above 80%. Non-response was more common in urban than rural areas, with the response rate for urban respondents 86.1% compared to 93.7% for rural residents. Rural residents are more willing to cooperate, less mobile, and are typically listed in more accurate population registers than those in urban areas. Most non-responses are from respondents emphatically refusing to participate (47.5% of all non-responses), with an additional 18.8% of nonrespondents a result of family members refusing to call the selected family member in for an interview. Finally, 15.5% of non-responses are the result of the designed respondent not being home for any of the three call-backs.
Dates of Data Collection
Data Collection Mode
- The data and related materials will not be redistributed or sold to other individuals, institutions, or organizations without the written agreement of the Microdata Library.
- The data will be used for statistical and scientific research purposes only. They will be used solely for reporting of aggregated information, and not for investigation of specific individuals or organizations.
- No attempt will be made to re-identify respondents, and no use will be made of the identity of any person or establishment discovered inadvertently. Any such discovery would immediately be reported to the Microdata Library.
- No attempt will be made to produce links among datasets provided by the Microdata Library, or among data from the Microdata Library and other datasets that could identify individuals or organizations.
- Any books, articles, conference papers, theses, dissertations, reports, or other publications that employ data obtained from the Microdata Library will cite the source of data in accordance with the Citation Requirement provided with the dataset.
- An electronic copy of all reports and publications based on the requested data will be sent to the Microdata Library.
- The original collector of the data, the Microdata Library, and the relevant funding agencies bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
Kazakhstan Survey of Conflict Prevention and Cooperation 2004, Ref. KAZ_2004_SCPC_v01_M, dataset downloaded from microdata.worldbank.org on [date]
DDI Document ID
The Brookings Institution
The World Bank / IHSN
Date of Metadata Production