Survey ID Number
Integrated Household Survey 1993
A self-weighting sample design should in principle eliminate the need for weighting. A number of factors intervened, however, which made it essential to use weights after all. Amongst these was violence, which prevented survey teams from conducting interviews in two clusters on the East Rand; failure to continue interviewing in a cluster until the required take had been interviewed; and systematic under-representation of whites in the sample. This last problem resulted both from systematic non-response (whites were found to be more likely to refuse to be interviewed, or to be absent than other groups) and from sampling problems themselves.
The importance of race in determining living standards in South Africa is such that the racial distribution of the population has a major bearing on measures of living standards and inequality. It was thus regarded as essential that the problems mentioned above should be overcome by applying appropriate weights to the data. The most appropriate weights to apply would usually be the average values obtained in a cluster for the missing questionnaires from that cluster in order to capture the homogeneity usually inherent in residential contiguity. However, that presented some difficulty for the two clusters in which violence prevented surveying and for those clusters in which there were only a small number of questionnaires completed. It was felt that this method would therefore not be appropriate.
Accordingly it was decided to use weights as far as possible at the level of the old provincial/homeland boundaries and race. The listing of households in each cluster combined with the sampling interval was used to determine how many households should have been interviewed. Where this deviated from the number actually interviewed, this was taken into account. The assumption was that the households left out were racially distributed in the same proportion as the actual households interviewed. When these numbers were then calculated at the provincial level, a weight could be calculated for each race group to rectify errors made in the field work. These errors typically resulted from the fact that most field work organizations involved had little experience of using anything but a weighted sample and were used to replacements that could easily be added ex post, not necessarily in the same area. When these mistakes were discovered, it was too late to go back to the field.
The sample of 360 clusters of 25 households each based on an expected household size of 5 should have yielded a resident population of 45,000. In fact, a different household size should not affect the results. In any particular cluster, the expected take of individuals would remain the same if the census population were accurate, irrespective of household size, for a smaller household size (as in the case of whites) would only have yielded more households, of whom a given proportion would have been interviewed. If in a particular cluster the census population was 472, every fourth household should have been interviewed (based on a sampling interval calculated to produce 125 persons per cluster in 1993, the expected take based on the census data of 118.1 per duster divided into the same population size). Irrespective of household size, then, one quarter of the cluster population would have been included in the survey. An average household size of 5 would have given 94 households of whom 23 would have been interviewed, i.e. 115 resident household members would have been found. If the household size were only three, on the other hand, one-quarter of the 157 households would have been 39, representing 117 household members. Only small differences from the expected take of 118 should thus arise, due to rounding. Only if the estimate of population based on the census is wrong, however, would the actual number of households deviate substantially from the expected take. In such a case, one quarter of the actual (i.e. listed or enumerated) rather than of the census population would have been included in the survey, i.e. there would have been an automatic adjustment. This gives the sample design its self-weighting character.
The census population for the survey data was estimated by applying Sadie's population growth rates to the adjusted 1991 census figures. The resultant racial and geographic distribution of the population of 40.1 million was presuming, of course, that no migration across provincial and homeland boundaries had occurred since the census. This implies that a raising factor of 891.4154 (40.1 million divided by an expected take of 45,000) should be applied to the results weighted by enumeration to obtain the population it represents. Applying the weights according to enumeration, 38.1 million people were covered by the survey, i.e. there was a 2 million under-enumeration amounting to about 5 per cent. Broken down by race, the under-enumeration was particularly large amongst whites, for whom the best census data exists, indicating that the problem did not lie so much with the census as with the survey. However, this is to be expected - a survey of this nature is better at capturing inequality and living standards than population size. Nevertheless, the margin of error in aggregate population estimates is relatively small, considering the presence of some homeless people, uncertainties about ESD boundaries in some areas and the likelihood of incomplete listings of households for various reasons. These results are therefore encouraging regarding the accuracy of the survey and also confirm that the adjusted census does not deviate substantially from population estimates obtained in a different manner.
However, the raised enumeration results deviate more from the census results where the provincial breakdown is concerned. The reason for this is not hard to find. The sample design introduced stratification only by geographic area (statistical regions) and proportion of the ESD population that was black. South African population clusters are still predominantly racially homogeneous, inter alia, because of past controls on residential patterns. It is therefore not surprising that in particular regions too few or too many clusters of a particular group were selected. In Natal, for instance, Coloureds and Indians are over represented in the data, even when weighted by enumeration, while Whites are under-represented. At the aggregate level, this should have little effect on the validity of the conclusions drawn, but it emphasizes the fact that care should be taken when drawing implications from the survey for Small populations. In small provinces (for instance, the new Northern Cape), only a small number of clusters has been included, with the result that little can be concluded about living standards there, even though these clusters are important in determining overall distribution.
As a final comment on weights, the data provided for the user contains weights to correct for the enumeration difficulties discussed above as well as census based weights. If the user of the data wishes to use these weights they are found in the data file named "weight02". The variable name for the enumeration-based weight is "rsweight" and the name for the census-based weight is "rcweight". (Do not use the "sweight" and "cweight" variables.)