Abstract |
The vast number of different study variables or population characteristics and the different domains of interest in a survey make it impractical and almost impossible to calculate and publish standard errors for each statistic (estimated value of a population variable or characteristic) and for each domain individually. However, it is advisable not to publish standard errors for only a small number of statistics for a few selected domains or to omit them altogether in a survey report. Since the estimated values are subject to statistical variation (resulting from the probability sampling), they can be evaluated only if their precision is known. The purpose of this research project was to study the feasibility of using mathematical models to estimate the standard errors of estimated values of population parameters or characteristics in survey data sets regularly gathered by Statistics South Africa, and to investigate effective and user-friendly presentation methods of these models in reports.\nThe following data sets were used in the investigation:\n· October Household Survey (OHS) 1995\n· OHS 1996\n· OHS 1997\n· Victims of Crime Survey (VOC) 1998\nNote that the OHS data sets consist of various sections, of which the persons section (which contains information on person demographics), workers section containing information on economic activity, employment, etc.) and the household section (containing information on household characteristics) were used. The basic methodology followed was to calculate estimates of the standard errors of the statistics considered in the survey for a variety of domains (such as the whole country, provinces, urban/rural areas, population groups, gender and age groups as well as combinations of these). This was done using a computer program that takes into consideration the complexity of the different sample designs. A set of domains covered a large variety of sample sizes, ranging from a very small number of sample records up to the whole data set. The standard errors obtained in this way are referred to as direct calculated standard errors. A regression model was then fitted to such a set of estimated domain values of a statistic and the associated direct calculated domain standard errors, where a function of the standard error value is considered as the dependent variable and a function of the size of the statistic is considered as the independent variable. A linear model, equating the natural logarithm of the coefficient of relative variation of a statistic to a linear function of the natural logarithm of the size of the statistic, gave an adequate fit in most cases considered in this study. Well known tests for the occurrence of outliers were applied in the fitting of the model. When an observation (sample record) was indicated as such, it was established whether the observation could be deleted legitimately (e.g. when the domain sample size was too small, or the estimate biased). After the deletion of such observations, the fitting process was repeated. |