ZAF_2007_GHS_v01_M
General Household Survey 2007
Name | Country code |
---|---|
South Africa | zaf |
Other Household Survey [hh/oth]
Sample survey data [ssd]
The units of anaylsis for the General Household Survey 2007 are individuals and households.
v1.2 Edited, anonymised dataset for public distribution
2007
Version 1.0 of the General Household Survey 2007 was acquired from Statistics South Africa in 2008. A new version of this dataset was released in 2010. This version (our version 1.1) was reweighted to reflect (a) the findings of the Community Survey 2007 and new HIV/AIDS and mortality data, and (b) the adjusted provincial boundaries that came into effect in December 2006.
This version, 1.2 includes the new weights released for the GHS 2002-2008 released at the same time as the GHS 2009 (6 May 2010).
From 2005 the "Stratum" variable, indicating rural and urban areas for each province, is no longer included in the GHS dataset.
The scope of the General Household Survey 2007 includes:
Household characteristics: Dwelling type, home ownership, access to water and sanitation facilities, access to services, transport, household assets, land ownership, agricultural production
Individuals' characteristics: demographic characteristics, relationship to household head, marital status, language, education, employment, income, health, disability, access to social services, mortality.
Women's characteristics: fertility
Topic | Vocabulary | URI |
---|---|---|
employment [3.1] | CESSDA | http://www.nesstar.org/rdf/common |
in-job training [3.2] | CESSDA | http://www.nesstar.org/rdf/common |
labour relations/conflict [3.3] | CESSDA | http://www.nesstar.org/rdf/common |
retirement [3.4] | CESSDA | http://www.nesstar.org/rdf/common |
unemployment [3.5] | CESSDA | http://www.nesstar.org/rdf/common |
working conditions [3.6] | CESSDA | http://www.nesstar.org/rdf/common |
LABOUR AND EMPLOYMENT [3] | CESSDA | http://www.nesstar.org/rdf/common |
TRADE, INDUSTRY AND MARKETS [2] | CESSDA | http://www.nesstar.org/rdf/common |
DEMOGRAPHY AND POPULATION [14] | CESSDA | http://www.nesstar.org/rdf/common |
The scope of the General Household Survey 2007 was national coverage.
The lowest level of geographic aggregations covered by the General Household Survey 2007 is District Council.
The survey covered all de jure household members (usual residents) of households in the nine provinces of South Africa and residents in workers' hostels. The survey does not cover collective living quarters such as students' hostels, old age homes, hospitals, prisons and military barracks.
Name |
---|
Statistics South Africa |
The sample design for the GHS 2007 was based on a master sample (MS) that was designed during 2003 and used for the first time in 2004. This master sample was developed specifically for household sample surveys that were conducted by Statistics South Africa between 2004 and 2007. These include surveys such as the annual Labour Force Surveys (LFS), General Household Survey (GHS) and the Income and Expenditure Survey (IES).
A multi-stage stratified area probability sample design was used. Stratification was done per province (nine provinces) and according to district council (DC) (53 DCs) within provinces. These stratification variables were mainly chosen to ensure better geographical coverage, and to enable analysts to disaggregate the data at DC level.
The design included two stages of sampling. Firstly PSUs were systematically selected using Probability Proportional to Size (PPS) sampling techniques. During the second stage of sampling, Dwelling Units (DUs) were systematically selected as Secondary Sampling Units (SSUs). Census Enumeration Areas (EAs) as delineated for Census 2001 formed the basis of the PSUs. EAs were pooled when needed to form PSUs of adequate size (72 dwelling units or more) for the first stage of sampling. The following criteria were used for PSU formation:
• No overlapping between any two PSUs;
• Complete coverage of the sampling population;
• Fully identifiable (e.g. in the case of a household survey, information on the geographical boundaries of the PSU should enable the exact location of the PSU);
• Secondary sampling units (SSUs) must be clearly identifiable within PSUs;
• Updated information on the number of SSUs within all the PSUs had to be available;
• PSUs must be sufficiently large in respect of the number of SSUs included to enable the forming of a predetermined number of clusters of SSUs, with the size of a cluster equal to the sample take of SSUs within a PSU, taking all types of surveys into consideration; and
• PSUs must also be sufficiently small to facilitate the listing and also regular updating of the
SSUs within them.
A PPS sample of PSUs was drawn in each stratum, with the measure of size being the number of households in the PSU. Altogether approximately 3 000 PSUs were selected. In each selected PSU a systematic sample of ten dwelling units was drawn, thus, resulting in approximately 30 000 dwelling units. All households in the sampled dwelling units were enumerated.
29 311 (84,0%) of the expected 34 902 interviews were successfully completed. This response rate is 2,0% points down from the 86,0% response rate as reported in the GHS 2006 report. It was not possible to complete interviews in 5,1% of the sampled dwelling units because of reasons such as refusals or absenteeism. An additional 10,9% of all interviews were not conducted for various reasons such as the sampled dwelling units had become vacant or had changed status (e.g.,. they were used as shops/small businesses at the time of the enumeration, but were originally listed as dwelling units).
A two-stage theoretical weighting procedure was done on the GHS 2007. In the first stage primary sampling units (PSU) are selected with probability proportional to size (PPS) from the census population.
Because there were undercounts in some PSUs (because households could not be traced or because of refusals to answer), the weight of each such PSU was adjusted upwards by a factor of nHH/nHH where nHH was the number of households which should have been interviewed and nHH was the number of households actually reached. Then all household weights were adjusted upwards by a further factor equal to the estimated population at the time of the GHS 2007 survey divided by the 1996 Census population estimate, to account for population growth between the 1996 Census (from which the master sample was drawn) and the date of the survey. These doubly adjusted weights are reported as the household weights in the data set. The person weights are derived by further adjusting the household weights in order to reproduce the marginal totals of the estimated population at the time of the 2007 GHS by gender, population group, province and age group. A SAS macro called CALMAR was used for this purpose.
The population estimate was derived by a 'bottom up' (cohort-by-cohort) exponential extrapolation from the 1996 and 2001 censuses. Such an estimate is quite reliable for the total population and the gender, population group and provincial subtotals. It is less reliable for the age distribution. Improved population estimates will become available when Statistics South Africa completes its short-term population projection model. The weights in this and other surveys may be modified in the light of model estimates.
Data revisions
In May 2010 Statistics SA revised the population model to produce mid-year population estimates during 2008 in the light of the findings of the Community Survey 2007 and new HIV/AIDS and mortality data. The new data have been used to adjust the benchmarking for all previous datasets. Weighting and benchmarking were also adjusted for the provincial boundaries that came into effect in December 2006. The new weights mean that the data for the GHS 2002 to GHS 2009 are now comparable. The General Household Survey 2007 data files (version 1.2) contain the new weights.
As a result of new statistical programs used for weighting, which discards records with unspecified values for the benchmarking variables, namely age, sex and population group, it became necessary to impute missing values for these variables. A combination of logical and hot deck imputation methods were used to impute the demographic variables of the whole GHS series from 2002-2009.
A new weighting system was also introduced for the household files as part of the revision process. This was based on household estimates that were developed using the headship ratio methodology. The databases of Census 1996, Census 2001, Community Survey 2007 and the Labour Force Survey 2003, Labour Force Survey 2005, and Quarterly Labour Force (Quarter 3) of 2009 were used to analyse trends and develop models to predict the number of households for each year. The weighting system was based on tables for the expected distribution of household heads for specific age categories, per population group and province.
Missing values and unknown values were excluded from totals used as denominators for the calculation of percentages, unless otherwise specified. Frequency values have been rounded off to the nearest thousand. Population totals in all tables reflect the population and sub-populations as calculated with SAS and rounded off. This will not always correspond exactly with the sum of the preceding rows because all numbers are rounded off to the nearest thousand.
The GHS 2007 questionnaire collected data on:
Household characteristics: Dwelling type, home ownership, access to water and sanitation facilities, access to services, transport, household assets, land ownership, agricultural production
Individuals' characteristics: demographic characteristics, relationship to household head, marital status, language, education, employment, income, health, disability, access to social services, mortality.
Women's characteristics: fertility
Start | End |
---|---|
2007 | 2007 |
Estimation and use of standard error
The published results of the General Household Survey are based on representative probability samples drawn from the South African population, as discussed in the section on sample design. Consequently, all estimates are subject to sampling variability. This means that the sample estimates may differ from the population figures that would have been produced if the entire South African population had been included in the survey. The measure usually used to indicate the probable difference between a sample estimate and the corresponding population figure is the standard error (SE), which measures the extent to which an estimate might have varied by chance because only a sample of the population was included. There are two major factors which influence the value of a standard error. The first factor is the sample size. Generally speaking, the larger the sample size, the more precise the estimate and the smaller the standard error. Consequently, in a national household survey such as the GHS, one expects more precise estimates at the national level than at the provincial level due to the larger sample size involved. The second factor is the variability between households of the parameter of the population being estimated, for example, the number of unemployed persons in the household.
Name | Affiliation | URL | |
---|---|---|---|
DataFirst | University of Cape Town | http://www.datafirst.uct.ac.za | info@data1st.org |
The GHS 2007 dataset is a licensed dataset, accessible under conditions.
Publications based on datasets distributed by DataFirst should acknowledge relevant sources by means of bibliographic citations. To ensure that such source attributions are captured for social science bibliographic utilities, citations must appear in footnotes or in the reference section of publications. The bibliographic citation for this dataset is:
General Household Survey 2007 [microdata files]. Pretoria: Statistics South Africa [producer], 2010. Cape Town: DataFirst [distributor],2011.
The information products and services of Statistics South Africa are protected in terms of the Copyright Act, 1978 (Act 98 of 1978). As the State President is the holder of State copyright, all organs of State enjoy unhindered use of the Department's information products and services, without a need for further permission to copy in terms of that copyright. Where a copy of the information is made available to any third party outside the State, the third party must be made aware of the existence of State copyright and ownership of the information by the State. The State (through Statistics SA) retains the full ownership of its information, products and services at all times; access to information does not give ownership of the information to the client.
The use of any data is subject to acknowledgement of Stats SA as the supplier and owner of copyright. Statistics South Africa (Stats SA) will not be liable for any damages or losses, except to the extent that such losses or damages are attributable to a breach by Stats SA of its obligations in terms of an existing agreement or to the negligence or wilful act or omissions of the Stats SA, its servants or agents, arising out of the supply of data and or digital products in terms of that agreement. The user indemnifies Stats SA against any claims of whatsoever nature (including legal costs) by third parties arising from the reformatting, restructuring, reprocessing and/or addition of the data, by the user.
Copyright 2008, Statistics South Africa
Name | Affiliation | URL | |
---|---|---|---|
Manager, DataFirst | University of Cape Town | info@data1st.org | http://www.datafirst.uct.ac.za |
DDI_ZAF_2007_GHS_v01_M
Name | Affiliation | Role |
---|---|---|
DataFirst | University of Cape Town | DDI Producer |
2011-10-11
Version 1.1
Version 1.2 - Adapted for use by the World Bank Microdata Library - changed study ID to match Microdata Library Standard
This site uses cookies to optimize functionality and give you the best possible experience. If you continue to navigate this website beyond this page, cookies will be placed on your browser. To learn more about cookies, click here.