The Community Survey is a nationally representative, large-scale household survey which is designed to provide information on the extent of poor households in South Africa, their access to services, and levels of unemployment, at national, provincial and municipal levels. The main objectives of the survey are:
1. To fill data gaps between national population and housing censuses
2. To provide estimates at lower geographical levels than existing household surveys
3. To build capacities for the next census round
4. To provide inputs to the mid-year population projections.
Kind of data
Sample survey data [ssd]
v1: Edited, anonymised dataset for public distribution.
morbidity and mortality [14.4]
social conditions and indicators [13.8]
economic conditions and indicators [1.2]
health care and medical treatment [8.5]
specific social services: use and provision [15.3]
The survey covered the whole of South Africa.
The lowest level of geographic aggregation of the data is local municipality.
Unit of analysis
The Community Survey covered all de jure household members (usual residents) in South Africa. The survey excluded collective living quarters (institutions) and some households in EAs classified as recreational areas or institutions.
Producers and sponsors
Statistics South Africa
The sampling procedure that was adopted for the CS was a two-stage stratified random sampling process. Stage one involved the selection of enumeration
areas, and stage tw0 was the selection of dwelling units. Since the data are required for each local municipality, each municipality was considered as an explicit stratum. The stratification is done for those municipalities classified as category B municipalities (local municipalities) and category A municipalities (metropolitan areas) as proclaimed at the time of Census 2001. However, the newly proclaimed boundaries as well as any other higher level of geography such as province or district municipality, were considered as any other domain variable based on their link to the smallest geographic unit - the enumeration area.
The main goal of CS 2016 is to produce estimates of key indicators at local municipality level. The sample was designed such that direct survey estimates for these indicators could be produced at municipal level. The weighting approach is based on the sample design. Information on weighting can be found in the technical report.
Dates of collection
Mode of data collection
The CS 2016 questionnaire consisted of six main sections, 11 sub-sections and a total of 225 questions. A first draft of the paper questionnaire was developed in February 2015 and various versions were reviewed and updated thereafter based on discussions with stakeholders. The target population of the survey was all persons in the sampled dwelling who were present on the reference night (i.e. the night between 6 and 7 March 2016). The final CAPI questionnaire was made up of three person rosters. One roster was utilised for the person information, one roster for emigration and one roster for mortality.
Data processing refers to a class of programmes that organise and manipulate usually large amounts of numeric data. Data processing involved the processing of completed questionnaires. Information received from questionnaires collected during fieldwork was converted into data represented by numbers or characters. The two methods used for this conversion were manual capturing (key-entry) and scanning. The scanning method was used as the main process and the key-entry application was used for questionnaires that were damaged and not scannable.
In general, the high-level processes covered the following activities:
Boxes were received and questionnaires were checked to ensure that:
1) they belonged to the box; and
2) were not damaged.
Data were then captured and converted into electronic format through scanning or Key-from-Paper (KFP). Thereafter, an account of all sampled dwelling units was prepared and data were balanced to verify whether the data collected for each household contained the four sections – General, Persons, Mortality, and Household.
Data were then checked for consistency and prepared for final output based on the tabulation plan.
Two methods were used for capturing the data, namely scanning and manual capturing (key-entry).
The scanning process proceeded as follows:
The data processor scanned the box number, and then entered the estimated number of pages in each batch. At this stage, the batches were ready to be scanned. One box at a time was given to each of the six Scanning Operators to avoid scanning the questionnaires twice. The batches were then taken out of the box and placed next to the tray on the scanner. The box number was then scanned using the small hand-held scanner and the number of pages per batch was entered into the Input Station.
A visual check was performed on the scanning to ensure that the images were clear of any noise and that the data were clear and readable. The barcode as well as the actual data on the questionnaire was checked. In the case where the image was either too light or too dark, parameters were adjusted and the batch was rescanned. Validations were automatically executed to confirm scanning parameters and image
quality. Questionnaires that could not be scanned were de-activated from their boxes and assigned to a new box. Images were transferred to the server and their barcodes were tracked. These questionnaires were then sent to Key-from-Paper.
Manual Capturing (Key-from-Paper)
Key-from-Paper (KFP) is an application for manual data capturing. The application was developed to capture questionnaires that were not suitable for scanning. Such questionnaires included those which were torn or where pencil entries were not bold enough for the interpretation of the scanner, or those that were in a bad condition. Duplicate application was created for quality assurance purposes. The same questionnaires that were captured in application one, were also captured on application two. Each questionnaire captured in both applications, was compared to one another using corresponding fields. Validation checks were not implemented in the applications. The application was used by data processors to capture information as was reflected on the questionnaires. EA and DU numbers were placed into the look-up table to validate the sampled frame. In cases where an EA or DU was found to be invalid, the EA Summary Book was then used for corrections.
Coding of Open-Ended Questions
Coding is the process of assigning numerical values to responses to facilitate data capturing and processing in general. The code lists for occupation and industry were based on the International Standard Classifications done to the five-digit level. The variables covered were occupation, industry, and place names.
Other forms of data appraisal
The Community Survey 2016 data was released in 2017. There are four data files. These are files for households, persons, mortality, and emigration. The emigration file is currently not available. Statistics SA has not provided an explanation for the missing file. DataFirst is working to obtain this file, and will add the data file to the dataset we publish once we have it.
The Community Survey 2016 is also missing employment and income data. Data on employment type and employment status data was collected with questions 3.7.6 - 22.214.171.124 of the questionnaire. Income data was collected with questions 3.7.7. - 126.96.36.199. According to Statistics SA, the data from these questions was not released because changes in collection methodologies resulted in this data not being comparable with the employment and income data in the Quarterly Labour Force Survey.
Public use files, accessible to all.
Statistics South Africa. Community Survey 2016 [dataset]. Version 1. Pretoria: Statistics South Africa [producer], 2017. Cape Town: DataFirst [distributor], 2017.
Disclaimer and copyrights
The information products and services of Stats SA are protected in terms of the Copyright Act, 1978 (Act 98 of 1978). As the State President is the holder of State copyright, all organs of State enjoy unhindered use of the Department's information products and services, without a need for further permission to copy in terms of that copyright.
Where a copy of the information is made available to any third party outside the State, the third party must be made aware of the existence of State copyright and ownership of the information by the State. The State (through Statistics South Africa) retains the full ownership of its information, products and services at all times. Access to information does not give ownership of the information to the client. The use of any data is subject to acknowledgement of Statistics South Africa as the supplier and owner of copyright.
Statistics South Africa (Stats SA) will not be liable for any damages or losses, except to the extent that such losses or damages are attributable to a breach by Stats SA of its obligations in terms of an existing agreement or to the negligence or wilful act or omissions of Stats SA, its servants or agents, arising out of the supply of data and or digital products in terms of that agreement. The user indemnifies Stats SA against any claims of whatsoever nature (including legal costs) by third parties arising from the reformatting, restructuring, reprocessing and/or addition of the data, by the user.
Since there have been demographic changes in South Africa associated, inter alia, with internal and external migration, and population growth. This means that population profiles may have changed at differing geographic levels. Stats SA is not responsible for any damages or losses, arising directly or consequently, which might result from the application or use of these data.
University of Cape Town
Version 2 : Identical to DataFirst zaf-statssa-cs-2016-v1- ID and DDI fields edited.