The Human Sciences Research Council (HSRC) carried out the Migration and Remittances Survey in South Africa for the World Bank in collaboration with the African Development Bank. The primary mandate of the HSRC in this project was to come up with a migration database that includes both immigrants and emigrants. The specific activities included:
· A household survey with a view of producing a detailed demographic/economic database of immigrants, emigrants and non migrants
· The collation and preparation of a data set based on the survey
· The production of basic primary statistics for the analysis of migration and remittance behaviour in South Africa.
Like many other African countries, South Africa lacks reliable census or other data on migrants (immigrants and emigrants), and on flows of resources that accompanies movement of people. This is so because a large proportion of African immigrants are in the country undocumented. A special effort was therefore made to design a household survey that would cover sufficient numbers and proportions of immigrants, and still conform to the principles of probability sampling. The approach that was followed gives a representative picture of migration in 2 provinces, Limpopo and Gauteng, which should be reflective of migration behaviour and its impacts in South Africa.
Kind of data
Sample survey data [ssd]
Two provinces: Gauteng and Limpopo
Limpopo is the main corridor for migration from African countries to the north of South Africa while Gauteng is the main port of entry as it has the largest airport in Africa. Gauteng is a destination for internal and international migrants because it has three large metropolitan cities with a great economic potential and reputation for offering employment, accommodations and access to many different opportunities within a distance of 56 km. These two provinces therefore were expected to accommodate most African migrants in South Africa, co-existing with a large host population.
Unit of analysis
The target group consists of households in all communities. The survey will be conducted among metro and non-metro households. Non-metro households include those in:
- small towns,
- secondary cities,
- peri-urban settlements and
- deep rural areas.
From each selected household, one adult respondent will be selected to participate in the study.
Producers and sponsors
Human Sciences Research Council (HSRC)
African Development Bank
University of North Carolina
Nelson Mandela Metropolitan University
Nelson Mandela Metropolitan University
Community Agency for Social Enquiry (CASE)
Migration data for South Africa are available for 2007 only at the level of local governments or municipalities from the 2007 Census; for smaller areas called "sub places" (SPs) only as recently as the 2001 census, and for the desired EAs only back so far as the Census of 1996. In sum, there was no single source that provided recent data on the five types of migrants of principal interest at the level of the Enumeration Area, which was the area for which data were needed to draw the sample since it was going to be necessary to identify migrant and non-migrant households in the sample areas in order to oversample those with migrants for interview.
In an attempt to overcome the data limitations referred to above, it was necessary to adopt a novel approach to the design of the sample for the World Bank's household migration survey in South Africa, to identify EAs with a high probability of finding immigrants and those with a low probability. This required the combined use of the three sources of data described above.
The starting point was the CS 2007 survey, which provided data on migration at a local government level, classifying each local government cluster in terms of migration level, taking into account the types of migrants identified. The researchers then spatially zoomed in from these clusters to the so-called sub-places (SPs) from the 2001 Census to classifying SP clusters by migration level. Finally, the 1996 Census data were used to zoom in even further down to the EA level, using the 1996 census data on migration levels of various typed, to identify the final level of clusters for the survey, namely the spatially small EAs (each typically containing about 200 households, and hence amenable to the listing operation in the field).
A higher score or weight was attached to the 2007 Community Survey municipality-level (MN) data than to the Census 2001 sub-place (SP) data, which in turn was given a greater weight than the 1996 enumerator area (EA) data. The latter was derived exclusively from the Census 1996 EA data, but has then been reallocated to the 2001 EAs proportional to geographical size. Although these weights are purely arbitrary since it was composed from different sources, they give an indication of the relevant importance attached to the different migrant categories. These weighted migrant proportions (secondary strata), therefore constituted the second level of clusters for sampling purposes.
In addition, a system of weighting or scoring the different persons by migrant type was applied to ensure that the likelihood of finding migrants would be optimised. As part of this procedure, recent migrants (who had migrated in the preceding five years) received a higher score than lifetime migrants (who had not migrated during the preceding five years). Similarly, a higher score was attached to international immigrants (both recent and lifetime, who had come to SA from abroad) than to internal migrants (who had only moved within SA's borders). A greater weight also applied to inter-provincial (internal) than to intra-provincial migrants (who only moved within the same South African province).
How the three data sources were combined to provide overall scores for EA can be briefly described. First, in each of the two provinces, all local government units were given migration scores according to the numbers or relative proportions of the population classified in the various categories of migrants (with non-migrants given a score of 1.0. Migrants were assigned higher scores according to their priority, with international migrants given higher scores than internal migrants and recent migrants higher scores than lifetime migrants. Then within the local governments, sub-places were assigned scores assigned on the basis of inter vs. intra-provincial migrants using the 2001 census data. Each SP area in a local government was thus assigned a value which was the product of its local government score (the same for all SPs in the local government) and its own SP score. The third and final stage was to develop relative migration scores for all the EAs from the 1996 census by similarly weighting the proportions of migrants (and non-migrants, assigned always 1.0) of each type. The the final migration score for an EA is the product of its own EA score from 1996, the SP score of which it is a part (assigned to all the EAs within the SP), and the local government score from the 2007 survey.
Based on all the above principles the set of weights or scores was developed.
In sum, we multiplied the proportion of populations of each migrant type, or their incidence, by the appropriate final corresponding EA scores for persons of each type in the EA (based on multiplying the three weights together), to obtain the overall score for each EA. This takes into account the distribution of persons in the EA according to migration status in 1996, the SP score of the EA in 2001, and the local government score (in which the EA is located) from 2007. Finally, all EAs in each province were then classified into quartiles, prior to sampling from the quartiles.
From the EAs so classified, the sampling took the form of selecting EAs, i.e., primary sampling units (PSUs, which in this case are also Ultimate Sampling Units, since this is a single stage sample), according to their classification into quartiles. The proportions selected from each quartile are based on the range of EA-level scores which are assumed to reflect weighted probabilities of finding desired migrants in each EA. To enhance the likelihood of finding migrants, much higher proportions of EAs were selected into the sample from the quartiles with the higher scores compared to the lower scores (disproportionate sampling). The decision on the most appropriate categorisations was informed by the observed migration levels in the two provinces of the study area during 2007, 2001 and 1996, analysed at the lowest spatial level for which migration data was available in each case.
Because of the differences in their characteristics it was decided that the provinces of Gauteng and Limpopo should each be regarded as an explicit stratum for sampling purposes. These two provinces therefore represented the primary explicit strata. It was decided to select an equal number of EAs from these two primary strata.
The migration-level categories referred to above were treated as secondary explicit strata to ensure optimal coverage of each in the sample. The distribution of migration levels was then used to draw EAs in such a way that greater preference could be given to areas with higher proportions of migrants in general, but especially immigrants (note the relative scores assigned to each type of person above). The proportion of EAs selected into the sample from the quartiles draws upon the relative mean weighted migrant scores (referred to as proportions) found below the table, but this is a coincidence and not necessary, as any disproportionate sampling of EAs from the quartiles could be done, since it would be rectified in the weighting at the end for the analysis.
The resultant proportions of migrants then led to the following proportional allocation of sampled EAs (Quartile 1: 5 per cent (instead of 25% as in an equal distribution), Quartile 2: 15 per cent (instead of 25%), Quartile 3: 30 per cent (instead of 25%), and Quartile 4: 50 per cent (instead of 25%).
It was agreed that a sample size of at least 2 000 households would be required to elicit the required information. It was agreed further that only six (6) households would be selected in the final level of clusters, i.e., the PSUs or the EAs, to reduce clustering effects, viz., the possible impact of spatial interdependence of survey responses. This gave a required total of 334 EAs (2 000 / 6 = 333.33) to be selected.
An explicit, disproportional stratification of provinces (primary strata) and incidence migrant proportions (secondary strata) was therefore used as a basis for the selection of EAs. The disproportionate distribution of these selected EAs was to be rectified afterwards through the use of EA weights during all data analyses.
Within each sample EA selected following the procedures above, an approximate listing of dwellings was undertaken by the survey team, and updated maps (showing streets/roads, potentially eligible dwellings, and other easily identifiable features for orientation purposes) were produced.
When there were more than one household at a particular visiting point, only one was randomly selected. In the case of a block of flats, townhouse complex or retirement village, it was important to regard every occupied flat/unit as a potential visiting point of the interval. In the case of single-sex workers' hostels, each room or dormitory constituted a visiting point and every occupied bed in a selected room/dormitory represented a (single-person) household.
The sampling process was according to the following plan.
· Enumerator Areas were randomly selected using the approach outlined earlier
· Maps of the selected EAs were obtained the from Statistics South Africa (STATS SA),
· For each EA, the fieldwork supervisor/team identified the physical boundaries from the map and ensured that the map and the physical location were congruent,
· The fieldwork supervisor/team counted the number of houses/dwellings within each EA. Call this Nile,
· 20 households per EA were to be visited, so the sampling interval was calculated as Nile/20. For example, if Nile=200 houses/stands, the sampling interval was Nile=200/20=10. This means that every 10th house/stand was visited,
· The supervisor identified a random starting point, such as a school, a shop, a library, or some similar public point. If none could be identified then one dwelling was identified,
· From this randomly selected starting point, every 10th house/dwelling was visited.
· The actual household interviewed was selected following the procedure below:
o interviewer approached the first household (call that Household #1) and completed the interview irrespective of whether there are migrants in the household,
o Households #2 to #15 were interviewed only if there was at least one international migrant in the household,
o Households #16 to #20 were interviewed irrespective of whether there were migrants in the household,
o If there were migrants in the first six households visited, the interviewer stopped and did not visit any more of Households. The other households were just noted,
o This meant that at the onset for each EA, 20 households were targeted for interview, but a maximum of six would be interviewed,
o If the dwelling unit replacements were required, e.g., if some households refused to be interviewed, then interviewers were to select the next house to the right, followed if necessary by the next house to the left, and so on.
In addition, fieldworkers also had to fill-in a recording sheet. The purpose of the recording sheet was to make sure that fieldwork teams recorded all the household they visited, recording addresses as well as the status of the household, i.e. whether the household had an international migrant or not.
Deviations from sample design
Only 1 EA substitution was necessary during data collection based on refusals. Three other EAs that did not have dwellings were replaced. The EAs involved and the reasons for substitution were:
1. EA 77409214 in Ormonde (affluent area) was substituted for by a similar, nearby EA because most potential respondents refused to participate, denying access.
2. EA 77409008 in Anchorville was substituted because it is an industrial area.
3. EA 91400011 and EA 91400013 in Swartklip SP were substituted because they are in a mining area and no people reside within these two EAs.
Fieldwork teams did not encounter major challenges in accessing sample areas and households. However, there were a number of refusals by households, especially in high affluent areas in both Gauteng and Limpopo provinces. Lack of interest in the topic was the most common reason given by sample respondents for refusing to participate in the study. To minimize this, fieldwork teams reminded potential respondents of the importance of the study and of them participating before they could record those cases as refusals.
Dates of collection
Mode of data collection
Data collection supervision
All fieldworkers in the project were directly supervised throughout the fieldwork, i.e., supervisors were in the field at all times during data collection. Completed questionnaires were checked by supervisors immediately after the interview--whether all the relevant questions were answered/coded and for consistency. Supervisors also conducted random callbacks on the completed questionnaires:
× checking if the interview had actually taken place
× checking whether the interview was conducted with the respondent recorded on the questionnaire
× checking whether the people listed on the household grid were correctly identified as members or non-members of that household
× verifying whether the migration status of household members was correctly listed on the questionnaire
The Fieldwork Coordinator was responsible for quality control during the data collection phase of the project. This included checking a percentage of questionnaires from every fieldwork team, and also conducting some callbacks. The Fieldwork Coordinator checked for data consistency and whether routine instructions were followed in the administering of the questionnaire. The callbacks on completed interviews were to verify whether the interview was conducted with the recorded respondent and whether the instructions in randomly selecting the household and the respondent were followed. Very few errors were found.
Questionnaire was jointly designed by the World Bank and the Human Sciences Research Council in English.
It was tested in the field and adjusted accordingly.
The questionnaire was translated into four local languages for those areas where English would have been undesirable or inappropriate.
Data entry was carried out while interviews were continuing in the field, but proceeded slowly so that most was done after the fieldwork had been completed. A double data entry procedure was used in which a questionnaire was entered twice by different persons to improve accuracy.
Challenges encountered included:
o Many questions in the questionnaire allowed multiple responses but were not marked as such;
o Some questions included two parts but were not marked as such; and
o Some households had more members than the space allocated.
Thus, data processors had to alter the data entry template to accommodate these situations, causing delays in data processing, e.g. for question 1.2, the data entry template was initially designed for nine household members, but later was adapted to allow for more than nine..
Use of the dataset must be acknowledged using a citation which would include:
- the Identification of the Primary Investigator
- the title of the survey (including acronym and year of implementation)
- the survey reference number
- the source and date of download
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.