The Pakistan Integrated Household Survey (PIHS) was conducted jointly by the Federal Bureau of Statistics (FBS), Government of Pakistan, and the World Bank. The survey was part of the Living Standards Measurement Study (LSMS) household surveys that have been conducted in a number of developing countries with the assistance of the World Bank. The purpose of these surveys is to provide policy makers and researchers with individual, household, and community level data needed to analyze the impact of policy initiatives on living standards of households.
The Pakistan Integrated Household Survey was carried out in 1991. This nationwide survey gathered individual and household level data using a multi-purpose household questionnaire. Topics covered included housing conditions, education, health, employment characteristics, selfemployment activities, consumption, migration, fertility, credit and savings, and household energy consumption. Community level and price data were also collected during the course of the survey.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
1: HOUSEHOLD INFORMATION
- Part A: Household roster
- Part B: Information on parents
- Part A: Type of dwelling
- Part B: Housing expenses
- Part C: Utilities and amenities
- Part A: Literacy and training
- Part B: Formal education
- Part C: Interruption of education
- Part D: Vocational and technical training
- Part A: Diarrhea
- Part B: Immunizations
- Part C: Other illnesses
5: WAGE EMPLOYMENT:
- Part A: Employment in agriculture
- Part B: Employment outside agriculture
- Part C: Pension, social security, and unemployment
- Part D: Overseas employment
6: FAMILY LABOR:
- Part A: Family labor inputs on own-farm or land rented in/sharecropped
- Part B: Non-farm self-employment
- Part C: Female time use
7M ENERGY (MALE QUESTIONNAIRE):
- Part A: Electricity usage and appliance ownership
- Part B: Natural gas and appliance ownership
- Part C: LPG and appliance ownership
- Part D: Kerosene oil and appliance ownership
- Part E: Firewood usage
- Part F: Dung cake
- Part I: Other fuels usage
- Part M: Attitudes/behavior
7F: ENERGY (FEMALE QUESTIONNAIRE):
- Part A: Electricity usage and appliance ownership
- Part B: Natural gas and appliance ownership
- Part C: LPG and appliance ownership
- Part D: Kerosene oil and appliance ownership
- Part E: Firewood usage
- Part F: Dung cake
- Part G: Charcoal usage
- Part H: Coal usage
- Part I: Other fuels usage
- Part J: Stoves
- Part K: Cooking habits and implements
- Part L: Fuel switching
- Part M: Attitudes/behavior
9: FARMING AND LIVESTOCK
- Part A: Landholding and tenure
- Part B1: Rabi crop production and distribution
- Part B2: Kharif crop production and distribution
- Part B3: Orchard crops
- Part B4: Sugarcane
- Part C: Assistance and credit
- Part D: Expenditure on agriculture inputs
- Part E: Expenditures and income from agri. services
- Part F: Livestock ownership and production
- Part G: Hired labor on own-farm
- Part H: Income from processing and sales of own-farm products
10: NON-FARM ENTERPRISE ACTIVITIES:
- Part A: General characteristics of the enterprise
- Part B: Operating expenses
- Part C: Ownership of assets
- Part D: Revenues
11: NON-FOOD EXPENDITURES AND INVENTORY OF DURABLE GOODS:
- Part A: Daily expenses
- Part B: Annual expenses
- Part C: Inventory of durable goods
12: FOOD EXPENSES AND HOME PRODUCTION:
- Part A: Food expenses
- Part B: Home production
13: MARRIAGE AND MATERNITY HISTORY:
- Part A: Maternity history for women 14 and older
- Part B: Family planning
- Part C: Maternity history for ever married women who have given birth
- Part D: Infant feeding practices
- Part E: Men's marriage history
15: CREDIT AND SAVINGS:
- Part A: Assets and liabilities position
- Part B: Borrowing and outstanding loans
- Part C: Lending and outstanding loans
- Part D: Property
- Part D1: Personal and investment property
- Part D2: Dowries
- Part D3: Stocks, shares, bonds, and other securities
- Part D4: Bank deposits and postal savings
- Part D5: Bisi or saving committees
16: TRANSFERS AND REMITTANCES:
- Part A: Remittances and transfer expenditure
- Part B: remittances and transfer income
17: OTHER INCOME
Producers and sponsors
Federal Bureau of Statistics (FBS)
The World Bank
The sample for the PIHS was drawn using a multi-stage stratified sampling procedure from the Master Sample Frame developed by FBS based on the 1981 Population Census.
This sample frame covers all four provinces (Punjab, Sindh, NWFP, and Balochistan) and both urban and rural areas. Excluded, however, are the Federally Administered Tribal Areas, military restricted areas, the districts of Kohistan, Chitral and Malakand and protected areas of NWFP. According to the FBS, the population of the excluded areas amounts to about 4 percent of the total population of Pakistan. Also excluded are households which depend entirely on charity for their living.
The sample frame consists of three main domains: (a) the self-representing cities; (b) other urban areas; and (c) rural areas. These domains are further split up into a number of smaller strata based on the system used by the Government to divide the country into administrative units. The four provinces of Pakistan mentioned above are divided into 20 divisions altogether; each of these divisions in turn is then further split into several districts. The system used to divide the sample frame into the three domains and the various strata is as follows:
(a) Self-representing cities: All cities with a population of 500,000 or more are classified as self-representing cities. These include Karachi, Lahore, Gujranwala, Faisalabad, Rawalpindi, Multan, Hyderabad and Peshawar. In addition to these cities, Islamabad and Quetta are also included in this group as a result of being the national and provincial capitals respectively. Each self-representing city is considered as a separate stratum, and is further sub-stratified into low, medium, and high income groups on the basis of information collected at the time of demarcation or updating of the urban area sample frame.
(b) Other urban areas: All settlements with a population of 5,000 or more at the time of the 1981 Population Census are included in this group (excluding the self-representing cities mentioned above). Urban areas in each division of the four provinces are considered to be separate strata.
(c) Rural areas: Villages and communities with population less than 5,000 (at the time of the Census) are classified as rural areas. Settlements within each district of the country are considered to be separate strata with the exception of Balochistan province where, as a result of the relatively sparse population of the districts, each division instead is taken to be a stratum.
As the above table shows, the sample frame consists of 88 strata altogether. Households in each stratum of the sample frame are exclusively and exhaustively divided into PSUs. In urban areas, each city or town is divided into a number of enumeration blocks with welldefined boundaries and maps. Each enumeration block consists of about 200-250 households, and is taken to be a separate PSU. The list of enumeration blocks is updated every five years or so, with the list used for the PIHS having been modified on the basis of the Census of Establishments conducted in 1988.
In rural areas, demarcation of PSUs has been done on the basis of the list of villages/mouzas/dehs published by the Population Census Organization based on the 1981 Census.
Each of these villages/mouzas/dehs is taken to be a separate PSU.
Altogether, the sample frame consists of approximately 18,000 urban and 43,000 rural PSUs.
The PIHS sample comprised 4,800 households drawn from 300 PSUs throughout the country. Sample PSUs were divided equally between urban and rural areas, with at least two PSUs selected from each of the strata. Selection of PSUs from within each stratum was carried out using the probability proportional to estimated size method. In urban areas, estimates of the size of PSUs were based on the household count as found during the 1988 Census of Establishments. In rural areas, these estimates were based on the population count during the 1981 Census.
Once sample PSUs had been identified, a listing of all households residing in the PSU was made in all those PSUs where such a listing exercise had not been undertaken recently. Using systematic sampling with a random start, a short-list of 24 households was prepared for each PSU. Sixteen households from this list were selected to be interviewed from the PSU; every third household on the list was designated as a replacement household to be interviewed only if it was not possible to interview either of the two households immediately preceding it on the list.
As a result of replacing households that could not be interviewed because of non-responses, temporary absence, and other such reasons, the actual number of households interviewed during the survey - 4,794 - was very close to the planned sample size of 4,800 households. Moreover, following a pre-determined procedure for replacing households had the added advantage of minimizing any biases that may otherwise have arisen had field teams been allowed more discretion in choosing substitute households.
SAMPLE DESIGN EFFECTS:
The three-stage stratified sampling procedure outlined above has several advantages from the point of view of survey organization and implementation. Using this procedure ensures that all regions or strata deemed important are represented in the sample drawn for the survey. Picking clusters of households or PSUs in the various strata rather than directly drawing households randomly from throughout the country greatly reduces travel time and cost. Finally, selecting a fixed number of households in each PSU makes it easier to distribute the workload evenly amongst field teams. However, in using this procedure to select the sample for the survey, two important matters need to be given consideration: (a) sampling weights or raising factors have to be first calculated to get national estimates from the survey data; and (b) the standard errors for estimates obtained from the data need to be adjusted to take account for the use of this procedure.
If a simple random sampling procedure had been used to draw the sample for the survey, the data collected could have been used directly to obtain national as well as regional estimates without the need for sampling weights or raising factors. However, in using data from a sample drawn by the procedure outline above, allowance needs to be made for the fact that this sampling procedure does not give all households in the country an equal chance of being selected for the survey. If no sampling weights are used with the data, the resulting estimates are likely to be biased as different types of households may not be represented in the sample in the same proportion as they exist in the population as a whole. In simple terms, sample weights attempt to correct for the fact that different households in the country have different chances of being included in the sample for the survey. To allow adjustment to be made for over-sampling of certain strata in the PIHS sample, sampling weights have been calculated, and have been incorporated into the PIHS data sets that are distributed. These raising factors should be used to weight data in order to obtain nationally representative statistics. In what follows. The way these sampling weights have been calculated is briefly outlined below.
The first aspect of the sampling strategy adopted for the PIHS that needs to be taken into consideration when calculating sampling weights is the stratification of the sample frame. Instead of picking PSUs at random from the country as a whole, PSUs for the PIHS survey were selected so as to ensure that at least 2 were picked from each strata of the Master Sample frame. Half the sample was picked from strata in urban areas even though they constituted less than 32 percent of the country s estimated population in 1991. In order to correct for such over-sampling, the weight for households drawn from each strata needs to include a component that is inversely proportional to the probability of selection of PSUs in that strata. In other words, the greater the assigned probability for selecting PSUs in a particular stratum, the lower the weight we should give to households picked from this stratum.
The second step of sample selection for the PIHS - i.e. the selection of PSUs within each stratum- was carried out using the probability proportional to estimated size (PPS) procedure. In this method, a large PSU is assigned a higher probability of selection than a smaller PSU by a factor that is directly proportional to their relative size. If an equal number of households are to be interviewed in each selected PSU, then this method in principle results in a self-weighted sample within each stratum. In other words, all households within the stratum have an equal chance of selection in the sample and should therefore be allotted the same weight. In practice, however, allowance almost always needs to be made for the fact that the actual size of the PSU as found during the household listing exercise differs from the estimated size on which the selection of the PSU from the sample frame was based. The weight assigned to households in different PSUs thus includes a second component that is directly proportional to the ratio of the PSU s actual size to its estimated size. Households in a PSU where the count during the listing exercise reveals the population to be 50 percent higher than that earlier supposed are thus given a weight 50 percent higher than that assigned to households in a PSU where these two counts are found to coincide.
Finally, the third step of sample selection - i.e. that of selecting households within each PSU - does not have any effect on sampling weights; therefore, all households within a particular PSU are assigned the same weight. This is because the “systematic sampling with a random start” procedure used to select households gives all households in the PSU an equal chance of selection. Even the use of replacement households in the case of the PIHS does not affect the assignment of weights within the PSU, as the process of selection of replacement households was the same as that used to select the other 16 households to be interviewed from the PSU.
The formula used to calculate the weight assigned to the various PSUs is as follows:
Wij = k x (1/Pij) x (Nj/Sj)
where Wij is the weight assigned to households in PSU j of stratum i, k is some constant, Pij is the assigned probability of selection of PSU j of stratum i, (i.e. the higher the given probability of selection, the lower the weight given to the PSU), Nj is the number of households in the PSU j as found during the listing exercise, and Sj is the number of households in the PSU j on which the PPS was based.
Dates of Data Collection
Data Collection Mode
Data Collection Notes
Field work for the PIHS was carried out by 15 teams based at FBS regional offices throughout the country. Two teams each were stationed in Karachi and Lahore, while one team each operated out of the FBS offices in Peshawar, Bannu, Rawalpindi, Gujranwala, Faisalabad, Sargodha, Multan, Bahawalpur, Sukkur, Hyderabad, and Quetta.
Each field team consisted of 7 members; a supervisor (Statistical Officer), two male and two female interviewers (Statistical Assistants), a data entry operator (Key Punch and Verifying Officer), and a driver. The four interviewers were responsible for carrying out the household interviews under the supervision of the Statistical Officer in accordance with the timetable prepared for each team. While the rest of the teams traveled back and forth between the regional office and the PSUs where the interviews were conducted, the data entry operators remained at the regional offices throughout. In order to facilitate travel for the field teams, a vehicle was provided to each team for the duration of the survey.
Overall supervision and coordination of the field work was conducted by the PIHS management team based at the FBS office in Islamabad. During the initial phase of the project, technical assistance was provided to the PIHS management team by local consultants hired for the project. The PIHS management team consisted of six members: a Project Director, a Chief Statistical Officer, three Statistical Officers, and a Data Processing Manager.
The team was headed by the Project Director who was responsible for administering the survey. He directed the work of the team and ensured the smooth running of the overall project. He was assisted in his duties by the Chief PIHS Section, and by the three Statistical Officers. The Data Processing Officer was responsible for working with consultants to develop the data entry software for the survey, and to ensure that the supervisors and data entry operators followed the instructions for running the programs and operating the microcomputers properly.
SCHEDULE OF ACTIVITIES:
Once preliminary arrangements regarding the outline of the project had been finalized, discussions were held between staff from the World Bank, the Federal Bureau of Statistics, Pakistani researchers, and donor agencies in order to develop a draft of the household questionnaire. This questionnaire was then field-tested in June 1990. Following the field test, a workshop was held in Islamabad where the FBS staff that had participated in the field work were invited to give their comments on the questionnaire. The household questionnaire was then revised and finalized in light of these discussions, and translated into Urdu.
Some of the field staff used for the PIHS were drawn from the personnel of the FBS, whereas the rest were recruited by the Bureau for the project. Training of the field staff was conducted in Islamabad during November and December 1990. Initially, a two week training session was organized for the team supervisors. The main topics covered during the course of this training were the organization of the survey and the supervisory checks to be performed on the work of the interviewers. The supervisors were then joined by the interviewers for the main training session. This session spanned four weeks; during the first three weeks, the field staff were given training on completing the household questionnaire itself while in the last week, the teams were taken to neighboring communities to conduct practice interviews. Supervisors were also able to practice supervisory checks during these visits. These household interviews were observed and critiqued by the survey staff.
Data entry operators received training for three weeks which was conducted concurrently with the training for the supervisors and interviewers. This training consisted of three main parts.First, as many of the trainees recruited for data entry had not used computers before, they were provided with training on the use and maintenance of personal computers. During the second part of the training, the data entry operators were instructed on the use of the data entry program. Finally, the training also included a practical training component where data entry operators recorded the data from the household interviews completed as part of the interviewer training. Printouts of the data entered were given to the team supervisors who then discussed the mistakes highlighted by the data entry program in these printouts with the interviewers concerned.
About 20 percent more staff than project requirements were trained during this period. This served two main purposes: (a) the project management team would use the most promising trainees for the main survey; and (b) the staff that dropped out during the survey or were unable to work temporarily could be replaced by the extra personnel that had been trained.
Following completion of the training in Islamabad, the various teams returned to their duty stations, and field work for the survey commenced in January 1991. During the course of the next twelve months, the PIHS field teams covered about 20 PSUs each on average. In the 300 PSUs covered, almost 4,800 households were interviewed.
ORGANIZATION OF FIELD WORK:
The PIHS was the first survey conducted by FBS in which data entry was carried out directly in the field. The main reasons for conducting data entry in the field was to improve data quality (possible errors could be corrected in the field through revisiting the households concerning rather than carrying out office editing), and to reduce the time taken between the completion of field work and availability of data for analysis. Decentralizing the data entry process involved installing a microcomputer in each of the regional offices for the immediate entry of data from all questionnaires completed by each team.
The schedule of work for all teams consisted of completing two PSUs each in a four-week period. Each team completed the first round of interviews in PSU 1 during the first week, the first round of interviews in PSU 2 during the second week, returned to PSU 1 to complete the second round of interviews in the third week, and then completed the second round of interviews in PSU 2 during the fourth week. At the end of each week, the team returned to the regional office to give the questionnaires to the data entry operator for data entry. The schedule of household interviews and data entry is summarized in the following ttable.
As the table shows, data entry of interviews conducted in a particular week was carried out in the following week. Thus, before the team went back to any PSU for the second round, data entry of the first round for that PSU had been completed by the data entry operator. During the second round visit, teams could take with them printouts of the data entered from the first round with a record of data omissions, possible errors, and inconsistencies for correction or verification.
During a week, the team completed one round of interviews for 16 households in the PSU. The teams worked in two pairs of one male and one female interviewer each, with each pair covering on average 2 households per day. During the period when household interviews were being conducted, the team stayed in the PSU. On their return to the office at the end of the week, the supervisor would review the printouts of data from the households for possible interviewer and data entry errors. Data entry errors would then be corrected at the office, while other possible data errors or inconsistencies would be marked on to the questionnaires and given to the interviewers for correction during the next visit.
The PIHS used three questionnaires: a household questionnaire, a community questionnaire, and a price questionnaire.
The PIHS questionnaire comprised 17 sections, each of which covered a separate aspect of household activity. The various sections of the household questionnaire were as follows:
1. HOUSEHOLD INFORMATION
5. WAGE EMPLOYMENT
6. FAMILY LABOR
9. FARMING AND LIVESTOCK
10. NON-FARM ENTERPRISE ACTIVITIES
11. NON-FOOD EXPENDITURES AND INVENTORY OF DURABLE GOODS
12. FOOD EXPENSES AND HOME PRODUCTION
13. MARRIAGE AND MATERNITY HISTORY
15. CREDIT AND SAVINGS
16. TRANSFERS AND REMITTANCES
17. OTHER INCOME
The household questionnaire was designed to be administered in two visits to each sample household. Apart from avoiding the problem of interviewing household members in one long stretch, scheduling two visits also allowed the teams to improve the quality of the data collected.
During the first visit to the household (Round 1), the enumerators covered sections 1 to 8, and fixed a date with the designated respondents of the household for the second visit. During the second visit (Round 2), which was normally held two weeks after the first visit, the enumerators covered the remaining portion of the questionnaire and resolved any omissions or inconsistencies that were detected during data entry of information from the first part of the survey.
Since many of the sections of the questionnaire pertained specifically to female members of the household, female interviewers were included in conducting the survey. The household questionnaire was split into two parts (Male and Female). Sections such as SECTION 3: EDUCATION, which solicited information on all individual members of the household (male as well as female) were included in both parts of the questionnaire. Other sections such as SECTION 2: HOUSING and SECTION 12: FOOD EXPENSES AND HOME PRODUCTION , which collected data at the aggregate household level, were included in either the male questionnaire or the female questionnaire, depending upon which member of the household was more likely to know more about that particular area of household activity. Male and female interviewers were instructed to switch questionnaires where necessary in order to obtain information from the best informed individual in the household.
Information for all male members aged 10 years or more was collected using the male questionnaire. Iinformation on other household members (i.e. all female household members as well as children aged less than 10 years) was collected using the female questionnaire. Individuals covered in the male questionnaire were assigned sequential ID codes beginning with code "01" and those household members covered in the female questionnaire were assigned ID codes starting with code "51".
It is important to note, however, that the division of the questionnaire into the male and female portions was undertaken solely to facilitate gathering of data in the field. Male and female enumerators could interview respondents of different sexes separately when visiting each household, and thus obtain information pertaining to household members of both sexes directly from the individuals concerned. This was particularly important in the case of sections such as SECTION 13: MARRIAGE AND MATERNITY HISTORY, where assigning female enumerators to directly interview the women concerned was crucial. While information for male and female members was collected in separate questionnaires, these data were combined during data entry so that the household data files contain information on all members of the household. Each section of the household questionnaire was further divided into subsections A, B, C, etc.
COMMUNITY AND PRICE QUESTIONNAIRES:
In each of the 300 communities where household interviews were conducted for the PIHS, a community questionnaire was administered by the team supervisor. Respondents to this questionnaire typically consisted of the head of the village or community, the local school master, local government official, or any other such individual who was knowledgeable about the community. Communities were defined as all households living in the Primary Sampling Unit (PSU) in which the interview was conducted (the concept of PSU is explained in more detail in the next section on Sample Design). While each of the 300 PSUs consisted of roughly the same number of households (generally about 200 - 300), the area covered by individual PSUs varied considerably. In urban areas, communities were, in general, much smaller in terms of area covered, and were defined to be the group of households living within the physical boundaries of the PSU. In rural areas, because of the low population density, the PSU at times consisted of a group of settlements spread over a large area. In such cases, the supervisors were instructed to treat the largest or most central village in the PSU as the community.
The community questionnaire contained questions on characteristics of the community such as the quality of physical infrastructure, provision of amenities such as electricity, gas and water, access to education and health care facilities, and on markets and availability of goods and services in the locality. In order to obtain more information on birth practices used in the community, one of the sections of the community questionnaire was directed at dais (birth attendants) in the community and contained a number of questions on birth practices and preand post-birth maternal care. In rural areas, in addition to the section on the general characteristics of the community, two additional sections on health facilities and primary school facilities were also administered. Detailed information was collected on the quality of infrastructure, the equipment and services available, as well as staffing of these facilities.
Finally, a price questionnaire was also administered in all the communities where households were interviewed. Price information for 37 goods was collected. The goods included items such as food staples, tea and sugar, selected vegetables, as well as a few non-food items like fuels, soaps, etc. For all goods, two sets of prices were collected: one from the local shopkeeper and the other from the local mandi or wholesale seller. In rural areas, prices of agricultural inputs as well as other relevant information on local farming practices was also collected.
Estimates of Sampling Error
The PIHS sample was designed to yield representative statistics at the national and urban/rural) levels. Care however should be taken when interpreting results for smaller analytic domains as the sample was not designed to be representative at a more disaggregated level. Thus, even with the use of the sampling weights, statistics for the smaller provinces such as Balochistan are likely to have high standard errors given the relatively small sample size in these domains. In this regard, it is important to note that when calculating standard errors for estimates derived from the PIHS data, allowance must be made for the fact that the survey used a multi-staged sampling procedure. Calculating standard errors using methods outlined in elementary statistical textbooks is likely to underestimate the true magnitude of errors as the techniques presented in these books often assume that simple random selection was used when drawing the sample.
In general, a multi-staged sampling scheme that involves picking a cluster of households at some stage is less efficient than one which involves simple random sampling. This is because neighboring households tend to have similar characteristics, and so a sample drawn from them reflects less of the population s diversity than a simple random sample of the same given size N. In such an instance, the standard errors associated with estimates based on data from a survey using a multi-stage stratified sampling procedure (such as the PIHS) will be higher than would be indicated by simple random sample-based statistical theory.
The magnitude by which the standard error would be underestimated if no allowance is made for the “cluster” effect depends on the characteristic being estimated. In general, the more homogeneous the households are within a cluster with respect to the characteristic being estimated, the less efficient a sampling scheme based on clustering, and the higher the true standard error of the estimate obtained. For most variables of interest, the degree of homogeneity within the cluster is likely to be low, and so the effect of ignoring the “cluster effect” when estimating standard errors is unlikely to be too serious. However, in some cases, the inter-cluster correlation with respect to the variable of interest may be quite large (for instance whether or not the household has electricity). In these cases, if no allowance is made for clustering, the magnitude by which the true standard error is underestimated will be high.
World Bank LSMS
LSMS Data Manager
The World Bank
Use of the dataset must be acknowledged by including a citation which would include:
- Identification of the Primary Investigator(s) and of the country
- Title of the survey (including the year of implementation)
- Survey reference number
- Source and date of download
Pakistan. Federal Bureau of Statistics and the World Bank. Pakistan Integrated Household Survey (PIHS) 1991. Ref. PAK_1991_PIHS_v01_M. Dataset downloaded from www.microdata.worldbank.org on [date]
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.