# Documentation of paper "Self-employment and migration", by Samuele Giambra and David McKenzie

The analysis of this paper is organized in two separate parts. Section 3 presents results from a descriptive analysis using panel survey data, while section 4 is based on data from randomized experiments. We adopt a parallel structure for the replication files and organize them in a /code/descriptive/ and a /code/causal/ directory.
The following document briefly summarizes the replication files contained in each directory and the raw data used.


## Descriptive analysis

### Datasets
All the raw survey datasets used in the descriptive part can be obtained upon registration at the following webpages:
- CFPS: http://opendata.pku.edu.cn/dataverse/CFPS?language=en
- ELMPS: [1998](http://www.erfdataportal.com/index.php/catalog/28); [2012](http://www.erfdataportal.com/index.php/catalog/45)
- IFLS: https://www.rand.org/well-being/social-and-behavioral-policy/data/FLS/IFLS.html
- IHDS: https://www.ihds.umd.edu/
- KHDS: [1991-94](https://microdata.worldbank.org/index.php/catalog/359); [2004](https://microdata.worldbank.org/index.php/catalog/79); [2010](https://microdata.worldbank.org/index.php/catalog/2251)
- LSMS-ISA: [2010-11](https://microdata.worldbank.org/index.php/catalog/1002); [2012-13](https://microdata.worldbank.org/index.php/catalog/1952); [2015-16](https://microdata.worldbank.org/index.php/catalog/2734)
- MXFLS: http://www.ennvih-mxfls.org/english/index.html
- PSID: https://psidonline.isr.umich.edu/

### Code
All descriptive tables and figures presented in the paper and online appendix can be recreated by running the file /descriptive/master.do. Raw survey datasets are used as inputs for the following do files:
- /descriptive/CFPS.do: cleans data from China Family Panel Studies
- /descriptive/ELMPS.do: cleans data from Egypt Labor Market Panel Survey
- /descriptive/IFLS.do: cleans data from Indonesian Family Life Survey
- /descriptive/IHDS.do: cleans data from India Human Development Survey
- /descriptive/KHDS.do: cleans data from Kagera Health and Development Survey
- /descriptive/LSMS-ISA.do: cleans data from Nigerian General Household Survey
- /descriptive/MXFLS.do: cleans data from Mexican Family Life Survey
- /descriptive/PSID.do: cleans data from Panel Study of Income Dynamics

The file /descriptive/preclean.do appends all survey waves and creates the derived descriptive dataset used to construct all descriptive tables and figures. We provide this dataset at the path /derived_data/master_panel_descriptive.dta.

The last two do files called in /descriptive/master.do perform the main descriptive analysis:
- /descriptive/build_figures.do: creates figure 1-5 and online appendix figure A.1
- /descriptive/build_tables.do: creates table 1 and online appendix tables A.1-A.4



## Causal analysis

### Datasets
Most of the data used in the causal analysis part can be obtained by downloading the replication files of the relevant studies. These are available following links:
- YOP: https://dataverse.harvard.edu/dataset.xhtml;jsessionid=b09da1bd44138069102978fd5c2a?persistentId=doi%3A10.7910/DVN/27898&version=1.0
- WINGS: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/QA0R1O
- SLMS: https://microdata.worldbank.org/index.php/catalog/1243/study-description
- SIYB: https://microdata.worldbank.org/index.php/catalog/1553
- Ghana microenterprises: https://microdata.worldbank.org/index.php/catalog/2249
- YouWin!: https://www.aeaweb.org/articles?id=10.1257/aer.20151404

We additionally requested and obtained the following data from the authors:
- YOP 9-year follow-up
- SLMS migration information
- SIYB 6-year follow-up
- Introduction section with respondents location for each wave of Ghana microenterprises survey

The regressions relative to the Nairobi microfranchising data were kindly run for us by Ozier Owen.

### Code
All causal tables and figures presented in the paper and online appendix can be recreated by running the file /causal/master.do. Raw data of each study considered are used as inputs for the following do files:
- /causal/yop.do: cleans data from Blattman et al. (2014; 2019)
- /causal/wings.do: cleans data from Blattman et al. (2016)
- /causal/slms.do: cleans data from DeMel et al. (2008; 2012)
- /causal/siyb.do: cleans data from DeMel et al. (2014)
- /causal/ghana.do: cleans data from Fafchamps et al. (2014)
- /causal/youwin.do: cleans data from McKenzie (2017)

The file /causal/preclean_reduced_form.do appends data on migration, self-employment, treatment status, and baseline control variables from the randomized experiments used in the paper. We provide the derived dataset used in the following regressions at the path /derived_data/master_panel_causal.dta.

The file /causal/build_reduced_form_tables.do runs all the regressions underlying tables 3-4 and online appendix table A.7.

The file /causal/preclean_meta_analysis.do creates a dataset which details estimates and standard errors for each study included in the causal analysis. The output flat file is provied at the path /derived_data/meta_analysis_data.dta.

The last two do files called in /causal/master.do perform the meta analysis:
- /causal/build_meta_regression_tables.do: creates table 5
- /causal/build_meta_analysis_figures.do: creates online appendix figure A.2 