Final Project Template#
The overall structure of your final project should read like a technical blog post, where you motivate the importance of the topic to the readers, introduce the relevant background, give an instructional overview of your study design, and then present your analysis along with the code to generate the results.
You may format your final project however you like, but be sure to address the points outlined under each section. Feel free to add or modify subsections as needed, as well as delete the points and templated admonition sections, like the ones below. You can refer to the template on the course website as you work on your final project to make sure youโre addressing all the points.
Tip
For those who are interested in structuring academic writing, I suggest reading Mensh and Kording 2017: Ten simple rules on structuring papers.
Note that this is completely optional, you are not required to adhere to these writing rules.
Tip
If the code to reproduce your results is long, you can use Jupyter Book functionality for hiding/removing content to toggle the visibility of the code cells.
Prior knowledge [2 pts]#
Give an appropriate title for your causal study, replacing โFinal Project Templateโ above.
Motivate the domain of study to a general audience.
Provide information about what is already known about the domain youโre studying.
For both the motivation and prior information, incorporate your literature review summaries as related work to the project.
Causal question [1 pt]#
State your causal question, specifying the treatment and outcome variables.
Specify the causal quantity of interest (ATT, ATE, etc), typsetting it mathematically using LaTeX.
Explain why your selected causal quantity is meaningful for the domain you are studying.
Design#
Study strategy [1.5 pts]#
Describe your chosen causal inference strategy (e.g., randomized experiment, observational study, RDD, etc.), introducing how it works to a general audience.
Discuss all key assumptions required for your strategy to be valid, and assess whether these assumptions are plausible for your study. For example, if you are using an observational study design, you should discuss the unconfoundedness assumption and whether you have a way to measure and control for relevant confounders.
Note: your study design does not need to be perfectly valid in terms of satisfying all of the assumptions. The goal is to evaluate the assumptions needed for identification and show that you have considered the strengths and weaknesses of your design strategy.
Provide a DAG image of all the relevant variables in your study.
If there are many confounders, you can represent them as a single node and list them separately.
If you are using a difference-in-differences design, you do not need to include \(T\) and \(Y\), instead drawing a DAG similar to the one shown in the diff-in-diff I lecture.
Covariates [1 pt]#
Describe your chosen dataset, including how it was collected and the composition of the sample.
Describe how your treatment and outcome variables are measured.
Provide a discussion of covariates in your dataset that could be potential confounders, which should include brief justifications of why you think they might affect both the treatment and outcome. If you are using a randomized experiment, you should still provide a discussion of covariates, but instead focus on how they may affect the outcome for potential stratification.
Provide summary statistics (mean, std) of relevant covariates in your study, grouped by treatment status if applicable.
Estimation [2 pts]#
Provide at least one (but more if youโd like!) relevant figure visualization for your chosen study design. For example:
Randomized experiment: a distribution of covariates by treatment status to show balance
Observational study: a Love plot before and after matching or trimming
Instrumental variable: a bar plot showing the mean covariate values for compliers, never-takers, always-takers, and the overall population
Regression discontinuity: a point plot showing the โjumpโ in the treatment assignment at the cutoff
Difference-in-differences: a plot evaluating the parallel trends assumption pre-treatment
Provide at least one mathematically typset equation for the statistical quantity you will be estimating.
Describe your chosen estimator(s) for the causal quantity and how you will implement them.
Report the results of your estimation, including the point estimate and 95% confidence intervals.
Include your unique deliverable here along with a discussion if you have chosen:
Analysis extension
Simulation study
Additional methods exploration
Interpretation [2 pts]#
Interpret your estimation results in the context of the domain you are studying. Are they confirming or refuting prior knowledge? Are the effect sizes large or small? Are they practically significant or inconclusive?
Discuss the population who the results apply to, keeping in mind both the causal quantity your estimator is targeting, as well as the composition of your dataset sample. For example, if your estimator and study design targets the ATE, but your data only comprises of individuals in the state of Massachusetts, you should highlight how this changes the interpretation or generalizability of your results.
Reflect on the limitations of your study design and analysis.
If your chosen unique deliverable is the future experiment proposal, include that here as a way to address the limitations of your study design.
Conclude with an overall summary of your analysis and its relevance to the domain you are studying.
What to submit [1 pt]#
A
final_project.zip
file containing your typeset Jupyter Book html article, following the same build process as described in Project 3All source code
.ipynb
or.py
files used for data cleaning/preprocessing to reproduce your results
Tip
The published html article should still have source code cells. However, you may want to do data cleaning/preprocessing if needed in a separate file so that the focus in your rendered article is on the analysis and results.