Activity 9: Other causal quantities and matching#
2025-03-11
import pandas as pd
import numpy as np
Part 1: Computing causal quantities from the “ground truth” table#
A healthcare provider implemented a voluntary smoking cessation program for patients who smoke. The program included counseling, nicotine replacement therapy, and support groups. The healthcare provider wants to evaluate the program’s effectiveness. The variables are as follows:
T: Participation in the smoking cessation program (1 = participated, 0 = did not participate)
Y: Lung function score (higher is better) measured one year after the program finished
Unit |
T |
Y(1) |
Y(0) |
---|---|---|---|
0 |
1 |
80 |
60 |
1 |
1 |
80 |
70 |
2 |
1 |
60 |
10 |
3 |
1 |
30 |
30 |
4 |
0 |
50 |
40 |
5 |
0 |
30 |
40 |
6 |
0 |
70 |
70 |
7 |
0 |
60 |
50 |
8 |
0 |
50 |
20 |
9 |
0 |
50 |
10 |
Suppose we are omniscient or in possession of a time machine giving us access to the ground truth table above, where we can see the potential outcomes for each unit. The code below generates a dataframe with this table:
part1_df = pd.DataFrame({
'T': [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
'Y1': [80, 80, 60, 30, 50, 30, 70, 60, 50, 50],
'Y0': [60, 70, 10, 30, 40, 40, 70, 50, 20, 10]
})
part1_df
T | Y1 | Y0 | |
---|---|---|---|
0 | 1 | 80 | 60 |
1 | 1 | 80 | 70 |
2 | 1 | 60 | 10 |
3 | 1 | 30 | 30 |
4 | 0 | 50 | 40 |
5 | 0 | 30 | 40 |
6 | 0 | 70 | 70 |
7 | 0 | 60 | 50 |
8 | 0 | 50 | 20 |
9 | 0 | 50 | 10 |
What is the estimated average treatment effect (ATE)? Recall that the ATE is given by:
Your response: TODO
# TODO your code here
Next, estimate the ATT and ATU.
The ATT is given by:
The ATU is given by:
Send your ATT estimate to PollEverywhere:
Your ATT estimate: pollev.com/tliu
Your ATU estimate: TODO
# TODO your code here
Part 2: Matching on one covariate#
Now suppose we only have access to the observed data table, and we want to estimate the ATT by matching on one covariate, smoking frequency:
X: smoking frequency (# cigarettes per day)
Unit |
T |
Y |
X |
---|---|---|---|
0 |
1 |
80 |
5 |
1 |
1 |
80 |
10 |
2 |
1 |
60 |
8 |
3 |
1 |
30 |
1 |
4 |
0 |
40 |
5 |
5 |
0 |
40 |
1 |
6 |
0 |
70 |
8 |
7 |
0 |
50 |
13 |
8 |
0 |
20 |
10 |
9 |
0 |
10 |
16 |
Below is the dataframe for the observed data:
part2_df = pd.DataFrame({
'T': [1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
'Y': [80, 80, 60, 30, 40, 40, 70, 50, 20, 10],
'X': [5, 10, 8, 1, 5, 1, 8, 13, 10, 16]
})
part2_df
T | Y | X | |
---|---|---|---|
0 | 1 | 80 | 5 |
1 | 1 | 80 | 10 |
2 | 1 | 60 | 8 |
3 | 1 | 30 | 1 |
4 | 0 | 40 | 5 |
5 | 0 | 40 | 1 |
6 | 0 | 70 | 8 |
7 | 0 | 50 | 13 |
8 | 0 | 20 | 10 |
9 | 0 | 10 | 16 |
Match all the treated units to the control units that have the same smoking frequency by filling out the list of units manually below:
treated_units = [0, 1, 2, 3]
control_units = [ ]
Next, compute the ATT estimate through the difference-in-means for the treated units and control units.
You can use df.loc[row_indices, column]
to select based on the row indices and column name.
For example, part2_df.loc[[0, 1, 2], 'Y']
will select the rows 0, 1, 2 and the column ‘Y’.
Your ATT estimate: pollev.com/tliu
# TODO compute difference-in-means for outcome Y between treated and control units
Part 3: Propensity score matching#
Suppose we want to instead match on the propensity score \(e_i(X)\), which is the estimated probability of treatment for unit \(i\) given other covariates \(X\).
Unit |
T |
Y |
e(X) |
---|---|---|---|
0 |
1 |
80 |
0.88 |
1 |
1 |
80 |
0.6 |
2 |
1 |
60 |
0.2 |
3 |
1 |
30 |
0.8 |
4 |
0 |
40 |
0.2 |
5 |
0 |
40 |
0.82 |
6 |
0 |
70 |
0.58 |
7 |
0 |
50 |
0.1 |
8 |
0 |
20 |
0.05 |
9 |
0 |
10 |
0.72 |
Let’s again match all the treated units to the control units that have the closest propensity score by filling out the list of units manually below.
Begin with the first treated unit (unit 0), and match it to the control unit with the closest propensity score.
Remove the matched control unit from further consideration. Aside: you can cross out in Markdown using double tildes:
~~crossed out~~
.Repeat until all treated units are matched.
treated_units = [0, 1, 2, 3]
control_units = [ ]
Suppose that we had instead began matching in reverse order, starting with the last treated unit 3. What control unit would be matched to unit 3 instead?
Your response: pollev.com/tliu