Activity 21: Independence testing for causal discovery

Activity 21: Independence testing for causal discovery#

2025-12-08

import numpy as np
import pandas as pd

rng = np.random.RandomState(42)
# high number of samples to reduce sampling noise
n_samples = 30000

Run the cell below to generate the simulated data_df for the purposes of this activity:

For the purposes of this activity, we will assume that correlations greater than 0.1 show a dependence between the two variables. If the correlation between two variables is less than 0.1, we will consider them independent.

We can compute correlations between any pair of variables by selecting the relevant columns from a dataframe, calling df[['col1', 'col2']].corr(), and reading off the non-diagonal elements of the resulting matrix. Let’s use this to test the following independence relationships:

Is A independent of B?
Is A independent of C?
Is B independent of C?

Input which pair(s) of variables are independent in the PollEverywhere: pollev.com/tliu