Lecture 7 live coding

Lecture 7 live coding#

2025-02-25


Law of Total Expectation#

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets as widgets
# Set random seed for reproducibility
rng = np.random.RandomState(42)
n_samples = 10000

Suppose weโ€™re looking at average coffee prices globally over two types of coffee beans: Arabica and Robusta.

\[\begin{split} X = \begin{cases} 1 & \text{if coffee is Arabica} \\ 0 & \text{if coffee is Robusta} \end{cases} \end{split}\]
# # generate a random variable X with 80% arabica and 20% robusta
# COMPLETED CELL
# # generate a random variable X with 80% arabica and 20% robusta
# X = rng.choice([1, 0], size=n_samples, p=[0.8, 0.2])

Let \(Y\) be the price of coffee in dollars per pound.

Arabica prices are higher, on average: $\( Y \sim \begin{cases} \mathcal{N}(4.5, \;0.75^2) & \text{if coffee is Arabica} \\ \mathcal{N}(2.0, \;0.25^2) & \text{if coffee is Robusta} \end{cases} \)$

# Create Y from two normal distributions

# Create dataframe
# COMPLETED CELL
# # Create Y from two normal distributions
# Y = np.where(
#     X == 1,
#     rng.normal(loc=4.5, scale=0.75, size=n_samples),
#     rng.normal(loc=2, scale=0.25, size=n_samples)
# )

# # create dataframe
# coffee_df = pd.DataFrame({
#     'is_arabica': X,
#     'price': Y
# })

Note

The Law of Total Expectation is generally stated as:

\[ E[Y] = E[E[Y|X]] \]

โ€œThe expected value of Y equals the expected value of the conditional expectation of Y given Xโ€

Idea: we take the average of conditional means of Y given X over all possible values of X.

For binary X, this expands to:

\[ E[Y] = E[Y|X=1]P(X=1) + E[Y|X=0]P(X=0) \]

\(E[E[Y|X]]\) means we:

  1. First compute the conditional means of \(Y\) for each value of \(X\)

  2. Then take the weighted average of these conditional means, weighted by how often each \(X\) occurs

Letโ€™s now verify the Law of Total Expectation with our coffee data:

# compute the simple mean of Y
actual_mean = 0

# compute the conditional means

# calculate the weighted average of conditional means
law_of_total_expectation = 0


print("Actual mean price: ${:.2f}/lb".format(actual_mean))
print("Law of total expectation: ${:.2f}/lb".format(law_of_total_expectation))
Actual mean price: $0.00/lb
Law of total expectation: $0.00/lb
# COMPLETED CELL
# # compute the simple mean of Y
# actual_mean = coffee_df['price'].mean()

# # compute the conditional means
# p_arabica = np.mean(coffee_df['is_arabica'])
# mean_price_arabica = coffee_df[coffee_df['is_arabica'] == 1]['price'].mean()
# mean_price_robusta = coffee_df[coffee_df['is_arabica'] == 0]['price'].mean()

# # calculate the weighted average of conditional means
# law_of_total_expectation = (mean_price_arabica * p_arabica + 
#                            mean_price_robusta * (1 - p_arabica))


# print("Actual mean price: ${:.2f}/lb".format(actual_mean))
# print("Law of Total Expectation: ${:.2f}/lb".format(law_of_total_expectation))

We can also visualize the distribution of the prices by species:

# COMPLETED CELL
# plt.figure(figsize=(10, 6))
# sns.kdeplot(data=coffee_df, x='price', hue='is_arabica')
# plt.axvline(actual_mean, color='black', linestyle='--', 
#             label='Overall mean')
# plt.axvline(mean_price_arabica, color='orange', linestyle='--', 
#             label='Mean price: Arabica')
# plt.axvline(mean_price_robusta, color='blue', linestyle='--', 
#             label='Mean price: Robusta')
# plt.xlabel('Price ($/lb)')
# plt.ylabel('Density')
# plt.title('Distribution of Coffee Prices by Species')
# plt.legend()

Note

Things to notice:

  • The overall mean (black dashed line) must fall between the two conditional means

  • Itโ€™s pulled closer to the Arabica mean because thereโ€™s more Arabica (80%)

  • The area under each curve represents the relative frequency of each type


Keyword argument unpacking#

By prefixing a dictionary with **, you can unpack the dictionary into keyword arguments.

# COMPLETED CELL
# def add(a, b):
#     return a + b

# d = {'a': 1, 'b': 2}

# add(**d)

The ** operator can also be used in the function definition:

# COMPLETED CELL
# def key_print(**kwargs):
#     # we see that the type of kwargs is dict
#     print(type(kwargs))
#     print(kwargs)

#     for k, v in kwargs.items():
#         print("key: ", k)
#         print("value: ", v)

# card_ranks = {"J": 11, "Q": 12, "K": 13, "A": 1}
# key_print(**card_ranks)

Note

See the str.format() method as an example of keyword argument unpacking.

help(str.format)
Help on method_descriptor:

format(...) unbound builtins.str method
    S.format(*args, **kwargs) -> str
    
    Return a formatted version of S, using substitutions from args and kwargs.
    The substitutions are identified by braces ('{' and '}').
# can give keyword args
print("In this game, {card} is rank {value}".format(value=14, card="A"))
In this game, A is rank 14