Lecture 7 live coding

Lecture 7 live coding#

2025-02-25

Law of Total Expectation#

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
import ipywidgets as widgets

# Set random seed for reproducibility
rng = np.random.RandomState(42)
n_samples = 10000

Suppose we’re looking at average coffee prices globally over two types of coffee beans: Arabica and Robusta.

\begin{array}{r} X = {\begin{cases} 1 & if coffee is Arabica \\ 0 & if coffee is Robusta \end{cases} \end{array}

# # generate a random variable X with 80% arabica and 20% robusta

# COMPLETED CELL
# # generate a random variable X with 80% arabica and 20% robusta
# X = rng.choice([1, 0], size=n_samples, p=[0.8, 0.2])

Let $Y$ be the price of coffee in dollars per pound.

Arabica prices are higher, on average: $ $Y \sim {\begin{cases} N (4.5, {0.75}^{2}) & if coffee is Arabica \\ N (2.0, {0.25}^{2}) & if coffee is Robusta \end{cases}$ $

# Create Y from two normal distributions

# Create dataframe

# COMPLETED CELL
# # Create Y from two normal distributions
# Y = np.where(
#     X == 1,
#     rng.normal(loc=4.5, scale=0.75, size=n_samples),
#     rng.normal(loc=2, scale=0.25, size=n_samples)
# )

# # create dataframe
# coffee_df = pd.DataFrame({
#     'is_arabica': X,
#     'price': Y
# })

Note

The Law of Total Expectation is generally stated as:

E [Y] = E [E [Y | X]]

“The expected value of Y equals the expected value of the conditional expectation of Y given X”

Idea: we take the average of conditional means of Y given X over all possible values of X.

For binary X, this expands to:

E [Y] = E [Y | X = 1] P (X = 1) + E [Y | X = 0] P (X = 0)

$E [E [Y | X]]$ means we:

First compute the conditional means of $Y$ for each value of $X$
Then take the weighted average of these conditional means, weighted by how often each $X$ occurs

Let’s now verify the Law of Total Expectation with our coffee data:

# compute the simple mean of Y
actual_mean = 0

# compute the conditional means

# calculate the weighted average of conditional means
law_of_total_expectation = 0


print("Actual mean price: ${:.2f}/lb".format(actual_mean))
print("Law of total expectation: ${:.2f}/lb".format(law_of_total_expectation))

Actual mean price: $0.00/lb
Law of total expectation: $0.00/lb

# COMPLETED CELL
# # compute the simple mean of Y
# actual_mean = coffee_df['price'].mean()

# # compute the conditional means
# p_arabica = np.mean(coffee_df['is_arabica'])
# mean_price_arabica = coffee_df[coffee_df['is_arabica'] == 1]['price'].mean()
# mean_price_robusta = coffee_df[coffee_df['is_arabica'] == 0]['price'].mean()

# # calculate the weighted average of conditional means
# law_of_total_expectation = (mean_price_arabica * p_arabica + 
#                            mean_price_robusta * (1 - p_arabica))


# print("Actual mean price: ${:.2f}/lb".format(actual_mean))
# print("Law of Total Expectation: ${:.2f}/lb".format(law_of_total_expectation))

We can also visualize the distribution of the prices by species:

# COMPLETED CELL
# plt.figure(figsize=(10, 6))
# sns.kdeplot(data=coffee_df, x='price', hue='is_arabica')
# plt.axvline(actual_mean, color='black', linestyle='--', 
#             label='Overall mean')
# plt.axvline(mean_price_arabica, color='orange', linestyle='--', 
#             label='Mean price: Arabica')
# plt.axvline(mean_price_robusta, color='blue', linestyle='--', 
#             label='Mean price: Robusta')
# plt.xlabel('Price ($/lb)')
# plt.ylabel('Density')
# plt.title('Distribution of Coffee Prices by Species')
# plt.legend()

Note

Things to notice:

The overall mean (black dashed line) must fall between the two conditional means
It’s pulled closer to the Arabica mean because there’s more Arabica (80%)
The area under each curve represents the relative frequency of each type

Keyword argument unpacking#

By prefixing a dictionary with **, you can unpack the dictionary into keyword arguments.

# COMPLETED CELL
# def add(a, b):
#     return a + b

# d = {'a': 1, 'b': 2}

# add(**d)

The ** operator can also be used in the function definition:

# COMPLETED CELL
# def key_print(**kwargs):
#     # we see that the type of kwargs is dict
#     print(type(kwargs))
#     print(kwargs)

#     for k, v in kwargs.items():
#         print("key: ", k)
#         print("value: ", v)

# card_ranks = {"J": 11, "Q": 12, "K": 13, "A": 1}
# key_print(**card_ranks)

Note

See the str.format() method as an example of keyword argument unpacking.

help(str.format)

Help on method_descriptor:

format(...) unbound builtins.str method
    S.format(*args, **kwargs) -> str
    
    Return a formatted version of S, using substitutions from args and kwargs.
    The substitutions are identified by braces ('{' and '}').

# can give keyword args
print("In this game, {card} is rank {value}".format(value=14, card="A"))

In this game, A is rank 14

Lecture 7 live coding

Contents

Lecture 7 live coding#

Law of Total Expectation#

Keyword argument unpacking#