sample all the distributions

1. Setup
2. Model definition
3. Sampling the prior predictive distribution
4. Run MCMC
5. Sampling the posterior predictive distribution

Thanks to this thread I recently learned about the JointDistribution.sample_{distributions} member function and it's something I've really wanted for a while.

My workflow used to be as is described in the thread and I've been annoyed that I had to re-implement some of my model after obtaining posterior samples in order to do stuff like posterior predictive checks. Simply running model.sample() would sample new values from the prior of my parameters when I wanted to use the posterior samples which I had. An example of this is in my ordered logistic post.

Anyway, the value argument in sample_distributions handles all of this nicely, here's an example of how it works with a bernoulli glm:

1 Setup

import tensorflow as tf
import tensorflow_probability as tfp
import time as tm
import matplotlib.pyplot as plt
import numpy as np

plt.rcParams["figure.figsize"] = (12, 8)

tfd = tfp.distributions
tfl = tf.linalg

N = 10_000
P = 20

intercept_true = tfd.Normal(loc=0.666, scale=1.0).sample()
coefficients_true = tfd.Normal(loc=0.0, scale=3.14).sample(P)

x = tfd.Normal(loc=0.0, scale=1.0).sample([N, P])
y = tfd.Bernoulli(intercept_true + tfl.matvec(x, coefficients_true)).sample()

Root = tfd.JointDistributionCoroutine.Root

print(f"tf: {tf.__version__}")
print(f"tfp: {tfp.__version__}")

tf: 2.2.0-dev20200416
tfp: 0.11.0-dev20200416

2 Model definition

def model():
    intercept = yield Root(tfd.Normal(loc=0.0, scale=10.0))
    coefficients = yield Root(
        tfd.MultivariateNormalLinearOperator(scale=tfl.LinearOperatorIdentity(P))
    )
    y = yield tfd.Independent(
        tfd.Bernoulli(intercept[..., tf.newaxis] + tfl.matvec(x, coefficients)),
        reinterpreted_batch_ndims=1,
    )


joint_dist = tfd.JointDistributionCoroutine(model)

3 Sampling the prior predictive distribution

def plot_ppc(y, y_rep):
    sampled_counts = [np.bincount(sample, minlength=2) for sample in y_rep]
    observed_counts = np.bincount(y)

    sampled_mean = np.mean(sampled_counts, axis=0)
    sampled_quantiles = np.quantile(sampled_counts, q=[0.025, 0.975], axis=0)

    low_error = sampled_mean - sampled_quantiles[0]
    high_error = sampled_quantiles[1] - sampled_mean

    plt.bar(
        x=[0, 1], height=observed_counts, label="observed data", color="blue",
    )

    plt.errorbar(
        x=[0, 1],
        y=sampled_mean,
        yerr=[low_error, high_error],
        fmt="o",
        label="simulated data",
        color="red",
        capsize=5,
    )

    plt.legend(loc="upper left")
    plt.xlabel("y")
    plt.ylabel("frequency")


prior_intercept, prior_cofficients, prior_y = joint_dist.sample(1_000)

plot_ppc(y, prior_y)

4 Run MCMC

def target_log_prob_fn(*state):
    return joint_dist.log_prob(list(state) + [y])


def trace_fn(states, pkr):
    return (
        pkr.inner_results.target_log_prob,
        pkr.inner_results.leapfrogs_taken,
        pkr.inner_results.has_divergence,
        pkr.inner_results.energy,
        pkr.inner_results.log_accept_ratio,
    )


def step_size_setter_fn(pkr, new_step_size):
    return pkr._replace(step_size=new_step_size)


def step_size_getter_fn(pkr):
    return pkr.step_size


def log_accept_prob_getter_fn(pkr):
    return pkr.log_accept_ratio


num_chains = 6
initial_state = list(joint_dist.sample(num_chains)[:-1])
initial_step_size = [0.1] * len(initial_state)

nuts = tfp.mcmc.NoUTurnSampler(target_log_prob_fn, step_size=initial_step_size)

adaptive_nuts = tfp.mcmc.DualAveragingStepSizeAdaptation(
    inner_kernel=nuts,
    num_adaptation_steps=800,
    target_accept_prob=0.8,
    step_size_setter_fn=step_size_setter_fn,
    step_size_getter_fn=step_size_getter_fn,
    log_accept_prob_getter_fn=log_accept_prob_getter_fn,
)


@tf.function(autograph=False, experimental_compile=True)
def run_mcmc():
    return tfp.mcmc.sample_chain(
        num_results=1_000,
        current_state=initial_state,
        num_burnin_steps=1_000,
        kernel=adaptive_nuts,
        trace_fn=trace_fn,
    )


s = tm.time()
mcmc_samples, mcmc_stats = run_mcmc()
e = tm.time()

print(f"run_mcmc finished in {e - s:.2f} seconds")

run_mcmc finished in 28.60 seconds

5 Sampling the posterior predictive distribution

And here is the magic:

posterior_distributions, posterior_samples = joint_dist.sample_distributions(value=mcmc_samples + [None])

posterior_samples now contains the samples the MCMC routine gave us earlier for the intercept and coefficients parameters:

print(f"intercept: {np.allclose(posterior_samples[0], mcmc_samples[0])}")
print(f"coefficients: {np.allclose(posterior_samples[1], mcmc_samples[1])}")

intercept: True
coefficients: True

and we have sampled from our Bernoulli observation model using our posterior/MCMC samples, so the (very basic) posterior predictive check now looks like:

posterior_y = tf.reshape(posterior_samples[-1], [-1, N])

plot_ppc(y, posterior_y)

Additionally, we can also easily calculate useful stuff like the "log-likelihood" matrix needed for LOO:

log_lik = posterior_distributions[-1].distribution.log_prob(y)
print(log_lik.shape) # (draws, chains, num_observations)

(1000, 6, 10000)

It always annoyed me in stan when I had to re-write the observation model in the generated quantities block to calculate this, and the same for TensorFlow Probability until now.

Table of Contents

1 Setup

2 Model definition

3 Sampling the prior predictive distribution

4 Run MCMC

5 Sampling the posterior predictive distribution