Maximum likelihood estimation with tensorflow probability and stan take 2

After a previous post there has been some discussion on the stan forums so I thought I would have another bash at seeing how fast I can make tensorflow and stan find maximum likelihood estimates for a fairly large problem.

I'm using specialised functions for fitting GLMs, namely bernoulli_logit_glm in stan and tfp.glm.fit in tensorflow. Both of these make the code faster and easier to understand than the previous post. Win.

In order to make the tensorflow version as fast as possible, I call it twice, the first call creates a graph and I don't want to time that:

import tensorflow_probability as tfp
import tensorflow as tf
import time as tm

tfd = tfp.distributions
tfl = tf.linalg

N = 1_000_000
P = 250


@tf.function(
    autograph=False,
    input_signature=[
        tf.TensorSpec(shape=tf.TensorShape([N, P + 1]), dtype=tf.float32),
        tf.TensorSpec(shape=tf.TensorShape([N]), dtype=tf.float32),
    ],
)
def fit(model_matrix, response):
    return tfp.glm.fit(
        model_matrix=model_matrix, response=response, model=tfp.glm.Bernoulli()
    )


tf.random.set_seed(123)

alpha_true = tfd.Normal(loc=0.666, scale=1.0).sample()
beta_true = tfd.Normal(loc=0.0, scale=3.14).sample(P)

x = tfd.Normal(loc=0.0, scale=1.0).sample([N, P])
y = tfd.Bernoulli(alpha_true + tfl.matvec(x, beta_true), dtype=x.dtype).sample()

model_matrix = tf.concat([tf.ones([N, 1]), x], axis=1)

_ = fit(model_matrix, y)

start = tm.time()
mle, linear_response, is_converged, num_iterations = fit(model_matrix, y)
end = tm.time()

print(f"tf version: {tf.__version__}, tfp version: {tfp.__version__}")
print(f"tf took: {end - start:.4f} seconds")
print(f"converged: {is_converged}")
print(f"iterations: {num_iterations}")

tf version: 2.0.0-dev20190724, tfp version: 0.8.0-dev20190724
tf took: 2.1767 seconds
converged: True
iterations: 15

Here's the rstan version:

data {
    int<lower = 0> N;
    int<lower = 0> P;
    matrix[N, P] x;
    int<lower = 0, upper = 1> y[N];
}

parameters {
    real alpha;
    vector[P] beta;
}

model {
    y ~ bernoulli_logit_glm_lpmf(x, alpha, beta);
}

import numpy as np

np.savetxt("misc/x.csv", x.numpy(), delimiter=",")
np.savetxt("misc/y.csv", y.numpy(), delimiter=",")

library(rstan)
library(data.table)
library(tictoc)

rstan_options(auto_write = TRUE)

x <- as.matrix(fread("misc/x.csv", header = FALSE))
y <- fread("misc/y.csv", header = FALSE)[[1]]

stan_data <- list(N = nrow(x), P = ncol(x), x = x, y = y)

model <- stan_model(stan_file)

tic(paste("rstan version:", packageVersion("rstan")))
fit <- optimizing(model, data = stan_data, hessian = FALSE)
toc()

fwrite(data.table(fit$par), "misc/rstan-take-2.csv", col.names = FALSE)

rstan version: 2.19.2: 11.213 sec elapsed

Which is pretty impressive running on a single core!

As before, let's check the results:

import pandas as pd

true_vector = np.concatenate(
    [np.expand_dims(alpha_true.numpy(), 0), np.squeeze(beta_true.numpy())], 0
)

estimates = pd.DataFrame(
    {
        "true": true_vector,
        "tf2": mle.numpy(),
        "rstan": np.genfromtxt("misc/rstan-take-2.csv", delimiter=","),
    },
    index=["alpha", *[f"beta[{i}]" for i in range(P)]],
)

estimates.sample(10).round(6)

	true	tf2	rstan
beta[168]	-1.00876	-1.04356	-1.02366
beta[204]	2.89247	2.93403	2.87811
beta[148]	3.16335	3.23323	3.17151
beta[212]	0.851386	0.871118	0.854402
beta[207]	-0.999736	-1.03336	-1.0136
beta[87]	-6.39461	-6.52071	-6.3964
beta[5]	2.60261	2.6781	2.62696
beta[209]	3.11193	3.18115	3.12059
beta[123]	2.1765	2.22263	2.18031
beta[75]	-4.83231	-4.92313	-4.82925