Computational Tricks with Turing
(Non-Centered Parametrization
and QR Decomposition)
There are some computational tricks that we can employ with Turing. I will cover here two computational tricks:
QR Decomposition
Non-Centered Parametrization
QR Decomposition
Back in "Linear Algebra 101" we've learned that any matrix (even rectangular ones) can be factored into the product of two matrices:
: an orthogonal matrix (its columns are orthogonal unit vectors meaning .
: an upper triangular matrix.
This is commonly known as the QR Decomposition:
Let me show you an example with a random matrix :
A = rand(3, 2)
3×2 Matrix{Float64}:
0.706461 0.417284
0.695988 0.436497
0.878977 0.196846
Now let's factor A
using LinearAlgebra
's qr()
function:
using LinearAlgebra: qr, I
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor:
3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}:
-0.533108 -0.421251 -0.733719
-0.525205 -0.515123 0.677354
-0.663292 0.746455 0.053373
R factor:
2×2 Matrix{Float64}:
-1.32517 -0.582274
0.0 -0.253694
Notice that qr()
produced a tuple containing two matrices Q
and R
. Q
is a 3x3 orthogonal matrix. And R
is a 2x2 upper triangular matrix. So that (the transpose is equal the inverse):
Matrix(Q') ≈ Matrix(Q^-1)
true
Also note that (identity matrix):
Q' * Q ≈ I(3)
true
This is nice. But what can we do with QR decomposition? It can speed up Turing's sampling by a huge factor while also decorrelating the columns of , i.e. the independent variables. The orthogonal nature of QR decomposition alters the posterior's topology and makes it easier for HMC or other MCMC samplers to explore it. Let's see how fast we can get with QR decomposition. First, let's go back to the kidiq
example in 6. Bayesian Linear Regression:
using Turing
using LinearAlgebra: I
using Statistics: mean, std
using Random: seed!
seed!(123)
@model function linreg(X, y; predictors=size(X, 2))
#priors
α ~ Normal(mean(y), 2.5 * std(y))
β ~ filldist(TDist(3), predictors)
σ ~ Exponential(1)
#likelihood
return y ~ MvNormal(α .+ X * β, σ^2 * I)
end;
using DataFrames
using CSV
using HTTP
url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/kidiq.csv"
kidiq = CSV.read(HTTP.get(url).body, DataFrame)
X = Matrix(select(kidiq, Not(:kid_score)))
y = kidiq[:, :kid_score]
model = linreg(X, y)
chain = sample(model, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 10.19 seconds
Compute duration = 16.15 seconds
parameters = α, β[1], β[2], β[3], σ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 21.5919 8.7175 0.1378 0.2040 1753.4216 1.0017 108.5441
β[1] 2.0147 1.8092 0.0286 0.0452 1914.7422 1.0013 118.5305
β[2] 0.5808 0.0584 0.0009 0.0012 2148.1559 1.0006 132.9798
β[3] 0.2437 0.3056 0.0048 0.0070 2166.7021 1.0009 134.1279
σ 17.8928 0.5896 0.0093 0.0115 2708.8277 1.0001 167.6877
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 4.3464 15.7517 21.7247 27.2853 38.7344
β[1] -0.6530 0.7333 1.6501 3.0029 6.3849
β[2] 0.4687 0.5416 0.5794 0.6197 0.6980
β[3] -0.3336 0.0285 0.2439 0.4560 0.8316
σ 16.7749 17.4893 17.8613 18.2915 19.0887
See the wall duration in Turing's chain
: for me it took around 24 seconds.
Now let's us incorporate QR decomposition in the linear regression model. Here, I will use the "thin" instead of the "fat" QR, which scales the and matrices by a factor of where is the number of rows of . In practice it is better to implement the thin QR decomposition, which is to be preferred to the fat QR decomposition. It is numerically more stable. Mathematically, the thin QR decomposition is:
Then we can recover the original with:
In Turing, a model with QR decomposition would be the same linreg
but with a different X
matrix supplied, since it is a data transformation. First, we decompose your model data X
into Q
and R
:
Q, R = qr(X)
Q_ast = Matrix(Q) * sqrt(size(X, 1) - 1)
R_ast = R / sqrt(size(X, 1) - 1);
Then, we instantiate a model with Q
instead of X
and sample as you would:
model_qr = linreg(Q_ast, y)
chain_qr = sample(model_qr, NUTS(1_000, 0.65), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):
Iterations = 1001:1:2000
Number of chains = 4
Samples per chain = 1000
Wall duration = 5.16 seconds
Compute duration = 8.67 seconds
parameters = α, β[1], β[2], β[3], σ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 32.9480 7.7468 0.1225 0.2364 1152.1382 1.0028 132.9032
β[1] -49.9599 6.9387 0.1097 0.2156 1136.4348 1.0032 131.0918
β[2] 22.0431 3.5609 0.0563 0.1093 1155.7501 1.0035 133.3199
β[3] 0.2728 0.8819 0.0139 0.0237 1398.8210 1.0020 161.3590
σ 17.8491 0.6193 0.0098 0.0131 2365.8711 1.0004 272.9116
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 18.5380 27.7723 32.6057 37.9096 49.3423
β[1] -62.7857 -54.5494 -50.2855 -45.4741 -35.3443
β[2] 14.7913 19.7396 22.1681 24.4313 28.6965
β[3] -1.4185 -0.2770 0.2243 0.7787 2.1571
σ 16.6845 17.4242 17.8210 18.2614 19.0900
See the wall duration in Turing's chain_qr
: for me it took around 5 seconds. Much faster than the regular linreg
. Now we have to reconstruct our s:
betas = mapslices(
x -> R_ast^-1 * x, chain_qr[:, namesingroup(chain_qr, :β), :].value.data; dims=[2]
)
chain_beta = setrange(
Chains(betas, ["real_β[$i]" for i in 1:size(Q_ast, 2)]), 1_001:1:2_000
)
chain_qr_reconstructed = hcat(chain_beta, chain_qr)
Chains MCMC chain (1000×20×4 Array{Float64, 3}):
Iterations = 1001:1:2000
Number of chains = 4
Samples per chain = 1000
parameters = real_β[1], real_β[2], real_β[3], α, β[1], β[2], β[3], σ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat
Symbol Float64 Float64 Float64 Float64 Float64 Float64
real_β[1] 6.2660 2.2533 0.0356 0.0315 3970.3435 1.0002
real_β[2] 0.5045 0.0632 0.0010 0.0015 1846.1138 1.0017
real_β[3] -0.0666 0.2154 0.0034 0.0058 1398.8210 1.0020
α 32.9480 7.7468 0.1225 0.2364 1152.1382 1.0028
β[1] -49.9599 6.9387 0.1097 0.2156 1136.4348 1.0032
β[2] 22.0431 3.5609 0.0563 0.1093 1155.7501 1.0035
β[3] 0.2728 0.8819 0.0139 0.0237 1398.8210 1.0020
σ 17.8491 0.6193 0.0098 0.0131 2365.8711 1.0004
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
real_β[1] 1.9309 4.7406 6.2556 7.7621 10.6728
real_β[2] 0.3815 0.4635 0.5050 0.5455 0.6267
real_β[3] -0.5268 -0.1902 -0.0548 0.0676 0.3464
α 18.5380 27.7723 32.6057 37.9096 49.3423
β[1] -62.7857 -54.5494 -50.2855 -45.4741 -35.3443
β[2] 14.7913 19.7396 22.1681 24.4313 28.6965
β[3] -1.4185 -0.2770 0.2243 0.7787 2.1571
σ 16.6845 17.4242 17.8210 18.2614 19.0900
Non-Centered Parametrization
Now let's us explore Non-Centered Parametrization (NCP). This is useful when the posterior's topology is very difficult to explore as has regions where HMC sampler has to change the step size and the factor. This is I've showed one of the most infamous case in 5. Markov Chain Monte Carlo (MCMC): Neal's Funnel (Neal, 2003):
using CairoMakie
using Distributions
funnel_y = rand(Normal(0, 3), 10_000)
funnel_x = rand(Normal(), 10_000) .* exp.(funnel_y / 2)
f, ax, s = scatter(
funnel_x,
funnel_y;
color=(:steelblue, 0.3),
axis=(; xlabel=L"X", ylabel=L"Y", limits=(-100, 100, nothing, nothing)),
)
Here we see that in upper part of the funnel HMC has to take few steps and be more liberal with the factor. But, the opposite is in the lower part of the funnel: way more steps and be more conservative with the factor.
To see the devil's funnel (how it is known in some Bayesian circles) in action, let's code it in Turing and then sample:
@model function funnel()
y ~ Normal(0, 3)
return x ~ Normal(0, exp(y / 2))
end
chain_funnel = sample(funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 5.03 seconds
Compute duration = 9.14 seconds
parameters = y, x
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
y 0.9349 2.5471 0.0403 0.1919 132.2382 1.0329 14.4665
x -0.3946 6.4402 0.1018 0.3474 280.4549 1.0046 30.6810
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
y -4.0096 -0.8760 0.7658 2.6657 6.0989
x -14.0106 -1.1051 -0.0337 0.7439 10.4440
Wow, take a look at those rhat
values... That sucks: all are above 1.01
even with 4 parallel chains with 1,000 iterations!
How do we deal with that? We reparametrize! Note that we can add two normal distributions in the following manner:
where the standard normal is the normal with mean and standard deviation . This is why is called Non-Centered Parametrization because we "decouple" the parameters and reconstruct them before.
@model function ncp_funnel()
x̃ ~ Normal()
ỹ ~ Normal()
y = 3.0 * ỹ # implies y ~ Normal(0, 3)
return x = exp(y / 2) * x̃ # implies x ~ Normal(0, exp(y / 2))
end
chain_ncp_funnel = sample(ncp_funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 4.06 seconds
Compute duration = 7.74 seconds
parameters = x̃, ỹ
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
x̃ 0.0130 0.9902 0.0157 0.0182 3639.7339 1.0012 470.0677
ỹ -0.0447 0.9978 0.0158 0.0160 3864.2412 1.0000 499.0625
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
x̃ -1.9084 -0.6314 0.0152 0.6657 1.9736
ỹ -2.0144 -0.7034 -0.0420 0.6243 1.9661
Much better now: all rhat
are well below 1.01
(or below 0.99
).
How we would implement this a real-world model in Turing? Let's go back to the cheese
random-intercept model in 10. Multilevel Models (a.k.a. Hierarchical Models). Here was the approach that we took, also known as Centered Parametrization (CP):
@model function varying_intercept(
X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
#priors
α ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept
β ~ filldist(Normal(0, 2), predictors) # population-level coefficients
σ ~ Exponential(std(y)) # residual SD
#prior for variance of random intercepts
#usually requires thoughtful specification
τ ~ truncated(Cauchy(0, 2); lower=0) # group-level SDs intercepts
αⱼ ~ filldist(Normal(0, τ), n_gr) # CP group-level intercepts
#likelihood
ŷ = α .+ X * β .+ αⱼ[idx]
return y ~ MvNormal(ŷ, σ^2 * I)
end;
To perform a Non-Centered Parametrization (NCP) in this model we do as following:
@model function varying_intercept_ncp(
X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
#priors
α ~ Normal(mean(y), 2.5 * std(y)) # population-level intercept
β ~ filldist(Normal(0, 2), predictors) # population-level coefficients
σ ~ Exponential(std(y)) # residual SD
#prior for variance of random intercepts
#usually requires thoughtful specification
τ ~ truncated(Cauchy(0, 2); lower=0) # group-level SDs intercepts
zⱼ ~ filldist(Normal(0, 1), n_gr) # NCP group-level intercepts
#likelihood
ŷ = α .+ X * β .+ zⱼ[idx] .* τ
return y ~ MvNormal(ŷ, σ^2 * I)
end;
Here we are using a NCP with the zⱼ
s following a standard normal and we reconstruct the group-level intercepts by multiplying the zⱼ
s by τ
. Since the original αⱼ
s had a prior centered on 0 with standard deviation τ
, we only have to use the multiplication by τ
to get back the αⱼ
s.
Now let's see how NCP compares to the CP. First, let's redo our CP hierarchical model:
url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv"
cheese = CSV.read(HTTP.get(url).body, DataFrame)
for c in unique(cheese[:, :cheese])
cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end
cheese[:, :background_int] = map(cheese[:, :background]) do b
if b == "rural"
1
elseif b == "urban"
2
else
missing
end
end
X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];
model_cp = varying_intercept(X, idx, y)
chain_cp = sample(model_cp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 10.52 seconds
Compute duration = 17.18 seconds
parameters = α, β[1], β[2], β[3], β[4], σ, τ, αⱼ[1], αⱼ[2]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 70.6428 5.0487 0.0798 0.1737 770.1477 1.0046 44.8360
β[1] 2.9791 1.3144 0.0208 0.0258 2172.3735 1.0004 126.4699
β[2] -10.5427 1.3601 0.0215 0.0303 2084.5327 1.0002 121.3560
β[3] 6.5850 1.3731 0.0217 0.0292 2098.4036 1.0007 122.1636
β[4] 1.1433 1.3117 0.0207 0.0299 2069.7945 1.0003 120.4980
σ 7.3996 0.4542 0.0072 0.0080 2613.2512 1.0008 152.1366
τ 6.3827 8.5037 0.1345 0.3209 800.8580 1.0184 46.6239
αⱼ[1] -3.3743 4.9339 0.0780 0.1727 757.3016 1.0050 44.0881
αⱼ[2] 3.6766 4.9634 0.0785 0.1742 748.6351 1.0056 43.5836
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 60.7302 68.3134 70.7241 73.0491 80.2703
β[1] 0.3954 2.1190 2.9438 3.8796 5.5242
β[2] -13.2151 -11.4684 -10.5399 -9.6328 -7.8694
β[3] 3.9123 5.6639 6.5755 7.4932 9.2530
β[4] -1.4944 0.2537 1.1339 2.0287 3.6481
σ 6.5570 7.0811 7.3764 7.6938 8.3492
τ 1.8508 3.2115 4.5971 6.8672 20.0755
αⱼ[1] -12.7622 -5.6539 -3.3432 -1.1848 6.2812
αⱼ[2] -5.7783 1.3861 3.5480 5.7955 13.6332
Now let's do the NCP hierarchical model:
model_ncp = varying_intercept_ncp(X, idx, y)
chain_ncp = sample(model_ncp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):
Iterations = 501:1:1500
Number of chains = 4
Samples per chain = 1000
Wall duration = 15.18 seconds
Compute duration = 27.18 seconds
parameters = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 70.6381 3.9661 0.0627 0.2270 276.3623 1.0032 10.1686
β[1] 3.0184 1.3067 0.0207 0.0343 1430.7232 1.0010 52.6427
β[2] -10.5484 1.3543 0.0214 0.0345 1795.1054 1.0000 66.0499
β[3] 6.5794 1.3084 0.0207 0.0366 1155.6678 1.0000 42.5222
β[4] 1.1558 1.3067 0.0207 0.0347 1472.3222 1.0000 54.1733
σ 7.3846 0.4668 0.0074 0.0103 2309.3984 1.0003 84.9731
τ 5.0347 2.4221 0.0383 0.1032 576.5776 1.0026 21.2149
zⱼ[1] -0.8461 0.8249 0.0130 0.0371 543.0646 1.0028 19.9818
zⱼ[2] 0.8675 0.8002 0.0127 0.0256 989.2219 1.0007 36.3979
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 61.2831 68.3926 70.8893 73.1427 78.1953
β[1] 0.5016 2.1144 3.0451 3.9067 5.4755
β[2] -13.1368 -11.4724 -10.5733 -9.6324 -7.7688
β[3] 4.0680 5.6634 6.5795 7.4830 9.0380
β[4] -1.4639 0.3336 1.1558 2.0353 3.7254
σ 6.5451 7.0536 7.3621 7.6882 8.3494
τ 1.8278 3.2063 4.4428 6.4645 11.0497
zⱼ[1] -2.5191 -1.3926 -0.8403 -0.2763 0.7263
zⱼ[2] -0.6093 0.2939 0.8572 1.4003 2.4570
Notice that some models are better off with a standard Centered Parametrization (as is our cheese
case here). While others are better off with a Non-Centered Parametrization. But now you know how to apply both parametrizations in Turing. Before we conclude, we need to recover our original αⱼ
s. We can do this by multiplying zⱼ[idx] .* τ
:
τ = summarystats(chain_ncp)[:τ, :mean]
αⱼ = mapslices(
x -> x * τ, chain_ncp[:, namesingroup(chain_ncp, :zⱼ), :].value.data; dims=[2]
)
chain_ncp_reconstructed = hcat(
MCMCChains.resetrange(chain_ncp), Chains(αⱼ, ["αⱼ[$i]" for i in 1:length(unique(idx))])
)
Chains MCMC chain (1000×23×4 Array{Float64, 3}):
Iterations = 1:1000
Number of chains = 4
Samples per chain = 1000
Wall duration = 15.18 seconds
Compute duration = 27.18 seconds
parameters = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2], αⱼ[1], αⱼ[2]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
α 70.6381 3.9661 0.0627 0.2270 276.3623 1.0032 10.1686
β[1] 3.0184 1.3067 0.0207 0.0343 1430.7232 1.0010 52.6427
β[2] -10.5484 1.3543 0.0214 0.0345 1795.1054 1.0000 66.0499
β[3] 6.5794 1.3084 0.0207 0.0366 1155.6678 1.0000 42.5222
β[4] 1.1558 1.3067 0.0207 0.0347 1472.3222 1.0000 54.1733
σ 7.3846 0.4668 0.0074 0.0103 2309.3984 1.0003 84.9731
τ 5.0347 2.4221 0.0383 0.1032 576.5776 1.0026 21.2149
zⱼ[1] -0.8461 0.8249 0.0130 0.0371 543.0646 1.0028 19.9818
zⱼ[2] 0.8675 0.8002 0.0127 0.0256 989.2219 1.0007 36.3979
αⱼ[1] -4.2599 4.1534 0.0657 0.1869 543.0646 1.0028 19.9818
αⱼ[2] 4.3677 4.0290 0.0637 0.1291 989.2219 1.0007 36.3979
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
α 61.2831 68.3926 70.8893 73.1427 78.1953
β[1] 0.5016 2.1144 3.0451 3.9067 5.4755
β[2] -13.1368 -11.4724 -10.5733 -9.6324 -7.7688
β[3] 4.0680 5.6634 6.5795 7.4830 9.0380
β[4] -1.4639 0.3336 1.1558 2.0353 3.7254
σ 6.5451 7.0536 7.3621 7.6882 8.3494
τ 1.8278 3.2063 4.4428 6.4645 11.0497
zⱼ[1] -2.5191 -1.3926 -0.8403 -0.2763 0.7263
zⱼ[2] -0.6093 0.2939 0.8572 1.4003 2.4570
αⱼ[1] -12.6830 -7.0114 -4.2305 -1.3910 3.6565
αⱼ[2] -3.0677 1.4797 4.3158 7.0499 12.3705
References
Neal, Radford M. (2003). Slice Sampling. The Annals of Statistics, 31(3), 705–741. Retrieved from https://www.jstor.org/stable/3448413