Computational Tricks with Turing
(Non-Centered Parametrization
and QR Decomposition)

There are some computational tricks that we can employ with Turing. I will cover here two computational tricks:

  1. QR Decomposition

  2. Non-Centered Parametrization

QR Decomposition

Back in "Linear Algebra 101" we've learned that any matrix (even rectangular ones) can be factored into the product of two matrices:

  • Q\mathbf{Q}: an orthogonal matrix (its columns are orthogonal unit vectors meaning QT=Q1)\mathbf{Q}^T = \mathbf{Q}^{-1}).

  • R\mathbf{R}: an upper triangular matrix.

This is commonly known as the QR Decomposition:

A=QR \mathbf{A} = \mathbf{Q} \cdot \mathbf{R}

Let me show you an example with a random matrix AR3×2\mathbf{A} \in \mathbb{R}^{3 \times 2}:

A = rand(3, 2)
3×2 Matrix{Float64}:
 0.587585  0.858524
 0.809433  0.375417
 0.223841  0.676707

Now let's factor A using LinearAlgebra's qr() function:

using LinearAlgebra: qr, I
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}}
Q factor:
3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}}:
 -0.573276   0.474808  -0.667767
 -0.789722  -0.53741    0.295855
 -0.21839    0.696957   0.68305
R factor:
2×2 Matrix{Float64}:
 -1.02496  -0.936432
  0.0       0.677517

Notice that qr() produced a tuple containing two matrices Q and R. Q is a 3x3 orthogonal matrix. And R is a 2x2 upper triangular matrix. So that QT=Q1\mathbf{Q}^T = \mathbf{Q}^{-1} (the transpose is equal the inverse):

Matrix(Q') ≈ Matrix(Q^-1)
true

Also note that QTQ1=I\mathbf{Q}^T \cdot \mathbf{Q}^{-1} = \mathbf{I} (identity matrix):

Q' * Q ≈ I(3)
true

This is nice. But what can we do with QR decomposition? It can speed up Turing's sampling by a huge factor while also decorrelating the columns of X\mathbf{X}, i.e. the independent variables. The orthogonal nature of QR decomposition alters the posterior's topology and makes it easier for HMC or other MCMC samplers to explore it. Let's see how fast we can get with QR decomposition. First, let's go back to the kidiq example in 6. Bayesian Linear Regression:

using Turing
using LinearAlgebra: I
using Statistics: mean, std
using Random: seed!
seed!(123)

@model function linreg(X, y; predictors=size(X, 2))
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))
    β ~ filldist(TDist(3), predictors)
    σ ~ Exponential(1)

    #likelihood
    y ~ MvNormal(α .+ X * β, σ^2 * I)
end;

using DataFrames, CSV, HTTP

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/kidiq.csv"
kidiq = CSV.read(HTTP.get(url).body, DataFrame)
X = Matrix(select(kidiq, Not(:kid_score)))
y = kidiq[:, :kid_score]
model = linreg(X, y)
chain = sample(model, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 18.94 seconds
Compute duration  = 31.07 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α   21.4725    8.4712     0.1339    0.2235   1603.8646    1.0035       51.6210
        β[1]    2.0972    1.8990     0.0300    0.0512   1552.1325    1.0011       49.9560
        β[2]    0.5794    0.0590     0.0009    0.0013   2016.1323    1.0010       64.8900
        β[3]    0.2528    0.3061     0.0048    0.0075   1944.6793    1.0022       62.5903
           σ   17.8914    0.5927     0.0094    0.0091   3380.2482    0.9994      108.7946

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           α    4.7211   15.6468   21.5424   27.2639   37.5835
        β[1]   -0.6806    0.7088    1.7451    3.1042    6.7604
        β[2]    0.4629    0.5393    0.5792    0.6197    0.6917
        β[3]   -0.3338    0.0485    0.2509    0.4514    0.8594
           σ   16.8026   17.4696   17.8777   18.2930   19.0610

See the wall duration in Turing's chain: for me it took around 24 seconds.

Now let's us incorporate QR decomposition in the linear regression model. Here, I will use the "thin" instead of the "fat" QR, which scales the Q\mathbf{Q} and R\mathbf{R} matrices by a factor of n1\sqrt{n-1} where nn is the number of rows of X\mathbf{X}. In practice it is better to implement the thin QR decomposition, which is to be preferred to the fat QR decomposition. It is numerically more stable. Mathematically, the thin QR decomposition is:

x=QRQ=Qn1R=1n1Rμ=α+Xβ+σ=α+QRβ+σ=α+Q(Rβ)+σ=α+Qβundefined+σ \begin{aligned} x &= \mathbf{Q}^* \mathbf{R}^* \\ \mathbf{Q}^* &= \mathbf{Q} \cdot \sqrt{n - 1} \\ \mathbf{R}^* &= \frac{1}{\sqrt{n - 1}} \cdot \mathbf{R}\\ \boldsymbol{\mu} &= \alpha + \mathbf{X} \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \mathbf{R}^* \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot (\mathbf{R}^* \cdot \boldsymbol{\beta}) + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \widetilde{\boldsymbol{\beta}} + \sigma \\ \end{aligned}

Then we can recover the original β\boldsymbol{\beta} with:

β=R1βundefined \boldsymbol{\beta} = \mathbf{R}^{*-1} \cdot \widetilde{\boldsymbol{\beta}}

In Turing, a model with QR decomposition would be the same linreg but with a different X matrix supplied, since it is a data transformation. First, we decompose your model data X into Q and R:

Q, R = qr(X)
Q_ast = Matrix(Q) * sqrt(size(X, 1) - 1)
R_ast = R / sqrt(size(X, 1) - 1);

Then, we instantiate a model with Q instead of X and sample as you would:

model_qr = linreg(Q_ast, y)
chain_qr = sample(model_qr, NUTS(1_000, 0.65), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):

Iterations        = 1001:1:2000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 10.98 seconds
Compute duration  = 16.38 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    33.1622    8.2304     0.1301    0.2720    851.6991    1.0018       51.9931
        β[1]   -49.7989    7.3689     0.1165    0.2442    853.9196    1.0019       52.1287
        β[2]    21.9550    3.7400     0.0591    0.1253    848.4503    1.0021       51.7948
        β[3]     0.2958    0.9490     0.0150    0.0270   1138.4049    1.0020       69.4954
           σ    17.8884    0.5846     0.0092    0.0126   2268.6950    1.0009      138.4955

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

           α    17.5711    27.8631    33.0967    38.1717    49.8906
        β[1]   -64.0312   -54.5153   -49.7710   -45.3874   -34.9937
        β[2]    14.5360    19.7261    21.9363    24.3812    29.0539
        β[3]    -1.4499    -0.2776     0.2512     0.8294     2.2582
           σ    16.7954    17.4909    17.8674    18.2864    19.0560

See the wall duration in Turing's chain_qr: for me it took around 5 seconds. Much faster than the regular linreg. Now we have to reconstruct our β\boldsymbol{\beta}s:

betas = mapslices(x -> R_ast^-1 * x, chain_qr[:, namesingroup(chain_qr, :β), :].value.data, dims=[2])
chain_beta = setrange(Chains(betas, ["real_β[$i]" for i in 1:size(Q_ast, 2)]), 1_001:1:3_000)
chain_qr_reconstructed = hcat(chain_beta, chain_qr)
length of `range` (2000) is not equal to the number of iterations (1000)

Non-Centered Parametrization

Now let's us explore Non-Centered Parametrization (NCP). This is useful when the posterior's topology is very difficult to explore as has regions where HMC sampler has to change the step size LL and the ϵ\epsilon factor. This is I've showed one of the most infamous case in 5. Markov Chain Monte Carlo (MCMC): Neal's Funnel (Neal, 2003):

using StatsPlots, Distributions, LaTeXStrings
funnel_y = rand(Normal(0, 3), 10_000)
funnel_x = rand(Normal(), 10_000) .* exp.(funnel_y / 2)

scatter((funnel_x, funnel_y),
    label=false, mc=:steelblue, ma=0.3,
    xlabel=L"X", ylabel=L"Y",
    xlims=(-100, 100))

Neal's Funnel

Here we see that in upper part of the funnel HMC has to take few steps LL and be more liberal with the ϵ\epsilon factor. But, the opposite is in the lower part of the funnel: way more steps LL and be more conservative with the ϵ\epsilon factor.

To see the devil's funnel (how it is known in some Bayesian circles) in action, let's code it in Turing and then sample:

@model function funnel()
    y ~ Normal(0, 3)
    x ~ Normal(0, exp(y / 2))
end

chain_funnel = sample(funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 7.06 seconds
Compute duration  = 12.91 seconds
parameters        = y, x
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64       Float64

           y    0.2689    2.4962     0.0395    0.2120    78.7145    1.0446        6.0958
           x   -0.2059    4.3276     0.0684    0.2203   358.9343    1.0030       27.7964

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           y   -4.9008   -1.6608    0.1332    2.0066    5.3659
           x   -8.7989   -0.5876   -0.0151    0.6030    6.5841

Wow, take a look at those rhat values... That sucks: all are above 1.01 even with 4 parallel chains with 1,000 iterations!

How do we deal with that? We reparametrize! Note that we can add two normal distributions in the following manner:

Normal(μ,σ)=Standard Normalσ+μ \text{Normal}(\mu, \sigma) = \text{Standard Normal} \cdot \sigma + \mu

where the standard normal is the normal with mean μ=0\mu = 0 and standard deviation σ=1\sigma = 1. This is why is called Non-Centered Parametrization because we "decouple" the parameters and reconstruct them before.

@model function ncp_funnel()
    x̃ ~ Normal()
    ỹ ~ Normal()
    y = 3.0 * ỹ         # implies y ~ Normal(0, 3)
    x = exp(y / 2) * x̃  # implies x ~ Normal(0, exp(y / 2))
end

chain_ncp_funnel = sample(ncp_funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 6.76 seconds
Compute duration  = 12.61 seconds
parameters        = x̃, ỹ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64

           x̃    0.0331    1.0038     0.0159    0.0144   4361.0430    0.9998      345.8675
           ỹ    0.0063    0.9731     0.0154    0.0149   4152.5341    1.0005      329.3310

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           x̃   -1.9074   -0.6348    0.0358    0.7130    2.0217
           ỹ   -1.9087   -0.6352   -0.0096    0.6453    1.9045

Much better now: all rhat are well below 1.01 (or below 0.99).

How we would implement this a real-world model in Turing? Let's go back to the cheese random-intercept model in 10. Multilevel Models (a.k.a. Hierarchical Models). Here was the approach that we took, also known as Centered Parametrization (CP):

@model function varying_intercept(X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2))
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(1 / std(y))             # residual SD
    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2); lower=0)    # group-level SDs intercepts
    αⱼ ~ filldist(Normal(0, τ), n_gr)       # CP group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ αⱼ[idx]
    y ~ MvNormal(ŷ, σ^2 * I)
end;

To perform a Non-Centered Parametrization (NCP) in this model we do as following:

@model function varying_intercept_ncp(X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2))
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(1 / std(y))             # residual SD

    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2); lower=0)   # group-level SDs intercepts
    zⱼ ~ filldist(Normal(0, 1), n_gr)      # NCP group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ zⱼ[idx] .* τ
    y ~ MvNormal(ŷ, σ^2 * I)
end;

Here we are using a NCP with the zⱼs following a standard normal and we reconstruct the group-level intercepts by multiplying the zⱼs by τ. Since the original αⱼs had a prior centered on 0 with standard deviation τ, we only have to use the multiplication by τ to get back the αⱼs.

Now let's see how NCP compares to the CP. First, let's redo our CP hierarchical model:

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv"
cheese = CSV.read(HTTP.get(url).body, DataFrame)

for c in unique(cheese[:, :cheese])
    cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end

cheese[:, :background_int] = map(cheese[:, :background]) do b
    b == "rural" ? 1 :
    b == "urban" ? 2 : missing
end

X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];

model_cp = varying_intercept(X, idx, y)
chain_cp = sample(model_cp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 14.1 seconds
Compute duration  = 27.06 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    70.6589    4.9930     0.0789    0.2182    556.9945    1.0058       20.5860
        β[1]     3.2302    1.2695     0.0201    0.0291   1960.5738    1.0018       72.4609
        β[2]   -11.6001    1.2470     0.0197    0.0291   1965.9459    1.0015       72.6594
        β[3]     7.1584    1.2247     0.0194    0.0285   2001.6490    1.0017       73.9790
        β[4]     1.1940    1.2255     0.0194    0.0288   1812.6376    1.0014       66.9933
           σ     6.0039    0.2715     0.0043    0.0050   3466.5880    0.9997      128.1217
           τ     6.3320    6.0941     0.0964    0.1936   1020.9959    1.0034       37.7350
       αⱼ[1]    -3.4079    4.9026     0.0775    0.2140    548.9518    1.0061       20.2887
       αⱼ[2]     3.7879    4.9209     0.0778    0.2162    544.9800    1.0068       20.1419

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%     97.5%
      Symbol    Float64    Float64    Float64    Float64   Float64

           α    59.0962    68.3422    70.7057    73.1201   80.1042
        β[1]     0.7913     2.3715     3.2387     4.0770    5.7145
        β[2]   -14.0651   -12.4275   -11.6126   -10.7751   -9.1552
        β[3]     4.7643     6.3444     7.1584     8.0039    9.5094
        β[4]    -1.2806     0.3902     1.2131     2.0047    3.5834
           σ     5.4983     5.8172     5.9975     6.1750    6.5576
           τ     1.9093     3.2499     4.6306     7.2682   19.8367
       αⱼ[1]   -13.2132    -5.7270    -3.4215    -1.2137    7.9739
       αⱼ[2]    -5.9677     1.3776     3.6508     5.9295   15.2966

Now let's do the NCP hierarchical model:

model_ncp = varying_intercept_ncp(X, idx, y)
chain_ncp = sample(model_ncp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 23.52 seconds
Compute duration  = 45.39 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse        ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64    Float64   Float64       Float64

           α    66.6729    8.2016     0.1297    0.9474    10.5950    1.8941        0.2334
        β[1]     3.4402    1.2160     0.0192    0.0590   101.2901    1.0522        2.2317
        β[2]   -11.4689    1.2415     0.0196    0.0599   107.3475    1.0445        2.3652
        β[3]     7.1476    1.1527     0.0182    0.0444   549.3184    1.0157       12.1033
        β[4]     1.1758    1.1634     0.0184    0.0448   436.9062    1.0084        9.6265
           σ     5.9591    0.2607     0.0041    0.0138   119.5446    1.0302        2.6340
           τ     6.5594    3.5619     0.0563    0.3580    16.2915    1.3504        0.3590
       zⱼ[1]    -0.3813    1.0916     0.0173    0.1059    15.1939    1.4208        0.3348
       zⱼ[2]     1.1008    0.7923     0.0125    0.0535    32.2106    1.1630        0.7097

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%     97.5%
      Symbol    Float64    Float64    Float64    Float64   Float64

           α    48.3094    64.6642    69.4575    72.0048   77.0475
        β[1]     0.9278     2.6142     3.6063     4.2625    5.6648
        β[2]   -14.0039   -12.2803   -11.3668   -10.6277   -9.2489
        β[3]     4.9071     6.4296     7.1147     7.8288    9.5011
        β[4]    -1.1389     0.4485     1.1551     1.9180    3.5224
           σ     5.5047     5.7746     5.9453     6.1215    6.5157
           τ     1.9838     3.5771     5.4532     9.7781   13.1560
       zⱼ[1]    -2.3506    -1.1482    -0.4922     0.3572    1.5519
       zⱼ[2]    -0.4172     0.5059     1.1135     1.6916    2.3822

Notice that some models are better off with a standard Centered Parametrization (as is our cheese case here). While others are better off with a Non-Centered Parametrization. But now you know how to apply both parametrizations in Turing. Before we conclude, we need to recover our original αⱼs. We can do this by multiplying zⱼ[idx] .* τ:

τ = summarystats(chain_ncp)[:τ, :mean]
αⱼ = mapslices(x -> x * τ, chain_ncp[:, namesingroup(chain_ncp, :zⱼ), :].value.data, dims=[2])
chain_ncp_reconstructed = hcat(MCMCChains.resetrange(chain_ncp), Chains(αⱼ, ["αⱼ[$i]" for i in 1:length(unique(idx))]))
Chains MCMC chain (1000×23×4 Array{Float64, 3}):

Iterations        = 1:1000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 23.52 seconds
Compute duration  = 45.39 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2], αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse        ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64    Float64   Float64       Float64

           α    66.6729    8.2016     0.1297    0.9474    10.5950    1.8941        0.2334
        β[1]     3.4402    1.2160     0.0192    0.0590   101.2901    1.0522        2.2317
        β[2]   -11.4689    1.2415     0.0196    0.0599   107.3475    1.0445        2.3652
        β[3]     7.1476    1.1527     0.0182    0.0444   549.3184    1.0157       12.1033
        β[4]     1.1758    1.1634     0.0184    0.0448   436.9062    1.0084        9.6265
           σ     5.9591    0.2607     0.0041    0.0138   119.5446    1.0302        2.6340
           τ     6.5594    3.5619     0.0563    0.3580    16.2915    1.3504        0.3590
       zⱼ[1]    -0.3813    1.0916     0.0173    0.1059    15.1939    1.4208        0.3348
       zⱼ[2]     1.1008    0.7923     0.0125    0.0535    32.2106    1.1630        0.7097
       αⱼ[1]    -2.5012    7.1601     0.1132    0.6944    15.1939    1.4208        0.3348
       αⱼ[2]     7.2205    5.1972     0.0822    0.3507    32.2106    1.1630        0.7097

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%     97.5%
      Symbol    Float64    Float64    Float64    Float64   Float64

           α    48.3094    64.6642    69.4575    72.0048   77.0475
        β[1]     0.9278     2.6142     3.6063     4.2625    5.6648
        β[2]   -14.0039   -12.2803   -11.3668   -10.6277   -9.2489
        β[3]     4.9071     6.4296     7.1147     7.8288    9.5011
        β[4]    -1.1389     0.4485     1.1551     1.9180    3.5224
           σ     5.5047     5.7746     5.9453     6.1215    6.5157
           τ     1.9838     3.5771     5.4532     9.7781   13.1560
       zⱼ[1]    -2.3506    -1.1482    -0.4922     0.3572    1.5519
       zⱼ[2]    -0.4172     0.5059     1.1135     1.6916    2.3822
       αⱼ[1]   -15.4184    -7.5313    -3.2287     2.3427   10.1797
       αⱼ[2]    -2.7367     3.3182     7.3039    11.0962   15.6256

References

Neal, Radford M. (2003). Slice Sampling. The Annals of Statistics, 31(3), 705–741. Retrieved from https://www.jstor.org/stable/3448413