Computational Tricks with Turing
(Non-Centered Parametrization
and QR Decomposition)

There are some computational tricks that we can employ with Turing. I will cover here two computational tricks:

  1. QR Decomposition

  2. Non-Centered Parametrization

QR Decomposition

Back in "Linear Algebra 101" we've learned that any matrix (even rectangular ones) can be factored into the product of two matrices:

  • Q\mathbf{Q}: an orthogonal matrix (its columns are orthogonal unit vectors meaning QT=Q1)\mathbf{Q}^T = \mathbf{Q}^{-1}).

  • R\mathbf{R}: an upper triangular matrix.

This is commonly known as the QR Decomposition:

A=QR \mathbf{A} = \mathbf{Q} \cdot \mathbf{R}

Let me show you an example with a random matrix AR3×2\mathbf{A} \in \mathbb{R}^{3 \times 2}:

A = rand(3, 2)
3×2 Matrix{Float64}:
 0.706461  0.417284
 0.695988  0.436497
 0.878977  0.196846

Now let's factor A using LinearAlgebra's qr() function:

using LinearAlgebra: qr, I
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor:
3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}:
 -0.533108  -0.421251  -0.733719
 -0.525205  -0.515123   0.677354
 -0.663292   0.746455   0.053373
R factor:
2×2 Matrix{Float64}:
 -1.32517  -0.582274
  0.0      -0.253694

Notice that qr() produced a tuple containing two matrices Q and R. Q is a 3x3 orthogonal matrix. And R is a 2x2 upper triangular matrix. So that QT=Q1\mathbf{Q}^T = \mathbf{Q}^{-1} (the transpose is equal the inverse):

Matrix(Q') ≈ Matrix(Q^-1)
true

Also note that QTQ1=I\mathbf{Q}^T \cdot \mathbf{Q}^{-1} = \mathbf{I} (identity matrix):

Q' * Q ≈ I(3)
true

This is nice. But what can we do with QR decomposition? It can speed up Turing's sampling by a huge factor while also decorrelating the columns of X\mathbf{X}, i.e. the independent variables. The orthogonal nature of QR decomposition alters the posterior's topology and makes it easier for HMC or other MCMC samplers to explore it. Let's see how fast we can get with QR decomposition. First, let's go back to the kidiq example in 6. Bayesian Linear Regression:

using Turing
using LinearAlgebra: I
using Statistics: mean, std
using Random: seed!
seed!(123)

@model function linreg(X, y; predictors=size(X, 2))
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))
    β ~ filldist(TDist(3), predictors)
    σ ~ Exponential(1)

    #likelihood
    return y ~ MvNormal(α .+ X * β, σ^2 * I)
end;

using DataFrames
using CSV
using HTTP

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/kidiq.csv"
kidiq = CSV.read(HTTP.get(url).body, DataFrame)
X = Matrix(select(kidiq, Not(:kid_score)))
y = kidiq[:, :kid_score]
model = linreg(X, y)
chain = sample(model, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 10.19 seconds
Compute duration  = 16.15 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α   21.5919    8.7175     0.1378    0.2040   1753.4216    1.0017      108.5441
        β[1]    2.0147    1.8092     0.0286    0.0452   1914.7422    1.0013      118.5305
        β[2]    0.5808    0.0584     0.0009    0.0012   2148.1559    1.0006      132.9798
        β[3]    0.2437    0.3056     0.0048    0.0070   2166.7021    1.0009      134.1279
           σ   17.8928    0.5896     0.0093    0.0115   2708.8277    1.0001      167.6877

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           α    4.3464   15.7517   21.7247   27.2853   38.7344
        β[1]   -0.6530    0.7333    1.6501    3.0029    6.3849
        β[2]    0.4687    0.5416    0.5794    0.6197    0.6980
        β[3]   -0.3336    0.0285    0.2439    0.4560    0.8316
           σ   16.7749   17.4893   17.8613   18.2915   19.0887

See the wall duration in Turing's chain: for me it took around 24 seconds.

Now let's us incorporate QR decomposition in the linear regression model. Here, I will use the "thin" instead of the "fat" QR, which scales the Q\mathbf{Q} and R\mathbf{R} matrices by a factor of n1\sqrt{n-1} where nn is the number of rows of X\mathbf{X}. In practice it is better to implement the thin QR decomposition, which is to be preferred to the fat QR decomposition. It is numerically more stable. Mathematically, the thin QR decomposition is:

x=QRQ=Qn1R=1n1Rμ=α+Xβ+σ=α+QRβ+σ=α+Q(Rβ)+σ=α+Qβundefined+σ \begin{aligned} x &= \mathbf{Q}^* \mathbf{R}^* \\ \mathbf{Q}^* &= \mathbf{Q} \cdot \sqrt{n - 1} \\ \mathbf{R}^* &= \frac{1}{\sqrt{n - 1}} \cdot \mathbf{R}\\ \boldsymbol{\mu} &= \alpha + \mathbf{X} \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \mathbf{R}^* \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot (\mathbf{R}^* \cdot \boldsymbol{\beta}) + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \widetilde{\boldsymbol{\beta}} + \sigma \\ \end{aligned}

Then we can recover the original β\boldsymbol{\beta} with:

β=R1βundefined \boldsymbol{\beta} = \mathbf{R}^{*-1} \cdot \widetilde{\boldsymbol{\beta}}

In Turing, a model with QR decomposition would be the same linreg but with a different X matrix supplied, since it is a data transformation. First, we decompose your model data X into Q and R:

Q, R = qr(X)
Q_ast = Matrix(Q) * sqrt(size(X, 1) - 1)
R_ast = R / sqrt(size(X, 1) - 1);

Then, we instantiate a model with Q instead of X and sample as you would:

model_qr = linreg(Q_ast, y)
chain_qr = sample(model_qr, NUTS(1_000, 0.65), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):

Iterations        = 1001:1:2000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 5.16 seconds
Compute duration  = 8.67 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    32.9480    7.7468     0.1225    0.2364   1152.1382    1.0028      132.9032
        β[1]   -49.9599    6.9387     0.1097    0.2156   1136.4348    1.0032      131.0918
        β[2]    22.0431    3.5609     0.0563    0.1093   1155.7501    1.0035      133.3199
        β[3]     0.2728    0.8819     0.0139    0.0237   1398.8210    1.0020      161.3590
           σ    17.8491    0.6193     0.0098    0.0131   2365.8711    1.0004      272.9116

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

           α    18.5380    27.7723    32.6057    37.9096    49.3423
        β[1]   -62.7857   -54.5494   -50.2855   -45.4741   -35.3443
        β[2]    14.7913    19.7396    22.1681    24.4313    28.6965
        β[3]    -1.4185    -0.2770     0.2243     0.7787     2.1571
           σ    16.6845    17.4242    17.8210    18.2614    19.0900

See the wall duration in Turing's chain_qr: for me it took around 5 seconds. Much faster than the regular linreg. Now we have to reconstruct our β\boldsymbol{\beta}s:

betas = mapslices(
    x -> R_ast^-1 * x, chain_qr[:, namesingroup(chain_qr, :β), :].value.data; dims=[2]
)
chain_beta = setrange(
    Chains(betas, ["real_β[$i]" for i in 1:size(Q_ast, 2)]), 1_001:1:2_000
)
chain_qr_reconstructed = hcat(chain_beta, chain_qr)
Chains MCMC chain (1000×20×4 Array{Float64, 3}):

Iterations        = 1001:1:2000
Number of chains  = 4
Samples per chain = 1000
parameters        = real_β[1], real_β[2], real_β[3], α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64

   real_β[1]     6.2660    2.2533     0.0356    0.0315   3970.3435    1.0002
   real_β[2]     0.5045    0.0632     0.0010    0.0015   1846.1138    1.0017
   real_β[3]    -0.0666    0.2154     0.0034    0.0058   1398.8210    1.0020
           α    32.9480    7.7468     0.1225    0.2364   1152.1382    1.0028
        β[1]   -49.9599    6.9387     0.1097    0.2156   1136.4348    1.0032
        β[2]    22.0431    3.5609     0.0563    0.1093   1155.7501    1.0035
        β[3]     0.2728    0.8819     0.0139    0.0237   1398.8210    1.0020
           σ    17.8491    0.6193     0.0098    0.0131   2365.8711    1.0004

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

   real_β[1]     1.9309     4.7406     6.2556     7.7621    10.6728
   real_β[2]     0.3815     0.4635     0.5050     0.5455     0.6267
   real_β[3]    -0.5268    -0.1902    -0.0548     0.0676     0.3464
           α    18.5380    27.7723    32.6057    37.9096    49.3423
        β[1]   -62.7857   -54.5494   -50.2855   -45.4741   -35.3443
        β[2]    14.7913    19.7396    22.1681    24.4313    28.6965
        β[3]    -1.4185    -0.2770     0.2243     0.7787     2.1571
           σ    16.6845    17.4242    17.8210    18.2614    19.0900

Non-Centered Parametrization

Now let's us explore Non-Centered Parametrization (NCP). This is useful when the posterior's topology is very difficult to explore as has regions where HMC sampler has to change the step size LL and the ϵ\epsilon factor. This is I've showed one of the most infamous case in 5. Markov Chain Monte Carlo (MCMC): Neal's Funnel (Neal, 2003):

using CairoMakie
using Distributions
funnel_y = rand(Normal(0, 3), 10_000)
funnel_x = rand(Normal(), 10_000) .* exp.(funnel_y / 2)

f, ax, s = scatter(
    funnel_x,
    funnel_y;
    color=(:steelblue, 0.3),
    axis=(; xlabel=L"X", ylabel=L"Y", limits=(-100, 100, nothing, nothing)),
)

Neal's Funnel

Here we see that in upper part of the funnel HMC has to take few steps LL and be more liberal with the ϵ\epsilon factor. But, the opposite is in the lower part of the funnel: way more steps LL and be more conservative with the ϵ\epsilon factor.

To see the devil's funnel (how it is known in some Bayesian circles) in action, let's code it in Turing and then sample:

@model function funnel()
    y ~ Normal(0, 3)
    return x ~ Normal(0, exp(y / 2))
end

chain_funnel = sample(funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 5.03 seconds
Compute duration  = 9.14 seconds
parameters        = y, x
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse        ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64    Float64   Float64       Float64

           y    0.9349    2.5471     0.0403    0.1919   132.2382    1.0329       14.4665
           x   -0.3946    6.4402     0.1018    0.3474   280.4549    1.0046       30.6810

Quantiles
  parameters       2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol    Float64   Float64   Float64   Float64   Float64

           y    -4.0096   -0.8760    0.7658    2.6657    6.0989
           x   -14.0106   -1.1051   -0.0337    0.7439   10.4440

Wow, take a look at those rhat values... That sucks: all are above 1.01 even with 4 parallel chains with 1,000 iterations!

How do we deal with that? We reparametrize! Note that we can add two normal distributions in the following manner:

Normal(μ,σ)=Standard Normalσ+μ \text{Normal}(\mu, \sigma) = \text{Standard Normal} \cdot \sigma + \mu

where the standard normal is the normal with mean μ=0\mu = 0 and standard deviation σ=1\sigma = 1. This is why is called Non-Centered Parametrization because we "decouple" the parameters and reconstruct them before.

@model function ncp_funnel()
    x̃ ~ Normal()
    ỹ ~ Normal()
    y = 3.0 * ỹ         # implies y ~ Normal(0, 3)
    return x = exp(y / 2) * x̃  # implies x ~ Normal(0, exp(y / 2))
end

chain_ncp_funnel = sample(ncp_funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 4.06 seconds
Compute duration  = 7.74 seconds
parameters        = x̃, ỹ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol   Float64   Float64    Float64   Float64     Float64   Float64       Float64

           x̃    0.0130    0.9902     0.0157    0.0182   3639.7339    1.0012      470.0677
           ỹ   -0.0447    0.9978     0.0158    0.0160   3864.2412    1.0000      499.0625

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           x̃   -1.9084   -0.6314    0.0152    0.6657    1.9736
           ỹ   -2.0144   -0.7034   -0.0420    0.6243    1.9661

Much better now: all rhat are well below 1.01 (or below 0.99).

How we would implement this a real-world model in Turing? Let's go back to the cheese random-intercept model in 10. Multilevel Models (a.k.a. Hierarchical Models). Here was the approach that we took, also known as Centered Parametrization (CP):

@model function varying_intercept(
    X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(std(y))                 # residual SD
    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2); lower=0)    # group-level SDs intercepts
    αⱼ ~ filldist(Normal(0, τ), n_gr)       # CP group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ αⱼ[idx]
    return y ~ MvNormal(ŷ, σ^2 * I)
end;

To perform a Non-Centered Parametrization (NCP) in this model we do as following:

@model function varying_intercept_ncp(
    X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(std(y))                 # residual SD

    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2); lower=0)   # group-level SDs intercepts
    zⱼ ~ filldist(Normal(0, 1), n_gr)      # NCP group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ zⱼ[idx] .* τ
    return y ~ MvNormal(ŷ, σ^2 * I)
end;

Here we are using a NCP with the zⱼs following a standard normal and we reconstruct the group-level intercepts by multiplying the zⱼs by τ. Since the original αⱼs had a prior centered on 0 with standard deviation τ, we only have to use the multiplication by τ to get back the αⱼs.

Now let's see how NCP compares to the CP. First, let's redo our CP hierarchical model:

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv"
cheese = CSV.read(HTTP.get(url).body, DataFrame)

for c in unique(cheese[:, :cheese])
    cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end

cheese[:, :background_int] = map(cheese[:, :background]) do b
    if b == "rural"
        1
    elseif b == "urban"
        2
    else
        missing
    end
end

X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];

model_cp = varying_intercept(X, idx, y)
chain_cp = sample(model_cp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 10.52 seconds
Compute duration  = 17.18 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    70.6428    5.0487     0.0798    0.1737    770.1477    1.0046       44.8360
        β[1]     2.9791    1.3144     0.0208    0.0258   2172.3735    1.0004      126.4699
        β[2]   -10.5427    1.3601     0.0215    0.0303   2084.5327    1.0002      121.3560
        β[3]     6.5850    1.3731     0.0217    0.0292   2098.4036    1.0007      122.1636
        β[4]     1.1433    1.3117     0.0207    0.0299   2069.7945    1.0003      120.4980
           σ     7.3996    0.4542     0.0072    0.0080   2613.2512    1.0008      152.1366
           τ     6.3827    8.5037     0.1345    0.3209    800.8580    1.0184       46.6239
       αⱼ[1]    -3.3743    4.9339     0.0780    0.1727    757.3016    1.0050       44.0881
       αⱼ[2]     3.6766    4.9634     0.0785    0.1742    748.6351    1.0056       43.5836

Quantiles
  parameters       2.5%      25.0%      50.0%     75.0%     97.5%
      Symbol    Float64    Float64    Float64   Float64   Float64

           α    60.7302    68.3134    70.7241   73.0491   80.2703
        β[1]     0.3954     2.1190     2.9438    3.8796    5.5242
        β[2]   -13.2151   -11.4684   -10.5399   -9.6328   -7.8694
        β[3]     3.9123     5.6639     6.5755    7.4932    9.2530
        β[4]    -1.4944     0.2537     1.1339    2.0287    3.6481
           σ     6.5570     7.0811     7.3764    7.6938    8.3492
           τ     1.8508     3.2115     4.5971    6.8672   20.0755
       αⱼ[1]   -12.7622    -5.6539    -3.3432   -1.1848    6.2812
       αⱼ[2]    -5.7783     1.3861     3.5480    5.7955   13.6332

Now let's do the NCP hierarchical model:

model_ncp = varying_intercept_ncp(X, idx, y)
chain_ncp = sample(model_ncp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 15.18 seconds
Compute duration  = 27.18 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    70.6381    3.9661     0.0627    0.2270    276.3623    1.0032       10.1686
        β[1]     3.0184    1.3067     0.0207    0.0343   1430.7232    1.0010       52.6427
        β[2]   -10.5484    1.3543     0.0214    0.0345   1795.1054    1.0000       66.0499
        β[3]     6.5794    1.3084     0.0207    0.0366   1155.6678    1.0000       42.5222
        β[4]     1.1558    1.3067     0.0207    0.0347   1472.3222    1.0000       54.1733
           σ     7.3846    0.4668     0.0074    0.0103   2309.3984    1.0003       84.9731
           τ     5.0347    2.4221     0.0383    0.1032    576.5776    1.0026       21.2149
       zⱼ[1]    -0.8461    0.8249     0.0130    0.0371    543.0646    1.0028       19.9818
       zⱼ[2]     0.8675    0.8002     0.0127    0.0256    989.2219    1.0007       36.3979

Quantiles
  parameters       2.5%      25.0%      50.0%     75.0%     97.5%
      Symbol    Float64    Float64    Float64   Float64   Float64

           α    61.2831    68.3926    70.8893   73.1427   78.1953
        β[1]     0.5016     2.1144     3.0451    3.9067    5.4755
        β[2]   -13.1368   -11.4724   -10.5733   -9.6324   -7.7688
        β[3]     4.0680     5.6634     6.5795    7.4830    9.0380
        β[4]    -1.4639     0.3336     1.1558    2.0353    3.7254
           σ     6.5451     7.0536     7.3621    7.6882    8.3494
           τ     1.8278     3.2063     4.4428    6.4645   11.0497
       zⱼ[1]    -2.5191    -1.3926    -0.8403   -0.2763    0.7263
       zⱼ[2]    -0.6093     0.2939     0.8572    1.4003    2.4570

Notice that some models are better off with a standard Centered Parametrization (as is our cheese case here). While others are better off with a Non-Centered Parametrization. But now you know how to apply both parametrizations in Turing. Before we conclude, we need to recover our original αⱼs. We can do this by multiplying zⱼ[idx] .* τ:

τ = summarystats(chain_ncp)[:τ, :mean]
αⱼ = mapslices(
    x -> x * τ, chain_ncp[:, namesingroup(chain_ncp, :zⱼ), :].value.data; dims=[2]
)
chain_ncp_reconstructed = hcat(
    MCMCChains.resetrange(chain_ncp), Chains(αⱼ, ["αⱼ[$i]" for i in 1:length(unique(idx))])
)
Chains MCMC chain (1000×23×4 Array{Float64, 3}):

Iterations        = 1:1000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 15.18 seconds
Compute duration  = 27.18 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2], αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std   naive_se      mcse         ess      rhat   ess_per_sec
      Symbol    Float64   Float64    Float64   Float64     Float64   Float64       Float64

           α    70.6381    3.9661     0.0627    0.2270    276.3623    1.0032       10.1686
        β[1]     3.0184    1.3067     0.0207    0.0343   1430.7232    1.0010       52.6427
        β[2]   -10.5484    1.3543     0.0214    0.0345   1795.1054    1.0000       66.0499
        β[3]     6.5794    1.3084     0.0207    0.0366   1155.6678    1.0000       42.5222
        β[4]     1.1558    1.3067     0.0207    0.0347   1472.3222    1.0000       54.1733
           σ     7.3846    0.4668     0.0074    0.0103   2309.3984    1.0003       84.9731
           τ     5.0347    2.4221     0.0383    0.1032    576.5776    1.0026       21.2149
       zⱼ[1]    -0.8461    0.8249     0.0130    0.0371    543.0646    1.0028       19.9818
       zⱼ[2]     0.8675    0.8002     0.0127    0.0256    989.2219    1.0007       36.3979
       αⱼ[1]    -4.2599    4.1534     0.0657    0.1869    543.0646    1.0028       19.9818
       αⱼ[2]     4.3677    4.0290     0.0637    0.1291    989.2219    1.0007       36.3979

Quantiles
  parameters       2.5%      25.0%      50.0%     75.0%     97.5%
      Symbol    Float64    Float64    Float64   Float64   Float64

           α    61.2831    68.3926    70.8893   73.1427   78.1953
        β[1]     0.5016     2.1144     3.0451    3.9067    5.4755
        β[2]   -13.1368   -11.4724   -10.5733   -9.6324   -7.7688
        β[3]     4.0680     5.6634     6.5795    7.4830    9.0380
        β[4]    -1.4639     0.3336     1.1558    2.0353    3.7254
           σ     6.5451     7.0536     7.3621    7.6882    8.3494
           τ     1.8278     3.2063     4.4428    6.4645   11.0497
       zⱼ[1]    -2.5191    -1.3926    -0.8403   -0.2763    0.7263
       zⱼ[2]    -0.6093     0.2939     0.8572    1.4003    2.4570
       αⱼ[1]   -12.6830    -7.0114    -4.2305   -1.3910    3.6565
       αⱼ[2]    -3.0677     1.4797     4.3158    7.0499   12.3705

References

Neal, Radford M. (2003). Slice Sampling. The Annals of Statistics, 31(3), 705–741. Retrieved from https://www.jstor.org/stable/3448413