Computational Tricks with Turing
(Non-Centered Parametrization
and QR Decomposition)

There are some computational tricks that we can employ with Turing. I will cover here two computational tricks:

  1. QR Decomposition

  2. Non-Centered Parametrization

QR Decomposition

Back in "Linear Algebra 101" we've learned that any matrix (even rectangular ones) can be factored into the product of two matrices:

  • Q\mathbf{Q}: an orthogonal matrix (its columns are orthogonal unit vectors meaning QT=Q1)\mathbf{Q}^T = \mathbf{Q}^{-1}).

  • R\mathbf{R}: an upper triangular matrix.

This is commonly known as the QR Decomposition:

A=QR \mathbf{A} = \mathbf{Q} \cdot \mathbf{R}

Let me show you an example with a random matrix AR3×2\mathbf{A} \in \mathbb{R}^{3 \times 2}:

A = rand(3, 2)
3×2 Matrix{Float64}:
 0.720103  0.295367
 0.573619  0.276597
 0.664468  0.983436

Now let's factor A using LinearAlgebra's qr() function:

using LinearAlgebra: qr, I
Q, R = qr(A)
LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}
Q factor: 3×3 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}
R factor:
2×2 Matrix{Float64}:
 -1.13539  -0.902615
  0.0       0.562299

Notice that qr() produced a tuple containing two matrices Q and R. Q is a 3x3 orthogonal matrix. And R is a 2x2 upper triangular matrix. So that QT=Q1\mathbf{Q}^T = \mathbf{Q}^{-1} (the transpose is equal the inverse):

Matrix(Q') ≈ Matrix(Q^-1)
MethodError: no method matching ^(::LinearAlgebra.AdjointQ{Float64, LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}}, ::Int64)

Closest candidates are:
  ^(!Matched::Float16, ::Integer)
   @ Base math.jl:1283
  ^(!Matched::Regex, ::Integer)
   @ Base regex.jl:863
  ^(!Matched::Float32, ::Integer)
   @ Base math.jl:1277
  ...

Also note that QTQ1=I\mathbf{Q}^T \cdot \mathbf{Q}^{-1} = \mathbf{I} (identity matrix):

Q' * Q ≈ I(3)
true

This is nice. But what can we do with QR decomposition? It can speed up Turing's sampling by a huge factor while also decorrelating the columns of X\mathbf{X}, i.e. the independent variables. The orthogonal nature of QR decomposition alters the posterior's topology and makes it easier for HMC or other MCMC samplers to explore it. Let's see how fast we can get with QR decomposition. First, let's go back to the kidiq example in 6. Bayesian Linear Regression:

using Turing
using LinearAlgebra: I
using Statistics: mean, std
using Random: seed!
seed!(123)

@model function linreg(X, y; predictors=size(X, 2))
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))
    β ~ filldist(TDist(3), predictors)
    σ ~ Exponential(1)

    #likelihood
    return y ~ MvNormal(α .+ X * β, σ^2 * I)
end;

using DataFrames
using CSV
using HTTP

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/kidiq.csv"
kidiq = CSV.read(HTTP.get(url).body, DataFrame)
X = Matrix(select(kidiq, Not(:kid_score)))
y = kidiq[:, :kid_score]
model = linreg(X, y)
chain = sample(model, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 6.24 seconds
Compute duration  = 23.56 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64       Float64

           α   21.5126    8.6720    0.2217   1528.8886   1868.2612    1.0009       64.8962
        β[1]    2.0014    1.7943    0.0419   2281.1940   1734.6512    1.0016       96.8290
        β[2]    0.5788    0.0584    0.0013   2163.9754   2292.8814    1.0006       91.8534
        β[3]    0.2566    0.3092    0.0074   1762.0214   2135.6795    1.0010       74.7919
           σ   17.8859    0.6033    0.0106   3271.1669   2347.2435    1.0008      138.8500

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           α    4.7278   15.7633   21.2942   27.4322   38.4426
        β[1]   -0.5876    0.7324    1.6761    2.9919    6.3388
        β[2]    0.4662    0.5392    0.5793    0.6184    0.6924
        β[3]   -0.3477    0.0440    0.2588    0.4733    0.8490
           σ   16.7525   17.4685   17.8796   18.2703   19.1238

See the wall duration in Turing's chain: for me it took around 24 seconds.

Now let's us incorporate QR decomposition in the linear regression model. Here, I will use the "thin" instead of the "fat" QR, which scales the Q\mathbf{Q} and R\mathbf{R} matrices by a factor of n1\sqrt{n-1} where nn is the number of rows of X\mathbf{X}. In practice it is better to implement the thin QR decomposition, which is to be preferred to the fat QR decomposition. It is numerically more stable. Mathematically, the thin QR decomposition is:

x=QRQ=Qn1R=1n1Rμ=α+Xβ+σ=α+QRβ+σ=α+Q(Rβ)+σ=α+Qβundefined+σ \begin{aligned} x &= \mathbf{Q}^* \mathbf{R}^* \\ \mathbf{Q}^* &= \mathbf{Q} \cdot \sqrt{n - 1} \\ \mathbf{R}^* &= \frac{1}{\sqrt{n - 1}} \cdot \mathbf{R}\\ \boldsymbol{\mu} &= \alpha + \mathbf{X} \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \mathbf{R}^* \cdot \boldsymbol{\beta} + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot (\mathbf{R}^* \cdot \boldsymbol{\beta}) + \sigma \\ &= \alpha + \mathbf{Q}^* \cdot \widetilde{\boldsymbol{\beta}} + \sigma \\ \end{aligned}

Then we can recover the original β\boldsymbol{\beta} with:

β=R1βundefined \boldsymbol{\beta} = \mathbf{R}^{*-1} \cdot \widetilde{\boldsymbol{\beta}}

In Turing, a model with QR decomposition would be the same linreg but with a different X matrix supplied, since it is a data transformation. First, we decompose your model data X into Q and R:

Q, R = qr(X)
Q_ast = Matrix(Q) * sqrt(size(X, 1) - 1)
R_ast = R / sqrt(size(X, 1) - 1);

Then, we instantiate a model with Q instead of X and sample as you would:

model_qr = linreg(Q_ast, y)
chain_qr = sample(model_qr, NUTS(1_000, 0.65), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×17×4 Array{Float64, 3}):

Iterations        = 1001:1:2000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 2.65 seconds
Compute duration  = 7.82 seconds
parameters        = α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
      Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Float64

           α    32.9057    7.8274    0.2470   1006.3265   1242.3293    1.0026      128.7521
        β[1]   -49.9922    7.0245    0.2210   1009.4898   1401.5358    1.0022      129.1568
        β[2]    22.0858    3.5735    0.1101   1054.2580   1418.8568    1.0028      134.8846
        β[3]     0.2869    0.8775    0.0238   1370.8734   1800.6496    1.0010      175.3932
           σ    17.8703    0.5859    0.0113   2699.9100   2464.9195    1.0019      345.4337

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

           α    17.6798    27.6922    32.7076    38.1740    47.9793
        β[1]   -63.6746   -54.6970   -50.1683   -45.2700   -36.2948
        β[2]    15.1066    19.7019    22.1804    24.4485    29.1569
        β[3]    -1.3554    -0.2981     0.2697     0.8374     2.1505
           σ    16.7293    17.4690    17.8683    18.2608    19.0084

See the wall duration in Turing's chain_qr: for me it took around 5 seconds. Much faster than the regular linreg. Now we have to reconstruct our β\boldsymbol{\beta}s:

betas = mapslices(
    x -> R_ast^-1 * x, chain_qr[:, namesingroup(chain_qr, :β), :].value.data; dims=[2]
)
chain_beta = setrange(
    Chains(betas, ["real_β[$i]" for i in 1:size(Q_ast, 2)]), 1_001:1:2_000
)
chain_qr_reconstructed = hcat(chain_beta, chain_qr)
Chains MCMC chain (1000×20×4 Array{Float64, 3}):

Iterations        = 1001:1:2000
Number of chains  = 4
Samples per chain = 1000
parameters        = real_β[1], real_β[2], real_β[3], α, β[1], β[2], β[3], σ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
      Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Missing

   real_β[1]     6.2103    2.1904    0.0339   4190.5377   2075.4888    1.0015       missing
   real_β[2]     0.5062    0.0616    0.0015   1657.4420   2135.3528    1.0021       missing
   real_β[3]    -0.0701    0.2143    0.0058   1370.8734   1800.6496    1.0010       missing
           α    32.9057    7.8274    0.2470   1006.3265   1242.3293    1.0026       missing
        β[1]   -49.9922    7.0245    0.2210   1009.4898   1401.5358    1.0022       missing
        β[2]    22.0858    3.5735    0.1101   1054.2580   1418.8568    1.0028       missing
        β[3]     0.2869    0.8775    0.0238   1370.8734   1800.6496    1.0010       missing
           σ    17.8703    0.5859    0.0113   2699.9100   2464.9195    1.0019       missing

Quantiles
  parameters       2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol    Float64    Float64    Float64    Float64    Float64

   real_β[1]     1.8485     4.7441     6.2339     7.7050    10.3766
   real_β[2]     0.3815     0.4656     0.5071     0.5480     0.6252
   real_β[3]    -0.5252    -0.2045    -0.0659     0.0728     0.3310
           α    17.6798    27.6922    32.7076    38.1740    47.9793
        β[1]   -63.6746   -54.6970   -50.1683   -45.2700   -36.2948
        β[2]    15.1066    19.7019    22.1804    24.4485    29.1569
        β[3]    -1.3554    -0.2981     0.2697     0.8374     2.1505
           σ    16.7293    17.4690    17.8683    18.2608    19.0084

Non-Centered Parametrization

Now let's us explore Non-Centered Parametrization (NCP). This is useful when the posterior's topology is very difficult to explore as has regions where HMC sampler has to change the step size LL and the ϵ\epsilon factor. This is I've showed one of the most infamous case in 5. Markov Chain Monte Carlo (MCMC): Neal's Funnel (Neal, 2003):

using CairoMakie
using Distributions
funnel_y = rand(Normal(0, 3), 10_000)
funnel_x = rand(Normal(), 10_000) .* exp.(funnel_y / 2)

f, ax, s = scatter(
    funnel_x,
    funnel_y;
    color=(:steelblue, 0.3),
    axis=(; xlabel=L"X", ylabel=L"Y", limits=(-100, 100, nothing, nothing)),
)

Neal's Funnel

Here we see that in upper part of the funnel HMC has to take few steps LL and be more liberal with the ϵ\epsilon factor. But, the opposite is in the lower part of the funnel: way more steps LL and be more conservative with the ϵ\epsilon factor.

To see the devil's funnel (how it is known in some Bayesian circles) in action, let's code it in Turing and then sample:

@model function funnel()
    y ~ Normal(0, 3)
    return x ~ Normal(0, exp(y / 2))
end

chain_funnel = sample(funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 5.91 seconds
Compute duration  = 23.42 seconds
parameters        = y, x
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse   ess_bulk   ess_tail      rhat   ess_per_sec
      Symbol   Float64   Float64   Float64    Float64    Float64   Float64       Float64

           y    0.3304    2.5005    0.4266    34.1757    64.0933    1.0981        1.4589
           x   -0.7113    8.8403    0.6150   467.3926   188.6939    1.0944       19.9527

Quantiles
  parameters       2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol    Float64   Float64   Float64   Float64   Float64

           y    -3.2368   -2.1271    0.0819    1.9243    5.8793
           x   -13.3185   -0.6603   -0.2764    0.4686    6.4055

Wow, take a look at those rhat values... That sucks: all are above 1.01 even with 4 parallel chains with 1,000 iterations!

How do we deal with that? We reparametrize! Note that we can add two normal distributions in the following manner:

Normal(μ,σ)=Standard Normalσ+μ \text{Normal}(\mu, \sigma) = \text{Standard Normal} \cdot \sigma + \mu

where the standard normal is the normal with mean μ=0\mu = 0 and standard deviation σ=1\sigma = 1. This is why is called Non-Centered Parametrization because we "decouple" the parameters and reconstruct them before.

@model function ncp_funnel()
    x̃ ~ Normal()
    ỹ ~ Normal()
    y = 3.0 * ỹ         # implies y ~ Normal(0, 3)
    return x = exp(y / 2) * x̃  # implies x ~ Normal(0, exp(y / 2))
end

chain_ncp_funnel = sample(ncp_funnel(), NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×14×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 5.38 seconds
Compute duration  = 21.32 seconds
parameters        = x̃, ỹ
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64       Float64

           x̃   -0.0006    1.0002    0.0161   3829.5014   2972.2855    0.9999      179.6454
           ỹ   -0.0444    1.0054    0.0160   3917.0636   2987.0567    0.9999      183.7530

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           x̃   -1.8916   -0.6865   -0.0231    0.6704    2.0065
           ỹ   -2.0393   -0.7107   -0.0524    0.6362    1.9541

Much better now: all rhat are well below 1.01 (or below 0.99).

How we would implement this a real-world model in Turing? Let's go back to the cheese random-intercept model in 11. Multilevel Models (a.k.a. Hierarchical Models). Here was the approach that we took, also known as Centered Parametrization (CP):

@model function varying_intercept(
    X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(std(y))                 # residual SD
    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2); lower=0)    # group-level SDs intercepts
    αⱼ ~ filldist(Normal(0, τ), n_gr)       # CP group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ αⱼ[idx]
    return y ~ MvNormal(ŷ, σ^2 * I)
end;

To perform a Non-Centered Parametrization (NCP) in this model we do as following:

@model function varying_intercept_ncp(
    X, idx, y; n_gr=length(unique(idx)), predictors=size(X, 2)
)
    #priors
    α ~ Normal(mean(y), 2.5 * std(y))       # population-level intercept
    β ~ filldist(Normal(0, 2), predictors)  # population-level coefficients
    σ ~ Exponential(std(y))                 # residual SD

    #prior for variance of random intercepts
    #usually requires thoughtful specification
    τ ~ truncated(Cauchy(0, 2); lower=0)   # group-level SDs intercepts
    zⱼ ~ filldist(Normal(0, 1), n_gr)      # NCP group-level intercepts

    #likelihood
    ŷ = α .+ X * β .+ zⱼ[idx] .* τ
    return y ~ MvNormal(ŷ, σ^2 * I)
end;

Here we are using a NCP with the zⱼs following a standard normal and we reconstruct the group-level intercepts by multiplying the zⱼs by τ. Since the original αⱼs had a prior centered on 0 with standard deviation τ, we only have to use the multiplication by τ to get back the αⱼs.

Now let's see how NCP compares to the CP. First, let's redo our CP hierarchical model:

url = "https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/cheese.csv"
cheese = CSV.read(HTTP.get(url).body, DataFrame)

for c in unique(cheese[:, :cheese])
    cheese[:, "cheese_$c"] = ifelse.(cheese[:, :cheese] .== c, 1, 0)
end

cheese[:, :background_int] = map(cheese[:, :background]) do b
    if b == "rural"
        1
    elseif b == "urban"
        2
    else
        missing
    end
end

X = Matrix(select(cheese, Between(:cheese_A, :cheese_D)));
y = cheese[:, :y];
idx = cheese[:, :background_int];

model_cp = varying_intercept(X, idx, y)
chain_cp = sample(model_cp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 7.72 seconds
Compute duration  = 26.82 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
      Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Float64

           α    70.6388    5.4554    0.2162    915.6164    759.2940    1.0028       34.1457
        β[1]     2.9639    1.3435    0.0265   2565.3867   2773.4856    1.0012       95.6698
        β[2]   -10.5915    1.3904    0.0277   2522.8976   2504.4545    1.0016       94.0853
        β[3]     6.5485    1.3619    0.0264   2668.6532   2413.9865    0.9997       99.5209
        β[4]     1.0905    1.3750    0.0285   2338.0486   2471.5094    1.0010       87.1918
           σ     7.4087    0.4507    0.0088   2643.0429   2581.2624    1.0001       98.5658
           τ     6.2641    5.7017    0.1952   1222.7006   1203.5556    1.0038       45.5976
       αⱼ[1]    -3.3228    5.3516    0.2167    865.3478    780.3551    1.0027       32.2710
       αⱼ[2]     3.7386    5.3657    0.2173    894.2647    746.3891    1.0034       33.3494

Quantiles
  parameters       2.5%      25.0%      50.0%     75.0%     97.5%
      Symbol    Float64    Float64    Float64   Float64   Float64

           α    59.8857    68.3215    70.7533   73.2141   81.3633
        β[1]     0.3781     2.0754     2.9757    3.8536    5.5841
        β[2]   -13.2884   -11.5392   -10.6060   -9.6587   -7.8122
        β[3]     3.9443     5.6304     6.5351    7.4435    9.2425
        β[4]    -1.6999     0.1515     1.1072    2.0425    3.7349
           σ     6.5891     7.0826     7.3872    7.7026    8.3437
           τ     1.8181     3.2859     4.6436    7.1806   20.4141
       αⱼ[1]   -14.0825    -5.6883    -3.3988   -1.1424    7.5331
       αⱼ[2]    -6.5136     1.3252     3.5179    5.9215   14.7231

Now let's do the NCP hierarchical model:

model_ncp = varying_intercept_ncp(X, idx, y)
chain_ncp = sample(model_ncp, NUTS(), MCMCThreads(), 1_000, 4)
Chains MCMC chain (1000×21×4 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 11.84 seconds
Compute duration  = 45.18 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
      Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Float64

           α    71.1250    3.5397    0.1218    878.2014    784.3499    1.0054       19.4365
        β[1]     2.9093    1.2858    0.0297   1876.0233   2481.1601    1.0026       41.5206
        β[2]   -10.6841    1.3789    0.0608    571.3700    192.4652    1.0074       12.6457
        β[3]     6.5318    1.3379    0.0326   1713.4983   1455.9135    1.0030       37.9235
        β[4]     1.0746    1.3279    0.0316   1767.8948   2009.5112    1.0071       39.1274
           σ     7.3765    0.4561    0.0158    777.6440    188.6146    1.0139       17.2110
           τ     5.0157    2.6654    0.1354    604.8463    193.9385    1.0075       13.3866
       zⱼ[1]    -0.9156    0.7846    0.0225   1184.6917   1334.6682    1.0041       26.2199
       zⱼ[2]     0.8326    0.8046    0.0223   1253.2053    947.4690    1.0023       27.7362

Quantiles
  parameters       2.5%      25.0%      50.0%     75.0%     97.5%
      Symbol    Float64    Float64    Float64   Float64   Float64

           α    63.6608    68.9974    71.0956   73.1433   78.5094
        β[1]     0.3584     2.0549     2.9066    3.7865    5.4293
        β[2]   -13.5656   -11.5643   -10.6594   -9.7752   -7.9711
        β[3]     3.9994     5.5582     6.4857    7.4393    9.0957
        β[4]    -1.4433     0.2116     1.0321    1.9658    3.6415
           σ     6.5332     7.0626     7.3592    7.6827    8.3100
           τ     1.8047     3.1023     4.2758    6.3272   11.7515
       zⱼ[1]    -2.5756    -1.4160    -0.8862   -0.3486    0.5239
       zⱼ[2]    -0.6131     0.2661     0.7795    1.3822    2.4805

Notice that some models are better off with a standard Centered Parametrization (as is our cheese case here). While others are better off with a Non-Centered Parametrization. But now you know how to apply both parametrizations in Turing. Before we conclude, we need to recover our original αⱼs. We can do this by multiplying zⱼ[idx] .* τ:

τ = summarystats(chain_ncp)[:τ, :mean]
αⱼ = mapslices(
    x -> x * τ, chain_ncp[:, namesingroup(chain_ncp, :zⱼ), :].value.data; dims=[2]
)
chain_ncp_reconstructed = hcat(
    MCMCChains.resetrange(chain_ncp), Chains(αⱼ, ["αⱼ[$i]" for i in 1:length(unique(idx))])
)
Chains MCMC chain (1000×23×4 Array{Float64, 3}):

Iterations        = 1:1000
Number of chains  = 4
Samples per chain = 1000
Wall duration     = 11.84 seconds
Compute duration  = 45.18 seconds
parameters        = α, β[1], β[2], β[3], β[4], σ, τ, zⱼ[1], zⱼ[2], αⱼ[1], αⱼ[2]
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean       std      mcse    ess_bulk    ess_tail      rhat   ess_per_sec
      Symbol    Float64   Float64   Float64     Float64     Float64   Float64       Float64

           α    71.1250    3.5397    0.1218    878.2014    784.3499    1.0054       19.4365
        β[1]     2.9093    1.2858    0.0297   1876.0233   2481.1601    1.0026       41.5206
        β[2]   -10.6841    1.3789    0.0608    571.3700    192.4652    1.0074       12.6457
        β[3]     6.5318    1.3379    0.0326   1713.4983   1455.9135    1.0030       37.9235
        β[4]     1.0746    1.3279    0.0316   1767.8948   2009.5112    1.0071       39.1274
           σ     7.3765    0.4561    0.0158    777.6440    188.6146    1.0139       17.2110
           τ     5.0157    2.6654    0.1354    604.8463    193.9385    1.0075       13.3866
       zⱼ[1]    -0.9156    0.7846    0.0225   1184.6917   1334.6682    1.0041       26.2199
       zⱼ[2]     0.8326    0.8046    0.0223   1253.2053    947.4690    1.0023       27.7362
       αⱼ[1]    -4.5922    3.9351    0.1128   1184.6917   1334.6682    1.0041       26.2199
       αⱼ[2]     4.1762    4.0358    0.1118   1253.2053    947.4690    1.0023       27.7362

Quantiles
  parameters       2.5%      25.0%      50.0%     75.0%     97.5%
      Symbol    Float64    Float64    Float64   Float64   Float64

           α    63.6608    68.9974    71.0956   73.1433   78.5094
        β[1]     0.3584     2.0549     2.9066    3.7865    5.4293
        β[2]   -13.5656   -11.5643   -10.6594   -9.7752   -7.9711
        β[3]     3.9994     5.5582     6.4857    7.4393    9.0957
        β[4]    -1.4433     0.2116     1.0321    1.9658    3.6415
           σ     6.5332     7.0626     7.3592    7.6827    8.3100
           τ     1.8047     3.1023     4.2758    6.3272   11.7515
       zⱼ[1]    -2.5756    -1.4160    -0.8862   -0.3486    0.5239
       zⱼ[2]    -0.6131     0.2661     0.7795    1.3822    2.4805
       αⱼ[1]   -12.9183    -7.1023    -4.4447   -1.7483    2.6277
       αⱼ[2]    -3.0749     1.3348     3.9098    6.9328   12.4414

References

Neal, Radford M. (2003). Slice Sampling. The Annals of Statistics, 31(3), 705–741. Retrieved from https://www.jstor.org/stable/3448413