StanBlocks.jl
  • Overview
  • Performance
  • Implementations
  • @slic

On this page

  • Caveats
  • Visualization
  • Tabular data

Julia vs Stan performance comparison

Author

Nikolas Siccha

Caveats

This page compares the performance of Julia’s and Stan’s log density and log density gradient computations for the implemented posteriors. Several caveats apply:

  • The posteriordb Stan implementations were never meant to represent “perfect and best-performant” practices.
  • The StanBlocks.jl implementations are not line-by-line translations of the Stan implementations. Sometimes small optimizations were applied, to make the implementation fall more in line with common Julia practices, or to make the code more friendly for Julia’s AD packages, e.g. by avoiding mutation.
  • Stan often automatically drops constant terms (unless configured differently), see https://mc-stan.org/docs/reference-manual/statements.html#log-probability-increment-vs.-distribution-statement, thus avoiding superfluous (for its purposes) computation, while the StanBlocks.jl implementations do not.
  • Stan implements a lot of custom AD rules, while StanBlocks.jl does not at all, and Enzyme.jl does rarely (if ever?). I suspect that adding AD rules for _glm_ type functions would further improve Julia’s performance.
  • The StanBlocks.jl “sampling” statements try to be clever about avoiding repeated computations. While I am not sure whether Stan applies the same optimizations, in principle it could do that without extra work by the user.
  • While preliminary benchmark runs included “all” Julia AD packages, all of them are almost always much slower than Enzyme.jl for the implemented posteriors, which on top of that performance advantage also supports more Julia language features than some of the other AD packages. As such, I am only comparing Enzyme and Stan. Enzyme outperforming every other AD package for these posteriors/loss functions does of course not mean that it will necessarily do as well for other applications.
  • Enzyme’s development is progressing quite quickly. While it currently sometimes crashes Julia, or it sometimes errors while trying to compute a gradient, in general Enzyme’s performance and reliability are continuously and quickly improving.
  • Stan’s benchmark is done from Julia via BridgeStan.jl. While I think that any performance penalty should be extremely small, I am not 100% sure. BridgeStan uses the -O3 compiler flag by default, but no additional ones.
  • All benchmarks are happening with a single thread on my local machine.
  • There are probably more caveats!
Warning

In general, doing performance comparisons is quite tricky, for more reasons than just the ones mentioned above. The below plot and tables should most definitely NOT be interpreted as “A is X-times faster than B”.

The WebIO Jupyter extension was not detected. See the WebIO Jupyter integration documentation for more information.

Visualization

Warning

In general, doing performance comparisons is quite tricky, for more reasons than just the ones mentioned above. The below plot and tables should most definitely NOT be interpreted as “A is X-times faster than B”.

The below plot shows the relative primitive runtime (x-axis, Julia vs Stan, left: Julia is faster) and the relative gradient runtime (y-axis, Julia+X vs Stan, bottom: Julia is faster) for the posteriordb models for which the overview table has a value less than 1e-8 in the median relative ulpdf error column. The color of the points represents the Julia AD framework used, which currently includes Enzyme.jl and Mooncake.jl. Hovering over the data points will show the posterior name, its dimension, the allocations required by Julia during the primitive and gradient run and a short explanation, e.g. mesquite-logmesquite_logvash (D=7, #allocs=0->70) - Julia's primitive is ~4.5 times faster, but Enzyme's gradient is ~16.0 times slower. Any time spent on garbage collection has been subtracted from the measured wall times. All mean runtime estimates were run until the estimated standard error of the mean was smaller than roughly .5% of the estimated mean. Due to this, I have removed the credible intervals and standard errors from the plot and table.

┌ Warning: attempting to remove probably stale pidfile
│   path = "/home/niko/.jlassetregistry.lock"
└ @ Pidfile ~/.julia/packages/Pidfile/DDu3M/src/Pidfile.jl:260

Tabular data

The below table shows information about the implemented posteriors. Will elaborate on the exact meaning of columns.

posterior name implementations dimension median relative ulpdf error relative mean primitive Julia runtime relative mean primitive Stan runtime relative mean Enzyme runtime relative mean Mooncake runtime relative mean Stan gradient runtime primitive Julia allocations Enzyme allocations Mooncake allocations median lpdf difference median Enzyme relative gradient error median Mooncake relative gradient error Bridgestan Enzyme Mooncake
GLMM_data-GLMM1_model

Stan, Julia

237 1.2e-16 1.0 3.9 1.0 3.7 2.2 0 0 18 -2.3 1.2e-17 1.2e-17 2.6.1 0.13.30 0.4.86
GLM_Binomial_data-GLM_Binomial_model

Stan, Julia

3 1.4e-16 1.0 1.5 1.2 3.1 1.0 0 0 47 -14.0 3.3e-16 3.8e-16 2.6.1 0.13.30 0.4.86
GLM_Poisson_Data-GLM_Poisson_model

Stan, Julia

4 3.6e-16 1.0 2.3 1.0 6.6 1.5 0 0 47 0.0 4.3e-16 4.0e-16 2.6.1 0.13.30 0.4.86
M0_data-M0_model

Stan, Julia

2 1.7e-16 1.0 1.2 1.1 2.0 1.0 0 0 5 0.0 3.0e-15 8.9e-16 2.6.1 0.13.30 0.4.86
Mb_data-Mb_model

Stan, Julia

3 0.0 1.0 2.6 2.3 3.2 1.0 3 32 5770 0.0 6.6e-15 4.8e-15 2.6.1 0.13.30 0.4.86
Mh_data-Mh_model

Stan, Julia

388 4.8e-16 1.0 1.7 1.0 1.6 1.2 1 2 14 -160.0 1.2e-15 7.4e-16 2.6.1 0.13.30 0.4.86
Mt_data-Mt_model

Stan, Julia

4 1.5e-16 1.0 1.9 3.9 3.9 1.0 1 26 4297 0.0 1.9e-15 2.6e-16 2.6.1 0.13.30 0.4.86
Mtbh_data-Mtbh_model

Stan, Julia

154 1.8e-16 1.0 2.4 1.8 2.5 1.0 6 38 2718 -2.3 6.1e-16 3.4e-16 2.6.1 0.13.30 0.4.86
Mth_data-Mth_model

Stan, Julia

394 1.7e-16 1.0 2.2 2.0 2.9 1.0 3 30 7015 0.0 1.9e-15 6.6e-16 2.6.1 0.13.30 0.4.86
Rate_1_data-Rate_1_model

Stan, Julia

1 9.2e-17 1.0 2.3 1.0 2.3 2.1 0 0 0 5.5 1.5e-16 2.8e-16 2.6.1 0.13.30 0.4.86
Rate_2_data-Rate_2_model

Stan, Julia

2 8.7e-17 1.0 2.0 1.0 2.6 1.7 0 0 0 10.0 2.1e-16 2.7e-16 2.6.1 0.13.30 0.4.86
Rate_3_data-Rate_3_model

Stan, Julia

1 0.0 1.0 1.7 1.0 2.6 1.6 0 0 0 10.0 2.0e-16 2.0e-16 2.6.1 0.13.30 0.4.86
Rate_4_data-Rate_4_model

Stan, Julia

2 8.7e-17 1.0 2.5 1.0 2.5 2.2 0 0 0 2.7 1.6e-16 1.9e-16 2.6.1 0.13.30 0.4.86
Rate_5_data-Rate_5_model

Stan, Julia

1 0.0 1.0 2.0 1.0 3.1 2.1 0 0 0 0.0 1.8e-16 2.4e-16 2.6.1 0.13.30 0.4.86
Survey_data-Survey_model

Stan, Julia

1 5.4e-16 1.0 1.1 2.8 6.8 1.0 1 40 20865 1.4e-14 7.4e-14 5.4e-14 2.6.1 0.13.30 0.4.86
arK-arK

Stan, Julia

7 1.4e-16 1.0 5.7 1.0 5.6 4.6 0 0 15 -15.0 1.4e-15 1.5e-15 2.6.1 0.13.30 0.4.86
arma-arma11

Stan, Julia

4 2.3e-16 1.0 3.8 1.0 5.4 5.5 0 0 5 -4.6 1.5e-16 2.6e-16 2.6.1 0.13.30 0.4.86
bball_drive_event_0-hmm_drive_0

Stan, Julia

6 1.2e-16 1.0 2.4 3.8 2.8 1.0 9 40 77 -2.3 4.0e-12 2.9e-12 2.6.1 0.13.30 0.4.86
bball_drive_event_1-hmm_drive_1

Stan, Julia

6 9.1e-16 1.0 2.6 2.7 2.5 1.0 11 44 84 760.0 3.2e-12 3.8e-12 2.6.1 0.13.30 0.4.86
bones_data-bones_model

Stan, Julia

13 1.3e-16 1.0 2.6 1.1 1.6 1.0 3 6 25 -47.0 1.4e-16 1.4e-16 2.6.1 0.13.30 0.4.86
diamonds-diamonds

Stan, Julia

26 1.2e-15 1.0 1.5 1.6 5.0 1.0 2 4 43 4600.0 2.1e-15 1.9e-15 2.6.1 0.13.30 0.4.86
dogs-dogs

Stan, Julia

3 4.2e-16 1.0 5.1 1.0 1.6 3.4 0 0 5 -14.0 1.8e-16 1.8e-16 2.6.1 0.13.30 0.4.86
dogs-dogs_hierarchical

Stan, Julia

2 1.1e-15 1.0 1.7 2.1 1.9 1.0 0 0 23 2.3e-13 1.2e-15 1.2e-15 2.6.1 0.13.30 0.4.86
dogs-dogs_log

Stan, Julia

2 5.0e-16 1.0 2.5 1.0 1.2 1.9 0 0 5 -9.2 3.7e-15 1.8e-15 2.6.1 0.13.30 0.4.86
dugongs_data-dugongs_model

Stan, Julia

4 1.7e-16 1.0 1.7 1.0 2.6 1.1 0 0 5 -22.0 1.7e-16 2.9e-16 2.6.1 0.13.30 0.4.86
earnings-earn_height

Stan, Julia

3 6.6e-16 1.0 9.3 1.0 16.0 7.4 0 0 25 0.0 8.2e-16 7.6e-16 2.6.1 0.13.30 0.4.86
earnings-log10earn_height

Stan, Julia

3 8.1e-16 1.0 9.5 1.0 12.0 5.7 0 0 25 0.0 1.0e-15 1.0e-15 2.6.1 0.13.30 0.4.86
earnings-logearn_height

Stan, Julia

3 8.1e-16 1.0 9.2 1.0 12.0 5.6 0 0 20 0.0 1.0e-15 9.9e-16 2.6.1 0.13.30 0.4.86
earnings-logearn_height_male

Stan, Julia

4 7.6e-16 1.0 2.8 1.0 20.0 6.7 0 0 29 0.0 9.7e-16 1.1e-15 2.6.1 0.13.30 0.4.86
earnings-logearn_interaction

Stan, Julia

5 1.0e-15 1.0 3.2 1.0 17.0 6.9 0 0 29 -1.2e-10 1.0e-15 9.0e-16 2.6.1 0.13.30 0.4.86
earnings-logearn_interaction_z

Stan, Julia

5 7.6e-16 1.0 3.2 1.0 19.0 7.6 0 0 24 0.0 1.0e-15 8.1e-16 2.6.1 0.13.30 0.4.86
earnings-logearn_logheight_male

Stan, Julia

4 6.9e-16 1.0 3.0 1.0 19.0 6.6 0 0 29 -1.8e-12 8.9e-16 9.5e-16 2.6.1 0.13.30 0.4.86
eight_schools-eight_schools_centered

Stan, Julia

10 2.1e-16 1.0 1.9 1.0 9.8 1.6 0 0 47 -23.0 1.8e-16 1.9e-16 2.6.1 0.13.30 0.4.86
eight_schools-eight_schools_noncentered

Stan, Julia

10 0.0 1.0 2.4 1.0 9.9 2.0 0 0 44 -23.0 3.5e-17 3.6e-17 2.6.1 0.13.30 0.4.86
election88-election88_full

Stan, Julia

90 9.1e-17 1.0 1.9 1.0 3.7 1.8 10 10 123 -23.0 1.1e-16 1.4e-16 2.6.1 0.13.30 0.4.86
garch-garch11

Stan, Julia

4 3.2e-16 1.0 3.4 1.0 1.9 2.6 0 0 0 0.0 5.0e-16 5.3e-16 2.6.1 0.13.30 0.4.86
gp_pois_regr-gp_pois_regr

Stan, Julia

13 1.8e-16 1.0 1.5 1.8 5.5 1.0 6 10 88 -21.0 2.3e-16 2.9e-16 2.6.1 0.13.30 0.4.86
gp_pois_regr-gp_regr

Stan, Julia

3 2.0e-16 1.0 1.9 1.6 4.4 1.0 9 15 62 -31.0 2.0e-16 2.1e-16 2.6.1 0.13.30 0.4.86
hmm_example-hmm_example

Stan, Julia

4 3.2e-16 1.0 2.8 2.2 1.0 2.2 7 23 28 94.0 1.8e-13 2.0e-13 2.6.1 0.13.30 0.4.86
hmm_gaussian_simulated-hmm_gaussian

Stan, Julia

14 6.9e-16 1.0 2.3 1.9 1.8 1.0 8 17 67 460.0 1.8e-12 2.1e-12 2.6.1 0.13.30 0.4.86
irt_2pl-irt_2pl

Stan, Julia

144 3.4e-16 1.0 1.5 1.2 1.9 1.0 21 34 501 15.0 8.2e-16 1.4e-12 2.6.1 0.13.30 0.4.86
kidiq-kidscore_interaction

Stan, Julia

5 4.0e-16 1.0 3.4 1.0 17.0 5.9 0 0 29 -0.92 6.2e-16 5.8e-16 2.6.1 0.13.30 0.4.86
kidiq-kidscore_momhs

Stan, Julia

3 5.3e-16 1.0 9.0 1.0 13.0 6.3 0 0 20 -0.92 5.0e-16 5.9e-16 2.6.1 0.13.30 0.4.86
kidiq-kidscore_momhsiq

Stan, Julia

4 3.8e-16 1.0 2.9 1.0 20.0 5.8 0 0 29 -0.92 5.7e-16 5.3e-16 2.6.1 0.13.30 0.4.86
kidiq-kidscore_momiq

Stan, Julia

3 4.1e-16 1.0 9.2 1.0 17.0 5.1 0 0 20 -0.92 5.5e-16 6.3e-16 2.6.1 0.13.30 0.4.86
kidiq_with_mom_work-kidscore_interaction_c

Stan, Julia

5 4.3e-16 1.0 3.4 1.0 16.0 6.3 0 0 24 0.0 4.5e-16 5.8e-16 2.6.1 0.13.30 0.4.86
kidiq_with_mom_work-kidscore_interaction_c2

Stan, Julia

5 4.8e-16 1.0 3.5 1.0 16.0 6.4 0 0 24 0.0 3.8e-16 4.3e-16 2.6.1 0.13.30 0.4.86
kidiq_with_mom_work-kidscore_interaction_z

Stan, Julia

5 4.4e-16 1.0 3.2 1.0 16.0 6.1 0 0 24 0.0 4.4e-16 4.3e-16 2.6.1 0.13.30 0.4.86
kidiq_with_mom_work-kidscore_mom_work

Stan, Julia

5 4.0e-16 1.0 3.2 1.0 16.0 6.1 0 0 24 0.0 6.0e-16 4.6e-16 2.6.1 0.13.30 0.4.86
kilpisjarvi_mod-kilpisjarvi

Stan, Julia

3 2.0e-16 1.0 7.0 1.0 9.7 4.1 0 0 20 -1.2 2.0e-16 2.3e-16 2.6.1 0.13.30 0.4.86
loss_curves-losscurve_sislob

Stan, Julia

15 1.8e-16 1.0 1.6 6.5 4.2 1.0 3 48 68 16.0 1.9e-16 1.8e-16 2.6.1 0.13.30 0.4.86
low_dim_gauss_mix-low_dim_gauss_mix

Stan, Julia

5 5.1e-16 1.0 2.2 1.0 1.6 1.4 4 9 38 920.0 3.9e-16 2.8e-16 2.6.1 0.13.30 0.4.86
low_dim_gauss_mix_collapse-low_dim_gauss_mix_collapse

Stan, Julia

5 5.2e-16 1.0 2.1 1.0 1.8 1.4 1 2 14 920.0 4.6e-16 3.0e-16 2.6.1 0.13.30 0.4.86
lsat_data-lsat_model

Stan, Julia

1006 1.5e-16 1.0 1.4 1.3 2.9 1.0 15 74 195 -28.0 1.1e-15 1.2e-15 2.6.1 0.13.30 0.4.86
mcycle_gp-accel_gp

Stan, Julia

66 3.3e-16 1.0 2.5 1.0 3.7 1.2 12 24 716 180.0 3.8e-16 3.7e-16 2.6.1 0.13.30 0.4.86
mcycle_splines-accel_splines

Stan, Julia

82 6.2e-16 1.0 2.5 1.0 3.6 1.3 11 22 102 190.0 6.2e-16 6.3e-16 2.6.1 0.13.30 0.4.86
mesquite-logmesquite

Stan, Julia

8 1.5e-16 1.0 3.6 14.0 3.2 1.0 0 84 29 0.0 2.0e-16 2.1e-16 2.6.1 0.13.30 0.4.86
mesquite-logmesquite_logva

Stan, Julia

5 1.8e-16 1.0 2.9 1.0 13.0 4.3 0 0 24 0.0 2.3e-16 2.4e-16 2.6.1 0.13.30 0.4.86
mesquite-logmesquite_logvas

Stan, Julia

8 1.3e-16 1.0 3.5 14.0 2.6 1.0 0 84 24 0.0 2.4e-16 2.2e-16 2.6.1 0.13.30 0.4.86
mesquite-logmesquite_logvash

Stan, Julia

7 1.8e-16 1.0 3.6 14.0 3.5 1.0 0 70 29 0.0 2.3e-16 2.3e-16 2.6.1 0.13.30 0.4.86
mesquite-logmesquite_logvolume

Stan, Julia

3 1.6e-16 1.0 7.3 1.0 13.0 3.7 0 0 25 0.0 2.2e-16 2.1e-16 2.6.1 0.13.30 0.4.86
mesquite-mesquite

Stan, Julia

8 1.5e-16 1.0 3.5 11.0 3.3 1.0 0 70 29 0.0 1.8e-16 1.7e-16 2.6.1 0.13.30 0.4.86
nes1972-nes

Stan, Julia

10 6.4e-16 1.0 3.8 6.1 2.4 1.0 0 88 29 0.0 1.3e-15 1.1e-15 2.6.1 0.13.30 0.4.86
nes1976-nes

Stan, Julia

10 6.1e-16 1.0 3.8 6.6 2.2 1.0 0 88 29 0.0 9.8e-16 1.2e-15 2.6.1 0.13.30 0.4.86
nes1980-nes

Stan, Julia

10 4.4e-16 1.0 3.9 6.2 2.1 1.0 0 88 29 0.0 8.6e-16 8.1e-16 2.6.1 0.13.30 0.4.86
nes1984-nes

Stan, Julia

10 6.0e-16 1.0 3.8 6.1 2.2 1.0 0 88 29 0.0 1.1e-15 1.1e-15 2.6.1 0.13.30 0.4.86
nes1988-nes

Stan, Julia

10 5.1e-16 1.0 3.9 6.0 2.1 1.0 0 88 29 4.5e-13 1.1e-15 1.0e-15 2.6.1 0.13.30 0.4.86
nes1992-nes

Stan, Julia

10 6.4e-16 1.0 3.9 6.2 2.1 1.0 0 88 29 -4.5e-13 1.1e-15 1.1e-15 2.6.1 0.13.30 0.4.86
nes1996-nes

Stan, Julia

10 6.0e-16 1.0 3.8 6.1 2.1 1.0 0 88 29 0.0 1.0e-15 1.0e-15 2.6.1 0.13.30 0.4.86
nes2000-nes

Stan, Julia

10 3.0e-16 1.0 4.0 6.1 2.1 1.0 0 56 29 0.0 6.7e-16 6.7e-16 2.6.1 0.13.30 0.4.86
nes_logit_data-nes_logit_model

Stan, Julia

2 0.0 1.0 1.0 2.2 4.5 1.0 1 2 29 0.0 5.8e-15 6.7e-15 2.6.1 0.13.30 0.4.86
normal_2-normal_mixture

Stan, Julia

3 1.0e-15 1.0 2.7 1.0 1.9 1.8 0 0 0 910.0 1.4e-16 1.2e-16 2.6.1 0.13.30 0.4.86
normal_5-normal_mixture_k

Stan, Julia

14 7.9e-16 1.0 1.8 3.1 2.8 1.0 2 10 35761 1600.0 8.1e-16 8.1e-16 2.6.1 0.13.30 0.4.86
ovarian-logistic_regression_rhs

Stan, Julia

3075 1.5e-13 1.0 1.3 2.0 4.1 1.0 3 23 58 -1700.0 8.0e-11 8.7e-11 2.6.1 0.13.30 0.4.86
pilots-pilots

Stan, Julia

18 0.0 1.0 2.3 1.0 7.3 1.2 2 4 67 0.0 1.7e-16 1.6e-16 2.6.1 0.13.30 0.4.86
prideprejudice_chapter-ldaK5

Stan, Julia

7714 5.5e-15 1.0 1.3 2.8 3.2 1.0 3 24 658743 -3.5e-10 5.2e-16 5.4e-16 2.6.1 0.13.30 0.4.86
prideprejudice_paragraph-ldaK5

Stan, Julia

15570 3.5e-15 1.0 1.4 2.8 2.7 1.0 4 32 694097 2.9e-11 3.0e-16 3.1e-16 2.6.1 0.13.30 0.4.86
prostate-logistic_regression_rhs

Stan, Julia

11935 4.4e-15 1.0 1.5 1.6 2.9 1.0 5 27 62 -6800.0 5.6e-15 5.4e-15 2.6.1 0.13.30 0.4.86
radon_all-radon_county_intercept

Stan, Julia

388 4.5e-15 1.0 4.1 1.0 11.0 12.0 0 0 10 11000.0 6.6e-15 1.2e-13 2.6.1 0.13.30 0.4.86
radon_all-radon_hierarchical_intercept_centered

Stan, Julia

391 3.9e-15 1.0 4.6 1.0 12.0 13.0 0 0 28 12000.0 8.7e-15 1.1e-13 2.6.1 0.13.30 0.4.86
radon_all-radon_hierarchical_intercept_noncentered

Stan, Julia

391 3.2e-15 1.0 4.6 1.0 9.5 11.0 1 2 29 12000.0 5.3e-15 9.7e-14 2.6.1 0.13.30 0.4.86
radon_all-radon_partially_pooled_centered

Stan, Julia

389 3.6e-15 1.0 3.6 1.0 12.0 11.0 0 0 18 12000.0 8.5e-15 1.6e-13 2.6.1 0.13.30 0.4.86
radon_all-radon_partially_pooled_noncentered

Stan, Julia

389 3.5e-15 1.0 3.7 1.0 9.6 11.0 1 2 19 12000.0 6.9e-15 1.2e-13 2.6.1 0.13.30 0.4.86
radon_all-radon_pooled

Stan, Julia

3 1.4e-14 1.0 33.0 1.0 12.0 20.0 0 0 20 12000.0 1.7e-14 1.9e-14 2.6.1 0.13.30 0.4.86
radon_all-radon_variable_intercept_centered

Stan, Julia

390 3.6e-15 1.0 4.2 1.0 11.0 12.0 0 0 13 12000.0 7.9e-15 1.1e-13 2.6.1 0.13.30 0.4.86
radon_all-radon_variable_intercept_noncentered

Stan, Julia

390 4.0e-15 1.0 4.4 1.0 8.1 11.0 1 2 14 12000.0 6.3e-15 9.7e-14 2.6.1 0.13.30 0.4.86
radon_all-radon_variable_intercept_slope_centered

Stan, Julia

777 3.7e-15 1.0 4.0 1.0 11.0 11.0 0 0 26 12000.0 6.1e-15 9.8e-14 2.6.1 0.13.30 0.4.86
radon_all-radon_variable_intercept_slope_noncentered

Stan, Julia

777 5.1e-15 1.0 4.2 1.0 6.8 8.6 2 4 28 12000.0 5.5e-15 8.2e-14 2.6.1 0.13.30 0.4.86
radon_all-radon_variable_slope_centered

Stan, Julia

390 1.4e-14 1.0 4.3 1.0 11.0 11.0 0 0 13 12000.0 1.1e-14 1.0e-13 2.6.1 0.13.30 0.4.86
radon_all-radon_variable_slope_noncentered

Stan, Julia

390 1.6e-14 1.0 4.5 1.0 7.7 10.0 1 2 14 12000.0 8.1e-15 7.8e-14 2.6.1 0.13.30 0.4.86
radon_mn-radon_county_intercept

Stan, Julia

87 6.9e-16 1.0 4.0 1.0 9.8 10.0 0 0 10 650.0 1.3e-15 1.0e-14 2.6.1 0.13.30 0.4.86
radon_mn-radon_hierarchical_intercept_centered

Stan, Julia

90 7.3e-16 1.0 4.6 1.0 11.0 11.0 0 0 28 840.0 8.3e-16 8.7e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_hierarchical_intercept_noncentered

Stan, Julia

90 6.4e-16 1.0 4.7 1.0 9.1 9.6 1 2 29 840.0 9.2e-16 9.0e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_partially_pooled_centered

Stan, Julia

88 8.4e-16 1.0 3.5 1.0 12.0 9.7 0 0 18 840.0 1.0e-15 7.1e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_partially_pooled_noncentered

Stan, Julia

88 8.3e-16 1.0 3.6 1.0 8.8 8.7 1 2 19 840.0 7.8e-16 8.4e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_pooled

Stan, Julia

3 1.2e-15 1.0 28.0 1.0 11.0 16.0 0 0 20 840.0 1.5e-15 1.3e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_variable_intercept_centered

Stan, Julia

89 7.1e-16 1.0 4.1 1.0 10.0 11.0 0 0 13 840.0 7.2e-16 8.3e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_variable_intercept_noncentered

Stan, Julia

89 8.0e-16 1.0 4.4 1.0 7.4 9.7 1 2 14 840.0 1.0e-15 8.2e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_variable_intercept_slope_centered

Stan, Julia

175 8.3e-16 1.0 3.9 1.0 11.0 9.3 0 0 26 840.0 7.4e-16 8.3e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_variable_intercept_slope_noncentered

Stan, Julia

175 8.6e-16 1.0 4.3 1.0 6.5 7.5 2 4 28 840.0 8.1e-16 8.0e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_variable_slope_centered

Stan, Julia

89 1.1e-15 1.0 4.2 1.0 10.0 10.0 0 0 13 840.0 9.7e-16 6.5e-15 2.6.1 0.13.30 0.4.86
radon_mn-radon_variable_slope_noncentered

Stan, Julia

89 1.1e-15 1.0 4.5 1.0 7.0 9.2 1 2 14 840.0 9.0e-16 7.0e-15 2.6.1 0.13.30 0.4.86
radon_mod-radon_county

Stan, Julia

389 0.0 1.0 6.5 1.0 7.4 2.3 0 0 33 0.0 6.8e-16 7.8e-16 2.6.1 0.13.30 0.4.86
rats_data-rats_model

Stan, Julia

65 9.7e-17 1.0 3.4 1.0 9.8 6.8 0 0 31 -9.2 4.5e-16 2.7e-15 2.6.1 0.13.30 0.4.86
sblrc-blr

Stan, Julia

6 2.6e-16 1.0 4.1 1.0 7.7 2.0 1 2 40 97.0 2.3e-16 2.2e-16 2.6.1 0.13.30 0.4.86
sblri-blr

Stan, Julia

6 2.3e-16 1.0 4.4 1.0 7.8 2.0 1 2 40 97.0 2.5e-16 2.0e-16 2.6.1 0.13.30 0.4.86
science_irt-grsm_latent_reg_irt

Stan, Julia

408 1.1e-16 1.0 1.7 1.5 2.6 1.0 10980 30202 49500 15.0 1.5e-16 1.4e-16 2.6.1 0.13.30 0.4.86
seeds_data-seeds_centered_model

Stan, Julia

26 1.5e-16 1.0 1.6 1.0 3.9 1.0 1 2 62 0.0 2.1e-16 2.3e-16 2.6.1 0.13.30 0.4.86
seeds_data-seeds_model

Stan, Julia

26 1.4e-16 1.0 1.3 1.1 4.2 1.0 0 0 58 -35.0 2.3e-16 2.2e-16 2.6.1 0.13.30 0.4.86
seeds_data-seeds_stanified_model

Stan, Julia

26 1.4e-16 1.0 1.4 1.1 4.1 1.0 0 0 58 0.0 2.5e-16 2.2e-16 2.6.1 0.13.30 0.4.86
sesame_data-sesame_one_pred_a

Stan, Julia

3 7.4e-16 1.0 9.4 1.0 12.0 5.9 0 0 20 0.0 1.7e-15 1.3e-15 2.6.1 0.13.30 0.4.86
surgical_data-surgical_model

Stan, Julia

14 1.5e-16 1.0 2.1 1.0 2.7 1.2 0 0 53 -14.0 1.2e-16 1.4e-16 2.6.1 0.13.30 0.4.86
three_docs1200-ldaK2

Stan, Julia

7 4.4e-16 1.0 1.5 4.8 4.8 1.0 2 34 24103 0.0 2.9e-16 3.0e-16 2.6.1 0.13.30 0.4.86
three_men1-ldaK2

Stan, Julia

502 1.7e-15 1.0 1.4 3.0 4.4 1.0 2 34 100137 2.2e-11 5.6e-16 5.8e-16 2.6.1 0.13.30 0.4.86
three_men2-ldaK2

Stan, Julia

510 1.6e-15 1.0 1.4 3.0 4.4 1.0 2 34 100281 -7.3e-12 4.1e-16 4.0e-16 2.6.1 0.13.30 0.4.86
three_men3-ldaK2

Stan, Julia

505 1.7e-15 1.0 1.4 3.1 4.5 1.0 2 34 100191 -7.3e-12 5.0e-16 4.6e-16 2.6.1 0.13.30 0.4.86
traffic_accident_nyc-bym2_offset_only

Stan, Julia

3845 1.1e-15 1.0 2.6 1.0 5.6 1.3 6 8 79 -0.65 1.4e-15 1.4e-15 2.6.1 0.13.30 0.4.86
wells_data-wells_daae_c_model

Stan, Julia

6 0.0 1.0 1.0 2.0 2.7 1.0 2 4 31 0.0 9.1e-16 9.3e-16 2.6.1 0.13.30 0.4.86
wells_data-wells_dae_c_model

Stan, Julia

5 1.5e-16 1.0 1.0 2.1 2.6 1.0 2 4 31 0.0 9.3e-16 9.9e-16 2.6.1 0.13.30 0.4.86
wells_data-wells_dae_inter_model

Stan, Julia

7 1.5e-16 1.0 1.0 2.0 2.6 1.0 2 4 31 0.0 9.8e-16 1.1e-15 2.6.1 0.13.30 0.4.86
wells_data-wells_dae_model

Stan, Julia

4 0.0 1.0 1.0 2.0 2.4 1.0 2 4 31 0.0 7.2e-16 8.5e-16 2.6.1 0.13.30 0.4.86
wells_data-wells_dist

Stan, Julia

2 2.6e-13 1.0 1.7 1.2 1.5 1.0 0 0 18 -4.6e-8 4.2e-11 3.6e-11 2.6.1 0.13.30 0.4.86
wells_data-wells_dist100_model

Stan, Julia

2 0.0 1.0 1.0 2.0 2.5 1.0 2 4 31 0.0 9.9e-16 8.8e-16 2.6.1 0.13.30 0.4.86
wells_data-wells_dist100ars_model

Stan, Julia

3 0.0 1.0 1.0 2.0 2.5 1.0 2 4 31 0.0 8.9e-16 7.8e-16 2.6.1 0.13.30 0.4.86
wells_data-wells_interaction_c_model

Stan, Julia

4 1.5e-16 1.0 1.0 2.0 2.6 1.0 2 4 31 0.0 1.1e-15 1.0e-15 2.6.1 0.13.30 0.4.86
wells_data-wells_interaction_model

Stan, Julia

4 0.0 1.0 1.0 2.0 2.5 1.0 2 4 31 0.0 8.6e-16 8.7e-16 2.6.1 0.13.30 0.4.86
Source Code
---
title: "Julia vs Stan performance comparison"
---

# Caveats

This page compares the performance of Julia's and Stan's log density and log density gradient computations for the implemented posteriors. Several caveats apply:

* The `posteriordb` Stan implementations were never meant to represent "perfect and best-performant" practices.
* The StanBlocks.jl implementations are not line-by-line translations of the Stan implementations. Sometimes small optimizations were applied, to make the implementation fall more in line with common Julia practices, or to make the code more friendly for Julia's AD packages, e.g. by avoiding mutation.
* Stan often automatically drops constant terms (unless configured differently), see [https://mc-stan.org/docs/reference-manual/statements.html#log-probability-increment-vs.-distribution-statement](https://mc-stan.org/docs/reference-manual/statements.html#log-probability-increment-vs.-distribution-statement), thus avoiding superfluous (for its purposes) computation, while the StanBlocks.jl implementations do not.
* Stan implements a lot of custom AD rules, while StanBlocks.jl does not at all, and Enzyme.jl does rarely (if ever?). I suspect that adding AD rules for `_glm_` type functions would further improve Julia's performance.
* The StanBlocks.jl "sampling" statements try to be clever about avoiding repeated computations. While I am not sure whether Stan applies the same optimizations, in principle it could do that without extra work by the user. 
* While preliminary benchmark runs included "all" Julia AD packages, all of them are almost always much slower than Enzyme.jl for the implemented posteriors, which on top of that performance advantage also supports more Julia language features than some of the other AD packages. As such, I am only comparing Enzyme and Stan. Enzyme outperforming every other AD package for *these* posteriors/loss functions does of course not mean that it will necessarily do as well for other applications.
* Enzyme's development is progressing quite quickly. While it currently sometimes crashes Julia, or it sometimes errors while trying to compute a gradient, in general Enzyme's performance and reliability are continuously and quickly improving.
* Stan's benchmark is done from Julia via `BridgeStan.jl`. While I think that any performance penalty should be extremely small, I am not 100% sure. BridgeStan uses the `-O3` compiler flag by default, but no additional ones.
* All benchmarks are happening with a single thread on my local machine.
* **There are probably more caveats!**

::: {.callout-warning}
**In general, doing performance comparisons is quite tricky, for more reasons than just the ones mentioned above. The below plot and tables should most definitely NOT be interpreted as "A is X-times faster than B".**
::: 
```{julia}
using Random, Chairmarks, Statistics, LinearAlgebra, Distributions, Plots 
import Plots, OnlineStats, StanBlocks, PosteriorDB
include("julia/common.jl")
struct RuntimeDistribution2{F}
    f::F
    n::Int64
end
RuntimeDistribution2(f) = RuntimeDistribution2(f, 1)
RuntimeDistribution2(f::RuntimeDistribution2, n::Int64) = RuntimeDistribution2(f.f, n * f.n)
@inline Random.rand(rng::AbstractRNG, d::RuntimeDistribution2) = begin
    stats = @timed begin
        rv = d.f(rng)
        Base.donotdelete(rv)
        for i in 1:d.n-1
            rv = max(rv, d.f(rng))
            Base.donotdelete(rv)
        end
    end
    (stats.time - stats.gctime) / d.n
end

struct IterableDistribution{R,D}
    rng::R
    dist::D
end
@inline Base.iterate(d::IterableDistribution) = rand(d.rng, d.dist)
struct UncertainMean{V,Q}
    var::V
    qmul::Q
end
round2(x) = round(x; sigdigits=2)
Base.show(io::IO, s::UncertainMean) = print(io, "[", round2(lower(s)), " -- ", round2(upper(s)), "] (via ", OnlineStats.nobs(s), " evaluations)")
Base.show(io::IO, ::MIME"text/plain", s::UncertainMean) = show(io, s)
UncertainMean(q::Real) = UncertainMean(OnlineStats.Variance(), quantile(Normal(), 1-q))
OnlineStats.fit!(s::UncertainMean, args...) = OnlineStats.fit!(s.var, args...)
OnlineStats.nobs(s::UncertainMean) = OnlineStats.nobs(s.var)
se(s::UncertainMean) = sqrt(var(s.var)/OnlineStats.nobs(s.var))
Statistics.mean(s::UncertainMean) = mean(s.var)
upper(s::UncertainMean) = mean(s) + atol(s)
lower(s::UncertainMean) = mean(s) - atol(s)
atol(s::UncertainMean) = s.qmul * se(s)
rtol(s::UncertainMean) = atol(s) / abs(mean(s))
Base.:isless(s1::UncertainMean, s2::UncertainMean) = upper(s1) < lower(s2) ? true : false#(upper(s2) < lower(s1) ? false : missing)
adaptive_mean3(args...; q=.025, kwargs...) = adaptive_mean3(args, [UncertainMean(q) for arg in args]; kwargs...)
adaptive_mean3(args::Tuple, means::Vector; n_min=10, n_max=1_000_000, rtol=.01) = begin
    N = length(args)
    vmeans = map(mean, means)
    perm = sortperm(vmeans)
    draws = map(iterate, args)
    n_start = 1+minimum(OnlineStats.nobs, means)
    n_start > 1 && display("Resuming at $n_start")
    for i in n_start:n_max
        draws = map(iterate, args)
        for (m, d) in zip(means, draws)
            OnlineStats.fit!(m, d)
        end
        i < n_min && continue
        map!(mean, vmeans, means)
        sortperm!(perm, vmeans)
        is_sorted = all(i->means[perm[i-1]] < means[perm[i]], 2:N)
        is_precise = all(m->Main.rtol(m) < rtol, means)
        if is_sorted && is_precise
            display("Stopping early at i=$i")
            break
        end
    end
    means
end
firstfinite(f) = ff(args...) = while true
    rv = f(args...)
    isfinite(rv) && return rv
end
nonzero(x) = x == zero(x) ? one(x) : x
adaptive_median(f::Function; n_min=100, n_max=1000, rtol=.001, q=.5) = begin 
    ff = firstfinite(f)
    vals = [ff()]
    w = [1.]
    qs = [0.]
    rv = vals[1]
    for i in 2:n_max
        val = ff()
        insert!(vals, searchsortedfirst(vals, val), val)
        # @assert issorted(vals)
        rv = quantile(vals, q; sorted=true)
        push!(w, 0.)
        push!(qs, 0.)
        rand!(w)
        @. w = -log1p(-w)
        cumsum!(qs, w)
        qm = q * qs[end]
        ridx = searchsortedfirst(qs, qm)
        rep = if ridx == 1
            vals[1]
        else
            lidx = ridx-1
            wl = (qm - qs[lidx]) / (qs[ridx] - qs[lidx])
            wr = 1 - wl
            wl * vals[lidx] + wr * vals[ridx]
        end
        rel_err = norm((rv - rep)/nonzero(max(norm(rv),norm(rep))))
        i < n_min && continue
        # Stopping criterion should be more sophisticated
        if rel_err < rtol
            display((;vals, rv, rep, rel_err))
            display("Stopping early at i=$i")
            break
        end 
    end
    rv
end

skip_names = split("""
uk_drivers-state_space_stochastic_level_stochastic_seasonal
timssAusTwn_irt-gpcm_latent_reg_irt
synthetic_grid_RBF_kernels-kronecker_gp
state_wide_presidential_votes-hierarchical_gp   
soil_carbon-soil_incubation 
sir-sir 
sat-hier_2pl    
rstan_downloads-prophet 
one_comp_mm_elim_abs-one_comp_mm_elim_abs   
mnist_100-nn_rbm1bJ10   
mnist-nn_rbm1bJ100  
iohmm_reg_simulated-iohmm_reg   
hudson_lynx_hare-lotka_volterra 
fims_Aus_Jpn_irt-2pl_latent_reg_irt
ecdc0501-covid19imperial_v3     
ecdc0501-covid19imperial_v2 
ecdc0401-covid19imperial_v3 
ecdc0401-covid19imperial_v2 
dogs-dogs_nonhierarchical   
butterfly-multi_occupancy   
""")
jdf = map(posterior_names) do posterior_name
    posterior_name in skip_names && return (;posterior_name)
    try
        e = PosteriorEvaluation(posterior_name)
        e.df_row
    catch e
        isa(e, InterruptException) && rethrow()
        @error posterior_name e
        rethrow()
        return (;posterior_name)
    end
end |> filter(!isnothing) |> pad_missing |> DataFrame;
```
:::{.column-page}
 
# Visualization

::: {.callout-warning}
**In general, doing performance comparisons is quite tricky, for more reasons than just the ones mentioned above. The below plot and tables should most definitely NOT be interpreted as "A is X-times faster than B".**
:::

The below plot shows the relative primitive runtime (x-axis, Julia vs Stan, left: Julia is faster) and the relative gradient runtime (y-axis, Julia+X vs Stan, bottom: Julia is faster) for the `posteriordb` models for which the [overview table](#tabular-data) has a value less than `1e-8` in the `median relative ulpdf error` column. **The color of the points represents the Julia AD framework used**, which currently includes [Enzyme.jl](https://github.com/EnzymeAD/Enzyme.jl) and [Mooncake.jl](https://github.com/compintell/Mooncake.jl).
Hovering over the data points will show the posterior name, its dimension, the allocations required by Julia during the primitive and gradient run and a short explanation, e.g. `mesquite-logmesquite_logvash (D=7, #allocs=0->70) - Julia's primitive is ~4.5 times faster, but Enzyme's gradient is ~16.0 times slower.` **Any time spent on garbage collection has been subtracted from the measured wall times. All mean runtime estimates were run until the estimated standard error of the mean was smaller than roughly .5% of the estimated mean. Due to this, I have removed the credible intervals and standard errors from the plot and table.**
```{julia}
hover_string(posterior_name, dimension, ptime, gtime, pallocs, gallocs; AD) = begin  
    pdescr = if ptime > 1
        "$(round(ptime; sigdigits=2)) times slower" 
    else
        "$(round(inv(ptime); sigdigits=2)) times faster"
    end
    gdescr = if gtime > 1
        "$(round(gtime; sigdigits=2)) times slower"
    else
        "$(round(inv(gtime); sigdigits=2)) times faster"
    end
    jdescr = if (ptime > 1) == (gtime > 1) 
        "and"
    else
        "but"
    end
    descr = "Julia's primitive is ~$pdescr, $jdescr $AD's gradient is ~$gdescr."
    "$(posterior_name) (D=$(dimension), #allocs=$(round(pallocs))->$(round(gallocs))) <br> $descr"
end
usable_row(row) = !any(ismissing, (row.stan_gradient_times, row.julia_lpdf_times, row.mooncake_times))
pdf = jdf = DataFrame(filter(usable_row, eachrow(jdf)))
pdf.ptime = mean.(pdf.julia_lpdf_times) ./ mean.(pdf.stan_lpdf_times)
pdf.etime = mean.(pdf.enzyme_times) ./ mean.(pdf.stan_gradient_times) 
pdf.mtime = mean.(pdf.mooncake_times) ./ mean.(pdf.stan_gradient_times)

colors = palette(:tab10)[[1,2]]
Plots.vline!(
Plots.hline!(
    Plots.scatter(
        pdf.ptime,
        [pdf.etime, pdf.mtime];
        label=["Enzyme" "Mooncake"],
        color=permutedims(colors),
        hover=hcat(
            hover_string.(pdf.posterior_name, pdf.dimension, pdf.ptime, pdf.etime, pdf.julia_allocations, pdf.enzyme_allocations; AD="Enzyme"), 
            hover_string.(pdf.posterior_name, pdf.dimension, pdf.ptime, pdf.mtime, pdf.julia_allocations, pdf.mooncake_allocations; AD="Mooncake") 
        ),
        xlabel="Relative primitive runtime\n(Julia vs Stan, left: Julia is faster)", 
        ylabel="Relative gradient runtime\n(Julia+X vs Stan, bottom: Julia+X is faster)",
        title="Relative mean runtimes, EXCLUDING GARBAGE COLLECTION",
        xscale=:log10, yscale=:log10, 
        size=(1000, 600)
    ),
    [1], color=:black, label="", hover=""
),
    [1], color=:black, label="", hover=""
)
```
:::

# Tabular data

The below table shows information about the implemented posteriors. Will elaborate on the exact meaning of columns.

:::{.column-screen}

```{julia}
ternary(c, t, f) = c ? t : f 
hl_best = HtmlHighlighter(
    (data, i, j) -> data[i,j] === 1.,
    HtmlDecoration(color="blue", font_weight="bold")
);
hl_failed = HtmlHighlighter(
    (data, i, j) -> ((data[i,j]==="FAILED")),
    HtmlDecoration(color = "red")
);
relative!(args...) = begin
    for i in eachindex(args[1])
        setindex!.(args, getindex.(args, i) ./ minimum(getindex.(args, i)), i)
    end
    args
end
julia_lpdf_times, stan_lpdf_times = relative!(mean.(jdf.julia_lpdf_times), mean.(jdf.stan_lpdf_times))
enzyme_times, mooncake_times, stan_gradient_times = relative!(mean.(jdf.enzyme_times), mean.(jdf.mooncake_times), mean.(jdf.stan_gradient_times))
pretty_table( 
    DataFrame(OrderedDict(  
        "posterior name"=>jdf.posterior_name,
        "implementations"=>implementations_string.(jdf.posterior_name),
        "dimension"=>jdf.dimension,
        "median relative ulpdf error"=>round2.(jdf.lpdf_accuracy),
        "relative mean primitive Julia runtime"=>round2.(julia_lpdf_times),
        "relative mean primitive Stan runtime"=>round2.(stan_lpdf_times),
        "relative mean Enzyme runtime"=>round2.(enzyme_times),
        "relative mean Mooncake runtime"=>round2.(mooncake_times),
        "relative mean Stan gradient runtime"=>round2.(stan_gradient_times),
        "primitive Julia allocations"=>jdf.julia_allocations,
        "Enzyme allocations"=>jdf.enzyme_allocations,
        "Mooncake allocations"=>jdf.mooncake_allocations,
        "median lpdf difference"=>round2.(jdf.lpdf_difference),
        "median Enzyme relative gradient error"=>round2.(jdf.enzyme_accuracy),
        "median Mooncake relative gradient error"=>round2.(jdf.mooncake_accuracy), 
        "Bridgestan"=>jdf.BRIDGESTAN_VERSION,
        "Enzyme"=>jdf.ENZYME_VERSION,
        "Mooncake"=>jdf.MOONCAKE_VERSION,
    ));
    backend=Val(:html),
    highlighters=(hl_best, hl_failed,), 
    show_subheader=false, 
    table_class="interactive"
)
```
:::