<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Stan on Methods Bites</title>
    <link>https://socialsciencedatalab.mzes.uni-mannheim.de/tags/stan/</link>
    <description>Recent content in Stan on Methods Bites</description>
    <generator>Hugo -- gohugo.io</generator>
    <lastBuildDate>Thu, 30 Jan 2020 01:00:00 +0100</lastBuildDate>
    
        <atom:link href="https://socialsciencedatalab.mzes.uni-mannheim.de/tags/stan/index.xml" rel="self" type="application/rss+xml" />
    
    
    <item>
      <title>Applied Bayesian Statistics Using Stan and R</title>
      <link>https://socialsciencedatalab.mzes.uni-mannheim.de/article/applied-bayesian-statistics/</link>
      <pubDate>Thu, 30 Jan 2020 01:00:00 +0100</pubDate>
      
      <guid>https://socialsciencedatalab.mzes.uni-mannheim.de/article/applied-bayesian-statistics/</guid>
      <description><![CDATA[
        </p>
<p>Whether researchers occasionally turn to Bayesian statistical methods out of convenience or whether they firmly subscribe to the Bayesian paradigm for philosophical reasons: The use of Bayesian statistics in the social sciences is becoming increasingly widespread. However, seemingly high entry costs still keep many applied researchers from embracing Bayesian methods. Next to a lack of familiarity with the underlying conceptual foundations, the need to implement statistical models using specific programming languages remains one of the biggest hurdles. In this <a href="https://socialsciencedatalab.mzes.uni-mannheim.de/categories/tutorials/">Methods Bites Tutorial</a>, <a href="https://twitter.com/denis_cohen">Denis Cohen</a> provides an applied introduction to Stan, a platform for statistical modeling and Bayesian statistical inference.</p>
<p>Readers will learn about:</p>
<ul>
<li>fundamental concepts in Bayesian statistics</li>
<li>the Stan programming language</li>
<li>the R interface RStan</li>
<li>the workflow for Bayesian model building, inference, and convergence diagnosis</li>
<li>additional R packages that facilitate statistical modeling using Stan</li>
</ul>
<p>Through numerous applied examples, readers will also learn how to write and run their own models.</p>
<p>This blog post is based on Denis’ workshop in the <a href="https://socialsciencedatalab.mzes.uni-mannheim.de/page/about/">MZES Social Science Data Lab</a> in Spring 2019. The original workshop materials can be found on our <a href="https://github.com/SocialScienceDataLab/Stan_Intro">GitHub</a>.</p>
<div id="contents" class="section level5">
<h5>Contents</h5>
<ol style="list-style-type: decimal">
<li><a href="#stan">Stan</a>
<ol style="list-style-type: decimal">
<li><a href="#what-is-stan">What Is Stan?</a></li>
<li><a href="#why-stan">Why Stan?</a></li>
</ol></li>
<li><a href="#bayesian-fundamentals">Bayesian Fundamentals</a>
<ol style="list-style-type: decimal">
<li><a href="#likelihood-function">Likelihood Function</a></li>
<li><a href="#prior-distribution">Prior Distribution</a></li>
<li><a href="#posterior-distribution">Posterior Distribution</a></li>
<li><a href="#example-flipping-a-coin-200-times">Example: Flipping a Coin 200 Times</a></li>
<li><a href="#markov-chain-monte-carlo-mcmc">Markov Chain Monte Carlo (MCMC)</a></li>
</ol></li>
<li><a href="#applied-bayesian-statistics-using-stan-and-r">Applied Bayesian Statistics Using Stan and R</a>
<ol style="list-style-type: decimal">
<li><a href="#the-bayesian-workflow">The Bayesian Workflow</a></li>
<li><a href="#step-1-specification">Step 1: Specification</a></li>
<li><a href="#step-2-model-building">Step 2: Model Building</a></li>
<li><a href="#step-3-validation">Step 3: Validation</a></li>
<li><a href="#step-4-inference">Step 4: Inference</a></li>
<li><a href="#step-5-convergence-diagnostics">Step 5: Convergence Diagnostics</a></li>
</ol></li>
<li><a href="#additional-interfaces">Additional Interfaces</a>
<ol style="list-style-type: decimal">
<li><a href="#rstanarm">rstanarm</a></li>
<li><a href="#brms">brms</a></li>
</ol></li>
<li><a href="#concluding-remarks">Concluding Remarks</a>
<ol style="list-style-type: decimal">
<li><a href="#reproducibility">Reproducibility</a></li>
<li><a href="#summary">Summary</a></li>
</ol></li>
<li><a href="#about-the-presenter">About the Presenter</a></li>
<li><a href="#further-reading">Further Reading</a></li>
<li><a href="#references">References</a></li>
</ol>
</div>
<div id="setup" class="section level5">
<h5>Setup</h5>
<p>Setting up Stan and its R interface RStan can be somewhat time-consuming as it requires the installation of a C++ compiler. Readers should follow <a href="https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started">these instructions</a> on the Stan Development Team’s GitHib to install and configure the <a href="https://cran.r-project.org/web/packages/rstan/index.html"><strong>rstan</strong></a> package and its prerequisites on their operating system.</p>
<p>The installation of some additional packages is necessary for working through this tutorial and for exploring further opportunities for applied Bayesian modeling using RStan.</p>
<details>
<p><summary> Code: Packages used in this tutorial</summary></p>
<pre class="r"><code>## Packages
pkgs &lt;- c(&quot;rstan&quot;, &quot;bayesplot&quot;, &quot;shinystan&quot;, &quot;coda&quot;, &quot;dplyr&quot;)

## Install uninstalled packages
lapply(pkgs[!(pkgs %in% installed.packages())], install.packages)

## Load all packages to library
lapply(pkgs, library, character.only = TRUE)</code></pre>
</details>
<p><br /></p>
</div>
<div id="stan" class="section level3">
<h3>Stan</h3>
<div id="what-is-stan" class="section level5">
<h5>What Is Stan?</h5>
<p>In the words of the developers,</p>
<blockquote>
<p><font size="-1">
Stan is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business. Users specify log density functions in Stan’s probabilistic programming language and get:</p>
<ul>
<li>full Bayesian statistical inference with MCMC sampling (NUTS, HMC)</li>
<li>approximate Bayesian inference with variational inference (ADVI)</li>
<li>penalized maximum likelihood estimation with optimization (L-BFGS)</li>
</ul>
</font>
</blockquote>
<div style="text-align: right">
<p><sub><sup>
Source: <a href="https://mc-stan.org/" class="uri">https://mc-stan.org/</a>
</sub></sup></p>
</div>
</div>
<div id="why-stan" class="section level5">
<h5>Why Stan?</h5>
<p>Stan is an open-source software that provides an intuitive language for statistical modeling along with fast and stable algorithms for fully Bayesian inference. The software offers high flexibility with only few limitations. The development process is highly transparent and publicly documented on the <a href="https://github.com/stan-dev/stan">Stan Development Repository</a> on GitHub.</p>
<p>Users benefit from various helpful resources. This includes, firstly, the extensive official documentation of Stan’s functionality in the <a href="https://mc-stan.org/docs/2_19/stan-users-guide/index.html">User’s Guide</a>, <a href="https://mc-stan.org/docs/2_19/reference-manual/index.html">Language Reference Manual</a>, and <a href="https://mc-stan.org/docs/2_19/functions-reference/index.html">Language Functions Reference</a>. Secondly, users benefit from a large and active online community in the <a href="https://discourse.mc-stan.org/">Stan Forums</a> and on <a href="https://stackoverflow.com/questions/tagged/stan">Stack Overflow</a>, where many members of the <a href="https://mc-stan.org/about/team/">Stan Development Team</a> regularly address users’ questions and troubleshoot their code.
Additionally, an expanding collection of <a href="https://mc-stan.org/users/documentation/case-studies.html">case studies</a>, <a href="https://mc-stan.org/users/documentation/tutorials.html">tutorials</a>, <a href="https://mc-stan.org/users/documentation/external.html">papers and textbooks</a> offer valuable inputs for both new and experienced users.</p>
<p>The Stan language is compatible with various editors for syntax highlighting, formatting, and checking (incl. <a href="https://www.rstudio.com/">RStudio</a> and <a href="https://www.gnu.org/software/emacs/">Emacs</a>).</p>
</div>
<div id="rstan-and-other-stan-interfaces" class="section level5">
<h5>RStan and Other Stan Interfaces</h5>
<p>Stan is interfaced with numerous softwares, including</p>
<ul>
<li>RStan (R)</li>
<li>PyStan (Python)</li>
<li>CmdStan (shell, command-line terminal)</li>
<li>MatlabStan (MATLAB)</li>
<li>Stan.jl (Julia)</li>
<li>StataStan (Stata)</li>
<li>MathematicaStan (Mathematica)</li>
<li>ScalaStan (Scala)</li>
</ul>
<p>While this blog post illustrates the use of the R interface RStan, users with other preferences may use the corresponding interface to call Stan from their preferred software.</p>
<p>Aside from the widespread popularity of R for programming, statistical computing and graphics, one of the primary reasons why we focus on RStan in this blog post is the availability of a broad range of packages that facilitate the use of RStan for applied researchers. These include</p>
<ul>
<li><a href="https://cran.r-project.org/package=rstan"><strong>rstan</strong></a>: General R Interface to Stan</li>
<li><a href="https://cran.r-project.org/package=shinystan"><strong>shinystan</strong></a>: Interactive Visual and Numerical Diagnostics and Posterior Analysis for Bayesian Models</li>
<li><a href="https://cran.r-project.org/web/packages/bayesplot/index.html"><strong>bayesplot</strong></a>: Plotting functions for posterior analysis, model checking, and MCMC diagnostics.</li>
<li><a href="https://cran.r-project.org/package=brms"><strong>brms</strong></a>: Bayesian Regression Models using ‘Stan’, covering a growing number of model types</li>
<li><a href="https://cran.r-project.org/package=rstanarm"><strong>rstanarm</strong></a>: Bayesian Applied Regression Modeling via Stan, with an emphasis on hierarchical/multilevel models</li>
<li><a href="https://cran.r-project.org/package=edstan"><strong>edstan</strong></a>: Stan Models for Item Response Theory</li>
<li><a href="https://cran.r-project.org/package=rstantools"><strong>rstantools</strong></a>: Tools for Developing R Packages Interfacing with ‘Stan’</li>
</ul>
</div>
</div>
<div id="bayesian-fundamentals" class="section level3">
<h3>Bayesian Fundamentals</h3>
<p>We start our discussions of the fundamental concepts of Bayesian statistics and inference with the following excerpt:</p>
<blockquote>
<font size="-1">
In the Bayesian world the unobserved quantities are assigned distributional properties and, therefore, become random variables in the analysis. These distributions come in two basic flavors. If the distribution of the unknown quantity is not conditioned on fixed data, it is called prior distribution because it describes knowledge prior to seeing data. Alternatively, if the distribution is conditioned on data that we observe, it is clearly updated from the unconditioned state and, therefore, more informed. This distribution is called posterior distribution. […] The punchline is this: All likelihood-based models are Bayesian models in which the prior distribution is an appropriately selected uniform prior, and as the size of the data gets large they are identical given any finite appropriate prior. So such empirical researchers are really Bayesian; they just do not know it yet.
</font>
</blockquote>
<div style="text-align: right">
<p><sub><sup>
Source: <span class="citation">Gill and Witko (2013)</span>
</sub></sup></p>
</div>
<p>As we have just learned, Bayesians, contrary to the frequentist paradigm, treat data as known and fixed, whereas parameters are considered unknown random quantities. Bayesians express their knowledge about parameters distributionally. To build some intuition about this, we consider the classical example of a series of coin tosses below. In doing so, we introduce and discuss three fundamental concepts in Bayesian statistics: The likelihood, the prior distribution, and the posterior distribution. Readers already familiar with these concepts may continue reading <a href="#applied-bayesian-statistics-using-stan-and-r">here</a>.</p>
<div id="likelihood-function" class="section level5">
<h5>Likelihood function</h5>
<p>A likelihood function is an assignment of a parametric form for the data, <span class="math inline">\(\mathbf{y}\)</span>. This involves stipulating a data generating process, where we specify a probability density function (pdf) or probability mass function (pmf) governed by a (set of) parameter(s) <span class="math inline">\(\theta\)</span>. This is commonly denoted <span class="math inline">\(p(\mathbf{y}|\theta)\)</span>. Given <span class="math inline">\(\theta\)</span>, one can specify the relative likelihood of observing a given value <span class="math inline">\(y\)</span>. Treating the observed values <span class="math inline">\(\mathbf{y}\)</span> as given yields the logical inversion “which unknown <span class="math inline">\(\theta\)</span> most likely produces the known data <span class="math inline">\(\mathbf{y}\)</span>?”. This is what we call the likelihood function, often denoted <span class="math inline">\(L(\theta | \mathbf{y})\)</span>. In practice, Bayesian practitioners often use the notations <span class="math inline">\(p(\mathbf{y}|\theta)\)</span> and <span class="math inline">\(L(\theta | \mathbf{y})\)</span> interchangeably. Here, we will stick to the former.</p>
<p>How can we think of the likelihood function in our present example? First, we need to think about the data generating process that produces the results of our series of coin flips. Coin flips produce a series of binary outcomes, i.e., a series of heads and tails. We can think of these as realizations of a series of Bernoulli trials following a binomial distribution. The binomial distribution characterizes a series of independent realizations of Bernoulli trials, where a probability parameter <span class="math inline">\(\pi\)</span> governs the number of successes, <span class="math inline">\(k\)</span>, out of the total number of trials, <span class="math inline">\(n\)</span>. It is this probability parameter <span class="math inline">\(\pi\)</span> that characterizes the fairness of the coin. For example, a value of <span class="math inline">\(\pi=0.5\)</span> would indicate a perfectly fair coin, equally likely to produce heads or tails.</p>
<p>As <span class="math inline">\(\pi\)</span> determines the probability mass of <span class="math inline">\(k\)</span> for any number of flips, <span class="math inline">\(n\)</span>, in the coin flip example, we can think of it as the equivalent to what we referred to as <span class="math inline">\(\theta\)</span> in the general notation above. When we flip a coin <span class="math inline">\(n\)</span> times and observe <span class="math inline">\(k\)</span> heads (and, thus, <span class="math inline">\(n-k\)</span> tails), we can think of the data as the number of heads, <span class="math inline">\(k\)</span>, given the total number of flips, <span class="math inline">\(n\)</span>. This is the equivalent to what we referred to as <span class="math inline">\(\mathbf{y}\)</span> in the general notation. So given a fixed number of flips, <span class="math inline">\(k\)</span> is generated according to a binomial distribution governed by the parameter <span class="math inline">\(\pi\)</span>: <span class="math inline">\(k \sim \text{Binomial}(n, \pi)\)</span>. The parameter <span class="math inline">\(\pi\)</span>, then, is our unknown quantity of interest.</p>
</div>
<div id="prior-distribution" class="section level5">
<h5>Prior distribution</h5>
<p>The prior distribution is a distributional characterization of our belief about a parameter prior to seeing the data. We denote the corresponding probability function as <span class="math inline">\(p(\theta)\)</span>. The prior distribution can be either (1) substantively informed by previous research or expert assessment, (2) purposefully vague, and thus, rather uninformative, or (3) weakly informative, purposefully assigning low density for unreasonable value ranges of a given parameter without otherwise conveying substantive information. Specifying a prior distribution includes statements about the distribution’s <em>family</em>, <em>density</em>, and <em>support</em>.</p>
<p>What does this mean in the context of our example? Prior to first flipping the coin, we may have a (more or less specific) belief about the parameter <span class="math inline">\(\pi\)</span> that characterizes the fairness of the coin. In absence of evidence that suggests otherwise, we would likely think that the coin was fair, though we would probably also reserve some skepticism and acknowledge that the coin might in fact not be fair. So how can this belief be expressed distributionally?</p>
<p>We know that <span class="math inline">\(\pi\)</span> is a probability parameter. Probabilities cannot lie outside of the unit interval, i.e., <span class="math inline">\(\pi \in [0, 1]\)</span>. This defines the <em>support</em> of our prior distribution. We also know that probabilities can take any real number inside the unit interval. So we need a probability density function that governs the relative likelihood of <span class="math inline">\(\pi\)</span> for every possible value inside the unit interval. The <em>beta distribution</em> is an ideal candidate because it ticks both boxes, offering a continuous probability distribution over the unit interval. This gives us the <em>family</em> of our prior distribution.</p>
<p>This leaves us with the <em>density</em>. After all, we have a weakly informative belief that the coin is more likely to be fair than it is to be (extremely) unfair. In other words, we want our prior probability density of <span class="math inline">\(\pi\)</span> to be highest at the value <span class="math inline">\(0.5\)</span> and lower the farther we move away from <span class="math inline">\(0.5\)</span>. Given our support, the probability will be deterministically equal to zero for values <span class="math inline">\(\pi&lt;0\)</span> and <span class="math inline">\(\pi&gt;1\)</span>. So how do we get there? The trick is to specify suitable <em>hyperparameters</em> for the beta distribution that governs the prior probability density of our parameter <span class="math inline">\(\pi\)</span>. As the beta distribution is defined by two shape parameters, <span class="math inline">\(\alpha\)</span> and <span class="math inline">\(\beta\)</span>, we need to choose values for the two that yield a distributional shape that conforms to our prior belief. As we can see below, <span class="math inline">\(\alpha = 5\)</span> and <span class="math inline">\(\beta = 5\)</span> are suitable candidates:</p>
<details>
<p><summary> Code: Defining and plotting the prior distribution</summary></p>
<pre class="r"><code>len.pi &lt;- 1001L                      ### number of candidate values for pi
pi &lt;- seq(0, 1, length.out = len.pi) ### candidate values for pi
a &lt;- b &lt;- 5                          ### hyperparameters
prior &lt;- dbeta(pi, a, b)             ### prior distribution

## Plot
plot(                                ### set up empty plot, specify labels
  pi, prior,
  type = &#39;n&#39;,
  xlab = &quot;Density&quot;,
  ylab = expression(paste(&quot;Prior Distribution for &quot;, pi))
)
polygon(                             ### draw density distribution
  c(rep(0, length(pi)), pi),
  c(prior, rev(prior)),
  col = adjustcolor(&#39;red&#39;, alpha.f = .4),
  border = NA
)
abline(                              ### add vertical at pi = 0.5 
  v = .5,
  col = &#39;white&#39;
)</code></pre>
</details>
<p><img src="/../../../../../article/applied-bayesian-statistics_files/figure-html/coin-sim0-print-1.png" width="75%" style="display: block; margin: auto;" /></p>
</div>
<div id="posterior-distribution" class="section level5">
<h5>Posterior distribution</h5>
<p>The posterior distribution results from updating our prior belief about the parameter(s), <span class="math inline">\(p(\theta)\)</span>, through the observed data included in the likelihood function, <span class="math inline">\(p(\mathbf{y}|\theta)\)</span>. It thus yields our distributional belief about <span class="math inline">\(\theta\)</span> given the data: <span class="math inline">\(p(\theta | \mathbf{y})\)</span>. The underlying calculation follows the proportional version of <a href="https://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes’ Law</a>: <span class="math inline">\(p(\theta | \mathbf{y}) \propto p(\theta) \times p(\mathbf{y}|\theta)\)</span>. By multiplying the prior probability density function and the likelihood function, we get the posterior probability function of <span class="math inline">\(\theta\)</span>. It provides us with a weighthed combination of likelihood and prior: The prior pulls the posterior density toward the center of gravity of the prior distribution, but as the data grows large, the likelihood becomes increasingly influential and eventually dominates the prior.</p>
<p>What does this mean in the context of our example of a series of coin flips? As discussed above, we start out with our prior belief summarized by the following beta distribution: <span class="math inline">\(p(\pi) \sim \text{beta}(\alpha=5,\beta=5)\)</span>. After every coin flip <span class="math inline">\(i=1,...,n\)</span>, we update our belief about <span class="math inline">\(\pi\)</span> by multiplying the prior pdf with the probability mass function of the binomial distribution, which is given by <span class="math inline">\(p(n,k|\pi) = {n \choose k} \pi^k (1-\pi)^{(n-k)}\)</span>. As <span class="math inline">\(n\)</span> grows large, this latter component becomes increasingly influential in determining the posterior distribution.</p>
<p>As the <a href="https://youtu.be/hKYvZF9wXkk">beta distribution is conjugate to the binomial likelihood</a> (i.e., in the same probability distribution family), the resulting posterior distribution, <span class="math inline">\(p(\pi|n,k)\)</span>, will also be a beta distribution with updated hyperparameters <span class="math inline">\(\alpha^{\prime}\)</span> and <span class="math inline">\(\beta^{\prime}\)</span>: <span class="math inline">\(p(\pi|n,k) \sim \text{beta}(\alpha^{\prime}=\alpha+k,\beta^{\prime}=\beta+n-k)\)</span>. These updated hyperparameters then determine the probability density of the resulting posterior distribution after every additional coin flip of the series.</p>
</div>
<div id="example-flipping-a-coin-200-times" class="section level5">
<h5>Example: Flipping a Coin 200 Times</h5>
<p>Suppose we flip a coin up to 200 times. Unbeknownst to us, the coin is far from fair – it is on average four times as likely to produce heads as it is to produce tails – that is, <span class="math inline">\(\pi=0.8\)</span>. We slowly learn about this in the process of flipping the coin repeatedly, keeping score of the number of flips <span class="math inline">\(n\)</span> and the number of heads <span class="math inline">\(k\)</span> after each flip. This is called <em>Bayesian updating</em>.</p>
<p>The code below implements this experiment. It simulates a series of <span class="math inline">\(n=200\)</span> coin flips and records the number of heads <span class="math inline">\(k_i\)</span> at every <span class="math inline">\(i\)</span>th flip. Based on this information, it retrieves the analytical solutions for the posterior mean along with its 95% credible interval at every turn.</p>
<details>
<p><summary> Code: Simulating the experiment</summary></p>
<pre class="r"><code>set.seed(20190417)                   ### set seed for replicability
len.pi &lt;- 1001L                      ### number of candidate values for pi
pi &lt;- seq(0, 1, length.out = len.pi) ### candidate values for pi
a &lt;- b &lt;- 5                          ### hyperparameters
n &lt;- 200                             ### num. of coin flips
pi_true &lt;- .8                        ### true parameter
data &lt;- rbinom(n, 1, pi_true)        ### n coin flips
posterior &lt;- matrix(NA, 3L, n)       ### matrix container for posterior

for (i in seq_len(n)) {    
  current.sequence &lt;- data[1:i]      ### sequence up until ith draw
  k &lt;- sum(current.sequence)         ### number of heads in current sequence
  
  ##### Updating
  a.prime &lt;- a + k               
  b.prime &lt;- b + i - k
  
  ### Analytical means and credible intervals
  posterior[1, i] &lt;- a.prime / (a.prime + b.prime)
  posterior[2, i] &lt;- qbeta(0.025, a.prime, b.prime)
  posterior[3, i] &lt;- qbeta(0.975, a.prime, b.prime)
}

## Plot
plot(                                ### set up empty plot with labels
  1:n, 1:n,
  type = &#39;n&#39;,
  xlab = &quot;Number of Coin Flips&quot;,
  ylab = expression(
    paste(
      &quot;Posterior Means of &quot;,
      pi, &quot; (with 95% Credible Intervals)&quot;,
      sep = &quot; &quot;
    )
  ),
  ylim = c(0, 1),
  xlim = c(1, n)
)
abline(                              ### reference line for the true pi
  h = c(.5, .8),
  col = &quot;gray80&quot;
)
rect(-.5, qbeta(0.025, 5, 5),        ### prior mean + interval at i = 0
     0.5, qbeta(0.975, 5, 5),
     col = adjustcolor(&#39;red&#39;, .4),
     border = adjustcolor(&#39;red&#39;, .2))
segments(-.5, .5,
         0.5, .5,
         col = adjustcolor(&#39;red&#39;, .9),
         lwd = 1.5)
polygon(                             ### posterior means + intervals
  c(seq_len(n), rev(seq_len(n))),
  c(posterior[2, ], rev(posterior[3, ])),
  col = adjustcolor(&#39;blue&#39;, .4),
  border = adjustcolor(&#39;blue&#39;, .2)
)
lines(
  seq_len(n),
  posterior[1, ],
  col = adjustcolor(&#39;blue&#39;, .9),
  lwd = 1.5
)</code></pre>
</details>
<p><img src="/../../../../../article/applied-bayesian-statistics_files/figure-html/coin-sim2-1.png" width="75%" style="display: block; margin: auto;" /></p>
<p>The plot above shows the prior distribution with its 95% credible interval at <span class="math inline">\(i=0\)</span> (in red) and the updated posterior distributions with their 95% credible intervals at every coin flip <span class="math inline">\(i=1,...,n\)</span>. As we can see, even after just a couple of coin flips, the posterior distribution departs from the center of gravity of the prior distribution and converges toward the proportion of heads, <span class="math inline">\(\frac{k}{n}\)</span>, in the data. After <span class="math inline">\(n=200\)</span> coin flips, we have k=173 heads, a proportion of 0.865. By chance, this happens to be higher than the underlying true parameter value. After 200 flips, our posterior mean and its corresponding 95% interval are 0.848 (0.796, 0.893), which shows how strongly the likelihood has come to dominate the prior.</p>
</div>
<div id="markov-chain-monte-carlo-mcmc" class="section level5">
<h5>Markov Chain Monte Carlo (MCMC)</h5>
<p>Attentive readers may have noticed that one buzzword frequently used in the context of applied Bayesian statistics – Markov Chain Monte Carlo (MCMC), an umbrella term for algorithms used for sampling from a posterior distribution – has been entirely absent from the coin flip example. Instead of using such MCMC algorithms, we have relied on an analytical solution for the posterior, exploiting the conjugacy of the beta and binomial distributions and the fact that this simple example with a single parameter (i.e., a unidimensional parameter space) allowed us to get the desired results with some quick and simple math.</p>
<p>For complex multi-dimensional posterior distributions, however, finding analytical solutions through integration becomes cumbersome, if not outright impossible. That’s where numerical approximation through MCMC algorithms comes in. MCMC algorithms are iterative computational processes that allow us to explore and describe posterior distributions. Essentially, we let <a href="https://en.wikipedia.org/wiki/Markov_chain">Markov Chains</a> wander through the parameter space. These should eventually, following an initial warmup period, converge to high-density regions in the underlying posterior distribution. The frequency of “times” (iterations) that the chain visits a given region of multidimensional parameter space gives a stochastic simulation of the posterior probability density in that region. Marginalizing the joint multidimensional posterior distribution with respect to a given single parameter, then, gives the posterior distribution for that parameter.</p>
<p>Some of the most frequently used MCMC algorithms include</p>
<ol style="list-style-type: decimal">
<li><strong>Gibbs Sampler</strong>: Draws iteratively through a complete set of conditional probability statements of each estimated parameter.</li>
<li><strong>Metropolis-Hastings</strong>: Considers a single multidimensional move on each iteration depending on the quality of the proposed candidate draw.</li>
<li><strong>Hamiltonian Monte Carlo (HMC)</strong>, used in Stan:</li>
</ol>
<blockquote>
<sub>
The Hamiltonian Monte Carlo algorithm starts at a specified initial set of parameters <span class="math inline">\(\theta\)</span>; in Stan, this value is either user-specified or generated randomly. Then, for a given number of iterations, a new momentum vector is sampled and the current value of the parameter <span class="math inline">\(\theta\)</span> is updated using the leapfrog integrator with discretization time <span class="math inline">\(\epsilon\)</span> and number of steps <span class="math inline">\(L\)</span> according to the Hamiltonian dynamics. Then a Metropolis acceptance step is applied, and a decision is made whether to update to the new state <span class="math inline">\((\theta^{\ast},\rho{\ast})\)</span> or keep the existing state.
</sub>
</blockquote>
<div style="text-align: right">
<p><sub><sup>
Source: <a href="https://mc-stan.org/docs/2_19/reference-manual/hamiltonian-monte-carlo.html">Stan Reference Manual, Section 14.1</a>
</sub></sup></p>
</div>
<p>Readers interested in MCMC algorithms may want to consult the referenced section in the Stan Reference Manual and Chapter 9 of <span class="citation">Gill (2015)</span>.</p>
</div>
</div>
<div id="applied-bayesian-statistics-using-stan-and-r" class="section level3">
<h3>Applied Bayesian Statistics Using Stan and R</h3>
<div id="the-bayesian-workflow" class="section level5">
<h5>The Bayesian Workflow</h5>
<p>Before we jump into the applications, we need to discuss the Bayesian workflow. Following <a href="https://rpubs.com/jimsavage/stanintro">Savage (2016)</a>, the typical workflow can be summarized as follows:</p>
<ol style="list-style-type: decimal">
<li><strong>Specification</strong>: Specify the full probability model including
<ul>
<li>Data</li>
<li>Parameters</li>
<li>Priors</li>
<li>Likelihood</li>
</ul></li>
<li><strong>Model Building</strong>: Translate the model into code</li>
<li><strong>Validation</strong>: Validate the model with fake data</li>
<li><strong>Fitting</strong>: Fit the model to actual data</li>
<li><strong>Diagnosis</strong>: Check generic and algorithm-specific diagnostics to assess convergence</li>
</ol>
<p>We thus start with the notational <em>specification</em> of a probability model and its translation into Stan <em>code</em> (steps 1 + 2). Unless we use ‘canned’ solutions, i.e. packages that generate accurate model code for us, <em>validation</em> of our Stan program using artifical data is utterly important: It allows us to test whether our model accurately retrieves prespecified parameters that we used for simulating the artificial data in the first place (step 3). Once this hurdle has been cleared, can we move on to <em>fitting</em>: The process of estimating the model parameters based on an actual data sample (step 4). In order to trust the corresponding estimates, we must then <em>diagnose</em> our estimates, using both generic and algorithm-specific diagnostic tools (step 5). In the following, we work through these steps using the example of the linear model.</p>
</div>
<div id="step-1-specification" class="section level5">
<h5>Step 1: Specification</h5>
<p>Stan requires that we be explicit about the known and unknown quantities in our model. This not only requires distinguishing the quantities we know from those we don’t know but also declaring object types (such as integer or real scalars, vectors, matrices, and arrays) and their respective dimensions. In this regard, it is worth quickly reviewing four different (yet fully equivalent) ways of the denoting the linear model formula: The scalar, row-vector, column-vector, and matrix forms.</p>
<ol style="list-style-type: decimal">
<li>Scalar form: <span class="math display">\[y_i = \beta_1 x_{i1} + \beta_2 x_{i2} + \beta_3 x_{i3} + \epsilon_i \text{ for all } i=1,...,N\]</span></li>
<li>Row-vector form: <span class="math display">\[y_i = \mathbf{x_i^{\prime}} \mathbf{\beta} + \epsilon_i  \text{ for all } i=1,...,N\]</span></li>
<li>Column-vector form: <span class="math display">\[\mathbf{y} = \beta_1 \mathbf{x_{1}} + \beta_2 \mathbf{x_{2}} + \beta_3 \mathbf{x_{3}} + \mathbf{\epsilon}\]</span></li>
<li>Matrix form: <span class="math display">\[\mathbf{y = X \beta + \epsilon}\]</span></li>
</ol>
<p>Of course, all four notations are based on the same quantities, which are compactly denoted by an outcome vector <span class="math inline">\(\mathbf{y}\)</span>, a model matrix <span class="math inline">\(\mathbf{X}\)</span>, a coefficient vector <span class="math inline">\(\beta\)</span>, and a vector of idiosyncratic error terms <span class="math inline">\(\epsilon\)</span> in the matrix form. The first three variants only differ from the fourth in that they separate the multiplicative and additive operations entailed in <span class="math inline">\(\mathbf{X \beta}\)</span> across the columns, rows, or cells of <span class="math inline">\(\mathbf{X}\)</span>. Hence, we can most compactly denote our objects according to the matrix form. Note that the model matrix typically contains a leading column of 1’s to multiply the intercept, <span class="math inline">\(\beta_1\)</span>. Therefore, <span class="math inline">\(x_{i1}\)</span> in the scalar form and <span class="math inline">\(\mathbf{x_{1}}\)</span> in the column-vector form are merely placeholders for 1’s and may be omitted notationally.</p>
<p>Let’s start by denoting the likelihood component of our model, summarized by the three core components of every generalized linear model:</p>
<ul>
<li>family: <span class="math inline">\(\mathbf{y} \sim \text{Normal}(\mu, \sigma)\)</span></li>
<li>(inverse) link function: <span class="math inline">\(\text{id}(\mu) = \mu\)</span></li>
<li>linear component: <span class="math inline">\(\mu = \mathbf{X} \beta\)</span></li>
</ul>
<p>In words, we stipulate that the data be normally distributed with mean <span class="math inline">\(\mu = \mathbf{X} \beta\)</span> and variance <span class="math inline">\(\sigma^2\)</span>. So what are our known and unknown quantities? The unknown quantities are</p>
<ul>
<li><span class="math inline">\(\beta\)</span>, the coefficient vector</li>
<li><span class="math inline">\(\sigma\)</span>, the scale parameter of the normal</li>
<li><span class="math inline">\(\mu\)</span>, the location parameter of the normal</li>
</ul>
<p>whereas the known quantities include</p>
<ul>
<li><span class="math inline">\(\mathbf{y}\)</span>, the outcome vector</li>
<li><span class="math inline">\(\mathbf{X}\)</span>, the model matrix</li>
</ul>
<p>as well as the dimensions of <span class="math inline">\(\mathbf{y}_{N \times 1}\)</span> and <span class="math inline">\(\mathbf{X}_{N \times K}\)</span> and the dimensions of <span class="math inline">\(\beta_{K \times 1}\)</span>, <span class="math inline">\(\sigma\)</span> (a scalar), and <span class="math inline">\(\mu_{N \times 1}\)</span>.</p>
</div>
<div id="step-2-model-building" class="section level5">
<h5>Step 2: Model Building</h5>
<div id="stan-program-blocks" class="section level6">
<h6>Stan Program Blocks</h6>
<p>Stan programs are defined in terms of several <a href="https://mc-stan.org/docs/2_19/reference-manual/overview-of-stans-program-blocks.html">program blocks</a>:</p>
<ol style="list-style-type: decimal">
<li><strong>Functions</strong>: Declare user-written functions</li>
<li><strong>Data</strong>: Declare all known quantities</li>
<li><strong>Transformed Data</strong>: Transform declared data inputs</li>
<li><strong>Parameters</strong>: Declare all unknown quantities</li>
<li><strong>Transformed Parameters</strong>: Transform declared parameters</li>
<li><strong>Model</strong>: Transform parameters, specify prior distributions and likelihoods to define posterior</li>
<li><strong>Generated Quantities</strong>: Generate quantities derived from the updated parameters without feedback into the likelihood</li>
</ol>
<p>The parameter and model blocks are strictly required in order to define the sampling space and draw from the corresponding posterior distribution. Usually, applications also feature data inputs in the data block. The remaining four blocks are optional: Users may or may not specify their own functions, transform the initial data inputs, specify transformed parameters, or compute generated quantities.</p>
<p>Importantly, the blocks have different logics with respect to variable scopes. Functions, (transformed) data and (transformed) parameters have global scope: Once defined, they can be accessed and used in other program blocks. The scope of variables defined in the model and generated quantities blocks, in contrast, is local: Data and parameters defined here can only be accessed within the respective block.</p>
<p>The blocks also differ with respect to execution timing. Functions and (transformed) data are declared once and passed to each of the chains. The processing of the initially declared parameters in the transformed parameters block and definition of the posterior in the model block are evaluated at every so-called <a href="https://mc-stan.org/docs/2_21/reference-manual/hamiltonian-monte-carlo.html">leapfrog step</a> during every iteration of the HMC algorithm. The generated quantities block, in contrast, is only evaluated once per iteration, i.e., it uses the parameter values found as a result of the multi-step operations of the algorithm within each given iteration. Thus, for reasons of computational efficiency, quantities of interest that are derived from our (transformed) parameters but do not need to feed back into the likelihood (e.g., expected values or average marginal effects) should be declared and computed in the generated quantities block.</p>
</div>
<div id="model-scripts" class="section level6">
<h6>Model Scripts</h6>
<p>For the following example, start with a blank script in your preferred code editor and save it as <code>lm.stan</code>. Using the <code>stan</code> suffix will enable syntax highlighting, formatting, and checking in RStudio and Emacs. Throughout the remainder of this subsection, we are coding in the Stan language: We separate declarations and statements by <code>;</code>, type comments after <code>//</code>, and end each script with a blank line. The language is fully documented in the <a href="https://mc-stan.org/docs/2_19/functions-reference/index.html">Stan Language Reference Manual</a> and the <a href="https://mc-stan.org/docs/2_19/functions-reference/index.html">Stan Language Functions Reference</a>.</p>
<p>In the <strong>data block</strong>, we declare all known quantities, including data types, dimensions, and constraints:</p>
<ul>
<li><span class="math inline">\(\mathbf{y}\)</span>, the outcome vector of length <span class="math inline">\(N\)</span></li>
<li><span class="math inline">\(\mathbf{X}\)</span>, the model matrix of dimensions <span class="math inline">\(N \times K\)</span> (including a leading columns of 1’s to multiply the intercept)</li>
</ul>
<pre class="stan"><code>data {
  int&lt;lower=1&gt; N; // num. observations
  int&lt;lower=1&gt; K; // num. predictors
  matrix[N,K] x;  // model matrix
  vector[N] y;    // outcome vector
}</code></pre>
<p>As we can see, we need to declare the integers <span class="math inline">\(N\)</span> and <span class="math inline">\(K\)</span> before we can specify the dimensions of the objects <span class="math inline">\(\mathbf{X}\)</span> and <span class="math inline">\(\mathbf{y}\)</span>.</p>
<p>Next, we use the <strong>parameters block</strong> to declare all primitive unknown quantities, including their respective storage types, dimensions, and constraints:</p>
<ul>
<li><span class="math inline">\(\beta\)</span>, the coefficient vector of length <span class="math inline">\(K\)</span></li>
<li><span class="math inline">\(\sigma\)</span>, the scale parameter of the normal, a non-negative real number</li>
</ul>
<pre class="stan"><code>parameters {
  vector[K] beta;      // coef vector
  real&lt;lower=0&gt; sigma; // scale parameter
}</code></pre>
<p>In the <strong>transformed parameters block</strong>, we declare and specify unknown transformed quantities, including storage types, dimensions, and constraints. In the following example, we use the transformed parameters block to compute our linear prediction, <span class="math inline">\(\mu = \mathbf{X} \beta\)</span>.</p>
<p>Note that we could just as well declare <span class="math inline">\(\mu\)</span> in the model block – or not declare <span class="math inline">\(\mu\)</span> as a variable at all but simply supply <span class="math inline">\(\mathbf{X} \beta\)</span> in the log-likelihood instead. While all of these approaches would yield the same posterior, specifying <span class="math inline">\(\mu\)</span> in the transformed parameters block makes an important difference: It ensures that <span class="math inline">\(\mu\)</span> is stored as a global variable. As a result, samples of <span class="math inline">\(\mu\)</span> will also be stored in the resulting output object.</p>
<pre class="stan"><code>transformed parameters {
  vector[N] mu;  // declare
  mu = x * beta; // assign
}</code></pre>
<p>Lastly, we use the <strong>model block</strong> to declare and specify our sampling statements:</p>
<ul>
<li><span class="math inline">\(\beta_k \sim \text{Normal}(0, 10) \text{ for } k = 1,...,K\)</span>; i.e., we assign every <span class="math inline">\(\beta\)</span> coefficient an independent normal prior with a mean of 0 and standard deviation of 10</li>
<li><span class="math inline">\(\sigma \sim \text{Cauchy}^{+}(0, 5)\)</span>; i.e., we assign the scale parameter a Cauchy prior with a location of 0 and a scale of 5. Given that we have constrained the support for <span class="math inline">\(\sigma\)</span> to non-negative values, the values will effectively be sampled from a half-Cauchy distribution</li>
<li><span class="math inline">\(\mathbf{y} \sim \text{Normal}(\mu, \sigma)\)</span>; i.e., we specify a normal log-likelihood, where every observation <span class="math inline">\(y_i\)</span> follows a normal distribution with mean <span class="math inline">\(\mu_i\)</span> and standard deviation <span class="math inline">\(\sigma\)</span>.</li>
</ul>
<pre class="stan"><code>model {
  // priors
  beta ~ normal(0, 10);  // priors for beta
  sigma ~ cauchy(0, 5);  // prior for sigma
  
  // log-likelihood
  target += normal_lpdf(y | mu, sigma);
}</code></pre>
<p>Putting all model blocks together in a single script then gives us our first Stan program.</p>
<details>
<p><summary> Code: Full Stan program for the linear model</summary></p>
<pre class="stan"><code>data {
  int&lt;lower=1&gt; N; // num. observations
  int&lt;lower=1&gt; K; // num. predictors
  matrix[N,K] x; // model matrix
  vector[N] y;    // outcome vector
}

parameters {
  vector[K] beta;      // coef vector
  real&lt;lower=0&gt; sigma; // scale parameter
}

transformed parameters {
  vector[N] mu;  // declare lin. pred.
  mu = x * beta; // assign lin. pred.
}

model {
  // priors
  beta ~ normal(0, 10);  // priors for beta
  sigma ~ cauchy(0, 5);  // prior for sigma
  
  // log-likelihood
  target += normal_lpdf(y | mu, sigma);
}
</code></pre>
</details>
<p><br />
Before we proceed with the next steps of validation, fitting, and diagnosis, we discuss two extensions below: Specifying a <a href="#extension-1-weights">weighted log-likelihood</a> and <a href="#extension-2-standardized-data">standardizing our data</a> as part of our program. These also illustrate the use of the <strong>functions</strong>, <strong>transformed data</strong>, and <strong>generated quantities</strong> blocks. Readers who wish to skip these extensions may continue reading <a href="#validation-and-inference">here</a>.</p>
</div>
<div id="extension-1-weights" class="section level6">
<h6>Extension 1: Weights</h6>
<p>Using sampling, design, or poststratification weights is common practice among survey researchers. Weights ensure that observations do not contribute to the log-likelihood equally, but proportionally to their idiosyncratic weights. We can easily incorporate this feature into our Stan program for the linear model. The hack is fairly straightforward: It requires the definition of a new function as well as a single-line modification in the model block.</p>
<p>Before we define a function that allows us to retrieve the weighted log-likelihood, we first need to understand the built-in function for the unweighted log-likelihood, <code>normal_lpdf</code>, actually does. <code>normal_lpdf</code> defines the log of the <a href="https://en.wikipedia.org/wiki/Normal_distribution">normal probability density function</a> (pdf) and sums across the resulting values of all observations, which returns a scalar:</p>
<p><span class="math display">\[ \mathtt{normal\_lpdf(y | mu, sigma)} = \sum_{i=1}^{N}\frac{1}{2} \log (2 \pi \sigma^2) \Big( \frac{y_i-\mu_i}{\sigma}\Big)\]</span></p>
<p>The fact that <code>normal_lpdf</code> sums across all observations is somewhat problematic for our purposes. To include weights, we need to weight every single entry in the log normal pdf prior to aggregation. In other words, we need a length-<span class="math inline">\(N\)</span> vector of <span class="math inline">\(\mathtt{normal\_lpdf}\)</span> values that we can then multiply with a length-<span class="math inline">\(N\)</span> vector of weights before we sum across all observations. We therefore define a new function that returns the point-wise log normal pdf:</p>
<pre class="stan"><code>functions {
  vector pw_norm(vector y, vector mu, real sigma) {
    return -0.5 * (log(2 * pi() * square(sigma)) + 
                     square((y - mu) / sigma));
  }
}</code></pre>
<p>The remaining modifications are straightforward. First, we declare a vector of length <span class="math inline">\(N\)</span> with idiosyncratic weights in the data block. We then use this vector in the model block, where we take the dot product of the <code>weights</code> and the vector of log-likelihood entries generated by <code>pw_norm</code>:</p>
<pre class="stan"><code>data {
  ...
  vector&lt;lower=0&gt;[N] weights;  // weights
}

...

model {
  ...
  
  // weighted log-likelihood
  target += dot_product(weights, pw_norm(y, mu, sigma));
}
</code></pre>
<p>The dot product returns the sum of the pairwise products of entries of both vectors, which gives us our weighted log-likelihood:</p>
<p><span class="math display">\[\mathtt{dot\_product(weights, pw\_norm(y, mu, sigma))} = \\ \sum_{i=1}^{N}\mathtt{weights}_i \times \frac{1}{2} \log (2 \pi \sigma^2) \Big( \frac{y_i-\mu_i}{\sigma}\Big)\]</span></p>
</div>
<div id="extension-2-standardized-data" class="section level6">
<h6>Extension 2: Standardized Data</h6>
<p>For our second extension, we follow <a href="https://mc-stan.org/docs/2_19/stan-users-guide/standardizing-predictors-and-outputs.html">this example</a> from the Stan User’s Guide. Here, we use the transformed data block to standardize our outcome variable and predictors. Standardization can help us boost the efficiency of our model by allowing the model to converge toward the posterior distribution faster.</p>
<p>Our initial data inputs remain the same as in the original example: We declare <span class="math inline">\(N\)</span>, <span class="math inline">\(K\)</span>, <span class="math inline">\(\mathbf{X}\)</span> and <span class="math inline">\(\mathbf{y}\)</span> in the data block. We then use the transformed data block to standardize both <span class="math inline">\(\mathbf{y}\)</span> and every column of <span class="math inline">\(\mathbf{X}\)</span> (except the leading column of 1’s that multiplies the intercept). This means that for each variable, we first subtract its mean from each value and then divide the centered values by the variable’s standard deviation.</p>
<pre class="stan"><code>transformed data {
  vector[K-1] sd_x;       // std. dev. of predictors (excl. intercept)
  vector[K-1] mean_x;     // mean of predictors (excl. intercept)
  vector[N] y_std;        // std. outcome 
  matrix[N,K] x_std = x;  // std. predictors
  
  y_std = (y - mean(y)) / sd(y);
  for (k in 2:K) {
    mean_x[k-1] = mean(x[,k]);
    sd_x[k-1] = sd(x[,k]);
    x_std[,k] = (x[,k] - mean_x[k-1]) / sd_x[k-1]; 
  }
}</code></pre>
<p>As a result of standardization, the posterior distributions of our estimated parameters change. This is, however, no cause for concern: We can simply use the generated quantities block to transform these parameters back to their original scale. Before we get there, we simply declare the parameters that we retrieve using the standardized data in the parameters block (named <code>beta_std</code> and <code>sigma_std</code> here) and complete the transformed parameters and model blocks analogous to the original example, using the standardized data, <code>x_std</code> and <code>y_std</code>, and the alternative parameters, <code>beta_std</code>, <code>sigma_std</code> and <code>mu_std</code>.</p>
<pre class="stan"><code>parameters {
  vector[K] beta_std;      // coef vector
  real&lt;lower=0&gt; sigma_std; // scale parameter
}

transformed parameters {
  vector[N] mu_std;          // declare lin. pred.
  mu_std = x_std * beta_std; // assign lin. pred.
}

model {
  // priors
  beta_std ~ normal(0, 10);  // priors for beta
  sigma_std ~ cauchy(0, 5);  // prior for sigma
  
  // log-likelihood
  target += normal_lpdf(y_std | mu_std, sigma_std); // likelihood
}</code></pre>
<p>Using our draws for the alternative parameters <code>beta_std</code> and <code>sigma_std</code>, we can then use a little algebra to retrieve the original parameters:</p>
<ul>
<li><span class="math inline">\(\beta_1 = \text{sd}(y) \Big(\beta_1^{\text{std}} - \sum_{k=2}^{K} \beta_k^{\text{std}} x_k^{\text{std}}\Big) + \bar{\mathbf{y}}\)</span></li>
<li><span class="math inline">\(\beta_k = \beta_k^{\text{std}} \frac{\text{sd}(y)}{\text{sd}(x_k)} \text{ for } k = 2,...,K\)</span></li>
<li><span class="math inline">\(\sigma = \text{sd}(y) \sigma^{\text{std}}\)</span></li>
</ul>
<p>These calculations are implemented in the generated quantities block below:</p>
<pre class="stan"><code>generated quantities {
  vector[K] beta;          // coef vector
  real&lt;lower=0&gt; sigma;     // scale parameter
  
  beta[1] = sd(y) * (beta_std[1] - 
    dot_product(beta_std[2:K], mean_x ./ sd_x)) + mean(y);
  beta[2:K] = beta_std[2:K] ./ sd_x * sd(y);
  sigma = sd(y) * sigma_std;
}</code></pre>
</div>
</div>
<div id="step-3-validation" class="section level5">
<h5>Step 3: Validation</h5>
<p>Validation and inference are two sides of the same coin. Whereas validation means that we estimate a Stan program on the basis of fake data, inference means that we estimate the same model on actual data to retrieve posterior distributions that speak to substantively meaningful questions.</p>
<p>Why start with fake data? Generating fake data allows us to mimic the data generating process underlying our model: By generating artificial predictors and arbitrarily choosing the ‘true’ model parameters, we can simulate the data generating process stipulated by our model to compute values of our outcome variable that we should observe given the predictors, the model parameters, and the likelihood function.</p>
<p>In turn, running our model using the simulated outcomes and artificial predictors should return parameter estimates close to the ‘true’ model parameters from which the outcome variable was simulated. If this is not the case, we know something went wrong: Assuming that our data-generating program is correct, there must be a problem in our model program. We should then go back to the model program and make sure that all parts of the script, including functions, data, parameterization, transformations, and the likelihood, are correctly specified.</p>
<p>The R code below implements our data-generating program: It simulates fake data which we will use to validate our Stan program for the linear model. After setting a seed for reproducibility, we simulate a model matrix <span class="math inline">\(\mathbf{X}\)</span> with <span class="math inline">\(N=10000\)</span> rows and <span class="math inline">\(K=5\)</span> columns. Next to a leading column of 1’s, this matrix has four predictors generated from independent standard normal distributions. Next, we generate the true model parameters from their respective prior distributions: The <span class="math inline">\(\beta\)</span> vector and the scale coefficient <span class="math inline">\(\sigma\)</span>. We then retrieve the linear prediction, <span class="math inline">\(\mu = \mathbf{X} \beta\)</span>, and use <span class="math inline">\(\mu\)</span> and <span class="math inline">\(\sigma\)</span> to simulate <span class="math inline">\(\mathbf{y}\)</span> according to the likelihood of our model, <span class="math inline">\(\mathbf{y} \sim \text{N}(\mu,\sigma)\)</span>.</p>
<details>
<p><summary> Code: Fake data generation</summary></p>
<pre class="r"><code>set.seed(20190417)
N.sim &lt;- 10000L                               ### num. observations
K.sim &lt;- 5L                                   ### num. predictors
x.sim &lt;- cbind(                               ### model matrix
  rep(1, N.sim), 
  matrix(rnorm(N.sim * (K.sim - 1)), N.sim, (K.sim - 1))
  )
beta.sim &lt;- rnorm(K.sim, 0, 10)               ### coef. vector
sigma.sim &lt;- abs(rcauchy(1, 0, 5))            ### scale parameter
mu.sim &lt;- x.sim %*% beta.sim                  ### linear prediction
y.sim &lt;- rnorm(N.sim, mu.sim, sigma.sim)      ### simulated outcome</code></pre>
</details>
<p><br />
Now that we have our fake data, we can set up for the next step, model validation. First, we load the <code>rstan</code> package and adjust some options to improve performance. Secondly, we collect the data we want to pass to our Stan program in a list named <code>standat.sim</code>. <strong>It is important that the object names in this list match the data declared in the data block of our program</strong>. This means that all data declared in the model block must be included in the list, named exactly as in the model script, and must have matching object and storage types (e.g., matrices should be matrices, integers should be integers, etc). Lastly, we compile our linear model using the <code>stan_model()</code> function. In this step, Stan uses a C++ compiler that translates the Stan program to C++ code.</p>
<pre class="r"><code>## Setup
library(rstan)
rstan_options(auto_write = TRUE)             ### avoid recompilation of models
options(mc.cores = parallel::detectCores())  ### parallelize across all CPUs
Sys.setenv(LOCAL_CPPFLAGS = &#39;-march=native&#39;) ### improve execution time

## Data (see data block) as list
standat.sim &lt;- list(
  N = N.sim,
  K = K.sim,
  x = x.sim,
  y = y.sim
)

## C++ Compilation
lm.mod &lt;- stan_model(file = &quot;code/lm.stan&quot;)</code></pre>
<p>We are now ready to run our model. Using the <code>sampling()</code> command, we retrieve a total of 2000 samples from the posterior distribution of <code>pars = c("beta", "sigma")</code>. Specifically, we let <code>chains = 4L</code> run in parallel across <code>cores = 4L</code> for <code>iter = 2000L</code>, each using <code>algorithm = "NUTS"</code>, the No U-Turn Sampler variant of the Hamiltonian Monte Carlo algorithm. We then discard the first <code>warmup = 1000L</code> samples of each chain and thin the remaining samples of each chain by a factor of <code>thin = 2L</code>. For an explanation of additional options, see <code>?rstan::sampling</code>.</p>
<pre class="r"><code>lm.sim &lt;- sampling(lm.mod,                            ### compiled model
                   data = standat.sim,                ### data input
                   algorithm = &quot;NUTS&quot;,                ### algorithm
                   control = list(                    ### control arguments
                     adapt_delta = .85
                     ),
                   save_warmup = FALSE,               ### discard warmup samples
                   sample_file = NULL,                ### no sample file
                   diagnostic_file = NULL,            ### no diagnostic file
                   pars = c(&quot;beta&quot;, &quot;sigma&quot;),         ### select parameters
                   iter = 2000L,                      ### iter per chain
                   warmup = 1000L,                    ### warmup period
                   thin = 2L,                         ### thinning factor
                   chains = 4L,                       ### num. chains
                   cores = 4L,                        ### num. cores
                   seed = 20190417)                   ### seed</code></pre>
<p>The <code>sampling()</code> command generates a fitted Stan model of class <code>stanfit</code>, which we have stored as an object named <code>lm.sim</code>. Next to the samples from the posterior distributions of all chains (stored under <code>lm.sim@sim$samples</code>), this object contains extensive information on the specification of our Stan model and the (default) inputs to the <code>sampling()</code> command. The full structure of the <code>stanfit</code> object can be inspected below. <code>?rstan::stanfit</code> also presents functions and methods for retrieving and summarizing the desired information from a <code>stanfit</code> object.</p>
<details>
<p><summary> Code: Structure of the <code>stanfit</code> object</summary></p>
<pre class="r"><code>str(lm.sim)</code></pre>
<pre><code>## Formal class &#39;stanfit&#39; [package &quot;rstan&quot;] with 10 slots
##   ..@ model_name: chr &quot;lm&quot;
##   ..@ model_pars: chr [1:4] &quot;beta&quot; &quot;sigma&quot; &quot;mu&quot; &quot;lp__&quot;
##   ..@ par_dims  :List of 4
##   .. ..$ beta : num 5
##   .. ..$ sigma: num(0) 
##   .. ..$ mu   : num 10000
##   .. ..$ lp__ : num(0) 
##   ..@ mode      : int 0
##   ..@ sim       :List of 12
##   .. ..$ samples    :List of 4
##   .. .. ..$ :List of 7
##   .. .. .. ..$ beta[1]: num [1:500] -1.15 -0.953 -1.059 -1.144 -1.092 ...
##   .. .. .. ..$ beta[2]: num [1:500] 5.55 5.55 5.72 5.69 5.67 ...
##   .. .. .. ..$ beta[3]: num [1:500] 17.5 17.5 17.4 17.4 17.4 ...
##   .. .. .. ..$ beta[4]: num [1:500] -0.914 -0.897 -0.979 -1.044 -1.105 ...
##   .. .. .. ..$ beta[5]: num [1:500] -8.81 -8.75 -9.05 -8.91 -8.92 ...
##   .. .. .. ..$ sigma  : num [1:500] 11.7 12 11.9 11.9 11.9 ...
##   .. .. .. ..$ lp__   : num [1:500] -38892 -38894 -38893 -38893 -38892 ...
##   .. .. .. ..- attr(*, &quot;test_grad&quot;)= logi FALSE
##   .. .. .. ..- attr(*, &quot;args&quot;)=List of 16
##   .. .. .. .. ..$ append_samples    : logi FALSE
##   .. .. .. .. ..$ chain_id          : num 1
##   .. .. .. .. ..$ control           :List of 12
##   .. .. .. .. .. ..$ adapt_delta      : num 0.85
##   .. .. .. .. .. ..$ adapt_engaged    : logi TRUE
##   .. .. .. .. .. ..$ adapt_gamma      : num 0.05
##   .. .. .. .. .. ..$ adapt_init_buffer: num 75
##   .. .. .. .. .. ..$ adapt_kappa      : num 0.75
##   .. .. .. .. .. ..$ adapt_t0         : num 10
##   .. .. .. .. .. ..$ adapt_term_buffer: num 50
##   .. .. .. .. .. ..$ adapt_window     : num 25
##   .. .. .. .. .. ..$ max_treedepth    : int 10
##   .. .. .. .. .. ..$ metric           : chr &quot;diag_e&quot;
##   .. .. .. .. .. ..$ stepsize         : num 1
##   .. .. .. .. .. ..$ stepsize_jitter  : num 0
##   .. .. .. .. ..$ enable_random_init: logi TRUE
##   .. .. .. .. ..$ init              : chr &quot;random&quot;
##   .. .. .. .. ..$ init_list         : NULL
##   .. .. .. .. ..$ init_radius       : num 2
##   .. .. .. .. ..$ iter              : int 2000
##   .. .. .. .. ..$ method            : chr &quot;sampling&quot;
##   .. .. .. .. ..$ random_seed       : chr &quot;20190417&quot;
##   .. .. .. .. ..$ refresh           : int 200
##   .. .. .. .. ..$ sampler_t         : chr &quot;NUTS(diag_e)&quot;
##   .. .. .. .. ..$ save_warmup       : logi FALSE
##   .. .. .. .. ..$ test_grad         : logi FALSE
##   .. .. .. .. ..$ thin              : int 2
##   .. .. .. .. ..$ warmup            : int 1000
##   .. .. .. ..- attr(*, &quot;inits&quot;)= num [1:10006] -0.5082 0.2635 -0.3416 -0.2569 -0.0969 ...
##   .. .. .. ..- attr(*, &quot;mean_pars&quot;)= num [1:10006] -1.07 5.58 17.53 -1.01 -8.9 ...
##   .. .. .. ..- attr(*, &quot;mean_lp__&quot;)= num -38894
##   .. .. .. ..- attr(*, &quot;adaptation_info&quot;)= chr &quot;# Adaptation terminated\n# Step size = 0.678845\n# Diagonal elements of inverse mass matrix:\n# 0.0129191, 0.01&quot;| __truncated__
##   .. .. .. ..- attr(*, &quot;elapsed_time&quot;)= Named num [1:2] 6.45 5.53
##   .. .. .. .. ..- attr(*, &quot;names&quot;)= chr [1:2] &quot;warmup&quot; &quot;sample&quot;
##   .. .. .. ..- attr(*, &quot;sampler_params&quot;)=List of 6
##   .. .. .. .. ..$ accept_stat__: num [1:500] 0.799 0.844 0.82 0.802 0.959 ...
##   .. .. .. .. ..$ stepsize__   : num [1:500] 0.679 0.679 0.679 0.679 0.679 ...
##   .. .. .. .. ..$ treedepth__  : num [1:500] 3 3 2 3 3 2 3 3 3 2 ...
##   .. .. .. .. ..$ n_leapfrog__ : num [1:500] 7 7 7 7 7 3 7 7 7 3 ...
##   .. .. .. .. ..$ divergent__  : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##   .. .. .. .. ..$ energy__     : num [1:500] 38896 38896 38899 38898 38896 ...
##   .. .. .. ..- attr(*, &quot;return_code&quot;)= int 0
##   .. .. ..$ :List of 7
##   .. .. .. ..$ beta[1]: num [1:500] -0.946 -0.908 -1.164 -1.224 -1.206 ...
##   .. .. .. ..$ beta[2]: num [1:500] 5.66 5.76 5.64 5.45 5.61 ...
##   .. .. .. ..$ beta[3]: num [1:500] 17.6 17.6 17.8 17.7 17.4 ...
##   .. .. .. ..$ beta[4]: num [1:500] -1.094 -1.124 -1.104 -1.111 -0.972 ...
##   .. .. .. ..$ beta[5]: num [1:500] -8.78 -8.92 -9.11 -8.85 -9 ...
##   .. .. .. ..$ sigma  : num [1:500] 12 11.8 12 11.9 11.9 ...
##   .. .. .. ..$ lp__   : num [1:500] -38894 -38893 -38897 -38894 -38892 ...
##   .. .. .. ..- attr(*, &quot;test_grad&quot;)= logi FALSE
##   .. .. .. ..- attr(*, &quot;args&quot;)=List of 16
##   .. .. .. .. ..$ append_samples    : logi FALSE
##   .. .. .. .. ..$ chain_id          : num 2
##   .. .. .. .. ..$ control           :List of 12
##   .. .. .. .. .. ..$ adapt_delta      : num 0.85
##   .. .. .. .. .. ..$ adapt_engaged    : logi TRUE
##   .. .. .. .. .. ..$ adapt_gamma      : num 0.05
##   .. .. .. .. .. ..$ adapt_init_buffer: num 75
##   .. .. .. .. .. ..$ adapt_kappa      : num 0.75
##   .. .. .. .. .. ..$ adapt_t0         : num 10
##   .. .. .. .. .. ..$ adapt_term_buffer: num 50
##   .. .. .. .. .. ..$ adapt_window     : num 25
##   .. .. .. .. .. ..$ max_treedepth    : int 10
##   .. .. .. .. .. ..$ metric           : chr &quot;diag_e&quot;
##   .. .. .. .. .. ..$ stepsize         : num 1
##   .. .. .. .. .. ..$ stepsize_jitter  : num 0
##   .. .. .. .. ..$ enable_random_init: logi TRUE
##   .. .. .. .. ..$ init              : chr &quot;random&quot;
##   .. .. .. .. ..$ init_list         : NULL
##   .. .. .. .. ..$ init_radius       : num 2
##   .. .. .. .. ..$ iter              : int 2000
##   .. .. .. .. ..$ method            : chr &quot;sampling&quot;
##   .. .. .. .. ..$ random_seed       : chr &quot;20190417&quot;
##   .. .. .. .. ..$ refresh           : int 200
##   .. .. .. .. ..$ sampler_t         : chr &quot;NUTS(diag_e)&quot;
##   .. .. .. .. ..$ save_warmup       : logi FALSE
##   .. .. .. .. ..$ test_grad         : logi FALSE
##   .. .. .. .. ..$ thin              : int 2
##   .. .. .. .. ..$ warmup            : int 1000
##   .. .. .. ..- attr(*, &quot;inits&quot;)= num [1:10006] 1.822 -1.301 0.447 1.979 1.029 ...
##   .. .. .. ..- attr(*, &quot;mean_pars&quot;)= num [1:10006] -1.07 5.57 17.53 -1.01 -8.91 ...
##   .. .. .. ..- attr(*, &quot;mean_lp__&quot;)= num -38894
##   .. .. .. ..- attr(*, &quot;adaptation_info&quot;)= chr &quot;# Adaptation terminated\n# Step size = 0.598511\n# Diagonal elements of inverse mass matrix:\n# 0.0145165, 0.01&quot;| __truncated__
##   .. .. .. ..- attr(*, &quot;elapsed_time&quot;)= Named num [1:2] 5.77 5.77
##   .. .. .. .. ..- attr(*, &quot;names&quot;)= chr [1:2] &quot;warmup&quot; &quot;sample&quot;
##   .. .. .. ..- attr(*, &quot;sampler_params&quot;)=List of 6
##   .. .. .. .. ..$ accept_stat__: num [1:500] 0.86 0.936 0.878 0.98 0.996 ...
##   .. .. .. .. ..$ stepsize__   : num [1:500] 0.599 0.599 0.599 0.599 0.599 ...
##   .. .. .. .. ..$ treedepth__  : num [1:500] 3 3 3 3 3 3 3 3 2 3 ...
##   .. .. .. .. ..$ n_leapfrog__ : num [1:500] 7 7 7 7 7 7 7 7 3 7 ...
##   .. .. .. .. ..$ divergent__  : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##   .. .. .. .. ..$ energy__     : num [1:500] 38897 38897 38900 38897 38895 ...
##   .. .. .. ..- attr(*, &quot;return_code&quot;)= int 0
##   .. .. ..$ :List of 7
##   .. .. .. ..$ beta[1]: num [1:500] -0.887 -0.944 -1.084 -1.181 -1.011 ...
##   .. .. .. ..$ beta[2]: num [1:500] 5.6 5.51 5.56 5.55 5.49 ...
##   .. .. .. ..$ beta[3]: num [1:500] 17.5 17.5 17.6 17.4 17.6 ...
##   .. .. .. ..$ beta[4]: num [1:500] -1.058 -1.033 -0.991 -0.96 -0.919 ...
##   .. .. .. ..$ beta[5]: num [1:500] -8.85 -8.99 -8.81 -9.1 -8.88 ...
##   .. .. .. ..$ sigma  : num [1:500] 11.8 11.8 11.9 11.7 11.9 ...
##   .. .. .. ..$ lp__   : num [1:500] -38892 -38892 -38891 -38894 -38891 ...
##   .. .. .. ..- attr(*, &quot;test_grad&quot;)= logi FALSE
##   .. .. .. ..- attr(*, &quot;args&quot;)=List of 16
##   .. .. .. .. ..$ append_samples    : logi FALSE
##   .. .. .. .. ..$ chain_id          : num 3
##   .. .. .. .. ..$ control           :List of 12
##   .. .. .. .. .. ..$ adapt_delta      : num 0.85
##   .. .. .. .. .. ..$ adapt_engaged    : logi TRUE
##   .. .. .. .. .. ..$ adapt_gamma      : num 0.05
##   .. .. .. .. .. ..$ adapt_init_buffer: num 75
##   .. .. .. .. .. ..$ adapt_kappa      : num 0.75
##   .. .. .. .. .. ..$ adapt_t0         : num 10
##   .. .. .. .. .. ..$ adapt_term_buffer: num 50
##   .. .. .. .. .. ..$ adapt_window     : num 25
##   .. .. .. .. .. ..$ max_treedepth    : int 10
##   .. .. .. .. .. ..$ metric           : chr &quot;diag_e&quot;
##   .. .. .. .. .. ..$ stepsize         : num 1
##   .. .. .. .. .. ..$ stepsize_jitter  : num 0
##   .. .. .. .. ..$ enable_random_init: logi TRUE
##   .. .. .. .. ..$ init              : chr &quot;random&quot;
##   .. .. .. .. ..$ init_list         : NULL
##   .. .. .. .. ..$ init_radius       : num 2
##   .. .. .. .. ..$ iter              : int 2000
##   .. .. .. .. ..$ method            : chr &quot;sampling&quot;
##   .. .. .. .. ..$ random_seed       : chr &quot;20190417&quot;
##   .. .. .. .. ..$ refresh           : int 200
##   .. .. .. .. ..$ sampler_t         : chr &quot;NUTS(diag_e)&quot;
##   .. .. .. .. ..$ save_warmup       : logi FALSE
##   .. .. .. .. ..$ test_grad         : logi FALSE
##   .. .. .. .. ..$ thin              : int 2
##   .. .. .. .. ..$ warmup            : int 1000
##   .. .. .. ..- attr(*, &quot;inits&quot;)= num [1:10006] -0.563 1.732 -1.012 1.965 -1.216 ...
##   .. .. .. ..- attr(*, &quot;mean_pars&quot;)= num [1:10006] -1.07 5.56 17.54 -1.01 -8.92 ...
##   .. .. .. ..- attr(*, &quot;mean_lp__&quot;)= num -38893
##   .. .. .. ..- attr(*, &quot;adaptation_info&quot;)= chr &quot;# Adaptation terminated\n# Step size = 0.673836\n# Diagonal elements of inverse mass matrix:\n# 0.0157556, 0.01&quot;| __truncated__
##   .. .. .. ..- attr(*, &quot;elapsed_time&quot;)= Named num [1:2] 6.92 4.42
##   .. .. .. .. ..- attr(*, &quot;names&quot;)= chr [1:2] &quot;warmup&quot; &quot;sample&quot;
##   .. .. .. ..- attr(*, &quot;sampler_params&quot;)=List of 6
##   .. .. .. .. ..$ accept_stat__: num [1:500] 0.967 0.784 0.85 0.869 0.995 ...
##   .. .. .. .. ..$ stepsize__   : num [1:500] 0.674 0.674 0.674 0.674 0.674 ...
##   .. .. .. .. ..$ treedepth__  : num [1:500] 3 3 3 3 3 3 2 3 3 3 ...
##   .. .. .. .. ..$ n_leapfrog__ : num [1:500] 7 7 7 7 7 7 3 7 7 7 ...
##   .. .. .. .. ..$ divergent__  : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##   .. .. .. .. ..$ energy__     : num [1:500] 38894 38896 38895 38896 38892 ...
##   .. .. .. ..- attr(*, &quot;return_code&quot;)= int 0
##   .. .. ..$ :List of 7
##   .. .. .. ..$ beta[1]: num [1:500] -0.866 -1.339 -1.297 -1.238 -1.016 ...
##   .. .. .. ..$ beta[2]: num [1:500] 5.23 5.63 5.7 5.56 5.64 ...
##   .. .. .. ..$ beta[3]: num [1:500] 17.5 17.7 17.4 17.8 17.7 ...
##   .. .. .. ..$ beta[4]: num [1:500] -1 -1 -1.3 -1.08 -1.08 ...
##   .. .. .. ..$ beta[5]: num [1:500] -8.82 -8.9 -8.76 -8.99 -8.82 ...
##   .. .. .. ..$ sigma  : num [1:500] 11.7 11.9 11.9 11.9 11.8 ...
##   .. .. .. ..$ lp__   : num [1:500] -38897 -38894 -38898 -38894 -38893 ...
##   .. .. .. ..- attr(*, &quot;test_grad&quot;)= logi FALSE
##   .. .. .. ..- attr(*, &quot;args&quot;)=List of 16
##   .. .. .. .. ..$ append_samples    : logi FALSE
##   .. .. .. .. ..$ chain_id          : num 4
##   .. .. .. .. ..$ control           :List of 12
##   .. .. .. .. .. ..$ adapt_delta      : num 0.85
##   .. .. .. .. .. ..$ adapt_engaged    : logi TRUE
##   .. .. .. .. .. ..$ adapt_gamma      : num 0.05
##   .. .. .. .. .. ..$ adapt_init_buffer: num 75
##   .. .. .. .. .. ..$ adapt_kappa      : num 0.75
##   .. .. .. .. .. ..$ adapt_t0         : num 10
##   .. .. .. .. .. ..$ adapt_term_buffer: num 50
##   .. .. .. .. .. ..$ adapt_window     : num 25
##   .. .. .. .. .. ..$ max_treedepth    : int 10
##   .. .. .. .. .. ..$ metric           : chr &quot;diag_e&quot;
##   .. .. .. .. .. ..$ stepsize         : num 1
##   .. .. .. .. .. ..$ stepsize_jitter  : num 0
##   .. .. .. .. ..$ enable_random_init: logi TRUE
##   .. .. .. .. ..$ init              : chr &quot;random&quot;
##   .. .. .. .. ..$ init_list         : NULL
##   .. .. .. .. ..$ init_radius       : num 2
##   .. .. .. .. ..$ iter              : int 2000
##   .. .. .. .. ..$ method            : chr &quot;sampling&quot;
##   .. .. .. .. ..$ random_seed       : chr &quot;20190417&quot;
##   .. .. .. .. ..$ refresh           : int 200
##   .. .. .. .. ..$ sampler_t         : chr &quot;NUTS(diag_e)&quot;
##   .. .. .. .. ..$ save_warmup       : logi FALSE
##   .. .. .. .. ..$ test_grad         : logi FALSE
##   .. .. .. .. ..$ thin              : int 2
##   .. .. .. .. ..$ warmup            : int 1000
##   .. .. .. ..- attr(*, &quot;inits&quot;)= num [1:10006] 0.6569 1.5602 -0.0956 -0.5841 -0.6011 ...
##   .. .. .. ..- attr(*, &quot;mean_pars&quot;)= num [1:10006] -1.06 5.56 17.54 -1.01 -8.92 ...
##   .. .. .. ..- attr(*, &quot;mean_lp__&quot;)= num -38894
##   .. .. .. ..- attr(*, &quot;adaptation_info&quot;)= chr &quot;# Adaptation terminated\n# Step size = 0.611781\n# Diagonal elements of inverse mass matrix:\n# 0.0148783, 0.01&quot;| __truncated__
##   .. .. .. ..- attr(*, &quot;elapsed_time&quot;)= Named num [1:2] 9.33 3.28
##   .. .. .. .. ..- attr(*, &quot;names&quot;)= chr [1:2] &quot;warmup&quot; &quot;sample&quot;
##   .. .. .. ..- attr(*, &quot;sampler_params&quot;)=List of 6
##   .. .. .. .. ..$ accept_stat__: num [1:500] 0.99 0.995 0.754 1 0.88 ...
##   .. .. .. .. ..$ stepsize__   : num [1:500] 0.612 0.612 0.612 0.612 0.612 ...
##   .. .. .. .. ..$ treedepth__  : num [1:500] 3 3 3 3 2 2 3 3 2 3 ...
##   .. .. .. .. ..$ n_leapfrog__ : num [1:500] 7 7 7 7 7 7 7 7 3 7 ...
##   .. .. .. .. ..$ divergent__  : num [1:500] 0 0 0 0 0 0 0 0 0 0 ...
##   .. .. .. .. ..$ energy__     : num [1:500] 38899 38899 38900 38899 38898 ...
##   .. .. .. ..- attr(*, &quot;return_code&quot;)= int 0
##   .. ..$ chains     : int 4
##   .. ..$ iter       : int 2000
##   .. ..$ thin       : int 2
##   .. ..$ warmup     : int 1000
##   .. ..$ n_save     : num [1:4] 500 500 500 500
##   .. ..$ warmup2    : int [1:4] 0 0 0 0
##   .. ..$ permutation:List of 4
##   .. .. ..$ : int [1:500] 164 31 283 233 388 51 98 306 468 328 ...
##   .. .. ..$ : int [1:500] 175 474 361 87 172 74 413 150 6 111 ...
##   .. .. ..$ : int [1:500] 378 237 220 182 196 409 485 78 414 405 ...
##   .. .. ..$ : int [1:500] 177 453 82 86 436 365 300 483 415 135 ...
##   .. ..$ pars_oi    : chr [1:3] &quot;beta&quot; &quot;sigma&quot; &quot;lp__&quot;
##   .. ..$ dims_oi    :List of 3
##   .. .. ..$ beta : num 5
##   .. .. ..$ sigma: num(0) 
##   .. .. ..$ lp__ : num(0) 
##   .. ..$ fnames_oi  : chr [1:7] &quot;beta[1]&quot; &quot;beta[2]&quot; &quot;beta[3]&quot; &quot;beta[4]&quot; ...
##   .. ..$ n_flatnames: int 7
##   ..@ inits     :List of 4
##   .. ..$ :List of 3
##   .. .. ..$ beta : num [1:5(1d)] -0.5082 0.2635 -0.3416 -0.2569 -0.0969
##   .. .. ..$ sigma: num 0.587
##   .. .. ..$ mu   : num [1:10000(1d)] -0.9569 0.5379 -0.9832 -0.0281 -0.5676 ...
##   .. ..$ :List of 3
##   .. .. ..$ beta : num [1:5(1d)] 1.822 -1.301 0.447 1.979 1.029
##   .. .. ..$ sigma: num 5.6
##   .. .. ..$ mu   : num [1:10000(1d)] 6.32 -1.64 4.65 -1.37 3.48 ...
##   .. ..$ :List of 3
##   .. .. ..$ beta : num [1:5(1d)] -0.563 1.732 -1.012 1.965 -1.216
##   .. .. ..$ sigma: num 0.177
##   .. .. ..$ mu   : num [1:10000(1d)] -0.3923 1.2545 3.5917 0.0717 0.9679 ...
##   .. ..$ :List of 3
##   .. .. ..$ beta : num [1:5(1d)] 0.6569 1.5602 -0.0956 -0.5841 -0.6011
##   .. .. ..$ sigma: num 5.47
##   .. .. ..$ mu   : num [1:10000(1d)] -3.0095 1.6166 0.0462 2.83 0.3272 ...
##   ..@ stan_args :List of 4
##   .. ..$ :List of 10
##   .. .. ..$ chain_id   : int 1
##   .. .. ..$ iter       : int 2000
##   .. .. ..$ thin       : int 2
##   .. .. ..$ seed       : int 20190417
##   .. .. ..$ warmup     : num 1000
##   .. .. ..$ init       : chr &quot;random&quot;
##   .. .. ..$ algorithm  : chr &quot;NUTS&quot;
##   .. .. ..$ save_warmup: logi FALSE
##   .. .. ..$ method     : chr &quot;sampling&quot;
##   .. .. ..$ control    :List of 1
##   .. .. .. ..$ adapt_delta: num 0.85
##   .. ..$ :List of 10
##   .. .. ..$ chain_id   : int 2
##   .. .. ..$ iter       : int 2000
##   .. .. ..$ thin       : int 2
##   .. .. ..$ seed       : int 20190417
##   .. .. ..$ warmup     : num 1000
##   .. .. ..$ init       : chr &quot;random&quot;
##   .. .. ..$ algorithm  : chr &quot;NUTS&quot;
##   .. .. ..$ save_warmup: logi FALSE
##   .. .. ..$ method     : chr &quot;sampling&quot;
##   .. .. ..$ control    :List of 1
##   .. .. .. ..$ adapt_delta: num 0.85
##   .. ..$ :List of 10
##   .. .. ..$ chain_id   : int 3
##   .. .. ..$ iter       : int 2000
##   .. .. ..$ thin       : int 2
##   .. .. ..$ seed       : int 20190417
##   .. .. ..$ warmup     : num 1000
##   .. .. ..$ init       : chr &quot;random&quot;
##   .. .. ..$ algorithm  : chr &quot;NUTS&quot;
##   .. .. ..$ save_warmup: logi FALSE
##   .. .. ..$ method     : chr &quot;sampling&quot;
##   .. .. ..$ control    :List of 1
##   .. .. .. ..$ adapt_delta: num 0.85
##   .. ..$ :List of 10
##   .. .. ..$ chain_id   : int 4
##   .. .. ..$ iter       : int 2000
##   .. .. ..$ thin       : int 2
##   .. .. ..$ seed       : int 20190417
##   .. .. ..$ warmup     : num 1000
##   .. .. ..$ init       : chr &quot;random&quot;
##   .. .. ..$ algorithm  : chr &quot;NUTS&quot;
##   .. .. ..$ save_warmup: logi FALSE
##   .. .. ..$ method     : chr &quot;sampling&quot;
##   .. .. ..$ control    :List of 1
##   .. .. .. ..$ adapt_delta: num 0.85
##   ..@ stanmodel :Formal class &#39;stanmodel&#39; [package &quot;rstan&quot;] with 5 slots
##   .. .. ..@ model_name  : chr &quot;lm&quot;
##   .. .. ..@ model_code  : chr &quot;data {\n  int&lt;lower=1&gt; N; // num. observations\n  int&lt;lower=1&gt; K; // num. predictors\n  matrix[N, K] x; // desi&quot;| __truncated__
##   .. .. .. ..- attr(*, &quot;model_name2&quot;)= chr &quot;lm&quot;
##   .. .. ..@ model_cpp   :List of 2
##   .. .. .. ..$ model_cppname: chr &quot;model362854de58e0_lm&quot;
##   .. .. .. ..$ model_cppcode: chr &quot;// Code generated by Stan version 2.19.1\n\n#include &lt;stan/model/model_header.hpp&gt;\n\nnamespace model362854de58&quot;| __truncated__
##   .. .. ..@ mk_cppmodule:function (object)  
##   .. .. ..@ dso         :Formal class &#39;cxxdso&#39; [package &quot;rstan&quot;] with 7 slots
##   .. .. .. .. ..@ sig         :List of 1
##   .. .. .. .. .. ..$ file36287001298d: chr(0) 
##   .. .. .. .. ..@ dso_saved   : logi TRUE
##   .. .. .. .. ..@ dso_filename: chr &quot;file36287001298d&quot;
##   .. .. .. .. ..@ modulename  : chr &quot;stan_fit4model362854de58e0_lm_mod&quot;
##   .. .. .. .. ..@ system      : chr &quot;x86_64, mingw32&quot;
##   .. .. .. .. ..@ cxxflags    : chr &quot;CXXFLAGS=-O3 -Wno-unused-variable -Wno-unused-function&quot;
##   .. .. .. .. ..@ .CXXDSOMISC :&lt;environment: 0x000000002647e630&gt; 
##   ..@ date      : chr &quot;Wed Jan 22 14:29:59 2020&quot;
##   ..@ .MISC     :&lt;environment: 0x00000000268f4988&gt;</code></pre>
</details>
<p><br />
For our purposes, (summaries of) the posterior distributions of the model parameters <span class="math inline">\(\beta\)</span> and <span class="math inline">\(\sigma\)</span> are of primary importance. In particular, we want to know if our Stan program accurately retrieves the ‘true’ parameters that we used to generate our artificial outcome data. We thus compare the true parameters</p>
<pre class="r"><code>true.pars &lt;- c(beta.sim, sigma.sim)
names(true.pars) &lt;- c(paste0(&quot;beta[&quot;, 1:5, &quot;]&quot;), &quot;sigma&quot;)
round(true.pars, 2L)</code></pre>
<pre><code>## beta[1] beta[2] beta[3] beta[4] beta[5]   sigma 
##   -1.03    5.50   17.33   -1.00   -8.88   11.98</code></pre>
<p>with the parameter estimates from our model:</p>
<pre class="r"><code>print(lm.sim, pars = c(&quot;beta&quot;, &quot;sigma&quot;))</code></pre>
<pre><code>## Inference for Stan model: lm.
## 4 chains, each with iter=2000; warmup=1000; thin=2; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##          mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
## beta[1] -1.06       0 0.12 -1.30 -1.15 -1.06 -0.98 -0.84  1735    1
## beta[2]  5.57       0 0.12  5.35  5.49  5.57  5.65  5.79  1666    1
## beta[3] 17.54       0 0.12 17.31 17.46 17.53 17.62 17.76  1486    1
## beta[4] -1.01       0 0.12 -1.23 -1.09 -1.01 -0.93 -0.78  1472    1
## beta[5] -8.91       0 0.12 -9.15 -9.00 -8.92 -8.83 -8.67  1725    1
## sigma   11.82       0 0.08 11.66 11.77 11.82 11.88 11.99  1656    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Jan 22 14:29:59 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).</code></pre>
<p>Unlike typical output after likelihood inference (i.e., point estimates and standard errors), we receive detailed summaries of the posterior distribution of each parameter. Here, we can think of the <code>mean</code> of a given posterior distribution as the counterpart to the point estimate, the <code>sd</code> as the counterpart to the standard error, and the 2.5% and 97.5% posterior percentiles as counterparts to the boundaries of a 95% confidence interval.</p>
<p>Ideally, we would want to see that the true parameter values lie near the center of their respective posterior distributions. As we can see above, the posterior means of all model parameters are close to the true parameter values and display only minor deviations. The question, of course, is how much deviation should have us worried. Also, one could suspect that the deviation from a given validation run is untypically large, e.g. due to a circumstantial selection of extreme ‘true’ parameter values when simulating the data generating process. <span class="citation">Cook, Gelman, and Rubin (2006)</span> therefore recommend running many replications of validation simulations, drawing different true parameter values from the respective prior distributions to generate a series of different simulated outcome values. They also propose test statistics that allow researchers to systematically analyze the magnitude of deviations between true parameters and posterior distributions.</p>
</div>
<div id="step-4-inference" class="section level5">
<h5>Step 4: Inference</h5>
<p>Having successfully validated our model, we can now run it on actual data. Model fitting works exactly as in the validation example above, using the same compiled Stan program and the <code>sampling()</code> command. The key difference is the data we supply. Instead of using simulated data, we now want to use an actual data set.</p>
<p>For the sake of illustration, we use the replication data for Table 1, Model 4, of <span class="citation">Bischof and Wagner (2019)</span>, made available through the <a href="https://doi.org/10.7910/DVN/DZ1NFG">American Journal of Political Science Dataverse</a>. The original analysis uses Ordinary Least Squares estimation to gauge the effect of the assassination of the populist radical right politician Pim Fortuyn prior to the Dutch Parliamentary Election in 2002 on micro-level ideological polarization. For this purpose, it analyzes 1551 respondents interviewed in the pre-election wave of the 2002 Dutch Parliamentary Election Study. The outcome variable contains squared distances of respondents’ left-right self-placement to the pre-election median self-placement of all respondents. The main predictor is a binary indicator whether the interview was conducted before or after Fortuyn’s assassination. The original analysis reports point estimates (standard errors) of 1.644 (0.036) for the intercept and -0.112 (0.076) for the before-/after indicator.</p>
<p>We now want to use our Stan program for the linear model to replicate these results. We thus retrieve the data from the AJPS dataverse and load them into R. Following this, we drop all but the three variables of interest, subset the data to the pre-election wave of the survey, and drop all incomplete rows. Now, we can easily extract the required input data for our linear model program (<span class="math inline">\(N\)</span>, <span class="math inline">\(K\)</span>, <span class="math inline">\(\mathbf{X}\)</span> and <span class="math inline">\(\mathbf{y}\)</span>) using the following code:</p>
<details>
<p><summary> Code: Retrieving and managing the data</summary></p>
<pre class="r"><code>## Retrieve and manage data
bw.ajps19 &lt;-
  read.table(
    paste0(
      &quot;https://dataverse.harvard.edu/api/access/datafile/&quot;,
      &quot;:persistentId?persistentId=doi:10.7910/DVN/DZ1NFG/LFX4A9&quot;
    ),
    header = TRUE,
    stringsAsFactors = FALSE,
    sep = &quot;\t&quot;,
    fill = TRUE
  ) %&gt;% 
  select(wave, fortuyn, polarization) %&gt;% ### select relevant variables
  subset(wave == 1) %&gt;%                   ### subset to pre-election wave
  na.omit()                               ### drop incomplete rows

x &lt;- model.matrix(~ fortuyn, data = bw.ajps19)
y &lt;- bw.ajps19$polarization
N &lt;- nrow(x)
K &lt;- ncol(x)</code></pre>
</details>
<p><br />
The rest then repeats the same steps we followed for validating our model with fake data: We collect the data objects in a list and pass the new list object, together with the compiled model, to the <code>sampling()</code> command.</p>
<details>
<p><summary> Code: Fitting the model</summary></p>
<pre class="r"><code>## data as list
standat &lt;- list(
  N = N,
  K = K,
  x = x,
  y = y)

## inference
lm.inf &lt;- sampling(lm.mod,                            ### compiled model
                   data = standat,                    ### data input
                   algorithm = &quot;NUTS&quot;,                ### algorithm
                   control = list(                    ### control arguments
                     adapt_delta = .85
                     ),
                   save_warmup = FALSE,               ### discard warmup samples
                   sample_file = NULL,                ### no sample file
                   diagnostic_file = NULL,            ### no diagnostic file
                   pars = c(&quot;beta&quot;, &quot;sigma&quot;),         ### select parameters
                   iter = 2000L,                      ### iter per chain
                   warmup = 1000L,                    ### warmup period
                   thin = 2L,                         ### thinning factor
                   chains = 4L,                       ### num. chains
                   cores = 4L,                        ### num. cores
                   seed = 20190417)                   ### seed</code></pre>
</details>
<p><br />
The resulting model output is summarized below:</p>
<pre class="r"><code>print(lm.inf,
      pars = c(&quot;beta&quot;, &quot;sigma&quot;),
      digits_summary = 3L)</code></pre>
<pre><code>## Inference for Stan model: lm.
## 4 chains, each with iter=2000; warmup=1000; thin=2; 
## post-warmup draws per chain=500, total post-warmup draws=2000.
## 
##           mean se_mean    sd   2.5%    25%    50%    75% 97.5% n_eff  Rhat
## beta[1]  1.644   0.001 0.035  1.575  1.620  1.644  1.669 1.712  1763 1.000
## beta[2] -0.112   0.002 0.074 -0.250 -0.163 -0.113 -0.062 0.038  1898 0.999
## sigma    1.239   0.001 0.023  1.195  1.223  1.239  1.254 1.288  1891 1.000
## 
## Samples were drawn using NUTS(diag_e) at Wed Jan 22 14:31:33 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).</code></pre>
<p>As we can see, our posterior distribution summary closely matches the output reported in the original analysis. Analogously to the results reported in the article, which concludes that there is no significant positive effect of Fortuyn’s assassination on ideological polarization at the 95% level, we can also conclude that the probability of a positive effect for <code>beta[2]</code> is very low: Merely 6.5% of the corresponding posterior draws are greater than zero.</p>
<pre class="r"><code>## Extract posterior samples for beta[2]
beta2_posterior &lt;- extract(lm.inf)$beta[, 2]

## Probability that beta[2] is greater than zero
mean(beta2_posterior &gt; 0)</code></pre>
<pre><code>## [1] 0.065</code></pre>
</div>
<div id="step-5-convergence-diagnostics" class="section level5">
<h5>Step 5: Convergence Diagnostics</h5>
<p>When using Markov Chain Monte Carlo (MCMC) algorithms for Bayesian Inference, we must always make sure that our samples from the parameter space are indicative of what we want – draws from the actual posterior distribution of interest. While simple, well-known models typically converge to the target posterior distribution fast, things may be more complicated for complex model programs. Multiple chains may not converge to the same posterior distribution; the exploration of the parameter space may be slow such that convergence is only achieved after a long warmup period; sequences of samples from the same chain may be highly correlated, necessitating long post-warmup runs for retrieving a reasonable large number of effective samples from the posterior distribution. These are just a few examples of things that can go wrong during Bayesian inference using MCMC methods. In the following, we introduce a number of diagnostic tools that allow users to assess the convergence of their Stan models.</p>
<div id="generic-diagnostics-rhat-and-n_eff" class="section level6">
<h6>Generic Diagnostics: <code>Rhat</code> and <code>n_eff</code></h6>
<p>By default, RStan reports two generic diagnostics when printing the summary output from a <code>stanfit</code> object (as in our linear model summaries above). The first of these is <span class="math inline">\(\hat{R}\)</span>, also known as the potential scale reduction statistic or as the Gelman-Rubin convergence diagnostic:</p>
<p><span class="math display">\[\small \widehat{Var}(\theta) = (1 - \frac{1}{\mathtt{n_{iter}}})
    \underbrace{\Bigg(\frac{1}{ \mathtt{n_{chains}} (\mathtt{n_{iter}} - 1)} \sum_{j=1}^{\mathtt{n_{chains}}} \sum_{i=1}^{\mathtt{n_{iter}}} (\theta_{ij} - \bar{\theta_j})^2 \Bigg)}_{\text{Within chain var}} + \\
    \frac{1}{\mathtt{n_{iter}}}  \underbrace{\Bigg(\frac{\mathtt{n_{iter}}}{\mathtt{n_{chains} - 1}} \sum_{j=1}^{\mathtt{n_{chains}}} (\bar{\theta_j} - \bar{\bar{\theta}})^2\Bigg)}_{\text{Between chain var}}\]</span></p>
<p>This statistic combines information on the variation within and between chains, thus assessing whether each chain converged to a stationary target distribution and whether all chains converged to the same target distributions at the same time. Low values indicate that chains are stationary and mix well; high values are a cause for concern. As a rule of thumb, we want <span class="math inline">\(\hat{R}&lt;1.1\)</span> for all model parameters.</p>
<p>The second generic diagnostic tool is <span class="math inline">\(\mathtt{n_{eff}}\)</span>, the effective size of our posterior samples. A small <a href="https://mc-stan.org/docs/2_19/reference-manual/effective-sample-size-section.html">effective sample size</a> indicates high autocorrelation within chains, which in turn indicates that chains explore the posterior density very slowly and inefficiently. As a rule of thumb, we want the ratio of effective samples to total iterations to be <span class="math inline">\(\frac{\mathtt{n_{eff}}}{\mathtt{n_{iter}}} &gt; 0.001\)</span>, i.e., we want a minimum of one effective sample per 1000 post-warmup iterations of our chains. However, such a rate is far from ideal because it requires that we run our sampler for many iterations to retrieve a sufficiently large effective sample size for valid inferences. In practice, we would thus likely explore the possibilities for <a href="https://mc-stan.org/docs/2_19/stan-users-guide/optimization-chapter.html">efficiency tuning</a> for improving the rate at which we retrieve effective samples.</p>
<p>Additional empirical diagnostics <span class="citation">(see Gill 2015, Ch. 14.3.3)</span> include</p>
<ul>
<li><strong>Geweke Time-Series Diagnostic</strong>: Compare non-overlapping post-warmup portions of each chain to test within-convergence</li>
<li><strong>Heidelberger and Welch Diagnostic</strong>: Compare early post-warmup portion of each chain with late portion to test within-convergence</li>
<li><strong>Raftery and Lewis Integrated Diagnostic</strong>: Evaluates the full chain of a pilot run (requires that <code>save_warmup = TRUE</code>) to estimate minimum required length of warmup and sampling</li>
</ul>
<p>and are implemented as part of the <a href="https://cran.r-project.org/web/packages/coda/index.html"><strong>coda</strong></a> package (Output Analysis and Diagnostics for MCMC). These can be used on <code>stanfit</code> objects after storing the posterior simulations as <code>mcmc.list</code>objects, as illustrated below.</p>
<details>
<p><summary> Code: Further generic diagnostics</summary></p>
<pre class="r"><code>## Stanfit to mcmc.list
lm.mcmc &lt;- As.mcmc.list(lm.inf,
                        pars = c(&quot;beta&quot;, &quot;sigma&quot;, &quot;lp__&quot;))

## Diagnostics
geweke.diag(lm.mcmc, frac1 = .1, frac2 = .5)              ### Geweke</code></pre>
<pre><code>## [[1]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## beta[1] beta[2]   sigma    lp__ 
##  0.9361 -1.9100  1.1021  1.8148 
## 
## 
## [[2]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## beta[1] beta[2]   sigma    lp__ 
##  0.7879 -0.9223  0.6536  0.9696 
## 
## 
## [[3]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## beta[1] beta[2]   sigma    lp__ 
## -0.5272 -1.2241  0.0946  0.4136 
## 
## 
## [[4]]
## 
## Fraction in 1st window = 0.1
## Fraction in 2nd window = 0.5 
## 
## beta[1] beta[2]   sigma    lp__ 
## -0.9422  0.8782 -0.1088 -2.6450</code></pre>
<pre class="r"><code>heidel.diag(lm.mcmc, pvalue = .1)                         ### Heidelberger-Welch</code></pre>
<pre><code>## [[1]]
##                                       
##         Stationarity start     p-value
##         test         iteration        
## beta[1] passed       1         0.482  
## beta[2] passed       1         0.594  
## sigma   passed       1         0.548  
## lp__    passed       1         0.308  
##                                      
##         Halfwidth Mean      Halfwidth
##         test                         
## beta[1] passed        1.643 0.00317  
## beta[2] passed       -0.111 0.00740  
## sigma   passed        1.240 0.00206  
## lp__    passed    -2532.284 0.10164  
## 
## [[2]]
##                                       
##         Stationarity start     p-value
##         test         iteration        
## beta[1] passed       1         0.627  
## beta[2] passed       1         0.817  
## sigma   passed       1         0.883  
## lp__    passed       1         0.688  
##                                      
##         Halfwidth Mean      Halfwidth
##         test                         
## beta[1] passed        1.645 0.00290  
## beta[2] passed       -0.111 0.00612  
## sigma   passed        1.240 0.00212  
## lp__    passed    -2532.403 0.12362  
## 
## [[3]]
##                                       
##         Stationarity start     p-value
##         test         iteration        
## beta[1] passed        1        0.2911 
## beta[2] passed       51        0.0653 
## sigma   passed        1        0.9587 
## lp__    passed        1        0.6963 
##                                     
##         Halfwidth Mean     Halfwidth
##         test                        
## beta[1] passed        1.64 0.00311  
## beta[2] passed       -0.11 0.00719  
## sigma   passed        1.24 0.00195  
## lp__    passed    -2532.39 0.13209  
## 
## [[4]]
##                                       
##         Stationarity start     p-value
##         test         iteration        
## beta[1] passed       1         0.593  
## beta[2] passed       1         0.638  
## sigma   passed       1         0.526  
## lp__    passed       1         0.118  
##                                      
##         Halfwidth Mean      Halfwidth
##         test                         
## beta[1] passed        1.646 0.00310  
## beta[2] passed       -0.114 0.00665  
## sigma   passed        1.239 0.00195  
## lp__    passed    -2532.447 0.11456</code></pre>
<pre class="r"><code>raftery.diag(lm.mcmc,                                     ### Raftery-Lewis
             q = 0.025,
             r = 0.005,
             s = 0.95,
             converge.eps = 0.001)</code></pre>
<pre><code>## [[1]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[2]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[3]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s
## 
## [[4]]
## 
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95 
## 
## You need a sample size of at least 3746 with these values of q, r and s</code></pre>
</details>
<p><br/></p>
</div>
<div id="algorithm-specific-diagnostics" class="section level6">
<h6>Algorithm-specific Diagnostics</h6>
<p>Next to these generic empirical diagnostics, Stan also offers algorithm-specific diagnostic tools after Hamiltonian Monte Carlo sampling. In the words of the developers,</p>
<blockquote>
<font size="-1">
Hamiltonian Monte Carlo provides not only state-of-the-art sampling speed, it also provides state-of-the-art diagnostics. Unlike other algorithms, when Hamiltonian Monte Carlo fails it fails sufficiently spectacularly that we can easily identify the problems.
</font>
</blockquote>
<div style="text-align: right">
<p><sub><sup>
Source: <a href="https://github.com/stan-dev/stan/wiki/Stan-Best-Practices" class="uri">https://github.com/stan-dev/stan/wiki/Stan-Best-Practices</a>
</sub></sup></p>
</div>
<p>These diagnostic tools can identify various problems. The first of these is <em>divergent transitions</em> after warmup, which can be tackled by increasing <code>adapt_delta</code> (the target acceptance rate) or by optimizing the model program (see <a href="https://mc-stan.org/docs/2_19/stan-users-guide/optimization-chapter.html">efficiency tuning</a>). Tackling this problem is imperative because even single divergent transitions can undermine the validity of our estimates. Another problem is <em>exceeding the maximum treedepth</em>, which can be tackled by increasing <code>max_treedepth</code>. Unlike divergent transitions, exceeding the maximum treedepth is merely an efficiency concern that slows down computation time. Moreover, the <code>check_hmc_diagnostics()</code> command offers extensive diagnostics summary for <code>stanfit</code> objects. For further information, see the <a href="https://mc-stan.org/misc/warnings.html">Guide to Stan’s warnings</a>.</p>
</div>
<div id="visual-diagnostics" class="section level6">
<h6>Visual Diagnostics</h6>
<p>In addition to empirical diagnostics, users can also use visual diagnostics. One way of doing so is using <a href="https://cran.r-project.org/web/packages/shinystan/index.html"><strong>shinystan</strong></a>. Maintained by the Stan Development Team, this package launches a local <a href="https://socialsciencedatalab.mzes.uni-mannheim.de/article/shiny-apps/">Shiny web application</a> that allows users to interactively explore visual and numerical diagnostics for posterior draws and algorithm performance in their web browser. Among other things, this tool allows users to view and export multiple variants of visual diagnostics as well as uni- and multivariate graphical and numerical posterior summaries of parameters Its additional functionality includes the <code>generate_quantity()</code> function, which creates a new parameter as a function of one or two existing parameters, and the <code>deploy_shinystan()</code> function, which allows users to deploy a ‘ShinyStan’ app on <a href="https://www.shinyapps.io/">shinyapps.io</a>. To explore <strong>shinystan</strong>, users can access a demo using <code>launch_shinystan_demo()</code>.</p>
<p>An alternative is <a href="https://cran.r-project.org/web/packages/bayesplot/index.html"><strong>bayesplot</strong></a>. This package offers a vast selection of visual diagnostics for <code>stanfit</code> objects, including various diagnostics for the No-U-Turn-Sampler (divergent transitions, energy, Bayesian fraction of missing information) and generic MCMC diangostics (<span class="math inline">\(\hat{R}\)</span>, <span class="math inline">\(\mathtt{n_{eff}}\)</span>, autocorrelation, and mixing through trace plots). For full functionality, examples, and vignettes, see these examples on <a href="https://github.com/stan-dev/bayesplot">GitHub</a>, the <a href="https://cran.r-project.org/web/packages/bayesplot/vignettes/visual-mcmc-diagnostics.html">CRAN Vignettes</a>, and the <code>available_mcmc()</code> function.</p>
<p>The example below illustrates how <strong>bayesplot</strong> can be used to assess within and between chain convergence using traceplots. Traceplots show the post-warmup samples from all chains against as a time series (plotted against the chains’ index). In case of convergence, we should see that each chain and all chains fluctuate around the same value throughout the post-warmup period. For instance, no single chain should shift its center of gravity from low to high values over the course of the chain and all chains should mix around the same range of values. As we can see below, this is the case for all three model parameters.</p>
<pre class="r"><code>## Extract posterior draws from stanfit object
lm.post.draws &lt;- extract(lm.inf, permuted = FALSE)

## Traceplot
mcmc_trace(lm.post.draws, pars = c(&quot;beta[1]&quot;, &quot;beta[2]&quot;, &quot;sigma&quot;))</code></pre>
<p><img src="/../../../../../article/applied-bayesian-statistics_files/figure-html/bayesplot-1.png" width="75%" style="display: block; margin: auto;" /></p>
</div>
</div>
</div>
<div id="additional-interfaces" class="section level3">
<h3>Additional Interfaces</h3>
<p>One of the key advantages of Stan over other programming languages for Bayesian inference is the availability of additional interfaces that offer ‘canned solutions’, i.e., pre-programmed functions for the estimation and analysis of widely used statistical models. Two of the most popular interfaces are <a href="https://cran.r-project.org/web/packages/rstanarm/index.html"><strong>rstanarm</strong></a> and <a href="https://cran.r-project.org/web/packages/brms/index.html"><strong>brms</strong></a>.</p>
<div id="rstanarm" class="section level5">
<h5>rstanarm</h5>
<p><strong>rstanarm</strong> has been developed by Stan Development Team members Jonah Gabry and Ben Goodrich, along with numerous contributors. In the words of its developers,</p>
<blockquote>
<font size="-1">
“<strong>rstanarm</strong> is an R package that emulates other R model-fitting functions but uses Stan (via the rstan package) for the back-end estimation. The primary target audience is people who would be open to Bayesian inference if using Bayesian software were easier but would use frequentist software otherwise.”
</font>
</blockquote>
<div style="text-align: right">
<p><sub><sup>
Source: <a href="https://mc-stan.org/rstanarm/" class="uri">https://mc-stan.org/rstanarm/</a>
</sub></sup></p>
</div>
<p><strong>rstanarm</strong> covers a broad range of model types, including</p>
<ul>
<li><code>stan_lm</code>, <code>stan_aov</code>, <code>stan_biglm</code>: Regularized linear models; similar to <code>lm</code> and <code>aov</code></li>
<li><code>stan_glm</code>, <code>stan_glm.nb</code>: Generalized linear models; similar to <code>glm</code></li>
<li><code>stan_glmer</code>, <code>stan_glmer.nb</code>, <code>stan_lmer</code>: Generalized linear models with group-specific terms; similar to <code>glmer</code>, <code>glmer.nb</code>, and <code>lmer</code> (<strong>lme4</strong>)</li>
<li><code>stan_nlmer</code>: Nonlinear models with group-specific terms; similar <code>nlmer</code> (<strong>lme4</strong>)</li>
<li><code>stan_gamm4</code>: Generalized linear additive models with optional group-specific terms; similar to <code>gamm4</code> (<strong>gamm4</strong>)</li>
<li><code>stan_polr</code>: Ordinal regression models; similar to <code>polr</code> (<strong>MASS</strong>)</li>
<li><code>stan_betareg</code>: Beta regression models; similar to <code>betareg</code> (<strong>betareg</strong>)</li>
<li><code>stan_clogit</code>: Conditional logistics regression models; similar to <code>clogit</code> (<strong>survival</strong>)</li>
<li><code>stan_mvmer</code>: Multivariate generalized linear models with correlated group-specific terms; a multivariate form of <code>stan_glmer</code></li>
<li><code>stan_jm</code>: Shared parameter joint models for longitudinal and time-to-event models</li>
</ul>
<div style="text-align: right">
<p><sub><sup>
Source: <a href="https://github.com/stan-dev/rstanarm" class="uri">https://github.com/stan-dev/rstanarm</a> | <a href="https://cran.r-project.org/web/packages/rstanarm/index.html">Vignettes</a>
</sub></sup></p>
</div>
<p>The package comes with its own <a href="https://mc-stan.org/rstanarm/">web site</a> which features an extensive collection of vignettes and tutorials.</p>
</div>
<div id="brms" class="section level5">
<h5>brms</h5>
<p>Developed by <a href="https://paul-buerkner.github.io/">Paul-Christian Bürkner</a>, <strong>brms</strong> offers another interface for statistical modeling using the Stan back-end:</p>
<blockquote>
<font size="-1">
“The <strong>brms</strong> package provides an interface to fit Bayesian generalized (non-)linear multivariate multilevel models using Stan, which is a C++ package for performing full Bayesian inference (see <a href="http://mc-stan.org/" class="uri">http://mc-stan.org/</a>). The formula syntax is very similar to that of the package lme4 to provide a familiar and simple interface for performing regression analyses.”
</font>
</blockquote>
<div style="text-align: right">
<p><sub><sup>
Source: <a href="https://paul-buerkner.github.io/brms/" class="uri">https://paul-buerkner.github.io/brms/</a>
</sub></sup></p>
</div>
<p>The package has extensive functionality, supporting a large collection of model types, including</p>
<ul>
<li>linear models</li>
<li>robust linear models</li>
<li>binomial models</li>
<li>categorical models</li>
<li>multinomial models</li>
<li>count data models</li>
<li>survival models</li>
<li>ordinal models</li>
<li>zero-inflated and hurdle models</li>
<li>generalized additive models</li>
<li>non-linear models</li>
</ul>
<div style="text-align: right">
<p><sub><sup>
Source: <a href="https://cran.r-project.org/web/packages/brms/vignettes/brms_overview.pdf">BRMS Overview Vignette</a>
</sub></sup></p>
</div>
<p>In addition to a detailed <a href="https://cran.r-project.org/web/packages/brms/vignettes/brms_overview.pdf">overview</a> and a <a href="https://github.com/paul-buerkner/brms">web site</a> that features extensive documentation, vignettes, and tutorials, <strong>brms</strong> comes with some additional useful features. These include</p>
<ul>
<li><code>marginal_effects()</code>: Display marginal effects of one or more numeric and/or categorical predictors including two-way interaction effects</li>
<li><code>brm_multiple()</code>: Inference across imputations generated by <code>mice</code> prior to model fitting in <strong>brms</strong></li>
<li><code>mi()</code>: Fully Bayesian imputation during model fitting</li>
</ul>
<br>
<div style="text-align: right">
<p><sub><sup>
Sources: <a href="https://cran.r-project.org/web/packages/brms/brms.pdf">BRMS Reference Manual</a>, <a href="https://cran.r-project.org/web/packages/brms/vignettes/brms_missings.html">BRMS Missing Values Vignette</a>
</sub></sup></p>
</div>
</div>
</div>
<div id="concluding-remarks" class="section level3">
<h3>Concluding Remarks</h3>
<div id="reproducibility" class="section level5">
<h5>Reproducibility</h5>
<p>Reproducibility is an important standard for the quality of quantitative research. Unfortunately, one and the same Stan program can produce (slightly) different parameter estimates even when identical seeds are supplied to the sampler. According to the Stan Reference Manual, Stan is only guaranteed to fully reproduce results if <em>all</em> of the following are held constant:</p>
<blockquote>
<p><font size="-1"></p>
<ul>
<li>Stan version</li>
<li>Stan interface (RStan, PyStan, CmdStan) and version, plus version of interface language (R, Python, shell)</li>
<li>versions of included libraries (Boost and Eigen)</li>
<li>operating system version</li>
<li>computer hardware including CPU, motherboard and memory</li>
<li>C++ compiler, including version, compiler flags, and linked libraries</li>
<li>same configuration of call to Stan, including random seed, chain ID, initialization and data</li>
</ul>
</font>
</blockquote>
<div style="text-align: right">
<p><sub><sup>
Source: <a href="https://mc-stan.org/docs/2_19/reference-manual/reproducibility-chapter.html">Stan Reference Manual</a>
</sub></sup></p>
</div>
<p>While deviations from user’s original results due to version mismatches are likely to be minor, users are thus advised to document their software and hardware configurations.</p>
</div>
<div id="summary" class="section level5">
<h5>Summary</h5>
<p>Stan offers a powerful tool for statistical inference using not only full Bayesian inference but also variational inference or penalized maximum likelihood estimation. It provides an intuitive language for statistical modeling that accommodates most, though perhaps not every user’s needs (most notably, no discrete parameters). Stan greatly facilitates parallelization and convergence diagnosis. R packages like <strong>rstanarm</strong> and <strong>brms</strong> offer a vast array of ‘canned’ solutions. These come with the benefit of making Bayesian inference easily accessible to a broad audience but arguably come with pitfalls when users overly rely on default settings.</p>
<p>Applied modeling beyond canned solutions may require some specialized knowledge, e.g. when it comes to <a href="https://mc-stan.org/docs/2_21/stan-users-guide/changes-of-variables.html">Jacobian adjustments</a> when sampling transformed parameters, the <a href="https://mc-stan.org/docs/2_21/stan-users-guide/latent-discrete-chapter.html">marginalization of discrete parameters</a>, or <a href="https://mc-stan.org/docs/2_19/stan-users-guide/optimization-chapter.html">effiency tuning</a>. Furthermore, Stan’s primary algorithm, the NUTS variant of HMC <span class="citation">(see Hoffman and Gelman 2014)</span>, tends to remain a black box even for long-time Bayesian practitioners. This should, however, by no means discourage aspiring users from learning Stan. After all, one of the core strengths of the Stan project is its extensive documentation, its vast collection of freely available learning materials, and its active and responsive online community.</p>
</div>
</div>
<div id="about-the-presenter" class="section level3">
<h3>About the Presenter</h3>
<p>Denis Cohen <a href="mailto:denis.cohen@mzes.uni-mannheim.de"><i class="fa
              fa-envelope"></i> </a>
<a href="https://denis-cohen.github.io"><i class="fa
              fa-globe"></i> </a>
<a href="https://twitter.com/denis_cohen"><i class="fa
              fa-twitter"></i></a> is a postdoctoral fellow in the Data and Methods Unit at the Mannheim Centre for European Social Research (MZES), University of Mannheim, and one of the organizers of the <a href="https://socialsciencedatalab.mzes.uni-mannheim.de/page/about/">MZES Social Science Data Lab</a>. His research focus lies at the intersection of political preference formation, electoral behavior, and political competition. His methodological interests include quantitative approaches to the analysis of clustered data, measurement models, data visualization, strategies for causal identification, and Bayesian statistics.</p>
</div>
<div id="further-reading" class="section level3">
<h3>Further Reading</h3>
<ul>
<li><a href="https://www.nature.com/articles/s41562-019-0807-z">Aczel, Balazs, Rink Hoekstra, Andrew Gelman, Eric-Jan Wagenmakers, Irene G. Klugkist, Jeffrey N. Rouder, Joachim Vandekerckhove, Michael D. Lee, Richard D. Morey, Wolf Vanpaemel, Zoltan Dienes and Don van Ravenzwaaij (2020). Discussion points for Bayesian inference. <em>Nature Human Behaviour</em>.</a></li>
<li><a href="https://github.com/SocialScienceDataLab/intro-bayesian-statistics">Schlierholz, Malte. SSDL Input Talk <em>Fundamentals in Bayesian Statistics</em>. December 14, 2016.</a></li>
<li><a href="https://github.com/stan-dev/stan/wiki/Stan-Best-Practices">Stan Best Practices</a></li>
<li><a href="https://staudtlex.de">staudtlex.de</a> – Alex Staudt’s web site, with R and C++ implementations of the Metropolis-Hastings algorithm</li>
<li><a href="https://mc-stan.org/users/documentation/case-studies/rstan_workflow.html">Tutorial: Robust Statistical Workflow with RStan</a></li>
</ul>
</div>
<div id="references" class="section level3 unnumbered">
<h3 class="unnumbered">References</h3>
<div id="refs" class="references hanging-indent">
<div id="ref-Bischof2019">
<p>Bischof, Daniel, and Markus Wagner. 2019. “Do Voters PolarizeWhen Radical Parties Enter Parliament.” <em>American Journal of Political Science</em>. <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/ajps.12449">https://onlinelibrary.wiley.com/doi/abs/10.1111/ajps.12449</a>.</p>
</div>
<div id="ref-Burkner2017">
<p>Bürkner, Paul-Christian. 2017a. “Advanced Bayesian Multilevel Modeling with the R Package brms.” <em>The R Journal</em> 10 (1): 395–411. <a href="http://arxiv.org/abs/1705.11123">http://arxiv.org/abs/1705.11123</a>.</p>
</div>
<div id="ref-Burkner2017a">
<p>———. 2017b. “brms : An R Package for Bayesian Multilevel Models Using Stan.” <em>Journal of Statistical Software</em> 80 (1). <a href="https://doi.org/10.18637/jss.v080.i01">https://doi.org/10.18637/jss.v080.i01</a>.</p>
</div>
<div id="ref-Carpenter2017">
<p>Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan : A Probabilistic Programming Language.” <em>Journal of Statistical Software</em> 76 (1). <a href="https://doi.org/10.18637/jss.v076.i01">https://doi.org/10.18637/jss.v076.i01</a>.</p>
</div>
<div id="ref-Cook2006">
<p>Cook, Samantha R., Andrew Gelman, and Donald B. Rubin. 2006. “Validation of software for Bayesian models using posterior quantiles.” <em>Journal of Computational and Graphical Statistics</em> 15 (3): 675–92. <a href="https://doi.org/10.1198/106186006X136976">https://doi.org/10.1198/106186006X136976</a>.</p>
</div>
<div id="ref-Gill2015">
<p>Gill, Jeff. 2015. <em>Bayesian Methods. A Social and Behavioral Sciences Approach</em>. 3rd ed. Boca Raton, FL: CRC Press.</p>
</div>
<div id="ref-Gill2013">
<p>Gill, Jeff, and Christopher Witko. 2013. “Bayesian analytical methods: A methodological prescription for public administration.” <em>Journal of Public Administration Research and Theory</em> 23 (2): 457–94. <a href="https://doi.org/10.1093/jopart/mus091">https://doi.org/10.1093/jopart/mus091</a>.</p>
</div>
<div id="ref-Hoffman2014">
<p>Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-U-Turn Sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.” <em>Journal of Machine Learning Research</em> 15: 1593–1623. <a href="http://arxiv.org/abs/1111.4246">http://arxiv.org/abs/1111.4246</a>.</p>
</div>
<div id="ref-Stan2.19a">
<p>Stan Development Team. 2019. “Stan Functions Reference. Version 2.19.”</p>
</div>
</div>
</div>
]]>
      </description>
    </item>
    
  </channel>
</rss>