Exercise solution for Chapter 2, Part 2

Sarah Cooper

Feb 24, 2020 3 min read exercises, Chapter 2

Exercise 2.6

The first part of the exercise asks you to:

Choose your own prior for the parameters of the beta distribution. You can do this by sketching it here: https://jhubiostatistics.shinyapps.io/drawyourprior.

After sketching a plot, I chose the parameters to set up a prior: $α$ = 2.47 and $β$ = 8.5.

Using this prior

Next, the exercise asks you:

Once you have set up a prior, re-analyse the data from Section 2.9.2, where we saw Y = 40 successes out of n = 300 trials.

To be able to use the loglikelihood function from the text, I first needed to redefine it here:

loglikelihood = function(theta, n = 300, k = 40) { ## Function definition from the textbook
  log(choose(n, k)) + k * log(theta) + (n - k) * log(1 - theta)
}

Then, I created a vector of $θ$ values between 0 and 1, spaced 0.001 units wide. The plot below shows different possible values of $θ$ and the log likelihood for each of these values:

thetas = seq(0, 1, by = 0.001)
plot(thetas, loglikelihood(thetas), xlab = expression(theta),
     ylab = expression(paste("log f(", theta, " | y)")),type = "l")

Next, I used rbeta to draw 1,000,000 random samples from a beta distribution with my new picks for the parameters for $α$ and $β$ :

rtheta = rbeta(1000000, shape1 = 2.47, shape2 = 8.5)

After running the above, for each of these $θ$ values, we then generate a random sample of $Y$ as observed in the histogram (with orange bars):

y = vapply(rtheta, function(th) {
  rbinom(1, prob = th, size = 300)
}, numeric(1))
hist(y, breaks = 50, col = "orange", main = "", xlab = "")

Our next step is to use this information to generate a posterior distribution of $θ$ at a fixed $Y$ value. In this example they used $Y = 40$ .

After running the above, for each of these thetas, we generated simulated values for the posterior distribution of $θ$ at $Y = 40$ as observed in this histogram (with green bars).

thetaPostEmp = rtheta[ y == 40 ]
hist(thetaPostEmp, breaks = 40, col = "chartreuse4", main = "",
     probability = TRUE, xlab = expression("posterior"~theta), ylim=c(0,40))

densPostTheory  =  dbeta(thetas, 42.47, 268.5)

You can check how this compares to the theoretical posterior distribution for $θ$ at $Y = 40$ :

hist(thetaPostEmp, breaks = 40, col = "chartreuse4", main = "",
  probability = TRUE, xlab = expression("posterior"~theta))
lines(thetas, densPostTheory, type="l", lwd = 3)

We can also check the means of both distributions computed above.

mean(thetaPostEmp) # Empirical

## [1] 0.1363246

dtheta = thetas[2]-thetas[1]
sum(thetas * densPostTheory * dtheta) # Theoretical

## [1] 0.1365727

Monte Carlo integration

We can use Monte Carlo integration instead and then check the agreement between our Monte Carlo sample thetaPostMC and our sample thetaPostEmp with a QQ plot:

thetaPostMC = rbeta(n = 1e6, 42.47, 268.5)
mean(thetaPostMC)

## [1] 0.1365813

qqplot(thetaPostMC, thetaPostEmp, type = "l", asp = 1)
abline(a = 0, b = 1, col = "blue")

densPost2 = dbeta(thetas, 42.47, 268.5)
mcPost2   = rbeta(1e6, 42.47, 268.5)
sum(thetas * densPost2 * dtheta)  # mean, by numeric integration

## [1] 0.1365727

mean(mcPost2)                     # mean, by MC

## [1] 0.1365487

thetas[which.max(densPost2)]      # MAP estimate

## [1] 0.134

quantile(mcPost2, c(0.025, 0.975))

##      2.5%     97.5% 
## 0.1007321 0.1767921

exercises Chapter 2

Exercise solution for Chapter 2, Part 2

Exercise 2.6

Using this prior

Monte Carlo integration

Sarah Cooper

Graduate Student in MIP

Related