\n", "

\n", "This model is not perfect as it does not exactly fit the data, and describes continuous-valued quantites while game scores or point-spreads or discrete. However, such a model provides an approximation that can be used to assign probabilities to events. Indeed:\n", "\n", "P(y > 0 | x) = P(z+x > 0 | x) = P(z > -x | x) = 1 - P(z < -x | x)\n", "\n", "As z follows a Gaussian distribution, we can easily compute its Cumulative Distribution Function (CDF).\n", "\n", "**4) Making use of the approximated distribution of z|x, compute the following probabilities:**\n", "\n", "- **P7 = Pr(Favorite wins | point spread = 3.5)**\n", "- **P8 = Pr(Favorite wins | point spread = 8.5)**\n", "- **P9 = Pr(Favorite wins | point spread = 9)**\n", "\n", "We derive the calculation for P7:\n", "\n", "$$\\textrm{P7} = \\textrm{Pr(Favorite wins | point spread} = 3.5) = \\textrm{Pr}(y > 0| x=3.5) = \\textrm{Pr}(z+x>0|x=3.5) = \\textrm{Pr}(z>-x | x=3.5) = 1-\\textrm{Pr}(z<-x|x=3.5).$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "P7 = 1 - dist.cdf(-3.5)\n", "P8 = 1 - dist.cdf(-8.5)\n", "P9 = 1 - dist.cdf(-9)\n", "\n", "print(\"P7 = \" + str(P7))\n", "print(\"P8 = \" + str(P8))\n", "print(\"P9 = \" + str(P9))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the probability for a 3.5-point favorite agrees with the empirical value given earlier,\n", "while the probabilities for 8.5- and 9-point favorites make more sense than the empirical values derived before." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Posterior inference (Bayesian Data Analysis, Gelman et al, Chapter 2)\n", "\n", "This exercise illustrates how to do posterior inference using standard probability distributions introduced in the class.\n", "\n", "Suppose you have a Beta(4,4) prior distribution on the probability $\\theta$ that a coin will yield a \"head\" when spun in a specified manner. The coin is independently spun ten times, and \"heads\" appear fewer than 3 times. You don't know how many heads were seen, but only that their number is less than 3. We will denote by y the random variable giving the number of heads obtained after the 10 throws.\n", "\n", "**1) Write the prior probability distribution of $\\theta$ and the conditional y|$\\theta$.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\theta$: Probability that a coin will yield a \"head\", $p(\\theta) \\propto \\theta^{3}(1-\\theta)^{3}.$\n", "\n", "y: Random variable giving the number of heads over the n=10 independent throws, $p(y=k|\\theta) = \\binom{n}{k}\\theta^{k}(1-\\theta)^{n-k}.$\n", "\n", "**2) Write the data likelihood.**\n", "\n", "$$\n", "\\begin{align}\n", "p(y<3|\\theta) = p(y=0|\\theta) + p(y=1|\\theta) + p(y=2|\\theta) = (1-\\theta)^{10} + 10\\theta(1-\\theta)^{9} + 45\\theta^{2}(1-\\theta)^{8}\n", "\\end{align}\n", "$$\n", "\n", "**3) Calculate the posterior density of $\\theta$ (up to a constant).**\n", "\n", "We want to update our knowledge on the probability $\\theta$ to get a \"head\" knowing that over 10 throws there were less than 3 heads. In other words, we want to infer the quantity: $p(\\theta|y<3)$. By simply making use of the Bayes' rule we can write $p(\\theta|y<3) \\propto p(y<3|\\theta)p(\\theta)$, which gives:\n", "\n", "$$\n", "\\begin{align}\n", "p(\\theta|y<3) = \\theta^{3}(1-\\theta)^{13} + 10\\theta^{4}(1-\\theta)^{12} + 45\\theta^{5}(1-\\theta)^{11}\n", "\\end{align}\n", "$$\n", "\n", "**4) Plot the posterior distribution of $\\theta$ (up to a constant).**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "theta = np.linspace(0,1,100)\n", "posterior_theta = theta**3*(1-theta)**13 + 10*theta**4*(1- theta)**12 + 45*theta**5*(1 - theta)**11\n", "\n", "plt.figure()\n", "plt.plot(theta, posterior_theta)\n", "plt.xlabel(\"theta\")\n", "plt.title(\"Posterior distribution of theta (up to a constant)\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predictive prior distribution (Bayesian Data Analysis, Gelman et al, Chapter 2)\n", "\n", "In this exercise, we show how we can incorporate all the information we have about the parameters of an experiment, in order to derive a predictive prior over the results of this experiement." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let y be the number of 6's in 1000 independent rolls of a particular real die, which may be unfair. Let $\\theta$ be the probability that the die lands on 6. We assume the following prior distribution for $\\theta$:\n", "\n", "$$\n", "\\begin{align}\n", "Pr(\\theta = \\frac{1}{12}) & = 0.25 \\\\\n", "Pr(\\theta = \\frac{1}{6}) & = 0.5 \\\\\n", "Pr(\\theta = \\frac{1}{4}) & = 0.25.\n", "\\end{align}\n", "$$\n", "\n", "**1) Using the normal approximation, give the conditional distribution p(y|$\\theta$).**\n", "\n", "In order to approximate p(y|$\\theta$) by a normal distribution we need to estimate its mean and variance.\n", "We know that the experiment consists in 1000 independent rolls of the dice, so we could model p(y|$\\theta$) by a binomial distribution. This distribution has a mean $\\mu = n*\\theta$, where n is the number of trials and a variance $\\sigma^{2} = n*\\theta(1-\\theta)$. Therefore we can compute estimates of the mean and variance of y for the different values of $\\theta$. In the case of $\\theta$ = 0.25 for instance, we would have $\\mu = 1000 * 0.25 = 250$, and $\\sigma^{2} = 1000 * 0.25 * 0.75 = 188$. So we obtain, p(y|$\\theta$=0.25) $\\sim \\mathcal{N}(250, 188)$.\n", "\n", "**2) Give an approximate prior distribution for p(y) and plot it.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the law of total probability, the prior distribution p(y) can be written:\n", "\n", "$$\n", "\\begin{align}\n", "p(y) = \\sum_i p(y, \\theta = p_i) = \\sum_i p(y|\\theta = p_i)p(\\theta = p_i)\n", "\\end{align}\n", "$$\n", "\n", "Making use of the previous result we can compute approximate distributions for the different values of $\\theta$.\n", "The approximate prior predictive distribution for y is:\n", "\n", "$$\n", "\\begin{align}\n", "p(y) = \\frac{1}{4}\\mathcal{N}(83 ; 76) + \\frac{1}{2}\\mathcal{N}(167 ; 139) + \\frac{1}{4}\\mathcal{N}(250 ; 188)\n", "\\end{align}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y = np.arange(50,300)\n", "pred_prior_y = 0.25*norm.pdf(y, 83, np.sqrt(76)) + 0.5*norm.pdf(y, 167, np.sqrt(139)) + 0.25*norm.pdf(y, 250, np.sqrt(188))\n", "\n", "plt.figure()\n", "plt.plot(y, pred_prior_y)\n", "plt.xlabel('y')\n", "plt.ylabel('density')\n", "plt.title('Approximate predictive prior distribution')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**3) Give approximate 5%, 25%, 50%, 75%, 95% points for the distribution of y.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The prior predicitve distribution of y is a sum of three Gaussian distributions. Moreover, we can see on the plot above that there is almost no overlap between them. So we can consider that 1/4 of the distribution is in the first hump, 1/2 in the second one, and 1/4 in the last one. We introduce the following probability distributions: $y_1 \\sim \\mathcal{N}(\\mu = 83, \\sigma^2 = 76)$, $y_2 \\sim \\mathcal{N}(\\mu = 167, \\sigma^2 = 139)$, $y_3 \\sim \\mathcal{N}(\\mu = 250,\\sigma^2 = 188)$.\n", "\n", "We also recall that, if we consider a random variable X with a CDF $F_X$, the $\\alpha$% points for the distribution of X is defined by:\n", "\n", "$$P(X < x) = \\alpha \\iff F_X(x) = \\alpha \\iff x = F_{X}^{-1}(\\alpha). $$\n", "\n", "Relying on this, we observe that the 5% points for y are completely contained in the first hump. However this hump represents only 1/4 of the data. Therefore, the 5% points for y are given by the 20% points of $y_1$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dist1 = norm(83, np.sqrt(76))\n", "five_percent_points = dist1.ppf(0.20)\n", "print(\"5% points = \" + str(np.rint(five_percent_points)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We know that the first hump contains 25% percent of the distribution of y. Therefore, the 25% points are in between the first and second humps, which would be approximately 120 on the graph. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The second hump contains 1/2 of the distribution of y so the 50% points is given by its mean 167. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The last hump contains 1/4 of the distribution, so the 75% points is in between the second and third hump, which is approximately 207." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " Finally the last hump contains the last 25% of the distribution. Before the hump we have 75% of the distribution of y, and in order to reach 95% we will need the 20% remaining which can be completely determined by y_3. However this distribution represents only 1/4 of y so we will need to determine its 80% points. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "dist3 = norm(250, np.sqrt(188))\n", "ninety_five_percent_points = dist3.ppf(0.80)\n", "print(\"95% points = \" + str(np.rint(ninety_five_percent_points)))" ] } ], "metadata": { "kernelspec": { "display_name": "Python3.6", "language": "python", "name": "python3.6" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 1, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 2 }