Bayes Factors for Dummies
Kevin Boone has a great introduction to Bayesian statistics with some very simple examples to begin to comprehend how Bayesian statistics are calculated. But, as he notes, it is only scratching the surface of how to apply Bayesian stats.
Currently, people report Bayesian statistics in the form of a Bayes factor, which is a ratio of the Bayesian probability of the null hypothesis and the Bayesian probability of the alternative hypothesis occurring. A review from Jarosz & Wiley (2014) and a methods paper from Masson (2011) boils down the Bayes factor into this broad equation:
Where p(x) = the probability of x occurring. Also, the notation for Bayes Factor, BF_{10}, is read as "The Bayes Factor of H_{1} against H_{0}". Some may see BF_{01}, which would compute the inverse of this Bayes Factor, or the BF of H0 against H1. Some people report BF_{10 }whereas others report BF_{01}. So long as you understand the notation (the first character in the subscript is the numerator) then you should be fine in interpreting the BF.
In this Bayes Factor post, I'll try to explain to you how to compute a Bayes Factor using some made up data and trying to explain the equations along the way. Unlike Kevin Boone's tutorial, some basic statistical knowledge is required -- ANOVA, sums of squares, and degrees of freedom won't be explained in this post. However, I'm assuming most people who are looking up information on Bayes Factors are people who are already familiar with traditional, null hypothesis significance testing.
I would highly recommend first reading Kevin Boone's post and then this post! I'll also be referring to equations from Jarosz & Wiley (2014) and Masson (2011) -- which I recommend reading, but should be summarized succinctly (I hope) in this post.
Dogmeat, revisited
In Kevin Boone's Bayesian statistics for dummies, he uses an example of a horse named Dogmeat who is racing against another horse. Boone's example was created to be as simple as possible to show how the Bayes Theorem is computed to generate a probability.
I expand on Boone's data with eight different horses that Dogmeat either came in 1st place to or lost to (e.g. didn't come in 1st place).
I generate my own data here:
Horse | FirstPlaceDry | FirstPlaceRain | LoseDry | LoseRain |
Dogmeat | 2 | 3 | 6 | 1 |
Fleetfoot | 2 | 0 | 1 | 9 |
Dobby | 1 | 0 | 1 | 10 |
Isabella | 2 | 0 | 0 | 10 |
Pignose | 1 | 0 | 2 | 9 |
Nana | 0 | 0 | 1 | 11 |
Felix | 0 | 0 | 5 | 7 |
Thuggy | 0 | 1 | 6 | 5 |
You can also download/view these data more easily by clicking this link to the Google Spreadsheet I created.
Like in Boone's example, Dogmeat "wins" (e.g. comes in first place) 5 times and "loses" (in our case, doesn't come in 1st place) 7 times. Also like Boone's example, Dogmeat wins 3 times and loses once when it's raining.
In total, Dogmeat races 12 times. To be consistent, there are only 12 first place finishes total across all horses.
I run a repeated measures ANOVA on these data to generate some statistics (you can view these data analyses on this spreadsheet in the "Stats" tab, this SPSS output, or this PDF). Place was divided into whether they came in first or not. Weather is the weather condition during the race:
Source | SS (Type III) | df | F | p | η_{p}^{2} |
Place | 162.00 | 1 | 63.000 | <0.001 | 0.900 |
Error (Place) | 18.00 | 7 | |||
Weather | 40.50 | 1 | 7.177 | 0.032 | 0.506 |
Error (Weather) | 39.50 | 7 | |||
Place X Weather | 60.50 | 1 | 5.465 | 0.052 | 0.438 |
Error (Place X Weather) | 77.50 | 7 |
By traditional, null hypothesis significance testing (NHST) standards, we see our main effects of place and weather being significant whereas our interaction effect is not significant (some people may call this "near significant").
Two things we could use to calculate the BF are either the sums of squares (SS) or the partial eta squared (η_{p}^{2}). There exists a relationship between SS and η_{p}^{2}:
People who report statistics find value in η_{p}^{2 }because it gives an indication of the strength of your finding. As I discussed in a previous post on Bayesian stats, NHST is really just looking for how likely is your alternative hypothesis due to chance. η_{p}^{2 }helps by saying "how much of my effect is actually due to the effect and how much is due to error?" If we look at the main effect of place, the size of the effect is 90%, indicating 90% of the effect is actually due to how the horses placed and only 10% of the effect is due to error. For weather, 50.6% is due to the weather effect whereas 49.4% is due to stuff we can't account for. Lastly, the size of the place X weather interaction is 43.8% due to the interaction effects and 56.2% due to error or stuff we can't account for. The larger the effect size, the more confident you can be in your p-value.
Why use a Bayes Factor when partial-eta-squared is basically doing the same thing?
On the contrary, they are doing different things. A Bayes Factor is trying to predict how likely your hypothesis will be in the future whereas η_{p}^{2 }is computing how much of your current dataset and hypothesis is reliable. BF is saying what is the likelihood this will occur the next time and η_{p}^{2 }is saying what was the likelihood this was due to our observed effect? The keyword between the two is likelihood.
So two statisticians, Akaike and Schwartz, created a series of likelihood functions (Akaike information criteria and Bayesian information criteria, respectively) that estimate the maximum likelihood of an event occurring (in our case, our event is based on η_{p}^{2 }). You can read a comparison between the two here.
The Bayesian information criteria (BIC) is important to calculating a BF because it estimates the maximum likelihood of an event occurring. When you have two likelihood estimates, or BICs, then you can calculate how much one will occur versus the other.
The Bayes Factor equation
These equations are compiled from Jarosz & Wiley (2014), Masson (2011), and Wagenmakers (2007). Don't fear, my Bayesian readers -- I'll be walking you through each of these computations. I just want to lay down the equations before we dive in.
The initial set up before computing your Bayes Factor and posterior odds is to gather your necessary data (step 1) and compute the unexplained variance (step 2).
Then (step 3 in the following sections), you need to compute the BIC for your null (H_{0}) and your alternative (H_{1}) hypotheses:
Where n is the amount of participants/subjects (horses in our case), ln() is the natural logarithm function, SS is sum of squares (for either your effect, error, or total), and k is the amount of free parameters (e.g. how many conditions are in H_{1} and how many are in H_{0}).
Then, you need to subtract BIC_{H0} from BIC_{H1}
Alternatively (step 3b), you could solve for this difference using this simplified equation:
You might notice inside the first natural log function, the division of SS_{effect} and SS_{effect} + SS_{error} is the inverse of η_{p}^{2}. You could compute the BIC using the following formula
Step 4, once we complete the BIC difference, we can then generate our Bayes Factor:
In this case, this would compute the BF for the null hypothesis against the alternative hypothesis. To get BF_{10 }, we simply find the inverse:
Step 5, we can calculate the odds of your alternative hypothesis occurring again (also known as your posterior probability):
Where D is your data. This equation reads the probability of your alternative hypothesis happening given your data is equal to the Bayes Factor of your alternative hypothesis against your null hypothesis divided by the same Bayes Factor plus one. The decimal odds you get from this equation can be reported
The likelihood of this event occurring again is X%.
Whew, that's a lot of algebra. But hey, I was a dummy like you before. So now I'll walk you through it, step-by-step!
Step 1: Get the numbers you know written down somewhere
You need:
- SS (condition)
- SS (error)
- η_{p}^{2 }
- The number of parameters in your null hypothesis model (k_{0})
- The number of parameters in your alternative hypothesis model (k_{1})
Let's just work with the main effect of place for now.
- SS(place) = 162
- SS(error) = 18
- η_{p}^{2 } = 0.9
- Number of parameters in the null model (H_{0} = there is no difference in place); = 0
- Number of parameters in the alt model (H_{1} = there is a difference in place); = 1
Step 2a: Calculate the unexplained variance for H_{0} and H_{1}
Step 2 can be calculated in two different ways, as shown in the Bayes Factor equation section. I'll explain one as 2a and the other as 2b. These will lead into 3a and 3b, but both will contain the same answer to be used in step 4. Bare with me, folks.
Unexplained variance is all the variance you can't account for (sounds pretty straight forward). In the previous section on effect size and η_{p}^{2 }, I discussed how η_{p}^{2 } is a measure of accounting how much of your effect is due to your effect and how much of your effect is due to chance, or unexplained variance. We can calculate your unexplained variance using the sum of squares:
How did I calculate SStotal, you ask?
We can do this again for the unexplained variance in H0. Just remember what the hypothesis is: there is no difference in place. This helps to understand the unexplained variance for H0 is really asking how much variance is due to both the supposed alternative hypothesis AND the known error? Treating your SS for place as part of the error term as well.
This isn't too surprising since there isn't a between-subjects measure (e.g. a group factor like sex or age range). If there were to be a between-subjects measure, the unexplained variance for H0 would be different from 1. But since there isn't, then H0 will always be 1.
Step 2b: Another way to calculate unexplained variance between H_{0} and H_{1}
As noted in the Bayes Factor equation section, you can either compute BIC for each hypothesis separately and then subtract them in the end or you can compute the difference of the BIC in one fell swoop with a simplified equation. Some like the piece-wise equations, some like the one-big-wrench approach.
Again, the concept is to calculate unexplained variance between the two hypotheses. We can do that with the following equation:
Recall that 1-η_{p}^{2} is equal to the relationship above. You can alternately compute the unexplained variance like so:
Man, math is so fun!
While we are here, we should calculate the difference in free parameters. Luckily, that's a very easy equation. Using the number of free parameters for H1 (1) and the number of free parameters for H0 (0) we get:
Step 3a: Estimating the difference of BIC
The BIC equations are:
But hey! The first natural log'd term is something we computed already! Holler! We can rewrite these equations:
Where UV is "unexplained variance". We know n = 8 and our null model free parameters are 0 and our alternative model free parameters are 1. So, we plug-and-chug:
To get our difference in BIC, we use this equation:
Not too difficult so far, right? If you are just following along with step 2a and step 3a, skip to step 4. Otherwise, check out step 3b and how you get the same answer as in step 3a! Mathemagic, y'all.
Step 3b: Estimating the difference in BIC with the big-wrench
The simplified equation for the difference in BIC is:
We've calculated the first natural log term already in 2b, as well as the difference in free parameters. We can rewrite this equation:
Where UV_{10} is the unexplained variance of the alternative hypothesis divided by the unexplained variance of the null hypothesis (see: step 2b) and the change in k as k_{1}-k_{0} (also see step 2b). The only other number we need is n, which we know is 8. So we just plug-and-chug:
It's exactly like the answer in step 3a! Which makes sense, since this is just the simplified version of the step 3a equations.
Both step 3a and 3b converge at step 4.
Step 4: From BIC to Bayes Factor
The Bayes Factor transformation is:
Using our BIC difference we calculated in step 3, we get:
If we calculate the inverse BF:
Here is an interpretation table from Jeffreys (1961, Appendix B):
BF_{XY}=... | Interpretation |
>100 | Decisive for X |
30-100 | Very strong for X |
10-30 | Strong for X |
3-10 | Moderate for X |
1-3 | Ancedotal for X |
1 | No evidence |
0.3-1 | Ancedotal for Y |
0.1-0.3 | Moderate for Y |
0.03-0.1 | Strong for Y |
0.01-0.03 | Very strong for Y |
<0.01 | Decisive for Y |
As mentioned in the very beginning, understanding the notation is important in Bayes Factors. If we are to use BF_{10}, then the larger numbers generate more support in favor of the alternative hypothesis. Alternately, BF_{01 }is the value that is initially calculated, which means the larger numbers generate more support in favor of the null hypothesis.
The general consensus is a Bayes Factor of 3 or greater (or 1/3 or less) is considered to have the same value as a p-value of 0.05 or less.
Step 5: From Bayes Factor to likelihood
So what does it mean? Here's the equation for converting your BF into a likelihood:
Since we know BF_{10}, we just need to plug-and-chug:
Our posterior probability, or the likelihood this will occur next time, is 0.9997 or 99.97%. In contrast, the likelihood that this would not occur next time is 0.0003, 0.03%, or 1 out of 3333 times.
Compare this to the η_{p}^{2} result, 0.9. The interpretation would be that 90% of the data is in fact due to the alternative hypothesis and not the null. Furthermore, the chance that the result is due to the null hypothesis is less than 0.001.
You can see in these two short explanations of the results that the BF likelihood is much easier to understand on a surface level. People can connect with Next time, there should be a 99% chance of more horses finishing after 1st place
People don't connect to this as much: Since 90% of the data on who comes in first are due to the horses finishing in first, and the likelihood of the amount of horses finishing in first or in any other position is less than 0.1%, then the chance of more horses finishing after 1st place is.... uh... well, I can't really predict the future with these stats, but if it's anything like the past then there's a pretty good chance!
Finally, the way I would report Bayes Factors would be something like this:
We observed a main effect of place (F=63.00, p<.001, η_{p}^{2}=0.9, BF_{10}>100), indicating more horses will finish after first place with a 99.97% decisive likelihood of occurring again.
Alternately, people have recently been adding separate Bayesian analysis sections into results sections.
I have not seen any original research articles in psychology or neuroscience that specifically only use Bayesian statistics or report only Bayes Factor -- at least, not in cognitive psychology or cognitive neuroscience. But that doesn't mean I won't.
Whew! That was long.
If you would like to practice calculating BF on your own with my data, I have provided the results you should see in the tables below:
SSwithin | SSerror | SStotal | partial-eta^2 | BF10 | p(H1|D) | |
Weather | 40.5 | 39.5 | 80 | 0.506 | 5.949 | 0.8561 |
Place x Weather | 60.5 | 77.5 | 138 | 0.438 | 3.554 | 0.7804 |
With pairwise comparisons for a two-way interaction, I just use the partial-eta^2 to calculate my BF:
Place x Weather pairwise | partial-eta^2 | BF10 | p(H1|D) |
Dry: Lose > 1st | 0.292 | 1.407 | 0.5846 |
Rain: Lose > 1st | 0.762 | 110.19 | 0.991 |
1st: Dry > Rain | 0.167 | 0.734 | 0.4234 |
Lose: Rain > Dry | 0.472 | 4.549 | 0.8198 |
As Boone says at the end of his page, this is only the surface of what Bayesian statistics are -- and in this case, what Bayes Factors are. Other things I didn't cover: where each component of the BF equation comes from; what do you do with 3-way interactions; how do you plot a Bayesian distribution; why doesn't this equation look like the Bayes Theorem; etc.
I can't say I can address all of these questions, as I am still trying to master the basics myself, but hopefully this puts you one step ahead in your education of Bayesian stats.
In case you missed it:
Read about the differences between traditional stats and Bayesian stats
Read about how counting your participants in your analysis can change your results
Hi Nick,
Thanks for an overall nice tutorial. I was confused about some of the formula's though, since it seems there is a inconsistency in what you write.
In step 2b you explain that SSerror / SSerror+SSeffect is basically the same as 1-η2p. This makes sense to me because both are about unexplained variance. Also, computing 1-.9=.1 and .1/1 both give the same answer.
However, earlier in the text you say that "Alternatively (step 3b), you could solve for this difference using this simplified equation: △BIC10=n×ln(SSeffect/SSeffect+SSerror)+(kH1−kH0)×ln(n). You might notice inside the first natural log function, the division of SSeffect and SSeffect + SSerror is the inverse of ηp2. You could compute the BIC using the following formula △BIC10=n×ln(1−η2p)+(kH1−kH0)×ln(n). So here, you equate 1-η2p to SSeffect / SSeffect+SSerror. That's the opposite of what you write later and seems to be wrong (1-.9) ~= (162/180).
Could it be that you confused SSerror and SSeffect in this part of your text? Or am I misunderstanding something?
Erik
Good catch, Erik! Never noticed that. Yeah, SSerror should be on top, not SSeffect. Afterwards, it does say it's the same log as calculated from before... maybe that's why no one mentioned this until now!
HI,
Excuse me if this is an obvious question, but what is the proof (or support/rationale) that the formula in step 5, provides the likelihood that this finding would occur 'next time'. And can you explain what you mean by 'next time'.
Thanks,
Hi S, thanks for reading; not an obvious question at all. Although I personally don't have the algebraic aptitude to provide a full proof, you can find that equation in Masson's 2011 tutorial http://link.springer.com/article/10.3758/s13428-010-0049-5/fulltext.html (it's equation 6). "Next time" is a rough, colloquialism for what Bayesian stats refer to as "posterior probability," which refers to the strength of the finding and likelihood of occurrence. This would be in contrast to the "prior probability," or the likelihood of a particular occurrence from previous observations (I believe; that's my interpretation anyway).
Hope that helps!
This is a great post - extremely useful! Thanks for your help! I was wondering if I could get a little extra guidance on how you determine kH0 and kH1. Here you set kH1 = 1 and kH0 = 0. You mention this is due to your number of conditions, so I figured your kH1 here is because your df for your first factor is 1, and maybe kH0 is always 0. So if you had 6 conditions, would kH1 = 5 and kH0 = 0? Thanks so much!
Hi Alice, I think so, yes. Double-check with the Masson paper -- I think that outlines how to consider conditions pretty well -- but conditions are basically degrees of freedom. Check around equation 10 and "Example 1" in the Masson paper. He calls it "free parameters," but that's simply the number of parameters that can vary freely which is literally a Wikipedia definition of "degrees of freedom".
Here's a direct quote from his paper when talking about kH1 - kH0: "This difference will be equal to the degrees of freedom associated with an effect when null and alternative hypotheses are contrasted. For example, when the difference between two condition means are tested in an ANOVA, the alternative hypothesis has one additional free parameter (the size of the difference between the two condition means), relative to the null hypothesis, so that, in that case, k1−k0=1."
Again, I'm not an expert in this but my understanding of it is this post. Hope that helps!
Hi, would you please have a look at this: http://callistoscraters.com/node/101
I do exactly what you describe, yet I cant get to, for example, BF10 of 1.074 for the "Priming(A)"-Model.
This is a two-by-two between-subjects, fixed-effects factorial design.
Could you advise on how I get the values in the Bayesian ANOVA table? (Created with Jasp).
Thank you very much!
Hi Perseus, this post wasn't intended to be a complete guide. Although I'm very much in support of the JASP/Jamovi teams, they are far more involved with the ins-and-outs of Bayesian analytics than I am. If you are already using JASP, then I would recommended reading the papers that supports JASP (can be found on the JASP site). Good luck!
I was just googling for Bayes Factor reporting and stumbled across this. Names in the example were a little too real. Thanks for the primer, though!