Bayes Factors for Dummies
Kevin Boone has a great introduction to Bayesian statistics with some very simple examples to begin to comprehend how Bayesian statistics are calculated. But, as he notes, it is only scratching the surface of how to apply Bayesian stats.
Currently, people report Bayesian statistics in the form of a Bayes factor, which is a ratio of the Bayesian probability of the null hypothesis and the Bayesian probability of the alternative hypothesis occurring. A review from Jarosz & Wiley (2014) and a methods paper from Masson (2011) boils down the Bayes factor into this broad equation:
Where p(x) = the probability of x occurring. Also, the notation for Bayes Factor, BF10, is read as "The Bayes Factor of H1 against H0". Some may see BF01, which would compute the inverse of this Bayes Factor, or the BF of H0 against H1. Some people report BF10 whereas others report BF01. So long as you understand the notation (the first character in the subscript is the numerator) then you should be fine in interpreting the BF.
In this Bayes Factor post, I'll try to explain to you how to compute a Bayes Factor using some made up data and trying to explain the equations along the way. Unlike Kevin Boone's tutorial, some basic statistical knowledge is required -- ANOVA, sums of squares, and degrees of freedom won't be explained in this post. However, I'm assuming most people who are looking up information on Bayes Factors are people who are already familiar with traditional, null hypothesis significance testing.
I would highly recommend first reading Kevin Boone's post and then this post! I'll also be referring to equations from Jarosz & Wiley (2014) and Masson (2011) -- which I recommend reading, but should be summarized succinctly (I hope) in this post.
In Kevin Boone's Bayesian statistics for dummies, he uses an example of a horse named Dogmeat who is racing against another horse. Boone's example was created to be as simple as possible to show how the Bayes Theorem is computed to generate a probability.
I expand on Boone's data with eight different horses that Dogmeat either came in 1st place to or lost to (e.g. didn't come in 1st place).
I generate my own data here:
Like in Boone's example, Dogmeat "wins" (e.g. comes in first place) 5 times and "loses" (in our case, doesn't come in 1st place) 7 times. Also like Boone's example, Dogmeat wins 3 times and loses once when it's raining.
In total, Dogmeat races 12 times. To be consistent, there are only 12 first place finishes total across all horses.
I run a repeated measures ANOVA on these data to generate some statistics (you can view these data analyses on this spreadsheet in the "Stats" tab, this SPSS output, or this PDF). Place was divided into whether they came in first or not. Weather is the weather condition during the race:
|Source||SS (Type III)||df||F||p||ηp2|
|Place X Weather||60.50||1||5.465||0.052||0.438|
|Error (Place X Weather)||77.50||7|
By traditional, null hypothesis significance testing (NHST) standards, we see our main effects of place and weather being significant whereas our interaction effect is not significant (some people may call this "near significant").
Two things we could use to calculate the BF are either the sums of squares (SS) or the partial eta squared (ηp2). There exists a relationship between SS and ηp2:
People who report statistics find value in ηp2 because it gives an indication of the strength of your finding. As I discussed in a previous post on Bayesian stats, NHST is really just looking for how likely is your alternative hypothesis due to chance. ηp2 helps by saying "how much of my effect is actually due to the effect and how much is due to error?" If we look at the main effect of place, the size of the effect is 90%, indicating 90% of the effect is actually due to how the horses placed and only 10% of the effect is due to error. For weather, 50.6% is due to the weather effect whereas 49.4% is due to stuff we can't account for. Lastly, the size of the place X weather interaction is 43.8% due to the interaction effects and 56.2% due to error or stuff we can't account for. The larger the effect size, the more confident you can be in your p-value.
Why use a Bayes Factor when partial-eta-squared is basically doing the same thing?
On the contrary, they are doing different things. A Bayes Factor is trying to predict how likely your hypothesis will be in the future whereas ηp2 is computing how much of your current dataset and hypothesis is reliable. BF is saying what is the likelihood this will occur the next time and ηp2 is saying what was the likelihood this was due to our observed effect? The keyword between the two is likelihood.
So two statisticians, Akaike and Schwartz, created a series of likelihood functions (Akaike information criteria and Bayesian information criteria, respectively) that estimate the maximum likelihood of an event occurring (in our case, our event is based on ηp2 ). You can read a comparison between the two here.
The Bayesian information criteria (BIC) is important to calculating a BF because it estimates the maximum likelihood of an event occurring. When you have two likelihood estimates, or BICs, then you can calculate how much one will occur versus the other.
The Bayes Factor equation
These equations are compiled from Jarosz & Wiley (2014), Masson (2011), and Wagenmakers (2007). Don't fear, my Bayesian readers -- I'll be walking you through each of these computations. I just want to lay down the equations before we dive in.
The initial set up before computing your Bayes Factor and posterior odds is to gather your necessary data (step 1) and compute the unexplained variance (step 2).
Then (step 3 in the following sections), you need to compute the BIC for your null (H0) and your alternative (H1) hypotheses:
Where n is the amount of participants/subjects (horses in our case), ln() is the natural logarithm function, SS is sum of squares (for either your effect, error, or total), and k is the amount of free parameters (e.g. how many conditions are in H1 and how many are in H0).
Then, you need to subtract BICH0 from BICH1
Alternatively (step 3b), you could solve for this difference using this simplified equation:
You might notice inside the first natural log function, the division of SSeffect and SSeffect + SSerror is the inverse of ηp2. You could compute the BIC using the following formula
Step 4, once we complete the BIC difference, we can then generate our Bayes Factor:
In this case, this would compute the BF for the null hypothesis against the alternative hypothesis. To get BF10 , we simply find the inverse:
Step 5, we can calculate the odds of your alternative hypothesis occurring again (also known as your posterior probability):
Where D is your data. This equation reads the probability of your alternative hypothesis happening given your data is equal to the Bayes Factor of your alternative hypothesis against your null hypothesis divided by the same Bayes Factor plus one. The decimal odds you get from this equation can be reported
The likelihood of this event occurring again is X%.
Whew, that's a lot of algebra. But hey, I was a dummy like you before. So now I'll walk you through it, step-by-step!
Step 1: Get the numbers you know written down somewhere
- SS (condition)
- SS (error)
- The number of parameters in your null hypothesis model (k0)
- The number of parameters in your alternative hypothesis model (k1)
Let's just work with the main effect of place for now.
- SS(place) = 162
- SS(error) = 18
- ηp2 = 0.9
- Number of parameters in the null model (H0 = there is no difference in place); = 0
- Number of parameters in the alt model (H1 = there is a difference in place); = 1
Step 2a: Calculate the unexplained variance for H0 and H1
Step 2 can be calculated in two different ways, as shown in the Bayes Factor equation section. I'll explain one as 2a and the other as 2b. These will lead into 3a and 3b, but both will contain the same answer to be used in step 4. Bare with me, folks.
Unexplained variance is all the variance you can't account for (sounds pretty straight forward). In the previous section on effect size and ηp2 , I discussed how ηp2 is a measure of accounting how much of your effect is due to your effect and how much of your effect is due to chance, or unexplained variance. We can calculate your unexplained variance using the sum of squares:
How did I calculate SStotal, you ask?
We can do this again for the unexplained variance in H0. Just remember what the hypothesis is: there is no difference in place. This helps to understand the unexplained variance for H0 is really asking how much variance is due to both the supposed alternative hypothesis AND the known error? Treating your SS for place as part of the error term as well.
This isn't too surprising since there isn't a between-subjects measure (e.g. a group factor like sex or age range). If there were to be a between-subjects measure, the unexplained variance for H0 would be different from 1. But since there isn't, then H0 will always be 1.
Step 2b: Another way to calculate unexplained variance between H0 and H1
As noted in the Bayes Factor equation section, you can either compute BIC for each hypothesis separately and then subtract them in the end or you can compute the difference of the BIC in one fell swoop with a simplified equation. Some like the piece-wise equations, some like the one-big-wrench approach.
Again, the concept is to calculate unexplained variance between the two hypotheses. We can do that with the following equation:
Recall that 1-ηp2 is equal to the relationship above. You can alternately compute the unexplained variance like so:
Man, math is so fun!
While we are here, we should calculate the difference in free parameters. Luckily, that's a very easy equation. Using the number of free parameters for H1 (1) and the number of free parameters for H0 (0) we get:
Step 3a: Estimating the difference of BIC
The BIC equations are:
But hey! The first natural log'd term is something we computed already! Holler! We can rewrite these equations:
Where UV is "unexplained variance". We know n = 8 and our null model free parameters are 0 and our alternative model free parameters are 1. So, we plug-and-chug:
To get our difference in BIC, we use this equation:
Not too difficult so far, right? If you are just following along with step 2a and step 3a, skip to step 4. Otherwise, check out step 3b and how you get the same answer as in step 3a! Mathemagic, y'all.
Step 3b: Estimating the difference in BIC with the big-wrench
The simplified equation for the difference in BIC is:
We've calculated the first natural log term already in 2b, as well as the difference in free parameters. We can rewrite this equation:
Where UV10 is the unexplained variance of the alternative hypothesis divided by the unexplained variance of the null hypothesis (see: step 2b) and the change in k as k1-k0 (also see step 2b). The only other number we need is n, which we know is 8. So we just plug-and-chug:
It's exactly like the answer in step 3a! Which makes sense, since this is just the simplified version of the step 3a equations.
Both step 3a and 3b converge at step 4.
Step 4: From BIC to Bayes Factor
The Bayes Factor transformation is:
Using our BIC difference we calculated in step 3, we get:
If we calculate the inverse BF:
Here is an interpretation table from Jeffreys (1961, Appendix B):
|>100||Decisive for X|
|30-100||Very strong for X|
|10-30||Strong for X|
|3-10||Moderate for X|
|1-3||Ancedotal for X|
|0.3-1||Ancedotal for Y|
|0.1-0.3||Moderate for Y|
|0.03-0.1||Strong for Y|
|0.01-0.03||Very strong for Y|
|<0.01||Decisive for Y|
As mentioned in the very beginning, understanding the notation is important in Bayes Factors. If we are to use BF10, then the larger numbers generate more support in favor of the alternative hypothesis. Alternately, BF01 is the value that is initially calculated, which means the larger numbers generate more support in favor of the null hypothesis.
The general consensus is a Bayes Factor of 3 or greater (or 1/3 or less) is considered to have the same value as a p-value of 0.05 or less.
Step 5: From Bayes Factor to likelihood
So what does it mean? Here's the equation for converting your BF into a likelihood:
Since we know BF10, we just need to plug-and-chug:
Our posterior probability, or the likelihood this will occur next time, is 0.9997 or 99.97%. In contrast, the likelihood that this would not occur next time is 0.0003, 0.03%, or 1 out of 3333 times.
Compare this to the ηp2 result, 0.9. The interpretation would be that 90% of the data is in fact due to the alternative hypothesis and not the null. Furthermore, the chance that the result is due to the null hypothesis is less than 0.001.
You can see in these two short explanations of the results that the BF likelihood is much easier to understand on a surface level. People can connect with Next time, there should be a 99% chance of more horses finishing after 1st place
People don't connect to this as much: Since 90% of the data on who comes in first are due to the horses finishing in first, and the likelihood of the amount of horses finishing in first or in any other position is less than 0.1%, then the chance of more horses finishing after 1st place is.... uh... well, I can't really predict the future with these stats, but if it's anything like the past then there's a pretty good chance!
Finally, the way I would report Bayes Factors would be something like this:
We observed a main effect of place (F=63.00, p<.001, ηp2=0.9, BF10>100), indicating more horses will finish after first place with a 99.97% decisive likelihood of occurring again.
Alternately, people have recently been adding separate Bayesian analysis sections into results sections.
I have not seen any original research articles in psychology or neuroscience that specifically only use Bayesian statistics or report only Bayes Factor -- at least, not in cognitive psychology or cognitive neuroscience. But that doesn't mean I won't.
Whew! That was long.
If you would like to practice calculating BF on your own with my data, I have provided the results you should see in the tables below:
|Place x Weather||60.5||77.5||138||0.438||3.554||0.7804|
With pairwise comparisons for a two-way interaction, I just use the partial-eta^2 to calculate my BF:
|Place x Weather pairwise||partial-eta^2||BF10||p(H1|D)|
|Dry: Lose > 1st||0.292||1.407||0.5846|
|Rain: Lose > 1st||0.762||110.19||0.991|
|1st: Dry > Rain||0.167||0.734||0.4234|
|Lose: Rain > Dry||0.472||4.549||0.8198|
As Boone says at the end of his page, this is only the surface of what Bayesian statistics are -- and in this case, what Bayes Factors are. Other things I didn't cover: where each component of the BF equation comes from; what do you do with 3-way interactions; how do you plot a Bayesian distribution; why doesn't this equation look like the Bayes Theorem; etc.
I can't say I can address all of these questions, as I am still trying to master the basics myself, but hopefully this puts you one step ahead in your education of Bayesian stats.
In case you missed it: