Two common problems in statistics are model selection and robustness testing. A big issue is that the usefulness of a test statistic is conditioned on the model being correctly specified. This means that we cannot rely on simple tests.

One board category of solutions to this problems is to try a bunch of different models and somehow compare them. In practice this normally involves either sequentially adding or taking away covariates and seeing how the overall model performance changes.

At the end of this process, the “best” model is selected and used for all future inference, while the other models are discarded.

However, this approach is far from problem free. For example, it can create false confidence in our results if we ignore model uncertainty. If we use test statistics conditional on our model being valid, then we are ignoring the additional uncertainty about the validity of the model itself.

Bayesian model averaging seeks to fix this by directly incorporating model uncertainty into our analysis. If we somehow know that model A has a 70% chance of being the correct model while model B has a 30% chance, then we can include this uncertainty in our output by taking a weighted average of the coefficients that the models produce.

$$ \beta = \beta_A * P(Model A is correctly specified) + \beta_B * P(Model B is correctly specified) $$

I glossed over how we would get the probabilities of models being correct because it is somewhat complex. The good news is that this kind of complexity is easy to hand over to a stats pacakge (e.g. (pyBMA)[/post/pyBMA]). The key take away is that averaging across models allows us to produce better estimates than any single model.

#### Testing the claim

The value proposition of model averaging is that it allows us to make better models, but this needs to be substainiated. There is a wide literature detailing the success of BMA in a range of different applications:

Raferty, A. E., Madigan, D. and Volinsky, C. T. (1995) Accounting for Model Uncertainty in Survival Analysis Improves Predictive Performance