Approximately Bayesian Reasoning: Knightian Uncertainty, Goodhart, and the Look-Elsewhere Effect

2Roko

2RogerDearnaley

New Comment

This line of reasoning is interesting and I think it deserves some empirical exploration, which could be done with modern LLMs and agents.

E.g. make a complicated process that generates a distribution of agents via RLHF on a variety of base models and a variety of RLHF datasets, and then test all those agents on some simple tests. Pick the best agents according to their mean average scores on those simple tests, and then vet them with some much more thorough tests.

I think such an experiment could be done more easily than that: simply apply standard Bayesian learning to a test set of observations and a large set of hypotheses, some of which are themselves probabilistic, yeilding a situation with both Knightian and statistical uncertainty, in which you would normally expect to be able to observe Regressional Goodhart/the Look-Elsewhere Efect. Repeat this, and confirm that that does indeed occur without this statistical adjustment, and then that applying this makes it go away (at least to second order).

However, I'm a little unclear why you feel the need to experimentally confirm a fairly well-known statistical technique: correctly compensating for the Look-Elsewhere Effect is standard procedure in the statistical analysis of experimental High-Energy Physics — which is of course a Bayesian learning process where you have both statistical uncertainty within individual hypotheses and Knightian uncertainty across alternative hypotheses, so exactly the situation in which this applies.

## Approximate Bayesian Reasoning

AIXI is well-known model of a Bayesian reasoner. However, it's not a practical one, since both the Solomonoff prior's basis, Kolmogorov complexity, and the rest of the Solomonoff induction process of Bayesian reasoning as used by AIXI are uncomputable: they require unlimited computational resources. [Even the carefully more limited case of attempting to approximate using polynomial computational resources the polynomial-limited-Kolmogorov complexity of a program for outputting the object in polynomial time turns out to raise some fascinating issues in computational complexity theory.] Any real-world agent will need to figure out how to make do with finite computational resources, just as humans practitioners of STEM and rationalism do. This introduces a wide range of concerns into our approximate Bayesianism:

At the end of the day, the purpose of a set of approximately Bayesian beliefs is to let us make informed decisions as well as we can given our available computational resources. Generically, we have some number of available options that we need to choose between: these may be some mix of discrete choices or effectively-continuous parameters. To make the choice, we need to compute the utility of each available option, or for continuous parameters estimate a parameters-to-utility function, and then attempt to maximize the utility.

## The Look-Elsewhere Effect and Regressional Goodhart

At this point we risk running into the Look-Elsewhere effect. Suppose, for simplicity, that each utility computation has an error distribution, that each of these distributions is exactly a normal distribution, that we had a good enough understanding of the error distribution in our approximately Bayesian reasoning process that we could actually estimate the standard deviation of each utility contribution (or estimate of the degree of bias will be zero: if we had a known degree of bias, we would have adjusted our utility computation to account for it), and that the utility computations for each option we were considering are all sufficiently complex and different a computations, based on the complexity of our remaining set of hypotheses, for their to errors to be independently distributed. (This is a fairly long set of fairly unlikely assumptions, but makes explaining and calculating the strength of the Look-Elsewhere effect a lot simpler.) Then, if we are comparing around 40 different options, we would expect around 5% of the utility calculations to be off by around two standard deviations, i.e. around two of them, each with a half chance of being either high or low. If the standard deviation sizes were all similar and actual differences in utility were very small compared to twice the standard deviation, then most likely we would end up picking an option because of an error in our utility calculation, likely whichever one we happened to be overestimating by around two standard deviations so it won. (This might help us learn more about its actual utility, improving our world model, in a form of multi-armed-bandit-like exploration, but generally isn't a good way to make a decision.) If the standard deviation sizes of the options varied significantly, then we are more likely to pick one with a high standard deviation simply because its larger potential for error makes it more likely to be overestimated enough to stand out unjustifiably. This is a form of Goodhart's law: in Scott Garrabrant's Goodhart Taxonomy it falls in the "regressional Goodhart" category: we are selecting an optimum action on our estimate of the utility, which is correlated with but not identical to the true utility, so we end up selecting on the sum of the true utility plus our error. Our best estimate of the bias of the error is zero (otherwise we'd have corrected for it by adjusting our posteriors), but we may well have information about the likely size of the error, which is also correlated with our optimum via the Look-elsewhere effect: an option that has wide error bars on its utility has a higher chance of that being sufficiently overestimated to appear to be the optimum when it isn't than one with narrow error bars does.

## Compensating for Look-Elsewhere Regressional Goodhart

We can compensate for this effect: if we converted our uncertainty on the posteriors of hypotheses into an uncertainty on the utility, then subtracted two standard deviations from the mean estimated utility of each option, that would give us an estimated 95% lower confidence bound on the utility, and would compensate for the tendency to favor options with larger uncertainties on their utility out of 40 choices. Similarly, if we were picking from around 200,000 options under the same set of assumptions, likely around two of them would be five standard deviations from the norm, one in an overestimate direction, and the best way to compensate if they were all "in the running" would be to subtract five standards deviations from the estimated utility to estimate a 99.999% lower confidence bound. So we need to add some pessimizing over utility errors to compensate for regressional Goodhart's law, before optimizing over the resulting lower confidence bound. If we are estimating the optimum of a continuous utility function of some set of continuous parameters, the number of options being picked between is not in fact infinite, as it might at first appear: for sufficiently nearby parameter values, the result and/or error of the utility function calculations will not be sufficiently independent to count as separate draws in this statistical process — while the effective number of draws may be hard to estimate, it is in theory deducible from the complexity of the form of the function, the and its dependence on the errors on the posteriors of the underlying hypotheses.

Note that the number of independent draws here is also effectively capped by the number of independent errors on posteriors of hypotheses in your hypothesis set — which for discrete hypotheses is just the number of hypotheses (whether this is the sum or the product of the number of world model hypotheses and number of utility function hypotheses depends on the complexity of how their errors interact: in general it will be up to the product). Helpfully, his is a number that only changes when you add hypotheses to you hypothesis set, or prune ones whose posterior has got close enough to 0.000… to not be worth including in the calculation. Much as before, if some of your hypotheses have posterior distributions over continuous parameters that form part of the hypothesis (as is very often the case), then the effective number if independent variables may be hard to estimate (and isn't fixed), but is in theory defined by the form and complexity of the parameters to utilities mapping function.

In practice, life is harder than the set of simplifying assumptions we made three paragraphs above. The error distributions of utility calculations may well not be normal: normal distributions are convergent to due to the central limit theorem when you have a large number of statistically independent effects whose individual sizes are not too abnormally distributed (as could happen when gradually accumulating evidence about a hypothesis), but in practice distributions are often fatter tailed due to unlikely large effects (such as if some pieces of evidence are particularity dispositive, or if a prior was seriously flawed), or sometimes less fat tailed because there are only a small number of independent errors (such as if we don't yet have much evidence). It's also fairly unlikely that all of our available options are so different that their utility estimation errors are entirely independent: some causes of utility estimation error may be shared across many of them, and this may vary a lot between hypotheses and sets of options, making doing the overall calculation accurately very complex.

It's also very unlikely that all of the options are so similar in estimated utility as to be "in the running" due to possible estimation errors. This issue can actually be approximately compensated for: if we have 200,000 independent options, plot their ±5 sigma error bars, look at the overlaps, and compute the number that are in the running to be the winner. Take that smaller number, covert it to a smaller number of sigmas according to the standard properties of the normal distribution, and recompute the now-smaller errors bars, and repeat. This process should converge rapidly to a pretty good estimate of the potential number of candidates that are actually "in the running". If you have a rough estimate of the degree of correlation between their utility estimate errors, then you should combine that with this process to reach a lower number. If the number of candidates gets quite small, you can now redo the estimation of the correlation of their error estimates with a lot more detailed specifics. The way this process converges means that our initial estimate of N isn't very important: the only hard-to-estimate parameter with much effect is the degree of correlation between the errors of the (generally small) number of options that are close enough to be "in the running".

So it is very hard to come up with a simple, exact, elegant algorithm to use here. Nevertheless, this calculation looks likely to be fairly tractable in practice, and several key elements can be observed that suggest a n umber of useful-looking and hopefully-computable heuristics:

Most of these heuristics should of course already be familiar to STEM practitioners, and any AI that uses approximate Bayesian reasoning should also incorporate them.

## Why we Need Knightian Uncertainty as well a Risk in Approximate Bayesian Reasoning

Notice that in order to correctly compensate for Look-Elsewhere Regressional Goodhart's law when doing approximate Bayesian reasoning, we need to keep more than just a Bayesian posterior for each hypothesis: we also need to be able to estimate error bars on it at various confidence levels. So (unlike the theoretical but uncomputable case of true Bayesian reasoning), we actually do need to distinguish between risk and Knightian uncertainty, because when choosing between large numbers of options we need to handle them differently to beat Look-Elsewhere Regressional Goodhart's law. Risk can be due to a either a hypothesis that makes stochastic predictions, and/or a set of hypotheses that isn't heavily dominated by just one with hypothesis a posterior that is close to 99.999…% Knightian uncertainty happens when not only do we have a set of hypotheses that isn't heavily dominated by just one, but also some of those non-extremal posteriors have wide estimated error bars.

Note that the distinction between risk and Knightian uncertainty isn't important when choosing to accept or reject a single bet (where there are only two options, and we know their probabilities of being the best option sum to 100%, so there is effectively only one independent variable being drawn here: the difference between the two utility estimates — and thus the distinction between risk and Knightian uncertainty disappears).

^{[1]}The distinction only becomes important when you have a large number of possibly-viable bets or options and you need to choose between them, when it discourages you from accepting bets whose favorability depends more strongly on things that you are Knightianly uncertain about compared to ones that are only risky, or nor even risky: not out of risk minimization, but due to needing to compensate for the look-elsewhere effect meaning that in the presence of Knightian uncertainty, your utility estimates on favorable-looking options have a predictable tendency for some to be overestimates which needs to be compensated for.^{^}In Knightian uncertainty in a Bayesian framework and Knightian uncertainty: a rejection of the MMEU rule, So8res correctly demonstrated that for deciding whether to take a single bet there is no distinction between risk and Knightian uncertainty — and then suggested that this meant that Knightian uncertainty was a useless concept, unnecessary even for computationally-limited approximately Bayesian reasoners, which as I showed above is incorrect. The distinction between them matters only when there are multiple options on the table, and is basically only a second-order effect — risk and Knightian uncertainty with the same mean estimated probability are equivalent to first order, but the non-zero standard deviation of Knightian uncertainty has second-and-higher-order effects that differ from the zero standard deviation of a pure, accurately quantified risk, due to the need to compensate for the Look-elsewhere Regressional Goodhart effect.