Monday, February 1, 2016

What to optimize when guessing right for a higher percentage of people (on average) means guessing right for fewer people (on average)

Previous: Why do Anthropic arguments work?

Anthropic Probability: The 'Imperfect Test for X' (ITX) Thought Experiment

Suppose that an experiment is not 100% reliable (as is the case for all real experiments). e.g. If X (which is some proposal about physics) is true, then the experiment probably shows result #1, but 'only' in 99% of such experiments; there is a 1% chance it will show result #2. If X is false (which can be written as ~X or "Not X"), then these percentages are reversed. Suppose that the experiment is expensive and time-consuming to perform, as is often the case, so on each planet people will only do it once. For this thought experiment, assume that these planets in question are the only place where any kind of people live in all of existence; in the "Variant thought experiments" section below I'll discuss the effect of changing that assumption. Let's use anthropic probability (based on observer-counting) to determine what the people should conclude about how likely X is to be true.


The Fixed Population Case:

Suppose also that the total number of people in the universe (as well as their likelihood to learn the result of such an experiment) does not depend on X.

Now, if before performing the experiment I have a 50% subjective credence that X is true, then by Bayes' Theorem, if the experiment shows result #1, I should then be 99% confident that X is true.

Note on notation: P(X|1) is "the conditional probability of X given 1", which is a standard notation, but I mention it since not all readers are necessarily familiar with it.

Explicit Bayes theorem:

P(X is true | I see result #1) =

P(I see result #1 | X is true) P(X) / [P(I see result #1 | X is true) P(X) + P(I see result #1 | X is false) (P~X)]

The same equation in more compact notation:

P(X|1) = P(1|X) P(X) / [P(1|X) P(X) + P(1|~X) P(~X)]

P(X) = P(~X) = 0.5

For P(I see result #1 | X is true) = P(1|X), I will assume for now that it makes sense to use 0.99; and for P(1|~X), 0.01.

Then P(X|1) = (0.99)(0.5) / [(0.99)(0.5) + (0.01)(0.5)] = 0.99.

If the universe is large enough, there will be at least a trillion people who learn the result of this kind of experiment. If X is true, 99% of those people see result #1, and 1% see result #2. If X is false, 1% see #1, and 99% see #2. No matter what the real truth value of X is, some people will see result #1 and some will see result #2, but if we follow the above procedure most people (99%) will guess correctly whether X is true. This is what we want to happen if we want more people (see below) to be right about things instead of being wrong.

I am then 99% subjectively likely to be in the group that got the right answer, since 99% of people who actually exist did so (regardless of the truth value of X), and I know I am one of the people who exist in the real world with its fixed truth value for X.


How likely was I to observe that really?

It might at first seem obvious that P(1|X) must indeed be 0.99 practically by definition. But it is far from obvious, because in order for me to see any result, I must first exist. P(1|X) can be written more explicitly as
P(I exist and see result #1 | X is true).

Since the experiment is 99% reliable, this equals (0.99) P(I exist | X is true) = (0.99) L(X).

How can we evaluate the subjective probability L(X) that I would exist if X were to be true? One attempt is to note that I do exist, so if X were to be true or false, I could still always take the subjective probability that I exist as 1. But I don't know if X is true. Obviously, if the physics of X were such that no observers could exist were X to be true, and I know that, then I can conclude that X is false.

If both X and not-X would lead to observers, then what is the likelihood that one of those observers would be me in each case?

For the case in which both situations result in the same number of observers, we can assume that this likelihood is the same for both X and not-X. Then it cancels out in Bayes' theorem, and we can ignore the issue. That is equivalent to what I did above when I used 0.99 for P(1|X) and 0.01 for P(1|~X). To be more precise I could have multiplied them both by the same number L, and L would appear in both numerator and denominator, and would cancel out.


The Variable (X-Dependent) Population Case:

That's simple enough, but now consider a case in which the total number of observers in the universe DOES depend on the truth of X.

(This is equivalent to a particular variant of the Sleeping Beauty Problem (SBP) - a famous thought experiment - but I won't use that one here as it must be properly formulated to be more relevant to uncertainty about cosmological issues. Most statements of it are not so formulated. My example is a more natural thought experiment for such issues.)

Suppose that we know that if X is true, there are 100 N observers in the world, but if X is false there are 9900 N observers. Each planet has N observers. Now:

If X is true, then 99 N observers see result #1, and N observers see result #2.
If X is false, then 99 N observers see result #1, and 9801 N observers see result #2.

What should an observer think if he sees result #1? There are two commonly argued views. A case could be made for either, depending on what we mean by wanting "more people" to be right:

The Self-Sampling Assumption (SSA): (aka 'halfer' in the SBP)
If we want to optimize the average percentage of observers who guess (or bet) correctly, then as before, we should want that observer to think that X is true.

This can be done by assuming that L(X) = P(I exist | X is true) is independent of the number of observers who would exist if X is true, and so on, as long as at least one observer would exist in each case.


The Self-Indication Assumption (SIA): (aka 'thirder' in the SBP)
However, if we want to optimize the average NUMBER of observers who guess (or bet) correctly, then we will have no preference here; in either case, 99 N observers will guess correctly.

This can be done by making the updated probability proportional to the number of observers rather than to the % of observers. e.g. With the SIA, we assume that L(X) = P(I exist | X is true) is proportional to the number of observers who would exist if X is true, and so on. Unlike the SSA, this has a nice smooth limit in the case where X would lead to zero observers, since then L(X) = 0.


With the SIA, as the number of observers who see result #2 increases, the fraction of observers who see result #1 decreases, and the SIA probability is proportional to the former but also to the latter, i.e.

P(I see #1 | X) is proportional to [(total number of observers if X is true) x (fraction of observers who see #1 if X is true)] = [number of observers who see #1 if X is true]

and

P(I see #1 | not X) is proportional to [(total number of observers if X is false) x (fraction of observers who see #1 if X is false)] = [number of observers who see #1 if X is false]

To put it (SIA) another way, suppose that the same 99 N observers who see result #1 would have existed no matter whether X is true or false. If X is 50% likely to be true, these observers would ideally assign a 50% chance to the hypothesis that X is true. There will also exist additional observers, all of whom would see result #2; no given observer will be less likely to see result #1 as a result of X being false. Thus, those observers who see result #1 have no evidence that X is false.


To see why it could matter, introduce consequences to believing correctly about X. Suppose that each planet has the option to try a new experiment which could either unlock an much-needed unlimited source of energy or blow up catastrophically (making everyone on the planet horribly sick). The only thing that determines which outcome happens is that a certain parameter must be set to a value that depends on whether X is true. For a planet that measured result #1 in the first experiment, what truth value should they assume for X when designing the second experiment?

If the designers go with SSA, they will believe that X is true with 99% confidence.

If they go with SIA, however, they will have only 50% confidence in X. Perhaps in this case they will not take the risk. Of course if they see result #2, they will be confident that X is false and in that case would take the risk.

And that is the correct policy: Because if result #1 is seen, there will be the same number of planets which believe that X is true whether it is true or not.

It would seem that SIA leads to better decision-making than SSA in such cases. That is a strong argument in favor of SIA.


Define Non-Anthropic Probability (NAP): The subjective probability that a model of reality is true prior to taking into account the numbers or fractions of any kinds of observers predicted by each model of reality.

There is a catch with SIA. Suppose now that the NAP that X is true is 90%, not 50%.

Suppose also that the safe experiment to try to find out whether X is true or not can't even be done prior to doing the dangerous experiment which could unlock energy supplies.

Using SSA, with no experimental data that just leaves us with the NAP (90% that X is true) as our subjective probability to work with.

Using SIA, we weight each case by the number of observers. So we will end up with a 8.3% subjective probability that X is true: (90% * 100 N) / [(90% * 100 N)+(10% * 9900 N)] = 0.083

In this case, we will believe that X is probably false. A wise thing to believe perhaps, since the consequences of choosing the setting on the dangerous experiment will affect far more people if X is actually false. But it is disturbing that there is a high NAP that we are mistaken and if so are condemning the actual people who do exist to a miserable life.


Pascal's Mugging:

A more extreme example is known as a "Pascal's mugging". A mugger claims that he has the power to and will create and eternally torture M copies of you (while first giving them your current experiences so you might be one of those as far as you can know) unless you hand him your wallet (or in other words, some outcome with a large negative utility). You assign a subjective non-anthropic probability P that he has the god-like powers he claims to, which will be very small but finite. If (as is commonly assumed in decision theory) you try to maximize the average utility, for any fixed P no matter how small, there is some large number which if given as M will convince you to hand over your wallet if you believe SIA. Once again, we are "probably" (according to the NAP) disregarding the interests of the actual observer (you) in favor of the many, many unlikely but possible observers who we are trying to save from eternal torture.


Dual-Objective Optimization (DOO)*

IMO, both SSA and SIA can lead to bad policies. SIA can benefit more people on average in simple cases like the "is X true" experiment but leaves all people vulnerable to being asked to sacrifice their own interests in favor of numerous-but-unlikely hypothetical people.

The best thing to do IMO depends on one's individual preference for how to trade off probable benefit with mean benefit, where mean here includes huge but highly unlikely benefits to large numbers of possible people weighted with the small probability of that happening. This is much like selecting a utility function, except that in standard decision theory, only mean benefit is maximized. It is better if a high % of people will guess correctly (SSA), and also if as many people as possible guess correctly (SIA).

One way to deal with a Pascal's Mugging is to ignore the threat if the NAP that the mugger can do as he threatens to is less than some predetermined value.


A Special Observer:

Consider the perspective of a special observer S, such that S would exist whether X is true or false. S can send a message to the other observers, but it is a one-way communication; he can never receive a reply. S does not know whether X is true or false and can't perform any experiment to try to find out.

Suppose that the NAP for X to be true is 90%. If all observers must perform a test to find out whether they are S or not, then prior to performing any such test or experiments, an observer who believes SSA assigns a 90% chance for X to be true, which is simply equal to his NAP. If he then discovers that he is S, he will further favor the belief that fewer observers exist (X is true), since a higher percentage of observers finds himself to be S in that scenario.

An observer who believes SIA at first assigns a 8.3% chance for X to be true, which is proportional to his NAP and to the total number of observers who would exist if X is true as compared to X is false. If he then discovers that he is S, he will adjust his probabilities towards the scenario with fewer observers, leaving him with a 90% subjective probability for X to be true, which is (and this is true in general for such cases) equal to his NAP.

Knowing that a safe experiment to try to determine if X is true can't be done, S decides to recommend a decision-making policy to the other observers, which will be either SSA or SIA. If he recommends SSA (and they follow his advice) then they will tend to assume that X is true. If he recommends SIA, then they will tend to assume that X is false.

So even if he believes SIA, he figures that telling the others to follow SIA is 90% likely to lead to the wrong choice. But, if X is indeed false (10% chance), then many more people will have made the right choice. What should he recommend?

If we are in the beginning stages of a long-lived human civilization, then we are in a similar position to that of S with respect to future people. Would a gamble be worthwhile which pays off if there are many future people, but which makes things worse in a higher-NAP scenario with fewer people? If so, can a high enough number of possible future people always outweigh a very small but fixed NAP for them to exist despite the probable cost to the fewer people who are more likely to exist?


The Adam Paradox (credit is due to Nick Bostrom):

Suppose that the NAP for X is 50%, but S has the power to make X true or false (e.g. by reproducing). If he believes SSA, then he believes that fewer people are likely to exist, so he thinks he will probably not reproduce. He decides to hunt deer as follows: Unless a deer drops dead in front of him, he will reproduce and have a huge number of descendants; if the hunt works he will have none. Since he thinks it unlikely that many people will exist, he is surprised if it doesn't work. On the other hand, if he believed SIA, then he would have had no such delusion; he would just think there is a 50% chance he will reproduce. Of course, the correct belief is probably that one.

The Iterated Reproducers Counter to the Adam Paradox:

The Adam Paradox seems to be a very strong argument against the SSA, but remember that the SSA is chosen to maximize the average % of people who guess correctly. So now consider a series of people. First comes Adam, who guessed wrong, and reproduced, producing a tribe of N people. The tribe goes through similar reasoning, and this results in a larger tribe of N^2 people. Again the same thing goes on resulting in a nation of N^3 people, and so on. Eventually, after some finite number of iterations M, the people then will guess right about their event in question; they will not reproduce. So the N^M people then alive will be correct in their guess based on the SSA, and they will be the majority of the people to ever have lived. In this series, if they had all used the SIA instead, the majority of people to have ever lived would have guessed wrong for any number M. (This is a version of the Doomsday Argument and is essentially the same as the "Shooting Room Paradox" of John Leslie.)


Variant thought experiments that give SIA - like probabilities even within the SSA:

- Independent Trials:

Suppose there are many universes which really exist, and the truth of X within each depends on the local environment; say X is true in 50% of them. For example X might be the statement that the electron to proton mass ratio is less than some value, and the ratio might vary from universe to universe. (As usual, the word universe here does not necessarily refer to the totality of existence; the word 'multiverse' would be used for that. A universe is some part of existence, which may or may not be the whole thing, but beyond which the observers in question can't make any observations.)

In each universe in which X is true, there are 100 N observers, and in one with X false there are 9900 N observers.

In this case, the observers in X-false universes outnumber those in X-true universes by 99:1. There is no doubt in this case that each observer should assign a 99% probability that X is false in his universe. The SIA gives that result for a single universe or any number of universes, while the SSA result for a single universe is 50% but approaches the SIA value as the number of universes increases.

In practice, for any non-cosmological statement used in place of X, the SIA very likely gives the correct result even according to the SSA, because such situations no doubt arise many times throughout the universe. For example, if we tried to perform a real Sleeping Beauty experiment, it is very likely that in all of time and space throughout the multiverse other people do so as well. These act as independent trials, and the SIA - like observer counting therefore applies, giving the 1/3 result in the SB experiment.

- A Quantum Trial:

Likewise, if MWI is true and a quantum measurement is used in place of X, and the Born Rule probability is used in place of the NAP that X is true, the SIA - like observer counting automatically applies when guessing which branch an observer is on, since the likelihood for that is proportional to the number of observers on that branch. This case is similar to that of many independent trials, but instead of relying on a large number of trials to produce an average % of people in each type of universe, the actual % of people in each type of universe is pre-determined and precisely known even for a single quantum coin flip.

- Observers Within a Larger Fixed Context:

If there are many other observers M known to exist besides for those Y which depend on X, then the SSA result reduces to the SIA result if the other observers greatly outnumber the X-dependent observers (M >> Y).

That's because the prior likelihood that a given observer is X-dependent is proportional in the SSA to the % of observers who are X-dependent, or in other words, is proportional to Y / (M+Y). Since M >> Y, that is approximately proportional to Y, which is like the SIA case.


Such variant thought experiments are sometimes used as arguments in favor of the SIA, because they are always much more like any possible non-cosmological experiment as compared to the original thought experiment. However, the whole point of trying to figure out the right way to do anthropic probability is to apply it to questions of cosmological uncertainty, such as whether one given theory of everything is more likely than another, or to questions which similarly affect the overall number of observers in the multiverse, such as that of whether artificial digital computers could be conscious (given that they have the potential to greatly outnumber biological observers) or to the Doomsday Argument question of how long-lived a typical line of sentient beings is likely to be.

These variant thought experiments don't help to answer those questions, but they do help point out the limitations of the SSA as a practical tool. Any advocate of the SSA must be fully aware of its nuances which make it give the same answer as the SIA in almost all practical situations (such as 1/3 in a realistic Sleeping Beauty experiment), while a SIA-believer has the luxury of just always giving the same answer to almost any question about anthropic probabilities.


The Reference Class Problem:

As noted above, the probabilities for X predicted by the SSA depend not only on the observers doing a particular kind of experiment or which can be correlated with X, but on the number of all observers. That makes it important to know what counts as an observer.

But, with a more careful formulation, we can say that it depends on the number of observers within the Reference Class, allowing that not all observers are really relevant to the probabilities we are studying.

For example, suppose there are 10 trillion mice, and 10 billion men if X is true; while there are 10 trillion mice, and 90 billion men if X is false. The NAP for X is 50%. Using the SSA, does finding oneself to be a man instead of a mouse provide evidence for X being false? (With the SIA, it would.) If mice don't count as observers, then the SSA likelihood in this case remains equal to the NAP, at 50%. But if they do, then the SSA likelihood approaches the SIA likelihood, which is 90% for X to be false.

The Questioner Reference Class:*

This is a difficulty for the SSA, but in my view, a reasonable choice for the Reference Class is to include only those observers who might ask this type of question. Since the SSA aims to maximize the % of observers who guess correctly, those observers must indeed guess. Mice very probably know nothing about X and are not capable of using anthropic reasoning to guess the likelihood that X is true, so they should not be included in the Reference Class.

Note that with the SIA, the Reference Class issue doesn't arise, because as the number of "other" observers increases, the fraction of observers involved in the question at hand decreases, and the SIA probability is proportional to the former but also to the latter, i.e.
P(I see #1 | X) is proportional to [(total number of observers) x (fraction of observers who see #1)] = [number of observers who see #1]


Easy confirmation of the Many-Worlds idea?

If you believe the SIA, then you must believe that the probability for you to exist would be much higher if MW (with its many observers) is true compared to if there is only one world and thus relatively few observers. So even if your NAP for MW is low, once you take the numbers of observers into account, your subjective likelihood for some kind of MW to be true is almost 100%.


Boltzmann Brains (BBs):

BBs are randomly assembled brains that may momentarily appear, such as in an infinitely long future of the universe. They could potentially vastly outnumber normal observers who result from Darwinian selection.

The SIA favors belief that a high number of observers have observations like ours. Using the SIA, even if there is just a small NAP that BBs vastly outnumber normal observers, there is a high subjective likelihood that they do, and that we are just BBs who by coincidence temporarily resemble brains of normal observers. I am not comfortable with that conclusion, and see it as an argument against the SIA.

Just using the NAP will not help enough. Suppose the NAP is 50% that BBs vastly outnumber normal observers. Shouldn't we think that our normal-seeming observations make that less likely? To reach that conclusion, we must use the SSA, since it favors belief that a high % of observers have observations like ours.


Infinite Universe / Multiverse:

If the universe or the set of many worlds is infinite, the number of observers is also infinite.

Then there are problems with anthropic probabilities:

First, the % of observers of each type becomes undefined, because there are infinitely many observers who see result #1 and also infinitely many who see result #2, regardless of whether X is true or not.

I think that problem is not so serious, because it seems to me that ratios of infinite numbers can still be well defined in physical situations. For example, suppose that widgets always occur in clusters of three, with always one blue widget and two green widgets. Then I would say that 1/3 of them are blue, even if there are infinitely many clusters. In practice, only if widgets are conscious does the question matter. So this principle is really a new philosophical assumption about how physics gives rise to a measure distribution of consciousness.

However, it is impossible to compare the number of observers in two incompatible models of physics if both are infinite. This makes it impossible in practice to use the SIA to compare models of physics, since thanks to SIA's easy confirmation of MW the viable ones will all have infinitely many observers.

Since the SSA deals only in fractions of observers within each model, it can still be used.

The SIA can still be used to compare closely related models if we assume something about how to make the comparison, such as models that are the same up until a special event occurs having the same number of observers before the event, and then using the ratios to compare the two models after it. Such an event may be a human decision whether or not to reproduce more, for example. Similarly, even in an infinite universe it is still desirable to maximize (all else being equal) the total number of people within the history of our Hubble volume.


Conclusion:
Due to the SSA's advantage in comparing infinite multiverses, and also influenced by the BBs argument and the Iterated Reproducers, I use SSA when comparing possible routes to the Born rule for MW QM. However, I do not consider the question settled. The Adam Paradox in particular seems to be a strong argument against the SSA, and I'm not sure that the Iterated Reproducers counter is sufficient. Also, both the SSA and SIA can lead to poor decisions depending on the situation, and it remains disturbing that the SSA can lead to a lower average utility. That failure for decision-making purposes may indicate a failure for informational purposes as well. However, while Dual-Objective Optimization may be best for decision making, it does not provide a recipe for assigning probabilities.


References:
* Indicates a point that is original in this post as far as I know.

As usual, external links may contain misleading arguments which I disagree with for good reasons not always worth mentioning here. Read at your own risk :)

https://en.wikipedia.org/wiki/Sleeping_Beauty_problem
http://www.princeton.edu/~adame/papers/sleeping/sleeping.html
https://wiki.lesswrong.com/wiki/Pascal's_mugging
http://www.nickbostrom.com/
http://www.anthropic-principle.com/?q=resources/preprints
http://www.anthropic-principle.com/preprints/spacetime.pdf
https://sites.google.com/site/darrenbradleyphilosophy/home/research
http://philsci-archive.pitt.edu/11864/

No comments:

Post a Comment

Featured Post

Why MWI?

Before getting into the details of the problems facing the Many-Worlds Interpretation (MWI), it's a good idea to explain why I believe t...

Followers