Friday, February 19, 2016

Counterfactual Computations or Generalized Causal Networks?

When determining whether a physical system implements a particular computation, if only the actual sequence of states is considered, rather than the whole set of counterfactual states that the computation could have been in, it would not be possible to distinguish between a valid implementation by a system that implements it as opposed to a false implementation by a system that is much too simple to implement the computation of interest.

For example, consider a set of clocks, c_1 through c_N. The initial state of the clocks can be mapped to any desired initial bitstring, and at each time step thereafter, the state of each clock can be mapped to what the state of that bit would be according to any desired computation involving bitstrings which change in time according to a given set of rules. There is no problem here with the first criterion for independence: Each bit depends on a different physical variable. Even the second criterion for basic independence could be also satisfied by using a new set of clocks or dials at each time step, instead of the same clocks.

Requiring the correct counterfactual transitions rules out such false implementations, since a hypothetical change in the state of one clock at the initial time step would no longer result in states at the next time step that are correct according to the transition rules for the computation, since each clock is unaffected by the others. This was Chalmers' first move in his formulation of the CSA to rule out the false implementations discussed by Searle and Putnam, and in many discussions of that work it is the only thing considered worth mentioning.

Counterfactual relationships are thus considered key to ruling out false implementations, but relying on them introduces a new problem for computationalists: It seems implausible that parts of a system which are never called upon to actually do anything would have any effect on consciousness, yet such parts can determine the counterfactual relationships between system components.

For example, consider a NAND (Not-AND) gate. Sufficiently many NAND gates can be connected to each other in such a way as to perform any desired bitwise computation (neglecting for now any further structure in the computation), and historically ordinary digital computers were sometimes constructed solely out of NAND gates as they tended to be cheap.

The NAND gate takes input bits A,B and outputs a bit C. If A=0 or B=0, then C=1, and otherwise (meaning A=1 and B=1) C=0.

Such a NAND gate can be implemented as follows: Daphne, Fred, Shaggy, and Velma are all recruited to help. Each is assigned a two bit code: 00 for Daphne, 01 for Fred, 10 for Shaggy, and 11 for Velma. The values of A and B are announced in order while the recruits get a snack. Meanwhile, a coin is placed face down on a table signifying C=0.

During the meal, a monster shows up and chases the recruits, but they manage to escape. Shaggy is the last to get away, though he is separated from the others. He smokes weed to calm himself down.

After the meal, the person whose code matches the numbers read has the job of making sure the coin is correctly positioned for the next time step. If C is supposed to equal 0 then it should be face down, and if C is supposed to be 1 it should be face up.

The actual numbers were 1 and 1, so Velma returns to do her job and verifies that the coin is face down. Seeing that the coin is already face down she leaves it alone, just as she does with the other coins on other tables in the room. Was this a valid implementation of the NAND gate? That depends on what would have happened in the counterfactual situations of other values for A and B. For example if Shaggy would have forgotten to do his job, it would not have been a valid implementation. Yet (though very slowly), we could use such NAND gates to run any desired artificial intelligence program.

In pseudo-code, this NAND gate works as follows, where for example C(0) means the coin state at time 0:

LET C(0) = 0
SELECT CASE A,B
CASE 0,0: Daphne sets C(1)=1
CASE 0,1: Fred sets C(1)=1
CASE 1,0: Shaggy sets C(1)=1
CASE 1,1: Velma verifies C(0)=0 (which implies C(1)=0) or sets C(1)=0 otherwise
END SELECT

Since Velma doesn't change anything, her role can be eliminated while preserving the proper transition rule, and it would still be a NAND gate. But that is also troubling: In the actual situation, no coin was directly influenced by things resulting from the values of A,B. It is merely that the other recruits refrain from influencing the coin in question, just as they do the other coins. It may be that the other coins are being used by other groups of recruits for their own NAND gates. Some of the other coins may be initialized to face up instead, and by coincidence, perhaps none of the coins will ever need to be flipped.

Even more troubling is the following: Suppose that Shaggy would have remembered his job, but in order to get back to the coin room in time, he would have needed to climb a rope hidden in a dusty room. If the rope were strong enough, he would have made it, but actually the rope is too weak. No one realizes that, and he never actually entered the dusty room or even saw the rope. Yet the state of that rope determines whether or not this was a valid NAND gate implementation, and by extension, such things could apparently determine whether our AI is conscious or not.

It would seem that the rope should not matter in the actual situation, but as we have seen such counterfactual situations are key to ruling out false implementations and can't be ignored. Note that the mere fact that the implementation is baroque isn't the problem; our own cells do operate like miniature Rube Goldberg machines. The problem is that the rope never actually played a role in what happened, yet its strength determined the success of the implementation. Similarly, perhaps Daphne would have found a dusty old Turing machine and ran a program on it, and only would do her job after the program halted. In that case, the halting problem for a program that was never run determines the validity of the NAND gate implementation!

Maudlin (Computation and consciousness) gave a similar example as an attack on computationalism: He described a water-based Turing Machine which follows a predetermined sequence for one set of initial conditions, but calls upon different machinery if the state is different from that which is expected. He points out that the link to the other machinery may be broken, and if so, the computation is not implemented. However, it seems implausible that the other machinery could matter in the case in which it's not used.

In a similar vein, Muhlestein (Counterfactuals, Computation, and Consciousness) discussed a light-based cellular automaton which can have its counterfactual sensitivity over-ridden by a projection of the correct pattern and sequence of lights onto its elements, and concluded that computationalism must be false since it's implausible that the projection makes a difference as the pattern of which elements are lit remains the same as does the operation of each component device.

Such things do seem implausible, but I must also note that there is no logical contradiction in it, and a case could be made that seemingly inactive components are doing more than one might think: they propagate forward in time, have well defined transition rules, and refrain from changing the value of the states of interest in cases where they should not. It is possible to retain the role of counterfactual situations as I have described for determining what computations are implemented, and that is the standard approach among computationalists.

Nevertheless, the above implications of computationalism are bizarre and perhaps too absurd to accept, and if an approach can be formulated that avoids them, it would be more plausible. Computationalist philosophy of mind is by no means firmly established enough to dismiss such concerns; on the contrary, it is hard to see how anything can give rise to consciousness, whether computational or otherwise.

In the above example, there are two problems:

1) With the actual values of A=1,B=1, the computation could be implemented in a way that didn't seem to require that these variables are the _cause_ of the output being what it was (since Velma could be removed without any change in the output), and so the system seems to have the wrong causal structure.

2) For other values of A,B, it seems like the exact complicated events that _would have_ transpired in those cases should be irrelevant to whatever consciousness the system would give rise to.

To address these problems, I want to first require that in the actual situation A=1,B=1 are the _cause_ of C(1)=0 in a sense to be defined. If that is so, then a network of such causal relationships among variables may be enough for consciousness, without requiring the correct counterfactual behavior of the whole system for other values of A,B.

But that seems problematic, because causation is defined in terms of counterfactual behaviors! A cause is typically defined as a "but-for" cause: The value of A causes the value of C(1) if there is a change in the value of A that _would have_ resulted in a change in the value of C(1). Therefore, in order to establish causation, we still still need to know what would have occurred in the counterfactual situations.

Intuitively, it seems like there should be a way to establish causation without knowing what would have occurred in the counterfactual cases. But what if the output's value is totally unrelated to A,B? That possibility needs to be ruled out, or else the system could be rather trivial and certainly not the sort of system that should give rise to consciousness.

Consider each CASE of A,B values as a different channel of potential influence on C(1). The counterfactual channels need to be selectively blocked if we are to establish that the actual channel was a cause of the value of the output, without having to look at the full counterfactual behavior of the system. If they are blocked, and changing A,B results in no change in the output, then there is no causation between A,B and the output; if the output would have changed, then there is such causation.

The channels could be selectively blocked if the mapping were augmented in such a way that the pseudo-code could be described as follows:

LET C(0) = 0
SELECT CASE A,B
CASE 0,0: IF D THEN Daphne sets C(1)=1
CASE 0,1: IF F THEN Fred sets C(1)=1
CASE 1,0: IF S THEN Shaggy sets C(1)=1
CASE 1,1: IF V THEN Velma verifies C(0)=0 (which implies C(1)=0) or sets C(1)=0 otherwise
END SELECT

If D,F,S were (counterfactually) set to FALSE, then the counterfactual channels would be blocked.

Now if V is TRUE, is there causation between A,B and C(1)? Since the coin would have been face down anyway, it would appear not.

But there is a difference between this coin and all of the other coins: If the coin had been face up, it would be changed by Velma to face down. This difference can be exploited by allowing consideration of the counterfactual case in which C(0)=1. The system implements something along the lines of: IF [(V=TRUE AND A=1 AND B=1) OR C(0)=0] THEN C(1)=0. In situations of this type, I will say that A,B are "generalized causes" of C(1)=0.

Note that if V is FALSE in the actual situation, then even if D,F,S are TRUE so that the NAND gate is implemented, the generalized causal link between A,B and C(1)=0 is missing, so this system would NOT play the same role as part of a Generalized Causal Network (GCN).

Also, if C(0) not equaling 0 is not possible, then the generalized causal link between A,B and C(1)=0 is likewise missing.

What if the underlying physical system is too simple to allow this kind of augmented mapping with D,F,S variables? If so, then I don't think the problem arises in the first place: A,B are causes of C=0 if the NAND gate is implemented by a system which is too simple to involve complicated chains of counterfactual events.

Another interesting example of a GCN is a "computer with a straight-jacket". This works as follows: With the actual initial conditions, the computer is allowed to run normally. However, it is being watched. If the state of the computer at any given time is different from the expected sequence, it will be altered to match the state of the expected sequence; otherwise the watcher won't touch it. Could this system implement a conscious AI, if the computer could do so had it not been watched? Since the computer is not actually touched, it would seem that it could, but it does not have the right counterfactual behavior for the computation due to the effect of the potentially interfering watcher. It does however have the right GC network, because removing or blocking the watcher is analogous to setting D,F,S to FALSE in the above example. It should be noted though that there are other ways to deal with the "computer with a straight-jacket", such as noting that at each time step it implements a computation 'closely related' to the original one which can be enough for the conscious AI; the watcher is treated as input in this case.

In the case of Muhlestein's cellular automaton, the projection of a spot of light onto a cell (call it P=TRUE) is analogous to the command LET C(1)=0 being placed after the END SELECT. This ruins the NAND gate as well as the causal link from A=1,B=1 to C(1)=0. However it is much like a combination of a "straight-jacket" and the initialization C(0)=0.

The underlying system appears to implement something analogous to: IF (A=1 AND B=1) OR P=TRUE then C(1)=0. In this generalized mapping, there is sensitivity to A,B, so I'd say they count as generalized causes in this case; in other words, the cellular automaton with the projected lights would still be conscious if the original one would have been.

Klein gave an account of Dispositional Implementation, based on 'episodic' implementation of computations, to solve the problem of the implausibility of relying on inert machinery, which he dubbed the Superfluous Structure Problem (SSP). My solution in terms of GCNs is essentially the same as his solution in terms of dispositions, except that it is better-defined. Bartlett criticized Klein's solution on the basis that it conflicts with the 'activity thesis' (which Bartlett found plausible) that only physical 'activity' matters; as a result Bartlett thought that Klein's solution was really just computationalism. Klein's idea does conflict with the 'activity thesis' since it also brings in dispositions which ultimately rely on physical laws instead of just physical structure. The 'activity thesis' ought to be discarded and to me it was never plausible in the least. I read Klein as one who actually rejects the standard 'activity thesis' for the right reasons, yet uses variant language in which he relies on his own modified 'activity thesis'. Perhaps if Klein had written instead of GCNs, Bartlett would have better understood the idea as being distinct from both the 'activity thesis' and standard computationalism.

Is it a form of computationalism? It is not a form of standard computationalism because a system in which standard NAND gates are all replaced by "gates" which produce the same output on all inputs could still implement the same GCN as the original system. In the above example, that would involve Daphne, Fred, and Shaggy all deciding in advance that they would place the coin face down instead of face up like they were supposed to, while in the actual situation they are never called upon to attend to the coin since Velma's numbers were called instead.

However, if we consider the entire spectrum of computations implemented by the whole system - in other words, we consider not just the original mapping but also the generalized one - then we have enough information to know what GCNs are implemented or not implemented. GCNs are simply a way of characterizing the structure and function of a dynamical system, just like computations are. In that sense, I would say that it is a generalized computationalism, which evades the SSP while being philosophically the same as computationalism in all other ways that matter. I will not make a distinction between 'generalized computationalism' and computationalism unless the technical difference is relevant in a particular case.

So far I have discussed discrete computations here. What about analog continuous computations, such as those implemented by systems described by coupled differential equations such as those of fluid mechanics? In that case the generalization is as follows: Instead of looking at differential equations that tell us what the system would do with any initial condition, we need only concern ourselves with the differential equations that hold for the actual situation; if those are the same for two systems, and the initial conditions are the same, then they implement the same generalized computation even if they would have behaved differently from each other given a different set of initial conditions.

Monday, February 1, 2016

What to optimize when guessing right for a higher percentage of people (on average) means guessing right for fewer people (on average)

Previous: Why do Anthropic arguments work?

Anthropic Probability: The 'Imperfect Test for X' (ITX) Thought Experiment

Suppose that an experiment is not 100% reliable (as is the case for all real experiments). e.g. If X (which is some proposal about physics) is true, then the experiment probably shows result #1, but 'only' in 99% of such experiments; there is a 1% chance it will show result #2. If X is false (which can be written as ~X or "Not X"), then these percentages are reversed. Suppose that the experiment is expensive and time-consuming to perform, as is often the case, so on each planet people will only do it once. For this thought experiment, assume that these planets in question are the only place where any kind of people live in all of existence; in the "Variant thought experiments" section below I'll discuss the effect of changing that assumption. Let's use anthropic probability (based on observer-counting) to determine what the people should conclude about how likely X is to be true.


The Fixed Population Case:

Suppose also that the total number of people in the universe (as well as their likelihood to learn the result of such an experiment) does not depend on X.

Now, if before performing the experiment I have a 50% subjective credence that X is true, then by Bayes' Theorem, if the experiment shows result #1, I should then be 99% confident that X is true.

Note on notation: P(X|1) is "the conditional probability of X given 1", which is a standard notation, but I mention it since not all readers are necessarily familiar with it.

Explicit Bayes theorem:

P(X is true | I see result #1) =

P(I see result #1 | X is true) P(X) / [P(I see result #1 | X is true) P(X) + P(I see result #1 | X is false) (P~X)]

The same equation in more compact notation:

P(X|1) = P(1|X) P(X) / [P(1|X) P(X) + P(1|~X) P(~X)]

P(X) = P(~X) = 0.5

For P(I see result #1 | X is true) = P(1|X), I will assume for now that it makes sense to use 0.99; and for P(1|~X), 0.01.

Then P(X|1) = (0.99)(0.5) / [(0.99)(0.5) + (0.01)(0.5)] = 0.99.

If the universe is large enough, there will be at least a trillion people who learn the result of this kind of experiment. If X is true, 99% of those people see result #1, and 1% see result #2. If X is false, 1% see #1, and 99% see #2. No matter what the real truth value of X is, some people will see result #1 and some will see result #2, but if we follow the above procedure most people (99%) will guess correctly whether X is true. This is what we want to happen if we want more people (see below) to be right about things instead of being wrong.

I am then 99% subjectively likely to be in the group that got the right answer, since 99% of people who actually exist did so (regardless of the truth value of X), and I know I am one of the people who exist in the real world with its fixed truth value for X.


How likely was I to observe that really?

It might at first seem obvious that P(1|X) must indeed be 0.99 practically by definition. But it is far from obvious, because in order for me to see any result, I must first exist. P(1|X) can be written more explicitly as
P(I exist and see result #1 | X is true).

Since the experiment is 99% reliable, this equals (0.99) P(I exist | X is true) = (0.99) L(X).

How can we evaluate the subjective probability L(X) that I would exist if X were to be true? One attempt is to note that I do exist, so if X were to be true or false, I could still always take the subjective probability that I exist as 1. But I don't know if X is true. Obviously, if the physics of X were such that no observers could exist were X to be true, and I know that, then I can conclude that X is false.

If both X and not-X would lead to observers, then what is the likelihood that one of those observers would be me in each case?

For the case in which both situations result in the same number of observers, we can assume that this likelihood is the same for both X and not-X. Then it cancels out in Bayes' theorem, and we can ignore the issue. That is equivalent to what I did above when I used 0.99 for P(1|X) and 0.01 for P(1|~X). To be more precise I could have multiplied them both by the same number L, and L would appear in both numerator and denominator, and would cancel out.


The Variable (X-Dependent) Population Case:

That's simple enough, but now consider a case in which the total number of observers in the universe DOES depend on the truth of X.

(This is equivalent to a particular variant of the Sleeping Beauty Problem (SBP) - a famous thought experiment - but I won't use that one here as it must be properly formulated to be more relevant to uncertainty about cosmological issues. Most statements of it are not so formulated. My example is a more natural thought experiment for such issues.)

Suppose that we know that if X is true, there are 100 N observers in the world, but if X is false there are 9900 N observers. Each planet has N observers. Now:

If X is true, then 99 N observers see result #1, and N observers see result #2.
If X is false, then 99 N observers see result #1, and 9801 N observers see result #2.

What should an observer think if he sees result #1? There are two commonly argued views. A case could be made for either, depending on what we mean by wanting "more people" to be right:

The Self-Sampling Assumption (SSA): (aka 'halfer' in the SBP)
If we want to optimize the average percentage of observers who guess (or bet) correctly, then as before, we should want that observer to think that X is true.

This can be done by assuming that L(X) = P(I exist | X is true) is independent of the number of observers who would exist if X is true, and so on, as long as at least one observer would exist in each case.


The Self-Indication Assumption (SIA): (aka 'thirder' in the SBP)
However, if we want to optimize the average NUMBER of observers who guess (or bet) correctly, then we will have no preference here; in either case, 99 N observers will guess correctly.

This can be done by making the updated probability proportional to the number of observers rather than to the % of observers. e.g. With the SIA, we assume that L(X) = P(I exist | X is true) is proportional to the number of observers who would exist if X is true, and so on. Unlike the SSA, this has a nice smooth limit in the case where X would lead to zero observers, since then L(X) = 0.


With the SIA, as the number of observers who see result #2 increases, the fraction of observers who see result #1 decreases, and the SIA probability is proportional to the former but also to the latter, i.e.

P(I see #1 | X) is proportional to [(total number of observers if X is true) x (fraction of observers who see #1 if X is true)] = [number of observers who see #1 if X is true]

and

P(I see #1 | not X) is proportional to [(total number of observers if X is false) x (fraction of observers who see #1 if X is false)] = [number of observers who see #1 if X is false]

To put it (SIA) another way, suppose that the same 99 N observers who see result #1 would have existed no matter whether X is true or false. If X is 50% likely to be true, these observers would ideally assign a 50% chance to the hypothesis that X is true. There will also exist additional observers, all of whom would see result #2; no given observer will be less likely to see result #1 as a result of X being false. Thus, those observers who see result #1 have no evidence that X is false.


To see why it could matter, introduce consequences to believing correctly about X. Suppose that each planet has the option to try a new experiment which could either unlock an much-needed unlimited source of energy or blow up catastrophically (making everyone on the planet horribly sick). The only thing that determines which outcome happens is that a certain parameter must be set to a value that depends on whether X is true. For a planet that measured result #1 in the first experiment, what truth value should they assume for X when designing the second experiment?

If the designers go with SSA, they will believe that X is true with 99% confidence.

If they go with SIA, however, they will have only 50% confidence in X. Perhaps in this case they will not take the risk. Of course if they see result #2, they will be confident that X is false and in that case would take the risk.

And that is the correct policy: Because if result #1 is seen, there will be the same number of planets which believe that X is true whether it is true or not.

It would seem that SIA leads to better decision-making than SSA in such cases. That is a strong argument in favor of SIA.


Define Non-Anthropic Probability (NAP): The subjective probability that a model of reality is true prior to taking into account the numbers or fractions of any kinds of observers predicted by each model of reality.

There is a catch with SIA. Suppose now that the NAP that X is true is 90%, not 50%.

Suppose also that the safe experiment to try to find out whether X is true or not can't even be done prior to doing the dangerous experiment which could unlock energy supplies.

Using SSA, with no experimental data that just leaves us with the NAP (90% that X is true) as our subjective probability to work with.

Using SIA, we weight each case by the number of observers. So we will end up with a 8.3% subjective probability that X is true: (90% * 100 N) / [(90% * 100 N)+(10% * 9900 N)] = 0.083

In this case, we will believe that X is probably false. A wise thing to believe perhaps, since the consequences of choosing the setting on the dangerous experiment will affect far more people if X is actually false. But it is disturbing that there is a high NAP that we are mistaken and if so are condemning the actual people who do exist to a miserable life.


Pascal's Mugging:

A more extreme example is known as a "Pascal's mugging". A mugger claims that he has the power to and will create and eternally torture M copies of you (while first giving them your current experiences so you might be one of those as far as you can know) unless you hand him your wallet (or in other words, some outcome with a large negative utility). You assign a subjective non-anthropic probability P that he has the god-like powers he claims to, which will be very small but finite. If (as is commonly assumed in decision theory) you try to maximize the average utility, for any fixed P no matter how small, there is some large number which if given as M will convince you to hand over your wallet if you believe SIA. Once again, we are "probably" (according to the NAP) disregarding the interests of the actual observer (you) in favor of the many, many unlikely but possible observers who we are trying to save from eternal torture.


Dual-Objective Optimization (DOO)*

IMO, both SSA and SIA can lead to bad policies. SIA can benefit more people on average in simple cases like the "is X true" experiment but leaves all people vulnerable to being asked to sacrifice their own interests in favor of numerous-but-unlikely hypothetical people.

The best thing to do IMO depends on one's individual preference for how to trade off probable benefit with mean benefit, where mean here includes huge but highly unlikely benefits to large numbers of possible people weighted with the small probability of that happening. This is much like selecting a utility function, except that in standard decision theory, only mean benefit is maximized. It is better if a high % of people will guess correctly (SSA), and also if as many people as possible guess correctly (SIA).

One way to deal with a Pascal's Mugging is to ignore the threat if the NAP that the mugger can do as he threatens to is less than some predetermined value.


A Special Observer:

Consider the perspective of a special observer S, such that S would exist whether X is true or false. S can send a message to the other observers, but it is a one-way communication; he can never receive a reply. S does not know whether X is true or false and can't perform any experiment to try to find out.

Suppose that the NAP for X to be true is 90%. If all observers must perform a test to find out whether they are S or not, then prior to performing any such test or experiments, an observer who believes SSA assigns a 90% chance for X to be true, which is simply equal to his NAP. If he then discovers that he is S, he will further favor the belief that fewer observers exist (X is true), since a higher percentage of observers finds himself to be S in that scenario.

An observer who believes SIA at first assigns a 8.3% chance for X to be true, which is proportional to his NAP and to the total number of observers who would exist if X is true as compared to X is false. If he then discovers that he is S, he will adjust his probabilities towards the scenario with fewer observers, leaving him with a 90% subjective probability for X to be true, which is (and this is true in general for such cases) equal to his NAP.

Knowing that a safe experiment to try to determine if X is true can't be done, S decides to recommend a decision-making policy to the other observers, which will be either SSA or SIA. If he recommends SSA (and they follow his advice) then they will tend to assume that X is true. If he recommends SIA, then they will tend to assume that X is false.

So even if he believes SIA, he figures that telling the others to follow SIA is 90% likely to lead to the wrong choice. But, if X is indeed false (10% chance), then many more people will have made the right choice. What should he recommend?

If we are in the beginning stages of a long-lived human civilization, then we are in a similar position to that of S with respect to future people. Would a gamble be worthwhile which pays off if there are many future people, but which makes things worse in a higher-NAP scenario with fewer people? If so, can a high enough number of possible future people always outweigh a very small but fixed NAP for them to exist despite the probable cost to the fewer people who are more likely to exist?


The Adam Paradox (credit is due to Nick Bostrom):

Suppose that the NAP for X is 50%, but S has the power to make X true or false (e.g. by reproducing). If he believes SSA, then he believes that fewer people are likely to exist, so he thinks he will probably not reproduce. He decides to hunt deer as follows: Unless a deer drops dead in front of him, he will reproduce and have a huge number of descendants; if the hunt works he will have none. Since he thinks it unlikely that many people will exist, he is surprised if it doesn't work. On the other hand, if he believed SIA, then he would have had no such delusion; he would just think there is a 50% chance he will reproduce. Of course, the correct belief is probably that one.

The Iterated Reproducers Counter to the Adam Paradox:

The Adam Paradox seems to be a very strong argument against the SSA, but remember that the SSA is chosen to maximize the average % of people who guess correctly. So now consider a series of people. First comes Adam, who guessed wrong, and reproduced, producing a tribe of N people. The tribe goes through similar reasoning, and this results in a larger tribe of N^2 people. Again the same thing goes on resulting in a nation of N^3 people, and so on. Eventually, after some finite number of iterations M, the people then will guess right about their event in question; they will not reproduce. So the N^M people then alive will be correct in their guess based on the SSA, and they will be the majority of the people to ever have lived. In this series, if they had all used the SIA instead, the majority of people to have ever lived would have guessed wrong for any number M. (This is a version of the Doomsday Argument and is essentially the same as the "Shooting Room Paradox" of John Leslie.)


Variant thought experiments that give SIA - like probabilities even within the SSA:

- Independent Trials:

Suppose there are many universes which really exist, and the truth of X within each depends on the local environment; say X is true in 50% of them. For example X might be the statement that the electron to proton mass ratio is less than some value, and the ratio might vary from universe to universe. (As usual, the word universe here does not necessarily refer to the totality of existence; the word 'multiverse' would be used for that. A universe is some part of existence, which may or may not be the whole thing, but beyond which the observers in question can't make any observations.)

In each universe in which X is true, there are 100 N observers, and in one with X false there are 9900 N observers.

In this case, the observers in X-false universes outnumber those in X-true universes by 99:1. There is no doubt in this case that each observer should assign a 99% probability that X is false in his universe. The SIA gives that result for a single universe or any number of universes, while the SSA result for a single universe is 50% but approaches the SIA value as the number of universes increases.

In practice, for any non-cosmological statement used in place of X, the SIA very likely gives the correct result even according to the SSA, because such situations no doubt arise many times throughout the universe. For example, if we tried to perform a real Sleeping Beauty experiment, it is very likely that in all of time and space throughout the multiverse other people do so as well. These act as independent trials, and the SIA - like observer counting therefore applies, giving the 1/3 result in the SB experiment.

- A Quantum Trial:

Likewise, if MWI is true and a quantum measurement is used in place of X, and the Born Rule probability is used in place of the NAP that X is true, the SIA - like observer counting automatically applies when guessing which branch an observer is on, since the likelihood for that is proportional to the number of observers on that branch. This case is similar to that of many independent trials, but instead of relying on a large number of trials to produce an average % of people in each type of universe, the actual % of people in each type of universe is pre-determined and precisely known even for a single quantum coin flip.

- Observers Within a Larger Fixed Context:

If there are many other observers M known to exist besides for those Y which depend on X, then the SSA result reduces to the SIA result if the other observers greatly outnumber the X-dependent observers (M >> Y).

That's because the prior likelihood that a given observer is X-dependent is proportional in the SSA to the % of observers who are X-dependent, or in other words, is proportional to Y / (M+Y). Since M >> Y, that is approximately proportional to Y, which is like the SIA case.


Such variant thought experiments are sometimes used as arguments in favor of the SIA, because they are always much more like any possible non-cosmological experiment as compared to the original thought experiment. However, the whole point of trying to figure out the right way to do anthropic probability is to apply it to questions of cosmological uncertainty, such as whether one given theory of everything is more likely than another, or to questions which similarly affect the overall number of observers in the multiverse, such as that of whether artificial digital computers could be conscious (given that they have the potential to greatly outnumber biological observers) or to the Doomsday Argument question of how long-lived a typical line of sentient beings is likely to be.

These variant thought experiments don't help to answer those questions, but they do help point out the limitations of the SSA as a practical tool. Any advocate of the SSA must be fully aware of its nuances which make it give the same answer as the SIA in almost all practical situations (such as 1/3 in a realistic Sleeping Beauty experiment), while a SIA-believer has the luxury of just always giving the same answer to almost any question about anthropic probabilities.


The Reference Class Problem:

As noted above, the probabilities for X predicted by the SSA depend not only on the observers doing a particular kind of experiment or which can be correlated with X, but on the number of all observers. That makes it important to know what counts as an observer.

But, with a more careful formulation, we can say that it depends on the number of observers within the Reference Class, allowing that not all observers are really relevant to the probabilities we are studying.

For example, suppose there are 10 trillion mice, and 10 billion men if X is true; while there are 10 trillion mice, and 90 billion men if X is false. The NAP for X is 50%. Using the SSA, does finding oneself to be a man instead of a mouse provide evidence for X being false? (With the SIA, it would.) If mice don't count as observers, then the SSA likelihood in this case remains equal to the NAP, at 50%. But if they do, then the SSA likelihood approaches the SIA likelihood, which is 90% for X to be false.

The Questioner Reference Class:*

This is a difficulty for the SSA, but in my view, a reasonable choice for the Reference Class is to include only those observers who might ask this type of question. Since the SSA aims to maximize the % of observers who guess correctly, those observers must indeed guess. Mice very probably know nothing about X and are not capable of using anthropic reasoning to guess the likelihood that X is true, so they should not be included in the Reference Class.

Note that with the SIA, the Reference Class issue doesn't arise, because as the number of "other" observers increases, the fraction of observers involved in the question at hand decreases, and the SIA probability is proportional to the former but also to the latter, i.e.
P(I see #1 | X) is proportional to [(total number of observers) x (fraction of observers who see #1)] = [number of observers who see #1]


Easy confirmation of the Many-Worlds idea?

If you believe the SIA, then you must believe that the probability for you to exist would be much higher if MW (with its many observers) is true compared to if there is only one world and thus relatively few observers. So even if your NAP for MW is low, once you take the numbers of observers into account, your subjective likelihood for some kind of MW to be true is almost 100%.


Boltzmann Brains (BBs):

BBs are randomly assembled brains that may momentarily appear, such as in an infinitely long future of the universe. They could potentially vastly outnumber normal observers who result from Darwinian selection.

The SIA favors belief that a high number of observers have observations like ours. Using the SIA, even if there is just a small NAP that BBs vastly outnumber normal observers, there is a high subjective likelihood that they do, and that we are just BBs who by coincidence temporarily resemble brains of normal observers. I am not comfortable with that conclusion, and see it as an argument against the SIA.

Just using the NAP will not help enough. Suppose the NAP is 50% that BBs vastly outnumber normal observers. Shouldn't we think that our normal-seeming observations make that less likely? To reach that conclusion, we must use the SSA, since it favors belief that a high % of observers have observations like ours.


Infinite Universe / Multiverse:

If the universe or the set of many worlds is infinite, the number of observers is also infinite.

Then there are problems with anthropic probabilities:

First, the % of observers of each type becomes undefined, because there are infinitely many observers who see result #1 and also infinitely many who see result #2, regardless of whether X is true or not.

I think that problem is not so serious, because it seems to me that ratios of infinite numbers can still be well defined in physical situations. For example, suppose that widgets always occur in clusters of three, with always one blue widget and two green widgets. Then I would say that 1/3 of them are blue, even if there are infinitely many clusters. In practice, only if widgets are conscious does the question matter. So this principle is really a new philosophical assumption about how physics gives rise to a measure distribution of consciousness.

However, it is impossible to compare the number of observers in two incompatible models of physics if both are infinite. This makes it impossible in practice to use the SIA to compare models of physics, since thanks to SIA's easy confirmation of MW the viable ones will all have infinitely many observers.

Since the SSA deals only in fractions of observers within each model, it can still be used.

The SIA can still be used to compare closely related models if we assume something about how to make the comparison, such as models that are the same up until a special event occurs having the same number of observers before the event, and then using the ratios to compare the two models after it. Such an event may be a human decision whether or not to reproduce more, for example. Similarly, even in an infinite universe it is still desirable to maximize (all else being equal) the total number of people within the history of our Hubble volume.


Conclusion:
Due to the SSA's advantage in comparing infinite multiverses, and also influenced by the BBs argument and the Iterated Reproducers, I use SSA when comparing possible routes to the Born rule for MW QM. However, I do not consider the question settled. The Adam Paradox in particular seems to be a strong argument against the SSA, and I'm not sure that the Iterated Reproducers counter is sufficient. Also, both the SSA and SIA can lead to poor decisions depending on the situation, and it remains disturbing that the SSA can lead to a lower average utility. That failure for decision-making purposes may indicate a failure for informational purposes as well. However, while Dual-Objective Optimization may be best for decision making, it does not provide a recipe for assigning probabilities.


References:
* Indicates a point that is original in this post as far as I know.

As usual, external links may contain misleading arguments which I disagree with for good reasons not always worth mentioning here. Read at your own risk :)

https://en.wikipedia.org/wiki/Sleeping_Beauty_problem
http://www.princeton.edu/~adame/papers/sleeping/sleeping.html
https://wiki.lesswrong.com/wiki/Pascal's_mugging
http://www.nickbostrom.com/
http://www.anthropic-principle.com/?q=resources/preprints
http://www.anthropic-principle.com/preprints/spacetime.pdf
https://sites.google.com/site/darrenbradleyphilosophy/home/research
http://philsci-archive.pitt.edu/11864/

Featured Post

Why MWI?

Before getting into the details of the problems facing the Many-Worlds Interpretation (MWI), it's a good idea to explain why I believe t...

Followers