onQM

Ontology & Quantum Mechanics

2050-05-25T21:31:00.004-04:00

This blog is an experimental way to discuss topics in philosophy of physics, especially interpretation of quantum mechanics (QM), and some philosophy of mind.

I tend to support the many worlds interpretation (MWI) of QM, and computationalist philosophy of mind. However, I try to be objective and lay out the difficulties clearly.

I welcome comments and criticism. However, if you think that mainstream physics is nonsense and that you are a lone genius, please go elsewhere until you learn some physics.

Jacques Mallah, PhD. (jackmallah@yahoo.com)

Table of Posts:
Ontology & Quantum Mechanics

Chapter 1: Basics of Quantum Mechanics

1.1. Simple proof of Bell's Theorem

1.2. Why MWI?
1.3. Top 12 things to know about physics
1.4. on external links
1.5. Studying Quantum Mechanics: the Delayed Choice example
1.6. Key definitions for QM: Part 1
1.7. Key definitions for QM: Part 2
1.8. Key definitions for QM: Part 3
1.9. Studying Quantum Mechanics: Measurement and Conservation Laws
1.10 Studying Quantum Mechanics: Decoherence, Macroscopic Superpositions, and the 'Preferred-Basis Problem'
1.11. Further Study

Chapter 2: Probability in Many Worlds Interpretations

I. Interlude: Anticipating the 2007 Many Worlds conference
II. Interlude: The 2007 Perimeter Institute conference Many Worlds @ 50

2.1. Meaning of Probability in an MWI
2.2. Why do Anthropic arguments work?
2.3. What to optimize when guessing right for a higher percentage of people (on average) means guessing right for fewer people (on average)

2.4. Measure of Consciousness versus Probability
2.5. Why 'Quantum Immortality' is false
2.6. Early attempts to derive the Born Rule in the MWI
2.7. Decision Theory & other approaches to the MWI Born Rule problem, 1999-2009
2.8. MWI proposals that include modifications of physics
2.9. The Computationalist approach to Measure

III. Tangent: On Dualism
IV. Tangent: The Everything Hypothesis: Its Predictions and Problems

Chapter 3: Making Computationalism Precise: Defining Implementations

3.1. Basic idea of an implementation
3.2. The Putnam-Searle-Chalmers theorem
3.3. Restrictions on mappings 1: Independence and Inheritance
3.4. Restrictions on mappings 2: Transference
3.5. The hierarchy of implementations by a real physical system
3.6. Counterfactual Computations or Generalized Causal Networks?

V. Tangent: The Partial Brain thought experiment

VI. Tangent: A biologically inspired example of a simple computation: Integration of a signal

Chapter 4: Making Computationalism Precise: Counting Implementations

4.1. Counting Implementations: The Problem of Size
4.2. Independence Criteria for Implementations
4.3. Linear dynamics, independence, & noise
4.4. Born Rule compatible?
4.5. The problem of Boltzmann Brains
4.6. Possible changes to the physics

VII. Tangent: Implications for artificial intelligence

VIII. Tangent: Ideas on quantum gravity
- The problem of time and reference frames
- Indiscernibles are Not Identical
- The pseudo-Heisenberg-operator possible ontology

---------------------------------------------------------------------
My related eprints:

Structure and Dynamics in Implementation of Computations
This is my paper about implementation of computations which appears in the proceedings of the 7th AISB Symposium on Computing and Philosophy (2014). It is largely compatible with the ideas presented on this blog, but contains a few other ideas and explanations.

The Many Computations Interpretation (MCI) of Quantum Mechanics
This is my 2007 eprint explaining my ideas on the Many-Worlds Interpretation of QM. It applies computationalism to QM and in order to do so first covers some of the same material as 'Structure and Dynamics' though the latter is more up to date. I intend to create another paper which focuses more on the application to QM and which should incorporate more thinking on the issue.

Many-Worlds Interpretations Can Not Imply 'Quantum Immortality'
This is my 2009 (revised 2011) refutation of the 'Quantum Immortality' fallacy. It includes a thorough discussion of how measure of consciousness relates to effective probabilities, though some of the material is somewhat dated. I intend to revise it to increase its clarity, while also updating the section on anthropic probability to reflect the material discussed in post 2.3 above.

The point is that the consciousness from before the QS does not get funneled into the surviving branches. Thus, the total amount of consciousness decreases. An argument is made that if consciousness did get funneled, it would contradict basic ideas of how brains give rise to consciousness and would have other implications that are clearly false.

Quantum Suicide Could Never be a Beneficial Decision
A newer draft paper on the subject.

---------------------------------------------------------------------

Introduction to Bayesian Probability (PDF slides)

Independence Criteria for Implementations

2017-09-14T15:35:00.002-04:00

As explained in the previous post Counting Implementations: The Problem of Size, the measure of a conscious computation within a system will be assumed to be proportional to the number of independent implementations of that computation. Independence Criteria for Implementations (ICI) must be chosen appropriately, and all possible mappings must be considered so as to maximize the number of simultaneous independent implementations for all conscious computations.

Intuitively as well as based on human observations, measure should not be a super-strong function of brain size or flexibility, so criteria that lead to exponential dependence on size are unlikely to be correct. In particular, mappings that include a functional channel plus an arbitrary combination of additional channels (any of which could have been a functional one given different initial conditions) would not be independent (intuitively) as the functional channel does all of the explanatory work about what computations are occurring, and as (problematically) the number of possible combinations grows exponentially with the number of channels. However, the Born Rule suggests that the size (squared amplitude) of terms in the wavefunction is indeed a factor, either directly or indirectly.

ICI proposal 1: Substate-style Criteria

For substates within a computation, the criterion for their independence from each other is roughly speaking that each substate should depend (in whole or in part) on a different physical variable (which may be either actual variables much like particle positions in classical configuration space, or an index which a group of variables depends on much like the quantum wavefunction depends on directions in particle-configuration space). Could a similar idea hold for independence of implementations?

Consider a digital variable that can take on 4 possible values. It can correspond to a pair of bits with possible values 00, 01, 10, 11. Within a computation, these can not be considered as independent substate bits because allowing that sort of thing would open the door to clock-and-dial-style false implementations. But for two different implementations (whether of the same computation or of different ones), each of which is by itself an allowed implementation, there seems to be no problem with allowing each of the above bits to play a role in a different one of the implementations. So while it is possible to use similar independence criteria across implementations as within them, that seems unnecessarily restrictive, and I do not find it intuitively appealing.

But that's not the only problem. Consider a system with N channels each of which can perform the same kinds of computations, similar to the earlier example of the Problem of Size, but for this example, suppose that it is a system with linear dynamics. It could for example be a quantum system, although any linear system would do.

For the initial conditions suppose that channel #1 is set up to perform a computation of interest, while the other channels are in their static conditions. For a quantum system, that means that the wavefunction is zero in the other channels, and remains so.

Consider mappings which are based on sums over various combinations of the channels. For example, a mapping M1 maps values of sums in corresponding substates of channels #1, 3, 5, and 8 to computational states. Another mapping M2 uses different channels, such as #1, 2, 3, 4, and 5.

How many implementations of the computation are performed by this system? If the Born Rule can be derived for this system, that depends on the amplitude of the wavefunction, but that's not what I want to focus on yet. In any case, the substate-style criterion leaves no direct role for the amplitude, though see below for how it might play an indirect role based on overcoming 'noise' in a many-particle world.

The question now is: How does the number of implementations depend on the number of channels, N? Intuitively, because there is no activity in channels other than #1, the number of channels should not affect the number of implementations. But mappings that combine different channels linearly, such as M1 and M2 do, satisfy the substate-style independence criteria. If each new mapping can either include or reject each channel (other than channel #1, which must be included in order to implement the computation), there are 2^(N-1) such mappings. This is actually an overestimate, because for any pair of mappings there must be at least one channel that the first includes but the second does not, and also vice versa. For example, the mapping that includes all odd # channels is not independent of the one that includes all channels. Still, this is not the proper dependence on N, which should be none.

It can be concluded that the substate-style criteria are not appropriate.

ICI proposal 2: Full Independence

An alternative is to allow any set of mappings such that being given each implementation mapping and the corresponding computational state places no restrictions on what computational states any of the other mappings correspond to. For the case of the classical ball system considered for the Problem of Size this rules out the problematic arbitrary combinations of channels, because all of the mappings using the same functional channel would have to be in the same computational state (or in one which can be mapped to it using similar substates and transition rules). It also allows the digital variable with 4 possible values to play a role in two different implementations.

For the linear system with N channels, this alone is not sufficient, because the same problem that the substate-style criteria encountered applies here. An added restriction can be placed as follows:

Change in Private:
If there is a region either of physical variables, physical indices, or physical values which is not shared by the mappings:
Both for initial and subsequent time steps, a difference in any computational sub-state that the mapping uses the non-shared region to determine must require a corresponding difference in the non-shared region, so that if there is no such difference there, no computational state is mapped to. For a given mapping, the shared region may or may not have to change also (not restricted by this condition).

This solves the N-channel problem because the non-shared channels don't meet the above conditions if the mappings give computational states that are determined by the state of channel #1 when the other channels remain at zero wavefunction.

The reference to a nonshared region of "physical values" for a particular physical variable is something that wouldn't make sense for substate independence within a single mapping, since all substates must be mapped to simultaneously within a single mapping. For different implementations, there is no such restriction.

What does it imply for quantum mechanics? That question will be studied in more detail in later posts, but note that scaling up wavefunction amplitudes does not appear to open up any possibilities for additional implementations under these criteria, for either of the above ICI proposals. As will be explored, the only way to produce the appearance of the Born Rule under such circumstances would be if a fixed 'noise level' in the wavefunction were somehow competing with the 'signal' and if increasing the amplitude allows more implementations indirectly because it allows a smaller volume of configuration space to support an implementation capable of beating the noise. This idea is interesting but has its own problems.

ICI proposal 3: Partial Independence (Note: This is a work in progress)

Knowing that the Born Rule is the goal, perhaps independence criteria could be found that satisfy the restrictions on independence suggested by the Problem of Size, while also opening the door to a more direct relationship between wavefunction amplitude and number of implementations. Increased complexity would argue against it, but if the problems of the competing-noise approach can be avoided that would be a point in its favor. Can such criteria be formulated, and if so, are they at all an intuitively reasonable possibility?

Counting implementations: The Problem of Size

2017-08-30T00:21:00.000-04:00

Counting implementations: The Problem of Size

In order to get predictions out of a computationalist model, several ingredients are required:
1) Some way of determining which computations are implemented;
2) some way of determining which ones give rise to consciousness;
3) some prediction (which need not be exhaustive) about what such a consciousness would observe;
4) and some way of determining the effective probability of different types of such observations.

The implementation problem has already been addressed in previous posts on this blog. Problems 2) and 3) are hard in principle, but for present purposes, it should suffice to assume that the human brain performs the appropriate computations (whether analog, digital, or mixed) to give rise to human-like observations, and that systems that are not brain-like (or AI-like) give rise to no observations. Going beyond that assumption is not something that I will pursue, mainly because I have no way to do so.

The effective probability question is quantitative and the answer to it could have important implications. To start with, it would have to be the link between computationalism and the Born Rule of quantum mechanics, and if the Born Rule were shown to be incompatible with the effective probabilities of computationalism, then the model must be discarded (either by modifying the assumed physics or by rejecting computationalism). In practice, it may be enough to find a formula such that that the computationalist probabilities need not be incorrect, but it would be better to derive them on other grounds.

A formula for determining effective probabilities - which is to say, relative amounts of consciousness - could have other implications. For example, certain types of brain structures might have more consciousness than others, and if we know that they do, an argument could be made that those brains ought to receive more privileges. This is especially relevant to the prospects of humans peacefully coexisting with potentially conscious AIs. This could be a dangerous thing to study, but a sufficiently intelligent AI could probably study the question on its own, so for us to study it before AIs even exist might be a good idea.

The effective probability of an observation is the fraction of consciousness that makes that observation. The amount of consciousness will be assumed to be proportional to the "number of independent implementations" of each type of conscious computation. Criteria for "independence" in this context must be chosen for this purpose; at this point they may or may not be similar to the criteria for substate independence within a computation, although such similarity is certainly a hypothesis to explore.

In principle, the amount of consciousness might also depend on other characteristics of a computation, such as its complexity.

In the case of a system with continuous variables and/or infinite extent, there might be an infinite variety of independent implementation mappings, and in such a case a "correct" regularization must be used to find a ratio of finite numbers and then a limit taken towards infinity.

If there are were distinct systems each of which implements one computation, we need only count the systems of each type - in other words, just do a head count for each type of observation.

But in general, every system implements many computations. It is quite possible for one system to implement the same computation many times over - and indeed, that is what must happen if the MWI of QM is true. Even in classical mechanics though, it could certainly happen.

In there any sense in which the size of a computer affects the number of implementations which it performs? One might think so because a larger system could provide more ways to make mappings. On the other hand, it is not generally believed that the size of a computer would affect the amount of consciousness it gives rise to, and if it did it would have strange implications. Yet also, in MWI QM the "size" (in the sense of squared amplitude) of each branch of the wavefunction must indeed affect the effective probabilities if that model is correct.

Consider a simple computer that operates using the collisions of balls to create logic gates. There may be several parallel channels though which the balls can pass.

A variety of mappings could be made by including or not including each channel, with the one actually used always included. The number of such mappings would grow exponentially with the number of channels. Clearly such a strong dependence on size is problematic. It makes sense that these mappings would not be "independent" in the appropriate way, and so this exponential growth of measure can be ruled out. The exact criteria for such independence are still to be determined, but should reproduce that result.

What about the size of the ball itself? A mapping can be made for each small part of the ball to an appropriate computational substate, and these can be combined to form an (exponential) multitude of overall implementation mappings, each of which will have the correct causal behavior given appropriate restrictions. But these all must "implement" the same computation at the same time; any difference would fragment the balls and ruin the causal behaviors. These mappings, too, should not be considered independent.

Similarly, consider the light that reflects off of the ball. It will be correlated with the position of the ball, so it provides an alternative thing to make an implementation mapping with - at least for final computational states. But implementations based on these mappings will not be independent of each other or of those based on mappings from the ball itself.

Next, I'll consider some possible Independence Criteria for Implementations (ICI) that meet these restrictions, and the implications of each of these flavors of ICI.

Counterfactual Computations or Generalized Causal Networks?

2016-02-19T16:46:00.000-05:00

When determining whether a physical system implements a particular computation, if only the actual sequence of states is considered, rather than the whole set of counterfactual states that the computation could have been in, it would not be possible to distinguish between a valid implementation by a system that implements it as opposed to a false implementation by a system that is much too simple to implement the computation of interest.

For example, consider a set of clocks, c_1 through c_N. The initial state of the clocks can be mapped to any desired initial bitstring, and at each time step thereafter, the state of each clock can be mapped to what the state of that bit would be according to any desired computation involving bitstrings which change in time according to a given set of rules. There is no problem here with the first criterion for independence: Each bit depends on a different physical variable. Even the second criterion for basic independence could be also satisfied by using a new set of clocks or dials at each time step, instead of the same clocks.

Requiring the correct counterfactual transitions rules out such false implementations, since a hypothetical change in the state of one clock at the initial time step would no longer result in states at the next time step that are correct according to the transition rules for the computation, since each clock is unaffected by the others. This was Chalmers' first move in his formulation of the CSA to rule out the false implementations discussed by Searle and Putnam, and in many discussions of that work it is the only thing considered worth mentioning.

Counterfactual relationships are thus considered key to ruling out false implementations, but relying on them introduces a new problem for computationalists: It seems implausible that parts of a system which are never called upon to actually do anything would have any effect on consciousness, yet such parts can determine the counterfactual relationships between system components.

For example, consider a NAND (Not-AND) gate. Sufficiently many NAND gates can be connected to each other in such a way as to perform any desired bitwise computation (neglecting for now any further structure in the computation), and historically ordinary digital computers were sometimes constructed solely out of NAND gates as they tended to be cheap.

The NAND gate takes input bits A,B and outputs a bit C. If A=0 or B=0, then C=1, and otherwise (meaning A=1 and B=1) C=0.

Such a NAND gate can be implemented as follows: Daphne, Fred, Shaggy, and Velma are all recruited to help. Each is assigned a two bit code: 00 for Daphne, 01 for Fred, 10 for Shaggy, and 11 for Velma. The values of A and B are announced in order while the recruits get a snack. Meanwhile, a coin is placed face down on a table signifying C=0.

During the meal, a monster shows up and chases the recruits, but they manage to escape. Shaggy is the last to get away, though he is separated from the others. He smokes weed to calm himself down.

After the meal, the person whose code matches the numbers read has the job of making sure the coin is correctly positioned for the next time step. If C is supposed to equal 0 then it should be face down, and if C is supposed to be 1 it should be face up.

The actual numbers were 1 and 1, so Velma returns to do her job and verifies that the coin is face down. Seeing that the coin is already face down she leaves it alone, just as she does with the other coins on other tables in the room. Was this a valid implementation of the NAND gate? That depends on what would have happened in the counterfactual situations of other values for A and B. For example if Shaggy would have forgotten to do his job, it would not have been a valid implementation. Yet (though very slowly), we could use such NAND gates to run any desired artificial intelligence program.

In pseudo-code, this NAND gate works as follows, where for example C(0) means the coin state at time 0:

LET C(0) = 0
SELECT CASE A,B
CASE 0,0: Daphne sets C(1)=1
CASE 0,1: Fred sets C(1)=1
CASE 1,0: Shaggy sets C(1)=1
CASE 1,1: Velma verifies C(0)=0 (which implies C(1)=0) or sets C(1)=0 otherwise
END SELECT

Since Velma doesn't change anything, her role can be eliminated while preserving the proper transition rule, and it would still be a NAND gate. But that is also troubling: In the actual situation, no coin was directly influenced by things resulting from the values of A,B. It is merely that the other recruits refrain from influencing the coin in question, just as they do the other coins. It may be that the other coins are being used by other groups of recruits for their own NAND gates. Some of the other coins may be initialized to face up instead, and by coincidence, perhaps none of the coins will ever need to be flipped.

Even more troubling is the following: Suppose that Shaggy would have remembered his job, but in order to get back to the coin room in time, he would have needed to climb a rope hidden in a dusty room. If the rope were strong enough, he would have made it, but actually the rope is too weak. No one realizes that, and he never actually entered the dusty room or even saw the rope. Yet the state of that rope determines whether or not this was a valid NAND gate implementation, and by extension, such things could apparently determine whether our AI is conscious or not.

It would seem that the rope should not matter in the actual situation, but as we have seen such counterfactual situations are key to ruling out false implementations and can't be ignored. Note that the mere fact that the implementation is baroque isn't the problem; our own cells do operate like miniature Rube Goldberg machines. The problem is that the rope never actually played a role in what happened, yet its strength determined the success of the implementation. Similarly, perhaps Daphne would have found a dusty old Turing machine and ran a program on it, and only would do her job after the program halted. In that case, the halting problem for a program that was never run determines the validity of the NAND gate implementation!

Maudlin (Computation and consciousness) gave a similar example as an attack on computationalism: He described a water-based Turing Machine which follows a predetermined sequence for one set of initial conditions, but calls upon different machinery if the state is different from that which is expected. He points out that the link to the other machinery may be broken, and if so, the computation is not implemented. However, it seems implausible that the other machinery could matter in the case in which it's not used.

In a similar vein, Muhlestein (Counterfactuals, Computation, and Consciousness) discussed a light-based cellular automaton which can have its counterfactual sensitivity over-ridden by a projection of the correct pattern and sequence of lights onto its elements, and concluded that computationalism must be false since it's implausible that the projection makes a difference as the pattern of which elements are lit remains the same as does the operation of each component device.

Such things do seem implausible, but I must also note that there is no logical contradiction in it, and a case could be made that seemingly inactive components are doing more than one might think: they propagate forward in time, have well defined transition rules, and refrain from changing the value of the states of interest in cases where they should not. It is possible to retain the role of counterfactual situations as I have described for determining what computations are implemented, and that is the standard approach among computationalists.

Nevertheless, the above implications of computationalism are bizarre and perhaps too absurd to accept, and if an approach can be formulated that avoids them, it would be more plausible. Computationalist philosophy of mind is by no means firmly established enough to dismiss such concerns; on the contrary, it is hard to see how anything can give rise to consciousness, whether computational or otherwise.

In the above example, there are two problems:

1) With the actual values of A=1,B=1, the computation could be implemented in a way that didn't seem to require that these variables are the _cause_ of the output being what it was (since Velma could be removed without any change in the output), and so the system seems to have the wrong causal structure.

2) For other values of A,B, it seems like the exact complicated events that _would have_ transpired in those cases should be irrelevant to whatever consciousness the system would give rise to.

To address these problems, I want to first require that in the actual situation A=1,B=1 are the _cause_ of C(1)=0 in a sense to be defined. If that is so, then a network of such causal relationships among variables may be enough for consciousness, without requiring the correct counterfactual behavior of the whole system for other values of A,B.

But that seems problematic, because causation is defined in terms of counterfactual behaviors! A cause is typically defined as a "but-for" cause: The value of A causes the value of C(1) if there is a change in the value of A that _would have_ resulted in a change in the value of C(1). Therefore, in order to establish causation, we still still need to know what would have occurred in the counterfactual situations.

Intuitively, it seems like there should be a way to establish causation without knowing what would have occurred in the counterfactual cases. But what if the output's value is totally unrelated to A,B? That possibility needs to be ruled out, or else the system could be rather trivial and certainly not the sort of system that should give rise to consciousness.

Consider each CASE of A,B values as a different channel of potential influence on C(1). The counterfactual channels need to be selectively blocked if we are to establish that the actual channel was a cause of the value of the output, without having to look at the full counterfactual behavior of the system. If they are blocked, and changing A,B results in no change in the output, then there is no causation between A,B and the output; if the output would have changed, then there is such causation.

The channels could be selectively blocked if the mapping were augmented in such a way that the pseudo-code could be described as follows:

LET C(0) = 0
SELECT CASE A,B
CASE 0,0: IF D THEN Daphne sets C(1)=1
CASE 0,1: IF F THEN Fred sets C(1)=1
CASE 1,0: IF S THEN Shaggy sets C(1)=1
CASE 1,1: IF V THEN Velma verifies C(0)=0 (which implies C(1)=0) or sets C(1)=0 otherwise
END SELECT

If D,F,S were (counterfactually) set to FALSE, then the counterfactual channels would be blocked.

Now if V is TRUE, is there causation between A,B and C(1)? Since the coin would have been face down anyway, it would appear not.

But there is a difference between this coin and all of the other coins: If the coin had been face up, it would be changed by Velma to face down. This difference can be exploited by allowing consideration of the counterfactual case in which C(0)=1. The system implements something along the lines of: IF [(V=TRUE AND A=1 AND B=1) OR C(0)=0] THEN C(1)=0. In situations of this type, I will say that A,B are "generalized causes" of C(1)=0.

Note that if V is FALSE in the actual situation, then even if D,F,S are TRUE so that the NAND gate is implemented, the generalized causal link between A,B and C(1)=0 is missing, so this system would NOT play the same role as part of a Generalized Causal Network (GCN).

Also, if C(0) not equaling 0 is not possible, then the generalized causal link between A,B and C(1)=0 is likewise missing.

What if the underlying physical system is too simple to allow this kind of augmented mapping with D,F,S variables? If so, then I don't think the problem arises in the first place: A,B are causes of C=0 if the NAND gate is implemented by a system which is too simple to involve complicated chains of counterfactual events.

Another interesting example of a GCN is a "computer with a straight-jacket". This works as follows: With the actual initial conditions, the computer is allowed to run normally. However, it is being watched. If the state of the computer at any given time is different from the expected sequence, it will be altered to match the state of the expected sequence; otherwise the watcher won't touch it. Could this system implement a conscious AI, if the computer could do so had it not been watched? Since the computer is not actually touched, it would seem that it could, but it does not have the right counterfactual behavior for the computation due to the effect of the potentially interfering watcher. It does however have the right GC network, because removing or blocking the watcher is analogous to setting D,F,S to FALSE in the above example. It should be noted though that there are other ways to deal with the "computer with a straight-jacket", such as noting that at each time step it implements a computation 'closely related' to the original one which can be enough for the conscious AI; the watcher is treated as input in this case.

In the case of Muhlestein's cellular automaton, the projection of a spot of light onto a cell (call it P=TRUE) is analogous to the command LET C(1)=0 being placed after the END SELECT. This ruins the NAND gate as well as the causal link from A=1,B=1 to C(1)=0. However it is much like a combination of a "straight-jacket" and the initialization C(0)=0.

The underlying system appears to implement something analogous to: IF (A=1 AND B=1) OR P=TRUE then C(1)=0. In this generalized mapping, there is sensitivity to A,B, so I'd say they count as generalized causes in this case; in other words, the cellular automaton with the projected lights would still be conscious if the original one would have been.

Klein gave an account of Dispositional Implementation, based on 'episodic' implementation of computations, to solve the problem of the implausibility of relying on inert machinery, which he dubbed the Superfluous Structure Problem (SSP). My solution in terms of GCNs is essentially the same as his solution in terms of dispositions, except that it is better-defined. Bartlett criticized Klein's solution on the basis that it conflicts with the 'activity thesis' (which Bartlett found plausible) that only physical 'activity' matters; as a result Bartlett thought that Klein's solution was really just computationalism. Klein's idea does conflict with the 'activity thesis' since it also brings in dispositions which ultimately rely on physical laws instead of just physical structure. The 'activity thesis' ought to be discarded and to me it was never plausible in the least. I read Klein as one who actually rejects the standard 'activity thesis' for the right reasons, yet uses variant language in which he relies on his own modified 'activity thesis'. Perhaps if Klein had written instead of GCNs, Bartlett would have better understood the idea as being distinct from both the 'activity thesis' and standard computationalism.

Is it a form of computationalism? It is not a form of standard computationalism because a system in which standard NAND gates are all replaced by "gates" which produce the same output on all inputs could still implement the same GCN as the original system. In the above example, that would involve Daphne, Fred, and Shaggy all deciding in advance that they would place the coin face down instead of face up like they were supposed to, while in the actual situation they are never called upon to attend to the coin since Velma's numbers were called instead.

However, if we consider the entire spectrum of computations implemented by the whole system - in other words, we consider not just the original mapping but also the generalized one - then we have enough information to know what GCNs are implemented or not implemented. GCNs are simply a way of characterizing the structure and function of a dynamical system, just like computations are. In that sense, I would say that it is a generalized computationalism, which evades the SSP while being philosophically the same as computationalism in all other ways that matter. I will not make a distinction between 'generalized computationalism' and computationalism unless the technical difference is relevant in a particular case.

So far I have discussed discrete computations here. What about analog continuous computations, such as those implemented by systems described by coupled differential equations such as those of fluid mechanics? In that case the generalization is as follows: Instead of looking at differential equations that tell us what the system would do with any initial condition, we need only concern ourselves with the differential equations that hold for the actual situation; if those are the same for two systems, and the initial conditions are the same, then they implement the same generalized computation even if they would have behaved differently from each other given a different set of initial conditions.

What to optimize when guessing right for a higher percentage of people (on average) means guessing right for fewer people (on average)

2016-02-01T16:01:00.001-05:00

Previous: Why do Anthropic arguments work?

Anthropic Probability: The 'Imperfect Test for X' (ITX) Thought Experiment

Suppose that an experiment is not 100% reliable (as is the case for all real experiments). e.g. If X (which is some proposal about physics) is true, then the experiment probably shows result #1, but 'only' in 99% of such experiments; there is a 1% chance it will show result #2. If X is false (which can be written as ~X or "Not X"), then these percentages are reversed. Suppose that the experiment is expensive and time-consuming to perform, as is often the case, so on each planet people will only do it once. For this thought experiment, assume that these planets in question are the only place where any kind of people live in all of existence; in the "Variant thought experiments" section below I'll discuss the effect of changing that assumption. Let's use anthropic probability (based on observer-counting) to determine what the people should conclude about how likely X is to be true.

The Fixed Population Case:

Suppose also that the total number of people in the universe (as well as their likelihood to learn the result of such an experiment) does not depend on X.

Now, if before performing the experiment I have a 50% subjective credence that X is true, then by Bayes' Theorem, if the experiment shows result #1, I should then be 99% confident that X is true.

Note on notation: P(X|1) is "the conditional probability of X given 1", which is a standard notation, but I mention it since not all readers are necessarily familiar with it.

Explicit Bayes theorem:

P(X is true | I see result #1) =

P(I see result #1 | X is true) P(X) / [P(I see result #1 | X is true) P(X) + P(I see result #1 | X is false) (P~X)]

The same equation in more compact notation:

P(X|1) = P(1|X) P(X) / [P(1|X) P(X) + P(1|~X) P(~X)]

P(X) = P(~X) = 0.5

For P(I see result #1 | X is true) = P(1|X), I will assume for now that it makes sense to use 0.99; and for P(1|~X), 0.01.

Then P(X|1) = (0.99)(0.5) / [(0.99)(0.5) + (0.01)(0.5)] = 0.99.

If the universe is large enough, there will be at least a trillion people who learn the result of this kind of experiment. If X is true, 99% of those people see result #1, and 1% see result #2. If X is false, 1% see #1, and 99% see #2. No matter what the real truth value of X is, some people will see result #1 and some will see result #2, but if we follow the above procedure most people (99%) will guess correctly whether X is true. This is what we want to happen if we want more people (see below) to be right about things instead of being wrong.

I am then 99% subjectively likely to be in the group that got the right answer, since 99% of people who actually exist did so (regardless of the truth value of X), and I know I am one of the people who exist in the real world with its fixed truth value for X.

How likely was I to observe that really?

It might at first seem obvious that P(1|X) must indeed be 0.99 practically by definition. But it is far from obvious, because in order for me to see any result, I must first exist. P(1|X) can be written more explicitly as
P(I exist and see result #1 | X is true).

Since the experiment is 99% reliable, this equals (0.99) P(I exist | X is true) = (0.99) L(X).

How can we evaluate the subjective probability L(X) that I would exist if X were to be true? One attempt is to note that I do exist, so if X were to be true or false, I could still always take the subjective probability that I exist as 1. But I don't know if X is true. Obviously, if the physics of X were such that no observers could exist were X to be true, and I know that, then I can conclude that X is false.

If both X and not-X would lead to observers, then what is the likelihood that one of those observers would be me in each case?

For the case in which both situations result in the same number of observers, we can assume that this likelihood is the same for both X and not-X. Then it cancels out in Bayes' theorem, and we can ignore the issue. That is equivalent to what I did above when I used 0.99 for P(1|X) and 0.01 for P(1|~X). To be more precise I could have multiplied them both by the same number L, and L would appear in both numerator and denominator, and would cancel out.

The Variable (X-Dependent) Population Case:

That's simple enough, but now consider a case in which the total number of observers in the universe DOES depend on the truth of X.

(This is equivalent to a particular variant of the Sleeping Beauty Problem (SBP) - a famous thought experiment - but I won't use that one here as it must be properly formulated to be more relevant to uncertainty about cosmological issues. Most statements of it are not so formulated. My example is a more natural thought experiment for such issues.)

Suppose that we know that if X is true, there are 100 N observers in the world, but if X is false there are 9900 N observers. Each planet has N observers. Now:

If X is true, then 99 N observers see result #1, and N observers see result #2.
If X is false, then 99 N observers see result #1, and 9801 N observers see result #2.

What should an observer think if he sees result #1? There are two commonly argued views. A case could be made for either, depending on what we mean by wanting "more people" to be right:

The Self-Sampling Assumption (SSA): (aka 'halfer' in the SBP)
If we want to optimize the average percentage of observers who guess (or bet) correctly, then as before, we should want that observer to think that X is true.

This can be done by assuming that L(X) = P(I exist | X is true) is independent of the number of observers who would exist if X is true, and so on, as long as at least one observer would exist in each case.

The Self-Indication Assumption (SIA): (aka 'thirder' in the SBP)
However, if we want to optimize the average NUMBER of observers who guess (or bet) correctly, then we will have no preference here; in either case, 99 N observers will guess correctly.

This can be done by making the updated probability proportional to the number of observers rather than to the % of observers. e.g. With the SIA, we assume that L(X) = P(I exist | X is true) is proportional to the number of observers who would exist if X is true, and so on. Unlike the SSA, this has a nice smooth limit in the case where X would lead to zero observers, since then L(X) = 0.

With the SIA, as the number of observers who see result #2 increases, the fraction of observers who see result #1 decreases, and the SIA probability is proportional to the former but also to the latter, i.e.

P(I see #1 | X) is proportional to [(total number of observers if X is true) x (fraction of observers who see #1 if X is true)] = [number of observers who see #1 if X is true]

and

P(I see #1 | not X) is proportional to [(total number of observers if X is false) x (fraction of observers who see #1 if X is false)] = [number of observers who see #1 if X is false]

To put it (SIA) another way, suppose that the same 99 N observers who see result #1 would have existed no matter whether X is true or false. If X is 50% likely to be true, these observers would ideally assign a 50% chance to the hypothesis that X is true. There will also exist additional observers, all of whom would see result #2; no given observer will be less likely to see result #1 as a result of X being false. Thus, those observers who see result #1 have no evidence that X is false.

To see why it could matter, introduce consequences to believing correctly about X. Suppose that each planet has the option to try a new experiment which could either unlock an much-needed unlimited source of energy or blow up catastrophically (making everyone on the planet horribly sick). The only thing that determines which outcome happens is that a certain parameter must be set to a value that depends on whether X is true. For a planet that measured result #1 in the first experiment, what truth value should they assume for X when designing the second experiment?

If the designers go with SSA, they will believe that X is true with 99% confidence.

If they go with SIA, however, they will have only 50% confidence in X. Perhaps in this case they will not take the risk. Of course if they see result #2, they will be confident that X is false and in that case would take the risk.

And that is the correct policy: Because if result #1 is seen, there will be the same number of planets which believe that X is true whether it is true or not.

It would seem that SIA leads to better decision-making than SSA in such cases. That is a strong argument in favor of SIA.

Define Non-Anthropic Probability (NAP): The subjective probability that a model of reality is true prior to taking into account the numbers or fractions of any kinds of observers predicted by each model of reality.

There is a catch with SIA. Suppose now that the NAP that X is true is 90%, not 50%.

Suppose also that the safe experiment to try to find out whether X is true or not can't even be done prior to doing the dangerous experiment which could unlock energy supplies.

Using SSA, with no experimental data that just leaves us with the NAP (90% that X is true) as our subjective probability to work with.

Using SIA, we weight each case by the number of observers. So we will end up with a 8.3% subjective probability that X is true: (90% * 100 N) / [(90% * 100 N)+(10% * 9900 N)] = 0.083

In this case, we will believe that X is probably false. A wise thing to believe perhaps, since the consequences of choosing the setting on the dangerous experiment will affect far more people if X is actually false. But it is disturbing that there is a high NAP that we are mistaken and if so are condemning the actual people who do exist to a miserable life.

Pascal's Mugging:

A more extreme example is known as a "Pascal's mugging". A mugger claims that he has the power to and will create and eternally torture M copies of you (while first giving them your current experiences so you might be one of those as far as you can know) unless you hand him your wallet (or in other words, some outcome with a large negative utility). You assign a subjective non-anthropic probability P that he has the god-like powers he claims to, which will be very small but finite. If (as is commonly assumed in decision theory) you try to maximize the average utility, for any fixed P no matter how small, there is some large number which if given as M will convince you to hand over your wallet if you believe SIA. Once again, we are "probably" (according to the NAP) disregarding the interests of the actual observer (you) in favor of the many, many unlikely but possible observers who we are trying to save from eternal torture.

Dual-Objective Optimization (DOO)*

IMO, both SSA and SIA can lead to bad policies. SIA can benefit more people on average in simple cases like the "is X true" experiment but leaves all people vulnerable to being asked to sacrifice their own interests in favor of numerous-but-unlikely hypothetical people.

The best thing to do IMO depends on one's individual preference for how to trade off probable benefit with mean benefit, where mean here includes huge but highly unlikely benefits to large numbers of possible people weighted with the small probability of that happening. This is much like selecting a utility function, except that in standard decision theory, only mean benefit is maximized. It is better if a high % of people will guess correctly (SSA), and also if as many people as possible guess correctly (SIA).

One way to deal with a Pascal's Mugging is to ignore the threat if the NAP that the mugger can do as he threatens to is less than some predetermined value.

A Special Observer:

Consider the perspective of a special observer S, such that S would exist whether X is true or false. S can send a message to the other observers, but it is a one-way communication; he can never receive a reply. S does not know whether X is true or false and can't perform any experiment to try to find out.

Suppose that the NAP for X to be true is 90%. If all observers must perform a test to find out whether they are S or not, then prior to performing any such test or experiments, an observer who believes SSA assigns a 90% chance for X to be true, which is simply equal to his NAP. If he then discovers that he is S, he will further favor the belief that fewer observers exist (X is true), since a higher percentage of observers finds himself to be S in that scenario.

An observer who believes SIA at first assigns a 8.3% chance for X to be true, which is proportional to his NAP and to the total number of observers who would exist if X is true as compared to X is false. If he then discovers that he is S, he will adjust his probabilities towards the scenario with fewer observers, leaving him with a 90% subjective probability for X to be true, which is (and this is true in general for such cases) equal to his NAP.

Knowing that a safe experiment to try to determine if X is true can't be done, S decides to recommend a decision-making policy to the other observers, which will be either SSA or SIA. If he recommends SSA (and they follow his advice) then they will tend to assume that X is true. If he recommends SIA, then they will tend to assume that X is false.

So even if he believes SIA, he figures that telling the others to follow SIA is 90% likely to lead to the wrong choice. But, if X is indeed false (10% chance), then many more people will have made the right choice. What should he recommend?

If we are in the beginning stages of a long-lived human civilization, then we are in a similar position to that of S with respect to future people. Would a gamble be worthwhile which pays off if there are many future people, but which makes things worse in a higher-NAP scenario with fewer people? If so, can a high enough number of possible future people always outweigh a very small but fixed NAP for them to exist despite the probable cost to the fewer people who are more likely to exist?

The Adam Paradox (credit is due to Nick Bostrom):

Suppose that the NAP for X is 50%, but S has the power to make X true or false (e.g. by reproducing). If he believes SSA, then he believes that fewer people are likely to exist, so he thinks he will probably not reproduce. He decides to hunt deer as follows: Unless a deer drops dead in front of him, he will reproduce and have a huge number of descendants; if the hunt works he will have none. Since he thinks it unlikely that many people will exist, he is surprised if it doesn't work. On the other hand, if he believed SIA, then he would have had no such delusion; he would just think there is a 50% chance he will reproduce. Of course, the correct belief is probably that one.

The Iterated Reproducers Counter to the Adam Paradox:

The Adam Paradox seems to be a very strong argument against the SSA, but remember that the SSA is chosen to maximize the average % of people who guess correctly. So now consider a series of people. First comes Adam, who guessed wrong, and reproduced, producing a tribe of N people. The tribe goes through similar reasoning, and this results in a larger tribe of N^2 people. Again the same thing goes on resulting in a nation of N^3 people, and so on. Eventually, after some finite number of iterations M, the people then will guess right about their event in question; they will not reproduce. So the N^M people then alive will be correct in their guess based on the SSA, and they will be the majority of the people to ever have lived. In this series, if they had all used the SIA instead, the majority of people to have ever lived would have guessed wrong for any number M. (This is a version of the Doomsday Argument and is essentially the same as the "Shooting Room Paradox" of John Leslie.)

Variant thought experiments that give SIA - like probabilities even within the SSA:

- Independent Trials:

Suppose there are many universes which really exist, and the truth of X within each depends on the local environment; say X is true in 50% of them. For example X might be the statement that the electron to proton mass ratio is less than some value, and the ratio might vary from universe to universe. (As usual, the word universe here does not necessarily refer to the totality of existence; the word 'multiverse' would be used for that. A universe is some part of existence, which may or may not be the whole thing, but beyond which the observers in question can't make any observations.)

In each universe in which X is true, there are 100 N observers, and in one with X false there are 9900 N observers.

In this case, the observers in X-false universes outnumber those in X-true universes by 99:1. There is no doubt in this case that each observer should assign a 99% probability that X is false in his universe. The SIA gives that result for a single universe or any number of universes, while the SSA result for a single universe is 50% but approaches the SIA value as the number of universes increases.

In practice, for any non-cosmological statement used in place of X, the SIA very likely gives the correct result even according to the SSA, because such situations no doubt arise many times throughout the universe. For example, if we tried to perform a real Sleeping Beauty experiment, it is very likely that in all of time and space throughout the multiverse other people do so as well. These act as independent trials, and the SIA - like observer counting therefore applies, giving the 1/3 result in the SB experiment.

- A Quantum Trial:

Likewise, if MWI is true and a quantum measurement is used in place of X, and the Born Rule probability is used in place of the NAP that X is true, the SIA - like observer counting automatically applies when guessing which branch an observer is on, since the likelihood for that is proportional to the number of observers on that branch. This case is similar to that of many independent trials, but instead of relying on a large number of trials to produce an average % of people in each type of universe, the actual % of people in each type of universe is pre-determined and precisely known even for a single quantum coin flip.

- Observers Within a Larger Fixed Context:

If there are many other observers M known to exist besides for those Y which depend on X, then the SSA result reduces to the SIA result if the other observers greatly outnumber the X-dependent observers (M >> Y).

That's because the prior likelihood that a given observer is X-dependent is proportional in the SSA to the % of observers who are X-dependent, or in other words, is proportional to Y / (M+Y). Since M >> Y, that is approximately proportional to Y, which is like the SIA case.

Such variant thought experiments are sometimes used as arguments in favor of the SIA, because they are always much more like any possible non-cosmological experiment as compared to the original thought experiment. However, the whole point of trying to figure out the right way to do anthropic probability is to apply it to questions of cosmological uncertainty, such as whether one given theory of everything is more likely than another, or to questions which similarly affect the overall number of observers in the multiverse, such as that of whether artificial digital computers could be conscious (given that they have the potential to greatly outnumber biological observers) or to the Doomsday Argument question of how long-lived a typical line of sentient beings is likely to be.

These variant thought experiments don't help to answer those questions, but they do help point out the limitations of the SSA as a practical tool. Any advocate of the SSA must be fully aware of its nuances which make it give the same answer as the SIA in almost all practical situations (such as 1/3 in a realistic Sleeping Beauty experiment), while a SIA-believer has the luxury of just always giving the same answer to almost any question about anthropic probabilities.

The Reference Class Problem:

As noted above, the probabilities for X predicted by the SSA depend not only on the observers doing a particular kind of experiment or which can be correlated with X, but on the number of all observers. That makes it important to know what counts as an observer.

But, with a more careful formulation, we can say that it depends on the number of observers within the Reference Class, allowing that not all observers are really relevant to the probabilities we are studying.

For example, suppose there are 10 trillion mice, and 10 billion men if X is true; while there are 10 trillion mice, and 90 billion men if X is false. The NAP for X is 50%. Using the SSA, does finding oneself to be a man instead of a mouse provide evidence for X being false? (With the SIA, it would.) If mice don't count as observers, then the SSA likelihood in this case remains equal to the NAP, at 50%. But if they do, then the SSA likelihood approaches the SIA likelihood, which is 90% for X to be false.

The Questioner Reference Class:*

This is a difficulty for the SSA, but in my view, a reasonable choice for the Reference Class is to include only those observers who might ask this type of question. Since the SSA aims to maximize the % of observers who guess correctly, those observers must indeed guess. Mice very probably know nothing about X and are not capable of using anthropic reasoning to guess the likelihood that X is true, so they should not be included in the Reference Class.

Note that with the SIA, the Reference Class issue doesn't arise, because as the number of "other" observers increases, the fraction of observers involved in the question at hand decreases, and the SIA probability is proportional to the former but also to the latter, i.e.
P(I see #1 | X) is proportional to [(total number of observers) x (fraction of observers who see #1)] = [number of observers who see #1]

Easy confirmation of the Many-Worlds idea?

If you believe the SIA, then you must believe that the probability for you to exist would be much higher if MW (with its many observers) is true compared to if there is only one world and thus relatively few observers. So even if your NAP for MW is low, once you take the numbers of observers into account, your subjective likelihood for some kind of MW to be true is almost 100%.

Boltzmann Brains (BBs):

BBs are randomly assembled brains that may momentarily appear, such as in an infinitely long future of the universe. They could potentially vastly outnumber normal observers who result from Darwinian selection.

The SIA favors belief that a high number of observers have observations like ours. Using the SIA, even if there is just a small NAP that BBs vastly outnumber normal observers, there is a high subjective likelihood that they do, and that we are just BBs who by coincidence temporarily resemble brains of normal observers. I am not comfortable with that conclusion, and see it as an argument against the SIA.

Just using the NAP will not help enough. Suppose the NAP is 50% that BBs vastly outnumber normal observers. Shouldn't we think that our normal-seeming observations make that less likely? To reach that conclusion, we must use the SSA, since it favors belief that a high % of observers have observations like ours.

Infinite Universe / Multiverse:

If the universe or the set of many worlds is infinite, the number of observers is also infinite.

Then there are problems with anthropic probabilities:

First, the % of observers of each type becomes undefined, because there are infinitely many observers who see result #1 and also infinitely many who see result #2, regardless of whether X is true or not.

I think that problem is not so serious, because it seems to me that ratios of infinite numbers can still be well defined in physical situations. For example, suppose that widgets always occur in clusters of three, with always one blue widget and two green widgets. Then I would say that 1/3 of them are blue, even if there are infinitely many clusters. In practice, only if widgets are conscious does the question matter. So this principle is really a new philosophical assumption about how physics gives rise to a measure distribution of consciousness.

However, it is impossible to compare the number of observers in two incompatible models of physics if both are infinite. This makes it impossible in practice to use the SIA to compare models of physics, since thanks to SIA's easy confirmation of MW the viable ones will all have infinitely many observers.

Since the SSA deals only in fractions of observers within each model, it can still be used.

The SIA can still be used to compare closely related models if we assume something about how to make the comparison, such as models that are the same up until a special event occurs having the same number of observers before the event, and then using the ratios to compare the two models after it. Such an event may be a human decision whether or not to reproduce more, for example. Similarly, even in an infinite universe it is still desirable to maximize (all else being equal) the total number of people within the history of our Hubble volume.

Conclusion:
Due to the SSA's advantage in comparing infinite multiverses, and also influenced by the BBs argument and the Iterated Reproducers, I use SSA when comparing possible routes to the Born rule for MW QM. However, I do not consider the question settled. The Adam Paradox in particular seems to be a strong argument against the SSA, and I'm not sure that the Iterated Reproducers counter is sufficient. Also, both the SSA and SIA can lead to poor decisions depending on the situation, and it remains disturbing that the SSA can lead to a lower average utility. That failure for decision-making purposes may indicate a failure for informational purposes as well. However, while Dual-Objective Optimization may be best for decision making, it does not provide a recipe for assigning probabilities.

References:
* Indicates a point that is original in this post as far as I know.

As usual, external links may contain misleading arguments which I disagree with for good reasons not always worth mentioning here. Read at your own risk :)

https://en.wikipedia.org/wiki/Sleeping_Beauty_problem
http://www.princeton.edu/~adame/papers/sleeping/sleeping.html
https://wiki.lesswrong.com/wiki/Pascal's_mugging
http://www.nickbostrom.com/
http://www.anthropic-principle.com/?q=resources/preprints
http://www.anthropic-principle.com/preprints/spacetime.pdf
https://sites.google.com/site/darrenbradleyphilosophy/home/research
http://philsci-archive.pitt.edu/11864/

On dualism

2012-09-17T17:10:00.002-04:00

My approach to interpretation of QM assumes that the mathematically describable aspects of the world are ultimately responsible for why some conscious observations are more commonly experienced than others. I'll call that "reductionism". Many philosophers don't share that view, and here, I will consider the alternative viewpoint, which is generally known as "dualism" as it usually posits that "mind" (consciousness, qualia) and "body" (the aspects of the physical world which can be described mathematically) are two very different things with no logically necessary connection between them.

Both views have counter-intuitive implications, which is one reason that no consensus has been reached on the issue in philosophical circles. The other reason is that no consensus is ever reached on any philosophical question :)

The counter-intuitive implication of reductionism is that qualia - the way the colors appear to us, for example - either 1) are caused by mathematical properties, or 2) don't actually exist (known as eliminativism).

The problem with 1) is that our perceptions of color, for example, seem to have a "qualitative" aspect (e.g. red, green, blue) that doesn't seem like the sort of thing that mathematics could explain. There is an "explanatory gap" between them. The philosopher David Chalmers famously called the problem of understanding how math could explain things like qualia the "hard problem" of consciousness.

The idea 2) that qualia don't actually exist may seem absurd on the face of it, but upon closer inspection, it's a viable possibility. We think that qualia exist because our brains tell us that they do, but our brains are often wrong about what they are experiencing. This could make sense if the brain is composed of several parts or modules. The part of the brain that decides what it's experiencing need not be the same part that is undergoing the experience in question, if any. Thus, it could decide wrongly. This argument is related to my partial brain thought experiment.

Given these issues, it's not surprising that people find reductionism implausible. The alternative hypothesis is dualism: that qualia really do exist, but not due to mathematically describable properties. This avoids the problem of how math could cause the way that colors look, as well as the difficulty of believing that our brains are wrong about what they experience.

It's hard to understand what minds could be if they are not mathematically describable. Dualism introduces the question "What is mind?" which seems to me is as hard a problem as "How could qualia be mathematically caused?"

The other problem with dualism is that it doesn't explain why our brains tell us we have qualia. Telling us is a physical action, mathematically describable as processes in the brain, and it has ordinary physical consequences such as me typing this sentence. The brain's actions are determined by mathematically describable physics: electrical signals, chemicals, etc. So dualist qualia are epiphenomenal; they can't be what causes our brains to tell us we have qualia.

So if dualism is true, then there are two things going on at the same time: We have qualia, and unrelated to that, for some other reason our brains tell us that we have qualia. A dualist could argue that while that seems counter-intuitive, there is at least no problem in principle as there would be in trying to connect qualia to a mathematical explanation. However, the same is true for eliminativism, and that has the advantage of being less complicated.

If dualism were true, we might expect that some explanation of the "coincidence" between our qualia and the brain's belief in its own qualia must be rooted in anthropic selection; e.g. that there are many sets of laws linking minds to physics and that cases in which the "coincidence" doesn't hold see usually see just a random jumble. It seems unlikely to me that such an explanation would work, but I won't say that for sure. Partially it makes sense: By hitching its wagon to the mathematically describable functions of a brain, a dualist law connecting minds to physics would be more likely to produce a complex but coherent set of qualia. However, there is leeway. For example, if what the brain thinks it enjoys gave rise to pain qualia, and what the brain thinks it dislikes gave rise to pleasure qualia, would that not be anthropically valid? Another problem is that all of those laws (some of which wouldn't respect the Born Rule) could give rise to partially coherent "Boltzman brains" that outnumber normal observers.

Intuitively, we would want an explanation in which we do have qualia and they are responsible for why our brains think we have them. There is a version of dualism, called interactionism, in which that would happen - but it requires that our brains' thoughts and behavior are not based on the mathematically describable physical world, and that is highly implausible given what we know about both brains and physics. There is another major problem with it: Even a mental world could be divided up into pure experience qualia versus mathematically describable interactions; thus, interactionism reduces to epiphenomenal dualism, just with some of the mathematically describable action hidden away from the world described by known physics.

This should not be confused with 'quantum approaches to consciousness', which while also implausible, assume that the brains' behavior is due to their ability to collapse the wavefunction - which, even if it could occur, would just be another mathematically describable physical process. Likewise, even if psychic abilities existed, they would have mathematically describable causes and effects.

In general, ANY type of physical behavior can be described mathematically.

"Mental monism" aka "idealism" is the view that only minds exist; i.e. the physical world is sort of like a shared dream. This avoids the problem of linking minds to physics, but still would need to explain physics, with all of its mathematical behavior. It's hard to see how it could avoid introducing some mathematically describable things to help it along, and thus just become dualism.

In any case, supposing that dualism is true, what would it imply for interpretation of QM?

First, there would have to be a new law of nature linking the mathematically describable physics to mental properties. This would revive the possibility of single-world hidden variables, because unlike reductive computationalism, there is no logical reason that the new law couldn't just take the hidden variables into account. However, while that would be logically possible, it wouldn't be the simplest possibility, which is that the wavefunction (which must exist anyway) is what is taken into account. So based on Occam's Razor, we'd still have reason to believe in some kind of many-worlds interpretation. This kind of argument has been made by David Chalmers, who also argues that computationalism would still hold, and Don Page has made a proposal for this kind of dualist MWI.

The explanation for Born's Rule would probably be different with dualism that with reductionism, though. While it's possible that (as Chalmers argues) computationalism would still hold, and that Born's rule would follow from counting implementations, a more direct explanation becomes available: The dualist law could simply mandate that measure is proportional to the squared amplitude of the wavefunction, just as it does in Page's model.

There are few limits that we can place on what such a law of nature could be like - if it does exist, that is. It's not something we could investigate or deduce by logic. That is one reason that my investigations focus on the more restricted possibilities given by reductionism/eliminativism; the other is that I find it more plausible.

1-page Bell's theorem

2012-07-10T16:17:00.001-04:00

One particle is sent to Alice; the other to Bob; they may be very far apart.

Alice <----------------------------- source --------------------------------> Bob

Each can measure the ‘spin’ of their particle along some direction; each result is + or –. The probability that Alice and Bob get the same (+ or -) result as each other depends on the directions they measure along. For a certain type of source, if they both measure in the same direction, they always get opposite results.

Each of the Observers will ‘choose’ one of three directions: A, B, or C. This ‘choice’ can be made using any procedure or device, however complicated; therefore, it should be considered unpredictable, even though it may be made using deterministic physics.

Distant-Measurement-Independent Result (DMIR): The assumption that THE (single) result of each measurement can’t depend on which direction the other Observer ‘chose’.

Note: If DMIR is false there are 3 possibilities, of which the first two are taken seriously:
1) Nonlocality: An instant (faster-than-light) hidden signal which conveys the information about the measurement angle (which can be ‘chosen’ right before measurement) to the other particle, no matter where it is or how far away.
2) Multiple outcomes of each measurement actually do occur (as in the MWI).
3) “Conspiracy theories” in which the other particle somehow can predict the angle.

Assume DMIR. Then each particle needs to know in advance what result to give for any angle so that they will always give opposite results when both Observers choose the same angle. Hypothetical properties that’d determine the results are called hidden variables.

Notation: Let P(A+ & B-) mean the probability that the hidden variables are such that result + would be found by Alice if she measures along A, and result – if along B.

Since more-general cases are at least as probable as less-general ones:

P(A+ & B-) = P(A+ & B- & C-) + P(A+ & B- & C+) ≤ P(A+ & C-) + P(B- & C+)

It is not possible to measure Alice's particle along more than one direction, but measuring Bob’s particle should reveal the opposite of the result Alice's particle would have given. Let P(A+ // B+) be the probability that result + would be found by Alice in direction A, and result + would be found by Bob in direction B.

P(A+ // B+) ≤ P(A+ // C+) + P(B- // C-)

Quantum mechanically, if A and B are at a 90 degree angle, with the C direction halfway in between them, then P(A+ // B+) = .25,
and P(A+ // C+) = P(B- // C-) = .073

Since .25 > .146, the inequality is violated; DMIR is not consistent with QM.
Violations of such inequalities have been confirmed experimentally; DMIR is false!

Why do Anthropic arguments work?

2012-05-25T17:03:00.000-04:00

See "Meaning of Probability in an MWI"

Anthropic arguments set the subjective probability of a type of observation equal to the fraction of such observations within a reference class. This is what I use for "effective probabilities" in the MWI after a split has occurred (the 'Reflection Argument' in the previous post).

There is sometimes some confusion and controversy about such arguments, so I will go into more detail here about how and why the argument works.

The anthropic 'probability' of an observation is equal to what the probability would be of obtaining that observation if an observation is randomly chosen.

Does this imply that a random selection is being assumed? Is it implied that there some non-deterministic process in which observers are randomly placed among the possible observations?

No! I am not assuming any kind of randomness at all. All that I am doing is using a general procedure – known as anthropic reasoning - to maximize the amount* of consciousness that guesses correctly the overall situation.

Suppose that, prior to making any observations, X is thought 50% likely to be true. If X is true then one person sees red and nine people see blue. If X is false, then nine people see red and one person sees blue. If you see red, you should think that X is probably false, with 90% confidence.

If people always follow this advice, then in cases like this, 90% of the people will be right. True, 10% will be wrong, but it’s the best we can do. The given confidence level should be used for betting and/or used as a prior probability for taking into account additional evidence using Bayes' theorem.

That is why the “effective probability” is proportional to the number of people or amount of consciousness; it is not because of some kind of ‘random’ selection process.

The next point is that "number of people" is not always the right thing to use for the anthropic effective probabilities. In fact, it only works as an approximation, and only in classical mechanics even then. The reason is that the amount of consciousness is not always the same for each "person". This is especially true if we consider effective probabilities in quantum mechanics, which are proportional to squared amplitude of the branch of the wavefunction. In such a case, we must set effective probabilities proportional to "amount of consciousness" which is a generalization of the idea of "number of people". I call this amount "measure of consciousness" (MOC) or "measure".

Note: In my interpretation of QM - the many computations interpretation - I do assume that the measure is proportional to the number of implementations of the computation, which can be thought of as the number of observers. However, many of the points I make in posts here do not rely on that interpretation, so the more general concept of measure is generally used.

There is no reason not to apply the same kind of reasoning to cases in which time is involved: In such cases, this maximizes the fraction* of consciousness which is associated with correct guesses. In a large enough population (which is certainly the case with the MWI), this is the same as maximizing the amount of consciousness associated with correct guesses at a given global time.

With all of this talk about consciousness, am I assuming any particular hypothesis about what consciousness is? No, I am not.

What about eliminativism - the idea that consciousness as it is commonly understood does not really exist? That's no problem either! I am just using consciousness as a way to talk about the thing that observers do when they observe. Even the most radical eliminativist does not deny that there is something about the brain that is related to observational processes; whatever that is, more people would have more of it.

Rather than "consciousness", perhaps it would be more precise to talk about "observations" or "queries". Remember, effective probability maximizes the fraction of correct answers; this implies that queries are being made. What about the quantum case, in which the "amount of queries" is proportional to squared amplitude? To make sense of this in an eliminativist view, it may be necessary to take a computationalist view, and let the "amount of queries" be the number of implementations of an appropriate computation. On the other hand, for a dualist, the effective probability should be set proportional to the amount of consciousness that sees a given "query".

Given these different philosophies, without implying any position on whether "consciousness" really exists or not, I will continue to use the term "amount of consciousness" to stand for whatever the quantity of interest is that generalizes the notion of "number of people" to give anthropic effective probabilities.

* When considering the consequences of a fixed model of reality, there is no difference between maximizing the amount of people who guess correctly as opposed to maximizing the fraction of people who guess correctly. However, if different hypotheses which predict different amounts of people are compared, there is a difference. This is closely tied to the philosophical arguments known as the Sleeping Beauty Problem and the Doomsday Argument. I discuss this important topic in the following post.

Interlude: The Partial Brain thought experiment

2011-12-29T14:58:00.003-05:00

Mark Bishop's 2002 paper "Counterfactuals Cannot Count" attacked the use of counterfactuals in computationalism using a neural replacement scenario, in which the components (e.g. neurons) of a brain (which may already be an artificial neural net) are replaced one at a time with components that merely pass through a predetermined sequence of states that happens to be the same sequence that the computational components would have passed through. Eventually, the whole brain is replaced with the equivalent of a player piano, unable to perform nontrivial computations.

There is a long history in philosophy of mind of using the neural replacement thought experiment to argue that various factors (such as what the components are made of) can't affect conscious experiences. For example, David Chalmers used it as an argument for computationalism. It works like this: the rest of the brain can't have any reaction to the replacement process, since by definition the replaced components provide the same signals to the rest of the brain. It's been argued that it doesn't seem plausible that the brain could be very much mistaken about its own experiences, so a gradual change in or vanishing of consciousness is taken to be implausible. A sudden change isn't plausible either, since there's no reason why a particular threshold of how far the replacement has gone should be singled out.

Bishop's argument is really no different from other neural replacement thought experiments, except in the radical (to a computationalist) nature of its conclusions. So, if neural replacement thought experiments do establish that consciousness must be invariant in these scenarios, then computationalism must be rejected.

My Partial Brain thought experiment shows that neural replacement thought experiments are completely worthless. It works like this: Instead of replacing the components of the brain with (whatever), just remove them, but provide the same inputs to the remainder of the brain as the missing components would have provided.

What would it be like to be such a partial brain? Some important features seem obvious: it is not plausible that as we let the partial brain decrease in size, consciousness would vanish suddenly. But now it's not even possible (unlike in neural replacement scenarios) that consciousness will remain unchanged; it must vanish when the removal of the brain is complete.

Therefore, progressively less and less of its consciousness will remain. In a sense it can't notice this - it beliefs will disappear as certain parts of the brain vanish, but they won't otherwise change - but that just means its beliefs will become more wrong until they vanish. For example, if the higher order belief center remains intact but the visual system is gone, the partial brain will believe it is experiencing vision but will in fact not be.

The same things would happen - by definition - in any neural replacement scenario in which the new components don't support consciousness; the remaining brain would have partial consciousness. So neural replacement scenarios can't show us anything about what sorts of components would support consciousness.

The partial brain thought experiment also shows that consciousness isn't a unified whole. It also illustrates that the brain can indeed be wrong about its own conscious experiences; for example, just because a brain is sure that it has qualitative experiences of color, that is not strong evidence in favor of the idea that it actually does, since a partial brain with higher-order thoughts about color but no visual system would be just as sure that it does.

Restrictions on mappings 2: Transference

2011-12-14T14:10:00.006-05:00

In the previous post, Restrictions on mappings 1: Independence and Inheritance, the "inheritance" of structured state labels was explained; it allows the same group of underlying variables to be mapped to more than one independent formal variable. In the example a function on a 2-d grid was mapped to a pair of variables.

Transference is something like the reverse process: It allows a set of simpler variables to be mapped to a structured state function on a grid.

This allows ordinary digital computers to implement wave dynamics on a 3-d spaces, which could matter for the question of whether the universe could be ultimately digital. The AdS/CFT correspondence in some models of string theory would need something similar if the bulk model is to be implemented on the boundary in the computational sense.

Transference can be Direct or Indirect. It works like this:

Direct Transference could be used in a mapping by taking the value from a given variable and turning it into a label for structuring a set of new variables.

For example, if there is a single integer variable I(t), we can transfer its value to label to a set of bits B(j) which each only depend on whether I(t) equals the value of its label, e.g.

B(j) = 1 if I = j
B(j) = 0 if I does not = j

These bits can be considered an ordered series of "occupation tests" of the different regions that the underlying variable's value could be in.

Of course, only one of these bits at a time will be nonzero. But they are to be considered independent variables. At this point you might object: If you know the value of the nonzero one, don't you know the other bit values must be zero? But just as Inheritance carved out an exception to the rule for independence, so would Direct Transference carve out an exception to it.

Going the other direction is no problem: If we restrict a mapping such that only one bit in an ordered set B(i) is nonzero, then a new variable I can be constructed such that I has a value equal to the index i of the nonzero bit. Here we are doing the reverse.

We can't double count, though; if we make the new set of variables b(i), we can't make a second independent new set of variables c(i) which gets its label transferred from the same underlying variable I(t) for the same values of I.

If we have two underlying variables I and J, we could similarly use Direct Transference to map them to a 2-d grid of bits, B(i,j), in which only one bit is nonzero.

If we then re-map this grid using inheritance we could arrive back at our original I and J variables. So, basically, what Direct Transference is saying is that these two pictures are really equivalent.

We could also map the two of them to a single 1-d series of variables, e.g. which are the sum of the respective 1-d series of bits. (Since the value of the sum becomes 2 when X=Y, these are trits, not bits.)

Can the variables that were obtained using Direct Transference be used to make a mapping so flexible that it must not be allowed? Something like a clock and dial mapping? The answer to that certainly appears to be, no. And that may be justification enough for allowing it; my philosophy is to be liberal in allowing mappings, as long as those mappings don't allow implementations of arbitrary computations.

Indirect Transference is a little more complicated. Consider a computer simulation of dynamics on a 2-d grid, f(x,y). When the value of f is updated at the pair of parameters x and y, this can be done by setting one variable equal to x, another equal to y, and using them to find the memory location M in the computer at which the corresponding value of f is to be changed. Since updates of f at (x,y) always involve fixed values for each of those parameters, f(x,y) can be labeled by those values. In this way, mapping of the values of f to the actual function of x and y, f(x,y), is considered a valid mapping, even though the computer's memory is not laid out in that matter. This is an example of Indirect Transference. It can be generalized to any case in which a parameter of a function is used.

The Everything Hypothesis: Its Predictions and Problems

2011-10-06T11:55:00.008-04:00

There are some basic questions about the world:
1) Why does anything exist, instead of just nothing?
2) What does exist?
3) Why does that stuff exist instead of other stuff?
4) Given that stuff does exist, why is there consciousness instead of just behavior?

These questions are so basic that it would be nice to know the answers to them before worrying about specifics of our own situation.

Regarding these questions we can make highly counter-intuitive observations:

Suppose that somehow you didn't know that anything exists, and you are asked to guess: Does anything exist? My guess would certainly be "No, it would make a lot more sense if nothing exists. That is not only much simpler, it's also a lot easier to understand how it might be that nothing exists."

OK, but things do exist. So why THIS instead of THAT? We don't know. And not only that: It seems impossible to even imagine any reason why one possible thing would be selected over another. You can't say "it's because of this other thing" (whether the other thing is a law of physics, a god, or whatever) because that doesn't explain anything, it just begs the question of "So why does that other thing exist?" and we are back to the start.

There are basically two schools of thought on consciousness: 1] Dualism: Consciousness is a basic thing, which can not be due to something mathematically describable; or 2] Reductionism: Consciousness is an inevitable consequence of certain mathematically describable things. While I tend to fall into the latter camp, both ideas have counter-intuitive implications which I will not address in this post.

At least we do have ideas to debate about consciousness. Are there even any ideas about possible answers to the other questions, controversial or not?

It turns out that there is an idea which might begin to address them to some extent, called the Everything Hypothesis:

What if everything that possibly could exist does exist?

This would seemingly avoid avoid the apparent paradox of some things existing rather than other things despite there being no possible reason why that could be the case. It is also the simplest possible set of things that could have existed (other than just nothing) due to being fully symmetrical.

However, it doesn't really answer question 3). To really put question 3) to rest we would still need to know "Why does everything exist?" which would also cover question 1).

It turns out that there is an idea that could address that; this idea is sometimes called "radical Platonism" in analogy with certain ideas that Plato had, but it is really a modern idea that was perhaps, although not necessarily inspired by Plato, given a bit of philosophical confidence by his precedent.

The idea is that there is no fundamental physical reality; instead, the fundamental reality is the world of logical and mathematical possibilities (which would thus better be called actualities). Of course, it remains difficult to understand why a logically possible world would automatically be real enough to have real observers inside it; but if that is the way it is, the Everything Hypothesis would have to be true. This can be seen as a 'new version' of either Question 1 or Question 4.

While Max Tegmark is credited with the first paper on the Everything Hypothesis, various other people came up with similar ideas on their own. I was one of them.

There are variations or more limited versions of the Everything Hypothesis based on what is meant by 'Everything'. Tegmark's version of the Everything Hypothesis is explicitly mathematical: namely, that every possible mathematical structure exists. If Dualism is false, then that is equivalent to the full Everything Hypothesis. However, if Dualism is true, then consciousness and laws related to it are other things within the set of Everything.

The next question is: What does the Everything Hypothesis predict?

This can be put into familiar terms for a many-worlds model: What measure distribution on conscious observations does it predict?

At first glance, one might think that a typical observer within the Everything would see a quite random mess. If so, the Everything Hypothesis must be false, since we see reliable laws and a highly ordered universe.

However, taking a computationalist view of mind, only certain mathematical structures would support observers: those that have the equivalent of time evolution with respect to some parameter (or some suitable substitute), and have reliable laws for such dynamics. We should therefore consider the set of such dynamical structures with all possible dynamical laws. The starting state may be completely random, but the state will no longer be so random once time evolution occurs.

While we don't know how deal with the set of all possible dynamical systems, there is one subset that is much easier to handle: Turing machines, which are digital computers. A universal Turing machine can run the equivalent of any digital computer program.

So we can look at a simple universal Turing machine and consider the set of all programs for it. Such a program is just an infinitely long symbol string, which I'll call a bit string for simplicity. The machine has a 'head' that move along the string and changes bits and a few value values internal to the head.

Most programs have only a finite number of bits that actually do anything (the active region), because the head is likely to be instructed either to halt or to enter an infinite loop. The number of programs that are the same within the active region but different in the region of bits that don't do anything decreases exponentially with the number of bits in the active region, because each bit brought into the active region is a bit that can't be randomly varied in the inactive region.

Therefore, shorter and simpler programs have a higher measure (more copies) than longer programs do. The typical laws of worlds simulated by these programs are therefore likely to be as simple as possible, consistent with the requirement that observers exist within them. Perhaps, then, our own world is such a world.

That is an impressive result! It would certainly be interesting - though probably it will always be beyond our capabilities - to know exactly what would be typical for observers in the set of all Turing machine program runs to see.

However, there's a big problem here for the Everything Hypothesis: there are infinitely many possible Turing machines and digital computers in general. We can pick one, but that contradicts the fact that radical Platonism must have no arbitrary choices - no free parameters - if it is to explain why the world is the way it is. So why not just pick all of them and weight them equally? The problem there is that since there are infinitely many, the order in which we list them makes a difference to the result we get when trying to get the measure distribution. It's a very small difference for 'subjectively reasonable' choices of these parameters, but that's not the point; ANY arbitrariness completely ruins the explanatory power of radical Platonism.

Besides, what about continuous systems? Some people are content to assume that the fact that there appears to be no natural measure for them means we don't need to include them in the Everything even if we are Platonists. However, it seems to me that they are just as much legitimate candidates for existence as digital systems.

Reluctantly, I am forced to conclude that - unless there is some way of overcoming these mathematical problems that we don't know about, which seems unlikely - Platonism does not provide the explanation we were looking for. This is a paradox, because it also seems that Platonism is the only thing that even could have been a real explanation for why things are the way they are.

With the Everything Hypothesis we are still left with a 'new version' of Questions 1-2: NOT "What does exist, and why?" but "What is the measure distribution, and why?" This is a change, at least! Perhaps it is a more tractable question, but for now it is still a question which demands explanation for which we have none.

I still think that the set of all things which exist is probably very simple in some sense and that the physics we see is just a small part of it. We see the part of it that we see due to being fairly typical observers.

Perhaps the right way to derive the Born Rule of quantum mechanics would be to start with something like the set of all possible Turing machine programs and derive from it the measure distribution on conscious observations, but obviously, such a project would be all but impossible to carry out in practice. My work will focus instead on studying the consequences of the standard physics equations (which are based in large part on experimental observations). However, my criteria for implementation of a computation are general and do not depend on the assumption that the underlying system is a physical one, so in principle, they should apply even to underlying systems that are Platonic Turing machines.

Restrictions on mappings 1: Independence and Inheritance

2011-10-04T20:50:00.015-04:00

Previous: The Putnam-Searle-Chalmers Theorem

Chalmers proposed that a computation state be represented by a string of numbers, where each part of the string is "independent"; he proposed that independence can be achieved if each part of the string depends on a different part of the underlying system. He called this a Combinatorial State Automaton (CSA), because the state is given not by a single number but by the combination of the different parts of the string. The CSA avoids the false "clock and dial" implementations discussed in the previous post.

However, as Chalmers noted, there are still false implementations that can be found for the CSA. Suppose that each part of the underlying system which is mapped to part of the string at the current time step contains all of the relevant information about every other relevant part of the system at the previous time step.

In that case, a mapping can be made in which - although each part of the string depends only on its own part of the underlying system - a sequence of values can be assigned to each part of the string in such a way as to mimic any desired computation: enough information is available for the value assigned to that part to depend on any desired combination of the values of the parts of the string at the previous time step.

The amount of information that must be stored would grow exponentially with time, which leads to systems that quickly become physically unrealistic after a few time steps, but the fact that the problem exists in principle is enough to show that a different restriction on mappings is needed.

CSSA:

An implementation criterion will be given for a formal system consisting of a structured set of variables (also called substates), and a transition rule. Such a system will be called a Combinatorial Structured State Automaton (CSSA). It differs from Chalmers' CSA in that the structure of the state is not restricted to that of a string of values. For example, the substates could be arranged into a rectangular matrix, or a higher-dimensional array, and this arrangement and its specification would be considered part of the specification of the CSA.

One reason for including such structure is to allow label inheritance (explained below) from an underlying computation to another computation. While the ultimate underlying system is assumed to be the physical universe (or whatever underlies that!), the definitions must work in general so that the underlying system can be anything that can be described mathematically, because the criteria for implementation of a computation should not depend on experimentally discovered facts of physics but rather on pure mathematics.

It will also be useful in analyzing systems to allow computations to implement other computations. For example, if an object implements a "Windows machine" computation, then we only need to check variables on the level of that computation (rather than looking at fundamental systems variables such as the wavefunction's behavior in terms of electron positions) to see if it also implements the "Firefox" computation. In this case the "Windows machine" is now treated as the underlying system. Many of the simple examples I will use will involve a simple underlying digital system.

The criteria given below are a modification of the ones given in my MCI paper, and will be incorporated into a revised paper.

Independence:

The basic rule for independence is that it must not be possible to determine from knowledge of the physical states that are mapped to the value of the current substate and of the system dynamics

- the values of any of those substates (at the previous time step) that determined what the value of a given substate is at the current time step, except when the current value itself provides enough information; or

- the values of other substates (at the same time step), except when the current value itself provides enough information.

This rules out clock and dial mappings, because the clock and dial physical state which any substate depends on would reveal all formal values of the other substates and the previous substates. It also rules out the other false implementations discussed above in which the variables record information that would reveal the values of the previous states that determine them.

Inheritance:

However, it is sensible to allow cases in which interacting formal substates share the underlying variables that they depend on. This is especially important with quantum mechanics (see below).

In these cases, independence is instead established with the help of “inheritance” of labels. Underlying label indices are inherited when a substate depends on the values of that index, associating the substate with that index.

Knowledge of the variables that determine the other substates - modulo permutations (or in other words, limited knowledge that would not be affected by such swapping) of the labeling indexes that are inherited by the given substate - must not reveal the value that the given substate has (at the same time step).

The restriction that knowledge of the underlying variables that determine the substate must not reveal the values of the substates at previous time step that determine its value (with exceptions as the basic rule for inheritance above) still applies as well, but modulo permutations of the inherited indexes for the respective substates.

An example helps show what is involved here:

Bits on a 2-d grid plus time --> (mapped to) 2 integers as functions of time

b(i,j,t) --> I(t), J(t)

The computation would then involve the dynamics of I and J, such as
I(t+1) = f(I(t),J(t))
J(t+1) = g(I(t),J(t))

where f and g are appropriate functions.

Each bit takes on the value 0 or 1. If the number of nonzero bits is not equal to 1, this mapping will be undefined (there is no need for the mapping to cover all possible states of the underlying system).

If only one bit is nonzero at a given time, then let
I(t) = the value of i at which b(i,j,t) is 1
J(t) = the value of j at which b(i,j,t) is 1

If we were to swap around the values of the i-index, it would affect the value of I(t) but not of J(t), and similarly for the j-index affecting J(t) but not I(t). Therefore the conditions are met so that I(t) “inherits” the i label and J(t) the j label.

To verify independence - even though both I and J depend on all bits on the grid in the sense that only one nonzero bit is allowed - knowing the full state of the 2-d grid modulo a permutation of the j-index (that is, knowing it only up to a possible swapping around of the j-index values) reveals the value of I(t), but not J(t), and vice versa. Thus they are independent.

In this example, an underlying system which consists of a function ON a 2-dimensional grid was reduced to a system of two variables, which together define a single point IN a 2-d space. This can be summarized as “on --> in”.

In cases where the underlying system has other variables, e.g. the k index in b(i,j,k,t), a mapping might not be symmetric with respect to the other variables. For example, we could have

X(t) = b(1,1,1,t)+ b(1,2,2,t) - b(2,1,3,t) - b(2,2,4,t)
Y(t) = b(1,1,1,t)+ b(2,1,3,t) - b(1,2,2,t) - b(2,2,4,t)

In this case, are X and Y independent, with X inheriting from i and Y from j? If two values of j (say j=1 and j=2) are swapped as indexes of b(), then the value of X could change: b(1,1,1,t) + b(1,2,2,t) is not b(1,2,1,t) + b(1,1,2,t). On the other hand, if we only had a limited knowledge of b() and that knowledge is symmetric with respect to such swaps, we wouldn't know that. This mapping is OK. It is perhaps better to think of it as if the terms in X(t) are labeled by different i-values and j-values (the values the mapping is supposed to inherit from) while ignoring their functional dependence on k (which is not being inherited by anything).

The concept of inheritance can be generalized to cases in which more than one underlying variable is responsible for the distinction being inherited. For example, here X inherits from i, but Y is to inherit from j and k:

X(t) = b(1,1,1,t)+ b(1,2,1,t) - b(2,1,1,t) - b(2,1,2,t)
Y(t) = b(1,1,1,t)+ b(2,1,1,t) - b(1,2,1,t) - b(2,1,2,t)

Inheritance can also occur with multiple levels of functionals. For example, suppose the underlying variables are a functional g of function f(i,j), e.g. if i and j are bits, here g=g(f(0,0),f(0,1),f(1,0),f(1,1)). An intermediate mapping could be made with substates that inherit from f and are functions on (i,j). This can then be mapped to a pair of substates X,Y that inherit from i and j respectively.

In quantum mechanics, the wavefunction is a function ON a high dimensional configuration space. In classical mechanics, the set of particle positions defines a point IN configuration space. It certainly seems that, in the classical limit, the computations that would performed by the related classical system (such as a system of classical switches) would in fact be performed by the actual quantum system. Inheritance of labels - from quantum directions in configuration space to classical position variables - permits that. Quantum field theory, which is even more of a realistic model, involves functional dependency of the wavefunctional on a function of space.

Union:

A substate can be mapped based on a region that includes a number of subregions each of which is associated with certain label values. If each such subregion meets a disinheritance criterion, those certain label values can be considered separately for determining disinheritance. This rule is particularly useful when dealing with Quantum Field Theory.

For example, suppose that several bits are to be mapped from psi(F1,F2,F3). In a straightforward inheritance scenario, perhaps one such bit, bit A, depends on how psi is distributed among values of F1, with F2 disinherited (perhaps summed over) and a particular value of F3 is chosen for the mapping. Another bit, bit B, might then disinherit F1 but similarly depend on the F2 distribution and use the same F3 value. Recall that disinheriting F1 means that swapping the values of psi evaluated at different F1 values with the other F's held fixed has no effect on B. Since A and B disinherit different variables, they are independent.

The Union rule means that we can choose a second F3 value and with it held fixed let A (besides for what it depended on before with the first F3 value still fixed for that) also depend on the distribution of psi on F2 (but not F1) with the new F3 value, while B also depends on the distribution of psi on F1 (but not F2) with the new F3 value, and A and B are still considered independent.

This sort of thing lets two substates in QFT depend on the same areas of space as long as the configuration of the environment distinguishes them, since particle identity is not available to do so. For example, given a row of evenly spaced switches, for certain combinations of states the row might be translated so as to replace one switch with the next; this should not ruin the computation.

The Putnam-Searle-Chalmers Theorem

2011-10-03T16:19:00.007-04:00

Previous: Basic idea of an implementation

If a computation is implemented, there must be a mapping from states of the underlying system to formal states of the computation, and the states must have the correct behavior (transition rule) as they change over time.

For example, for an electronic digital computer, we can map a high-voltage state of a transistor lead to a "1" and a low voltage state to a "0". By doing the same for various other circuit elements, we can obtain an ordered string of 1's and 0's. This string will change over time as the voltages change. If, due to the laws of physics and the way the circuit is connected, this string must always change in accordance with the transition rules for computation C, then the system implements C.

We must allow as much flexibility as possible in the choice of mapping, because we are trying to understand the behavior of the system without any reference to things outside the system. Human convenience in recognizing the mapping is not a consideration.

The obvious way to try do this is simply to allow any mapping that is mathematically possible. This leads to what I will call the naive implementation criterion, because while it may sound good at first it is not a viable option for a satisfactory criterion. Chalmers' paper Does a Rock Implement Every Finite-State Automaton? explained this in detail.

The following is my version of what I'll call the "Putnam-Searle-Chalmers (PSC) Theorem" which shows that unrestricted mappings are not a viable option:

Suppose that a system consists of two parts, S and T, each of which has a numerical value, and that the dynamics of our system are as follows:
S(t+1) = S(t)
T(t+1) = T(t) + 1

These dynamics are fairly trivial; S is a dial that maintains a constant setting, while T is a clock.

We will check to see if this system implements a computation, C, which has the transition rule X(t+1) = F(X(t)) where t is a time index.

Here X need not be a single number; it might be a string of bits, for example.

F could be a complicated function, such as F(X) = the Xth prime number (expressed in base 2, where X is an integer expressed in base 2).

Now make a mapping M going from (S,T) to X with the following properties:
X = M(S,T)
M(S,T+1) = F(M(S,T))

We do need to make sure that any possible starting value of X is allowed by the mapping, which we can always do if our system has enough possible dial values.

Now, according to the mapping, X will change as a function of time based on the dynamics of the system as follows:
X(t+1) = M(S(t+1),T(t+1))
= M(S(t),T(t)+1)
= F(M(S(t),T(t)))
= F(X(t))

Therefore, the system would implement the computation if this mapping is allowed. But the system's dynamics are trivial while the computation can be a very complicated function. Obviously, this is not acceptable; the computation does not characterize the behavior of the system at all. This is what I call a "false implementation". All of the complicated dynamics of the computation have been put into the mapping. It is therefore necessary to put restrictions on what mappings are allowed.

One possibility is to require that each part of a string that defines a formal computational state (e.g. each bit in a bit string) takes its value based on a different part of the underlying system. That is basically what Chalmers proposed to overcome the problem.

While it somewhat counter-intuitively rules out distinctions based on the value of a single number (since a single number can still be mapped to any other single number), it goes a long way towards ruling out false implementations, while still allowing important standard examples of implementations that we want to retain, such as mapping switch positions to bit values (assuming classical physics).

However, it is not quite right. There are some systems for which it still allows false implementations, and there are other cases where it rules out what seem to be legitimate implementations - and these become clearly important when quantum mechanics is brought into the picture. In classical mechanics, different particles are different parts of the system; in quantum mechanics, different particles are different directions in the shared configuration space on which the wavefunction evolves.

Next: Restrictions on mappings 1: Independence and Inheritance

Basic idea of an implementation

2011-10-01T16:55:00.011-04:00

In conceptual terms, to say that a system implements a given computation is to say that something about that system - some aspect of its behavior - is described by the computation. It is thus a way of characterizing the system.

For example, if a system can take two numbers as inputs and produces their sum as an output, we could characterize it as an "adder". If instead it can take two numbers as inputs and produces their product as an output, we could characterize it as a "multiplier". "Adding" and "multiplying" are two different _types_ of computations. These behaviors are fairly general, so they can be can be further divided into specific computations if more details are specified, such as what format the inputs are in, internal functioning during intermediate steps, and so on.

Two "multipliers" with different intermediate steps would implement different computations. Thus, while the word "computation" suggests something that is done to get a result, that is misleading in the context of the computationalist view of consciousness, which instead focuses on internal functioning of the system in question.

If we find a third system and discover that it behaves like an "adder", then it has a lot in common with our other "adder": we now know a lot about how it can behave. But there's also a lot that this characterization does NOT tell us. It doesn't tell us what the system is made out of; or what its other behaviors might be; or what internal processes it uses to perform the additions. It also does not tell us whether it performs the addition multiple times, redundantly, perhaps performing the same addition again but sending that (same) result to some other person.

In addition to its capabilities, a system's behavior is further characterized by the exact sequence of computer states that are actually involved in its behavior. For example, an adder which added the numbers 45 and 66 and got 111 is characterized by those numbers, in addition to just being an adder. Such a sequence of numbers (or more precisely the sequence that includes all intermediate states of the computation) can be called the "activity" of the computation, and together with the computation it specifies what is known as a run of the computation.

It is important to note that neither the computation alone nor the activity alone are enough to adequately describe the behavior of the system; both are needed.

Often, I will speak only of computations, but it should be understood by the context that "runs of computations" are often meant as well. For example, if I say that a particular computation gives rise to a particular conscious observation, I mean that a particular run of that computation - corresponding to a particular starting state - gives rise to that observation. A different run of the same computation (such as the same type of brain except in a different starting state) might then give rise to a different kind of observation or to none at all.

In the case of programmable systems, the distinction between computations and runs becomes somewhat arbitrary, but this is not a problem as we can always specify what computations we are interested in and use that to make the choice.

A system could be both an adder and a multiplier; for example, the universe is both if at least one of each exists as a sub-system of it. Yet a hypothetical universe with 100 adders and 1 multiplier would be significantly different from one with 1 adder and 100 multipliers. A more detailed way of characterizing such systems would be to state how many instances of each kind of computation is performed by it: in other words, to give a measure distribution on computations. Such a measure distribution (or more exactly, a measure distribution on runs of computations) is the main tool needed to evaluate the computationalist version of the Many Worlds Interpretation of QM, and will be addressed in later posts.

While artificial computers tend to operate on command, there is nothing about the concept of implementing computations that requires that. For example, a robot that walks around and listens for numbers, then once it has heard two of them, adds them and says the result, then starts again, would still be an "adder". Such a robot would behave according to its internal agenda, ignoring the desires of the people around it even if they beg it to stop.

It's also important to note "outputs" need not be distinguished from internal parts of the system. Any parts of the system that produce the behavior in question are the relevant parts, regardless of how they interact with things outside the system.

For a computation with more than one step, it is useful to define "inputs" as substates that are affected by influences outside the scope of the computation in such a way that their values are not determined by the previous state of the computation. Influences outside the scope of the computation are not necessarily outside of the system (such as the universe) which performs that computation.

Those familiar with computer science might be surprised that the concept of the Turing Machine will play no role in using computations to characterize systems. A Turing Machine is a type of computer that is useful for specifying which functions could be calculated by in principle by digital computers (if unlimited memory - an infinite number of substates - is available, but the transition rule for each substate depends on a finite number of substates) and how easily (setting bounds for how memory needs and processing time scale with the size of a problem).

Because this class of computers is so ubiquitous in computer science, those computer scientists who dare venture into the swamps of philosophy far enough to read a paper or attend a lecture on the implementation problem sometimes completely lose interest in the problem when they realize that no one is mentioning Turing Machines. In fact, it would be quite easy to describe Turing Machines as a special case of the computers that I will describe.

But here we are concerned with characterizing the behavior of actual systems as they exist, not with finding what size of problems they could be programmed to handle. It is also important to note that digital computing is just a special case of the behaviors that could be characterized as computation; analog computing can certainly be considered as well, and most of the definitions I will make are general enough to cover all cases. However, I will focus on digital computation in most of the examples I look at.

The next step is to formalize the idea of implementation by giving a mathematical criterion for whether a computation is implemented by a given mathematically described system. This will be done by requiring a mapping from states of the underlying system to formal states of the computation, and requiring the correct behavior of the states as they change over time.

However, as will be seen in the next post, this approach quickly runs into a problem: without restrictions on allowed mappings, 'almost any' physical system would seem to implement 'almost any' computation. This absurd result would imply that a rock implements the same computations as a brain. The likely solution is to impose restrictions on the allowed mappings, but finding a fully satisfactory set of restrictions has proven to be a difficult task. My proposals for this will be presented in subsequent posts; I think that I have been successful in finding (at least close to) the right set of restrictions.

The Computationalist approach to Measure

2011-09-26T15:14:00.004-04:00

For reference, before reading this post, read the posts on Meaning of Probability in an MWI and Measure of Consciousness versus Probability.

Here, I'll outline the way in which the Computationalist philosophy of mind suggests that it might be possible to calculate the measures of consciousness for various outcomes.

My goal in this is to either derive from the MWI the Born Rule (which allows us to calculate the probabilities in QM), thus providing strong support for the computationalist view of the MWI, or to show that the experimentally discovered Born Rule is inconsistent with these popular views of mind and physics.

Computationalism is basically the idea that a brain gives rise to conscious observations due to its mathematically describable functioning: the motions of its parts together with the laws of physics that restrict and determine those motions.

With the existence of consciousness thus being in principle describable mathematically, we can hypothesize that it might be possible to analyze the mathematical description of a physical system, and based on it determine not only whether consciousness is present, but also to determine that multiple different types of observations are being made in different parts of the system. We can make a reasonable guess as to the nature of those observations.

We can also attempt to determine what the "quantity of consciousness" is for each significantly different type of observation. If we can, then we have a measure distribution, which we can compare to what the Born Rule predicts.

In doing so, it is necessary to work with computations as proxies for the conscious observations which are assumed to accompany them. The nature of the link between computations and conscious observations, while an important topic in its own right, is for the most part not something we need to analyze for this purpose, which is fortunate as understanding it is famously hard.

In comparing the computationalist + MWI prediction to the Born Rule, it would not be a problem if there are very small deviations which would not yet have been detected experimentally. In theory, such a situation could open the door to a straightforward experimental test of the computationalist approach to the MWI, which would be a great thing to have. However, it seems likely that IF small deviations are predicted then any such deviations would be too small to ever measure.

On the other hand, if the measure distribution we get does NOT agree with the Born Rule even approximately, then the computationalist picture of the MWI would be refuted.

Unfortunately, all of this is easier said than done. While the general idea of Computationalism is fairly popular, it has not yet been developed into something precise enough to carry out this program. Needless to say, experiments can be of no help in this regard.

As will be explained in detail in forthcoming posts, I have made proposals as to how to make computationalism precise enough. While it is technically challenging to state necessary and sufficient conditions to be able to say that a given computation is carried out by a given system, I am satisfied that this can be done and that my proposal is at least quite close to the correct way to do so.

Given such conditions, the next thing that we need is a way to determine the accompanying quantity of consciousness. Hopefully, we can avoid dealing with the link between computation and consciousness even here, by determining as a proxy the "quantity of each computation" that is present, and assuming that it is proportional to the quantity of consciousness.

However, the link may not be so easily bypassed, as it turns out that there are different options for determining the "quantity of computation" and naturally the "right" choice would be the one that best suits the application we have in mind, namely consciousness. It is not clear, however, what that one is.

Given the choices that I would tend to pick based upon subjective criteria of what "seems most natural" to me, the application of those choices to the standard physics of the MWI apparently DOES NOT give the Born Rule. This fact is not sufficient for me to declare that the MWI has been disproven; however, it does help motivate a search for MWI proposals that include modifications of physics, as well as further work on how to determine "quantities of computations".

I would like other people to learn of and study these issues. Perhaps, in the future, people who have studied these issues deeply and explored the implications and alternatives in great detail will find that the criteria which happen to give the Born Rule from the unmodified MWI are the same that they prefer for quite different reasons.

MWI proposals that include modifications of physics

2009-10-21T13:59:00.013-04:00

Previous: Decision Theory and other approaches to the MWI Born Rule, 1999-2009

The greatest appeal of the Everett-style Many-Worlds Interpretation of QM - that is, the wave equation alone, or standard MWI - is its simplicity in terms of mathematics and physics. While all other interpretations need to add extra things to the wave equation - adding in hidden variables with dynamics of their own, or modifying the wave equation itself to include random wave collapse processes - the Everett MWI states that the standard wave equation alone can explain everything we observe.

Yet, despite numerous attempts and claims to the contrary (and putting aside the possibilities for my own approach for now), the Born Rule probabilities have not been derived from the Everett picture. Thus, it may prove necessary to add new physics to our description of QM after all.

However, some approaches attempt to retain much of the advantage in simplicity of the MWI, as well as its multiple-worlds character, while making such modifications. It's a promising idea, and these MWIs certainly take inspiration from the Everett-style MWI, but they add too much to the pure wave picture to satisfy true Everett-style MWI partisans.

a) Hidden variables introduce complexity not only because of the extra dynamic equations, but because they require some choice of initial conditions. The wavefunction of QM also requires initial conditions, of course, but there is reason to hope that some simple equation could govern those initial conditions; the Hartle-Hawking 'no boundary' condition for the wavefunction of the universe is a well-known example of such a proposal (though it has problems of its own). Particle-like hidden variables do not seem amenable to such simple specification of initial conditions. However, if all possible sets of hidden variable initial conditions are equally real, then the overall simplicity of initial conditions for the multiverse is restored.

'Continuum Bohmian Mechanics' (CBM) is the best-known example of this approach. Like the Pilot Wave Interpretation, it has particle-like hidden variables; but instead of just one set of them, it has a continuous distribution of such sets, which act much like a continuous fluid. In addition to the possibility of simpler initial conditions, CBM may be immune to the fatal flaw of the PWI, which is being 'many-worlds in denial'. In other words, in the 'single-world' PWI, with one set of hidden variables, most of the observers will end up being implemented by the many worlds of the wavefunction, so the hidden variables won't matter. With CBM, the number of hidden variable sets is also infinite, so a typical observer could depend on the hidden variables after all. (This latter claim still needs to be proven compatible with computationalist considerations, but it is plausible.)

The hidden variables in the PWI follow the Born Rule, so CBM should be OK in that regard. But CBM retains the other features of the PWI that many physicists dislike, namely non-locality and a preferred reference frame. It is also not clear how well a relativistic version of the PWI works, and CBM inherits such problems. (Granted, no physics that works for quantum general relativity is known yet.) Also, even CBM is not as simple as the standard MWI.

I regard CBM (and more generally, MWI's with hidden variables) as something useful to keep in mind, as it is one of the few interpretations of nonrelativistic QM that seems to actually work in terms of being compatible with the Born Rule and not having an 'MWI in denial' problem. But other possiblities must be thoroughly explored before I would consider endorsing CBM as being likely to be true.

b) Another approach is to retain a pure wavefunction picture as in Everett's MWI, but to make the wavefunction be discrete instead of continuous. Discrete space is not what is meant here, but rather a discrete nature of the wavefunction itself. Buniy et al advocate such an approach.

The most obvious way to do that might be to assume that the wavefunction is represented by an integer function on configuration space rather than a continuous function. (If configuration space is also discrete, that is one way an approximate discrete numerical representation of a continuous wave function might be done on a digital computer.)

Buniy et al propse a somewhat different different assumption, in which wavefunctions seperated by a term of some mimimum squared amplitude are condidered to be the same.

Because the wave function (or perhaps I should say its 'populated' region) is constantly and rapidly expanding into new areas of configuration space (e.g. as entropy increases), its numerical value is constantly imploding. I will call this the Wavefunction Value Implosion (WVI). If the universe is finite, then this effect will be finite, but exponentially large as a function of the number of particles in the universe. Thus, a discrete wavefunction could not be detected experimentally if its discrete nature is small enough, until such time as the WVI brings the populated part of the wavefunction below that scale, and then presumably time evolution will radically change or effectively stop; I will call this the Crash.

Discrete physics has a certain appeal to some people, independent of any possible role in quantum mechanics. Wolfram's book "A New Kind of Science" discusses such views. Also, the idea that all possible mathematical universes physically exist (the Everything Hypothesis, which will be discussed in a later post) may be somewhat more tractable if it is restricted to digital systems (though, despite its undeniable appeal, it still has problems even then).

If the wavefunction is discrete, would that help explain the Born probabilities? Buniy et al argue that it would, by shoring up the old "frequency operator" attempted derivation by extending it to finite numbers of measurements rather than infinite. This argument notes that, after repeated measurements, terms in the wavefunction which don't have the Born frequencies have much smaller amplitudes than the terms with the 'right' frequencies. With a minimum amplitude cutoff, most of the un-Born terms would be eliminated. This argument does not seem very satisfactory, as we are interested in situations with small numbers of meaurements, and thus small factors of difference in amplitude, while the digital cutoff would have to be very far from a significant fraction of the total amplitude if the Crash is not yet upon us. In practical situations, other factors would affect amplitudes much more. For example, entropy production is not associated with low probability, but it results in numerous sub-branches each of which shares a fraction of the original squared amplitude.

Shoring up the 'Mangled Worlds' argument would seem a more promising approach. There are many sub-branches comprising each macroscopically distinguishable world, and they tend to have a log-normal distribution in squared amplitude. As Hanson showed, a cutoff in the right range of squared amplitudes would lead to Born Rule probabilities. This cutoff must be uniform across branches, which Hanson's 'mangling' mechanism by larger branches actually fails to provide, but a digital cutoff could provide it. I will tentatively say that this is a possible mechanism for the Born Rule, though I need to study it more before I can say for sure that there are no problems that would ruin it. In particular, if the number of worlds changes too much over time or the era in which the Born Rule holds is too short, that would indicate a problem.

c) Another mechanism that improves on 'Mangled Worlds' is my own idea in which random noise in the initial wavefunction means that larger volumes in configuration space per implemented computation are required for low-amplitude sub-branches, which can lead to the Born Rule. This requires new physics in the form of special initial conditions, but hopefully not in terms of time evolution. It is possible that this leads to a Boltzmann Brains problem. I will discuss this, as well as an alternative in which the Born Rule is due to a special way to count computations (which avoids new physics - if it can be justified) in later posts.

d) The Everything Hypothesis (that all possible mathematical stuctures exist) can be used directly in an attempt to predict what a typical observer would observe. Some have argued that this explains what we observe, including the Born Rule. The Everything Hypothesis will be discussed in a post of its own.

e) Other MW schemes for modifying physics have been proposed.

One example is Michael Weismann's idea involving sudden splitting of existing worlds into proposed new degrees of freedom, with a higher rate of such splitting events for higher amplitude worlds. The problem with it is that if new worlds are constantly being produced, then the number of observers would be growing exponentially. The probability of future observations, as far into the future as possible, would be much greater than that of our current observations. Thus, the scheme must be false unless we are highly atypical observers, which is highly unlikely.

David Strayhorn had an idea based on general relativity, in which different topologies correspond to different sub-branches of the wavefunction. This approach is not well-developed as of yet and it has problems that I think will prevent it from working. I discussed it in various post on the OCQM yahoo group.

Decision Theory & other approaches to the MWI Born Rule problem, 1999-2009

2009-09-26T18:13:00.018-04:00

In the previous post, I explained the early attempts to derive the Born Rule for the MWI. These attempts required assumptions for which no justification was given; as a result, critics of the MWI pointed to the lack of justification for the Born Rule as a major weakness of the interpretation.

MWI supporters often had to resort to simply postulating the Born Rule as an additional law of physics. That is not as good as a derivation, which would be a great advantage for the MWI, but it at least puts the MWI on the same footing as most other interpretations. However, it is by no means clear that it is legitimate to do that, either. Many people think that branch-counting (or some form of observer-counting) must be the basis for probabilities in an MWI, as Graham had suggested. Since branch-counting gives the wrong probabilities (as Graham failed to realize), a critic might argue that experiments (which confirm the Born rule) show the MWI must be false.

Thus, MWI supporters were forced to argue that branch-counting did not, in fact, matter. The MWI still had supporters due to its mathematical simplicity and elegance, but when it came to the Born Rule, it was in a weak position.

In the famous Everett FAQ of 1995, Price cited the old 'infinite measurements frequency operator' argument. That was my own first encounter with the problem of deriving the Born Rule for the MWI, and despite being an MWI supporter, the finite-number-of measurements hole in the infinite-measurements argument was immediately obvious to me.

5) The decision-theoretic approach to deriving the Born Rule

In 1999, David Deutsch created a new approach to deriving the Born Rule for the MWI, based on decision theory. He wrote "Previous attempts ... applied only to infinite sets of measurements (which do not occur in nature), and not to the outcomes of individual measurements (which do). My method is to analyse the behaviour of a rational decision maker who is faced with decisions involving the outcomes of future quantum-mechanical measurements. I shall prove that if he does not assume [the Born Rule], or any other probabilistic postulate, but does believe the rest of quantum theory, he necessarily makes decisions as if [the Born Rule] were true."

Deutsch's approach quickly attracted both supporters and critics. David Wallace came out with a series of papers that defended, simplified and built on the decision theory approach, which is now known as the Deutsch-Wallace approach.

Deutch's derivation contained an implicit assumption, which Wallace made explicit, and called 'measurement neutrality'. Basically, it means that the details of how a measurement is made don't matter. For example, if a second measurement is made along with the first, it is assumed that the probabilities for the outcomes of the first won't be affected. This implies that unitary transformations, which preserve the amplitudes, don't matter. That implies 'equivalence', which states that two branches of equal amplitudes have equal probabilities, and which is essentially equivalent to the Born Rule. The Born Rule is then derived from 'equivalence' using simple assumptions cast in the language of decision theory.

Wallace acknowledged that 'measurement neutrality' was controversial, admitting "The reasons why we treat the state/observable description as complete are not independent of the quantum probability rule." Indeed, if probabilities depend on something other than amplitudes, then clearly they can change under unitary transformations.

So he offered a direct defense of the 'equivalence' assumption, which formed the basis of the paper that was for a long time considered the best statement of the DW approach, certainly as of the 2007 conferences. New Scientist magazine proclaimed that his derivation of the Born Rule in the MWI was "rigorous" and was forcing people to take the MWI seriously.

His basic argument was that things that the person making a decision doesn't care about won't matter. This included the number of sub-branches, but he also took care to argue that the number of sub-branches can't matter because it is not well-defined.

Consider Albert's hypothetical fatness rule, in which probabilities are proportional both to the squared amplitudes and to the observer's mass. This obviously violates 'equivalence'. According to Wallace's argument, the decider should ignore his mass unless it comes into play for the decision, so that is impossible. But it is a circular argument; the decider should care about his mass if it in fact affects the probabilities.

My critique of Wallace's approach is presented in more detail here, where I also cover his more recent paper.

In his 2009 paper, Wallace takes a different approach. Perhaps recognizing that assuming 'equivalence' is practically the same as just assuming the Born Rule, he makes some other assumptions instead, couched in the language of decision theory, which allow him to derive 'equivalence'. The crucial new assumption is what he calls 'diachronic consistency'. In addition to consistency of desires over time, it contains the assumption of conservation of measure as a function of time, which there is no justification to assume. Of course, the classical version of diachronic consistency is unproblematic, and only a very careful reading of the paper would reveal the important difference if it were not for the fact that Wallace helpfully notes that Albert's fatness rule violates it.

6) Zurek's envariance

W. Zurek attempted to derive the Born Rule using symmetries that he called 'envariance' or enviroment-assisted invariance. While interesting, his assumptions are not justified. The most important assumption is that all parts of a branch, and all observers in a branch, have the same "probability". Albert's fatness rule provides an obvious counterexample. I also note that a substate with no observers in it can not meaningfully be assigned any effective probability.

He uses this, together with another unjustified assumption that is similar to locality of probabilities, to obtain what Wallace called 'equivalence' and then the Born Rule from that. Because the latter part of Zurek's derivation is similar to the DW approach, the two approaches are sometimes considered similar, although Zurek does not invoke decision theory.

7) Hanson's Mangled Worlds

Robin Hanson came up with a radical new attempt to derive the Born Rule in 2003. It was similar to Graham's old world-counting proposal in that Hanson proposed to count sub-branches of the wavefunction as the basis for the probabilities.

The new element Hanson proposed was that the dynamics of sub-branches of small amplitude would be ruined, or 'mangled', by interference from larger sub-branches of the wavefunction. Thus, rather than simply count sub-branches, he would count only the ones with large enough amplitude to escape the 'mangling'.

Due to microscopic scattering events, a log-normal squared-amplitude distribution of sub-branches arises, as it is a random walk in terms of multiplication of the original squared-amplitude. Interference ('mangling') from large amplitude branches imposes a minimum amplitude cutoff. If the cutoff is in the right numerical range and is uniform for all branches, then due to the mathematical form of the log-normal function, the number of branches above the cutoff is proportional to the square of the original amplitude, yielding the Born Rule.

Unfortunately, this Mangled Worlds picture relies on many highly dubious assumptions; most importantly, the uniformity of the ‘mangling’ cutoff. Branches will not interfere much with other branches unless they are very similar, so there will be no uniformity; small-amplitude main branches will have smaller sub-branches but also smaller interference from large main branches and thus a smaller cutoff.

Even aside from that, while the idea of branch-counting has some appeal, it is clear that observer-counting (with computationalism, implementation-counting) is what is fundamentally of interest. Nonetheless, 'Mangled Worlds' is an interesting proposal, and is the inspiration for a possible approach to attempt to count implementations of computations for the MWI, which will be discussed in more detail in later posts. That does require some new physics though, in the form of random noise in the initial conditions which acts to provide the uniform cutoff scale that is otherwise not present.

In the next post, proposals for MWIs that include modifications of physics will be discussed.

Early attempts to derive the Born Rule in the MWI

2009-09-23T14:57:00.016-04:00

When Everett wrote his thesis in 1957 on the '"Relative State" Formulation of Quantum Mechanics', he certainly needed to address how the Born Rule probabilities fit into his new interpretation of QM. While the MWI remains provocative even today, it was not taken seriously in 1957 except by a few people, to the extent that Everett had to call it "Relative State" rather than "Many Worlds". So it is perhaps fortunate that he did not realize the true challenges of fitting the Born Rule into the MWI, which could have derailed his paper. Instead, he came up with a short derivation of the Born Rule, using assumptions that he did not realize lacked justification.

Of course, the Born Rule issue has long since returned to haunt the MWI. Historically, what has happened several times was that a derivation of the Born Rule that seemed plausible to MWI supporters was produced, but soon it attracted critics. After a few years it became clear to most physicists that the critics were right, and the MWI fell into disrespect until a new justification for the Born Rule was produced. This cycle continues today, with the decision-theoretic Deutsch-Wallace approach being considered the best by many, and now attracting growing (and deserved) criticism.

When considering claimed derivations of the Born Rule in the MWI, it is often useful to keep in mind an 'alternative rule' that is being ruled out, and to question the justification for doing so. Two useful ones are as follows:

a) The unification rule: All observations that exist have the same measure. In this case, branch amplitudes don't matter, as long as they are nonzero (and they always are, in practice).

b) David Albert's fatness rule: The measure of an observer is proportional to the squared amplitude (of the branch he's on) multiplied by his mass. Here, amplitudes matter, but so does something else. This one is especially interesting because it illustrates that not all observers necessarily have the same measure, even if they are on the same branch of the wavefunction. While it is obviously implausible, it's a useful stand-in for other possibilities that may seem better more justifiable, such as using the number of neurons in the observer's brain instead of his mass, or any other detail of the wavefunction.

Another useful thing to keep in mind is the possibility of a modified counterpart to quantum mechanics, in which squared-amplitude would not be a conserved quantity. We would expect that the Born Rule might no longer hold, but some other Rule should, even in the absense of conserved quantities. Presumably, if the modification is small, so would be any departure from the Born Rule. Thus, one should not think that conserved quantities must have any special a priori importance without which no measure distribution is possible.

Let us examine a few of the early attempts to derive the Born Rule within the MWI:

1) Everett's original recipe

In Everett's 1957 paper, he models an observer in a fairly simple way, considering only a set of memory elements. This is a sort of rough approximation of a computational model, but without the dynamics (which are crucial for a well-defined account of computation). Thus, Everett was a visionary pioneer in applying computationalist thinking to quantum mechanics, but he never confronted the complexity of what would be required to do a satisfactory job of it.

He assumed that the measure of a branch would be a function of its amplitude only, and thus would not depend on the specific nature of that branch. This is a very strong assumption, and arguably contains his next assumption as a special case already. [A more general approach would allow other properties to be considered, such as in Albert's fatness rule.]

[Note: Everett's use of the term 'measure' is not stated to refer specifically to the amount of consciousness, but in this context, the role it plays is essentially the same as if it did. Some authors use 'measure of existance' to specifically mean the squared amplitude by definition; obviously Everett did not, since he wanted to prove that his measure was equal to the squared amplitude. I recommend avoiding overly suggestive terms (like 'weight') for the squared amplitude.]

Next, he assumed that measure is 'additive' in the sense that if two orthogonal branches are in superposition, they can be regarded as a single branch, and the same function of amplitude must give the same total measure in either case.

If the definition of a 'branch' is arbritrary in allowing combinations of orthogonal components, the 'additivity' assumption makes sense, since it means that it does not matter how the branches are considered to be divided up into orthogonal components. [An argument similar to that would be presented years later in Wallace's 2005 paper, in which Wallace defended the assumption of 'equivalence' (branches of equal amplitude must have equal measure) against the idea of sub-branch-counting, based on the impossibility of defining the specific number of sub-branches. Everett did not get into such detail.]

With the previous assumption, 'additivity' would only hold if the measure is proportional to the squared amplitude; thus, he concluded that the Born Rule holds.

Everett considered the additivity requirement equivalent to saying that measure is conserved; thus, when a branch splits into two branches, the sum of the new measures is equal to the measure of the original branch. He gave no justification for the conservation of measure, perhaps considering it self-evident.

In classical mechanics, conservation of probability is self-evident because the probability just indicates something about what state the single system is likely to be in. If the probabilities summed to 2, for example, a single system couldn't explain it; perhaps there would have to be 2 copies instead of one. Yet the existance of multiple copies is precisely what the MWI of QM describes, and in this case, there is no a priori reason to believe that the total measure can not change over time.

Everett's attempted derivation of the Born Rule is not considered satisfactory even by other supporters of the MWI, because he did not justify his assumptions. Soon, other attempts to explain the probabilities emerged.

2) Gleason's Theorem

Also discovered in 1957, Gleason's theorem shows that if probabilities are non-contextual, meaning that the probability of a term in the superposition does not depend on what other terms are in the superposition, then the only formula which could give the probabilities is based on squared expansion coefficients. It is straighforward to argue that the correct expansion to use is that for the current wavefunction; thus, these coefficients are the amplitudes, which gives Born's Rule.

Unfortunately, there is no known justification for assuming non-contextuality of the probabilities. If measure is not conserved, the probabilities can not generally be noncontextual. Gleason's theorem is sometimes cited in attempts to show that the MWI yields the Born Rule, but it is not a popular approach since usually those attempts make (unjustified) assumptions which are strong enough to select the Born Rule without having to rely on the more complicated math required to prove Gleason's theorem.

3) The infinite-measurements limit and its frequency operator

The frequency operator is the operator associated with the observable that is the number of cases in a series of experiments that a particular result occurs, divided by the total number of experiments. If is assumed that just the frequency itself is measured, and if the limit of the number of experiments is taken to infinity, the eigenvalue of this frequency operator is unique and equal to the Born Rule probability. The quantum system is then left in the eigenstate with that frequency; all other terms have zero amplitude, as shown by Finkelstein (1963) and Hartle (1968).

This scheme is irrelevant for two reasons. First, an infinite number of experiments can never be performed. As a result, terms of all possible frequencies remain in the superposition. Unless the Born Rule is assumed, there is no reason to discard branches of small amplitude. Assuming that they just disappear is equivalent to assuming collapse of the wavefunction.

Second, in real experiments, individual outcomes are recorded as well as the overall frequency. As a result, there are many branches with the same frequency and the amplitude of any one branch tends towards zero as the number of experiments is increased. If one discards branches that approach zero amplitude in the limit of infinite experiments, then all branches should be discarded. Furthermore, prior to taking the infinite limit, the very largest individual branch is the one where the highest amplitude outcome of each individual experiment occurred, if there is one.

A more detailed critique of the frequency operator approach is given here. The same basic approach of using infinite ensembles of measurements has been taken recently by certain Japanese physicists, Tanaka (who seems unaware of Hartle's work) and (seperately) Wada. Their work contains no significant improvements on the old, failed approach.

4) Graham's branch counting

Neil Graham came out with a paper in 1973 that appears in the book "The Many Worlds Interpretation of Quantum Mechanics" along with Everett's papers and others.

Graham claimed that the actual number of fine-grained branches is proportional to the total squared amplitude of a course-grained macroscopically defined branch. Such sub-branches would be produced by splits due to microscopic scattering events and so on which act as natural analogues of measurements.

If it were true, it could also begin to give some insight into why the Born Rule would be true, beyond just a mathematical proof; that is, each fine-grained branch would presumably support the same number of copies of the observer. (That assumption would still need to be explained, of course.)

Unfortunately, and even aside from the lack of precise definition for fine-grained branches, he failed to justify his statistical claims, which stand in contradiction to straightforward counting of outcomes. He simply assumed that fine-grained branches would on average have equal amplitudes regardless of the amplitude of the macroscopic branch that they split from.

In the next post, the more recent attempts (other than my own) to derive the Born Rule within the MWI will be described.

Why 'Quantum Immortality' is false

2009-09-21T18:09:00.008-04:00

In the previous posts, I explained that effective 'probabilities' in an MWI are proportional to the amount (measure) of consciousness that sees the various outcomes. Because this measure need not be a conserved quantity, this can lead to nonclassical selection effects, with 'probabilities' for a given outcome still changing as a function of time even after the outcomes have been observed and recorded. That can lead to an illusion of nonlocality, which can only be properly understood by thinking in terms of the measures directly, as opposed to thinking only in terms of 'probabilities'.

The most extreme example in which it is crucial to think in terms of the measures, rather than 'probabilities' only, is the so-called 'Quantum Suicide' (QS) experiment. Failure to realize this leads to a literally dangerous misunderstanding. The issue is explained at length in my eprint "Many-Worlds Interpretations Can Not Imply 'Quantum Immortality'".

The idea of QS is as follows: Suppose Bob plays Russian Roulette, but instead of using a classical revolver chamber to determine if he lives or dies, he uses a quantum process. In the MWI, there will be branches in which he lives, and branches in which he dies. The QS fallacy is that, as far as he is concerned, he will simply find himself to survive with no ill effects, and that the experiment is therefore harmless to him.

A common variation is for him to arrange a bet, such that he gets rich in the surviving branches only, which would thus seem to benefit him. Of course in the branches where he does not survive, his friends will be upset, and this is often cited as the main reason for not doing the experiment.

That it is a fallacy can be seen in several ways. Most basically, the removal of copies of Bob in some branches does nothing to benefit the copies in the surviving branches; they would have existed anyway. Their measure is no larger than it would have been without the QS - no extra consciousness was funnelled into the surviving branches, while the amount formerly in the now-dead branches was completely removed. If our utility function states that more human life is a good thing, then clearly the overall measure reduction is bad, just as killing your twin would be bad in a classical case.

It is true that the effective probability (conditional on Bob making an observation after the QS event) of the surviving branches becomes 1. That is what creates the QS confusion; in fact, it leads to the fallacy of "Quantum Immortality" - the belief that since there are some branches in which you will always survive, then for practical purposes you are immortal.

But such a conditional effective probability being 1 is not at all the same as saying that the probability that Bob will survive is 1. Effective probability is simply a ratio of measures, and while it often plays the role we would expect a probability to play, this is not a case in which such an assumption is justified.

We can get at what does correspond for practical purposes to the concept of 'the probability that Bob will survive' in a few equivalent ways. In a case of causal differentiation, it is simple: the fraction of copies that survive is the probability we want, since the initial copy of Bob is effectively a randomly chosen one.

A more general argument is as follows: Suppose Bob makes an observation at 12:00, has a 50% chance QS at 12:30, and his surviving copies make an observation at 1:00. Given that Bob is observing at either 12:00 or 1:00, what is the effective probability that it is 12:00? (Perhaps he forgets the time, and wants to guess it in advance of looking at a clock, so that the Reflection Argument can be used here.) The answer is the measure ratio of observations at 12:00 to the total at both times, which is therefore 2/3.

That is just what we would expect if Bob had a 50% chance to survive the QS: Since there are twice as many copies at 12:00 compared to 1:00, he is twice as likely to make the observation at 12:00.

Most of your observations will be made in the span of your normal lifetime. Thus QI is a fallacy; for practical purposes, people are just as mortal in the MWI as in classical models.

It's worth mentioning another argument against a person's measure being constant:

1) "MWI immortality" believers typically think that a person's total amount of consciousness does not change even if their quantum amplitude changes, while I argue that the contrary is true.

2) In the MWI, there are definitely some (very small but nonzero) amplitudes for branches that contain Boltzmann brains (brains formed by uncoordinated processes such as thermal fluctuations) very early on. The exact amplitudes are irrelevant to the point being made.

3) Once a Boltzmann brain that matches yours has some amplitude, you start to exist. It's true that evolution, much later, will also cause _much larger amplitude_ branches to also contain versions of you. But if the belief described in point #1 were true, that would _not_ mean that your amount of consciousness increased. Thus, you would still be on even footing with the other Boltzmann brains. That is not plausible, so the immortality belief is not plausible.

Next up: Early attempts to derive the Born Rule in the MWI

Measure of Consciousness versus Probability

2009-09-16T13:47:00.005-04:00

In the last post, Meaning of Probability in an MWI, it was explained that in a deterministic Many-Worlds model, with known initial conditions, that which plays the role for of a probability for practical purposes is the ratio

(the measure (amount) of consciousness which sees a given outcome)
/ (the total measure summed over outcomes)

I call that the effective probability of the outcome.

Although the effective probability is quite similar to what we normally think of as a probability in terms of its practical uses, there are also important differences, which will be explored here.

The most important differences stem from the fact that measure of consciousness need not be a conserved quantity. By definition, probabilities sum to 1, but that is not all there is to it. In a traditional, single-world model, a transfer of probability indicates causality, while the total measure remains constant over time. This is not necessarily so in a MW model.

For example, suppose there are two branches, A and B. A has 10 observers at all times. B starts off with 5 observers at T0, which increases to 10 observers at T1 and to 20 observers at T2. All observers have the same measure, and observe which branch they are in.

So the effective probability of A starts off at 2/3 at T0, while the effective probability of B is 1/3. At T1, A and B have effective probabilities of 1/2 each. At T2, the effective probability of A is 1/3 and that of B is 2/3.

There are two important effects here. First, the effective probability of B increased with time. In a single-world situation, that would mean that a system which was actually in A was more likely to change over to B as time passes. But in this MW model, there is no transfer of systems, just changes in B itself.

This means that probability changes that would require nonlocality in a single-world model don't necessarily mean nonlocality in a MW model. If A is localized at X1, and B is localized at X2 which is a light-year away, there need not be a year's delay before the effective probability of B suddenly increases.

In a single-world local hidden variable model, probability must be locally conserved, so that the change of probability in a region is equal to the transitions into and out of adjacent regions only. This need not be so in an MW model.

The second important effect of nonconservation of measure in a MW model is that total measure changes as a function of time. Observers can measure, not only what branch they are on, but also what time it is. They will be more likely to observe times with higher measure than with lower measure, just as with any other kind of observation.

A good example of this is a model proposed by Michael Weissman - a modification of physics designed to make world-counting yield the Born Rule. His scheme involved sudden splitting of existing worlds into proposed new degrees of freedom, with a higher rate of such splitting events for higher amplitude worlds. The problem with it is that if new worlds are constantly being produced, then the number of observers would be growing exponentially. The probability of future observations, as far into the future as possible, would be much greater than that of our current observations. Thus, the scheme must be false unless we are highly atypical observers, which is highly unlikely.

Edit (2/2/16): See however this post. If the SIA is correct, the above argument against Weissman's idea fails, since the SIA gives extra likelihood to theories with more observers, exactly cancelling out the effect of reducing the fraction of observers which have observations like ours. However, as discussed in that post, I don't think the SIA is the right thing to use for comparing MWIs.

It is important to realize that since changes in measure mean changes in the number of observers, decreases in measure are undesirable. This will be discussed further in the next post.

Meaning of Probability in an MWI

2009-09-11T13:43:00.010-04:00

The quantitative problem of whether the Born Rule for quantum probabilities is consistent with the many-worlds interpretation is the key issue for interpretation of QM. Before addressing that, it is important to understand in general what probabilities mean in a many-worlds situation, because ideas from single-world thinking can lead to unjustified assumptions regarding how the probabilities must behave. Many failed attempts to derive the Born Rule make that mistake.

The issue of what probabilities mean in a Many-Worlds model is covered in greatest detail in my eprint "Many-Worlds Interpretations Can Not Imply 'Quantum Immortality'". Certain work by Hilary Greaves is directly relevant.

First, note that for a single-world, deterministic model, such as classical mechanics provides, probabilities are subjective. The classic example is tossing a coin: the outcome will depend deterministically on initial conditions, but since we don't know the details, we have to assign a subjective probability to each outcome. This may be 50%, or it may be different, depending on other information we may have such as the coin's weight distribution or a historical record of outcomes. Bayes' rule is used to update prior probabilities to reflect new information that we have.

In such a model, consciousness comes into play in a fairly trivial way: As long as we register the outcome correctly, our experienced outcome will be whatever the actual outcome was. Thus, if we are crazy and always see a coin as being heads up, then the probability that we see "up" is 100%. Physics must explain this, but the explanation will be grounded in details of our brain defects, not in the physics of coin trajectories.

By contrast, in any normal situation, the probability that we see "up" is simply equal to the probability that the coin lands face up. [Even this is really nontrivial: it means that randomly occuring "Boltzman brains" are not as common as "normal people". As we will see, if we believe in computationalism, it also means that rocks don't compute everything that brains do, which is nontrivial to prove.]

In a many-worlds situation, it may still be the case that we don't know the initial conditions. However, even if we do know the initial conditions, as we do for many simple quantum systems, there would still be more than one outcome and there is some distribution of observers that see those outcomes.

Assume that we do know the initial conditions. The question of interest becomes (roughly speaking): 'What is the probability of being among the observers that see a particular given outcome?'

It is important to note that in a many-worlds situation, the total number of obsevers might vary with time, which can lead to observer selection effects not seen in single-world situations. Because of this the fundamental quantity of interest is not probability as such, but rather the number, or quantity, of observers that sees each outcome. The amount of conscious observers that see a given outcome will be called the measure (of consciousness) for that outcome.

In a deterministic MWI with known initial conditions, it will be seen that what plays the role of the “probability” of a given observation in various situations relates to the commoness of that observation among observers.

Define the 'effective probability' for a given outcome as (the measure of observers that see a given outcome) divided by (the total measure summed over observed outcomes).

1) The Reflection Argument

When a measurement has already been performed, but the result has not yet been revealed to the experimenter, he has subjective uncertainty as to which outcome occurred in the branch of the wavefunction that he is in.

He must assign some subjective probabilities to his expectations of seeing each outcome when the result is revealed. He should set these equal to the effective probabilities. For example, if 2/3 of his copies (or measure) will see outcome A while the other 1/3 see B, he should assign a subjective probability to A of 2/3.

Why? Because that way, the amount of consciousness seeing each outcome will be proportional to its subjective probability, just as one would expect on average for many trials with a regular probability.

See Why do Anthropic Arguments work? for more details.

2) Theory Confirmation

It may be than an experimental outcome is already known, but the person does not know what situation produced it. For example, suppose a spin is measured and the result is either “up” or “down”. The probability of each outcome depends on the angle that the preparation apparatus is set to. There are two possible preparation angles; angle A gives a 90% effective probability for spin up, while angle B gives 10%. Bob knows that the result is “up”, but he does not know the preparation angle.

In this case, he will probably guess that the preparation angle was A. In general, Bayesian updating should be used to relate his prior subjective probabilities for the preparation angle to take the measured outcome into account. For the conditional probability that he should use for outcome “up” given angle A, he should use the effective probability of seeing “up” given angle A, and so on.

This procedure is justified on the basis that most observers (the greatest amount of conscious measure) who use it will get the right answer. Thus, if the preparation angle really was B, then only 10% of Bob’s measure would experience the guess that A is more likely, and the other 90% will see a “down” result and correctly guess B is more likely.

3) Causal Differentiation

It may be the case that some copies of a person have the ability to affect particular future events such as the fate of particular copies of the future person. The observer does not know which copy he is. Pure Causal Differentiation situations are the most similar to classical single-world situations, since there is genuine ignorance about the future, and normal decision theory applies. Effective probabilities here are equal to subjective probabilities just like in the Reflection Argument.

4) Caring Coefficients

As opposed to Causal Differentiation, which may not apply to the standard MWI, the most standard way to think of what happens to a person when a “split” occurs is that of personal fission. Perhaps this is the most interesting case when an experiment has not yet been performed. Decision theory comes into play here: In a single-world case, one would make a decision so as to maximize the average utility, where the probabilities are used to find the average. What is the Many-Worlds analogue?

If it is a deterministic situation and the decider knows the initial conditions, including his own place in the situation, it is important to note that he should not use some bastardized form of ‘decision theory in the presence of subjective uncertainty’ for this case. It is a case in which the decider would know all of the facts, and only his decision selects what the future will be among the options he has. He must maximize, not a probability-weighted average utility, but simply the actual utility for the decision that is chosen.

Rationality does not constrain utility functions, so at first glance it might seem that the decider’s utility function might have little to do with the effective probabilities. However, as products of Darwinian evolution and members of the human species, many people have common features among their utility functions. The feature that is important here is that of “the most good for the most people”. Typically, the decider will want his future ‘copies’ to be happy, and the more of them are happy the better.

In principle he may care about whether the copies all see the same thing or if they see different things, but in practice, most believers in the MWI would tend to adopt a utility function that is linear in the measures of each branch outcome:

U_total = Σ_i Σ_p m_ip[Choice] q_ip

where i labels the branch, p denotes the different people and other things in each branch, m_ip is the measure of consciousness of person (or animal) p which sees outcome i, and is a function of the Choice that the decider will make, and q_ip is the decider’s utility per unit measure (quality-of-life factor) for that outcome for that person.

The measures here can be called “caring measures” since the decider cares about the quality of life in each branch in proportion to them.

Utility here is linear in the measures. For cases in which measure is conserved over time, this is equivalent to adopting a utility function which is linear in the effective probabilities, which would then differ from the measures by only a constant factor. In such a case, effective probabilities are used to find the average utility in the same way that actual probabilities would have been used in a single-world model in which one outcome occurs randomly.

Next: Measure of Consciousness versus Probability

Interlude: The 2007 Perimeter Institute conference Many Worlds @ 50

2009-09-07T13:14:00.011-04:00

As explained in the previous post, I had long been anticipating a conference on the MWI in 2007, and attended the Perimeter Institute conference Many Worlds at 50, armed with a copy of my then-new eprint on the Many Computations Interpretation.

When I arrived at my hotel the night before the conference, an older couple was checking in at the same time as I was. Someone asked the clerk for directions to the Perimeter Institute. It turned out that this couple was also attending the conference, and they were a couple of the friendliest and most interesting people I met there.

George Pugh had worked with Hugh Everett (founder of the MWI) at a defense contractor, Lambda Corp. (The work Everett did there is not so famous as his MWI but was actually important during the Cold War.) George and his impressive wife Mary had talked about the MWI with Everett himself, and they support it. They asked me which side I was on, as both pro- and con- people were attending the conference. I told them I was in favor of the MWI. They liked to hear that. We ended up having meals together on several occasions over the course of the conference.

The conference itself consisted mostly of lectures in a classroom-like atmosphere, followed by questions from the audience. Appropriately, most of the talks focused on the question of probability in the MWI.

However, and unfortunately, they mainly focused on the attempt to derive the Born Rule from decision-theoretic considerations. That approach was proposed by David Deutsch in 2000, and further developed by Simon Saunders and especially by David Wallace. Saunders and Wallace gave talks that mainly reiterated what is in their papers. There were also talks that (correctly, though of course this was not accepted by Wallace's supporters) pointed out the failures of that approach, such as those by Adrian Kent and David Albert.

The only other approach to the Born Rule that was presented at a talk was that of W. Zurek, who talked about his (equally fallacious) 'envariance' approach. Most people seemed to agree that Zurek's approach was similar to Wallace's. There was little discussion of it beyond that. When Zurek was asked about Wallace's approach during an informal discussion, he basically said that he didn't know if Wallace's approach was correct also, but he didn't seem to think it matters much, because his own approach showed that the Born Rule followed from the MWI. When I tried to point out to him why his approach fails - a task made all the more difficult by his somewhat intimidating large physical presense and lion-like bearded appearance - he didn't understand my point and soon ended the conversation.

Max Tegmark was a speaker, and he briefly discussed his heirarchy of many-worlds types, up to the Everything Hypothesis for which he is known.

Besides that, the only other controversy addressed in the talks was that of the legitimacy and meaning of talking about probability in the deterministic MWI, which is a seperate question than the quantitative problem of deriving the Born Rule. This focused on Hilary Greaves' 'caring measure' approach. She is sometimes lumped in with the decision theoretic approach to the Born Rule, because she uses decision theory in another way, but in fact her ideas are independent of that and are basically correct though not the full story.

The official speakers were basically divided into two camps: Those MWI-supporters who supported Wallace's attempted derivation of the Born Rule or who were considered allies of it (like Zurek and Greaves), versus those who not only rejected it but also were against the MWI in general (like Kent and Albert). Tegmark was neither but his one talk was largely ignored, and he did not address the Born Rule controversy.

Among the attendees, however, the situation was more complicated. I was not the only one who supported some kind of MWI, and considered understanding the Born Rule to be the key issue of interest, but utterly rejected the approaches to the Born Rule that had been presented. The alternatives that we wanted to discuss involved some form of observer-counting as the basis for probabilities in an MWI, even if it required some new physics. This led to a minor rebellion, in which a few of us tried to talk about our ideas during a lunch period in the room set aside for the conference lunch. The only official speaker that we got any help from was Hilary Greaves. We were able to speak in the lunchroom for a little while, but it didn't get much attention.

There was another young woman by the name of Hillary, I think a physicist studying at the Institute, who also helped us set up the lunchtime discussion.

The 'counter' camp included Michael Weissman, who proposed a modification of physics in order for world-counting to yield the Born Rule. His scheme involved sudden splitting of existing worlds into proposed new degrees of freedom, with a higher rate of such splitting events for higher amplitude worlds. This was interesting, but I was skeptical, and after thinking about it for a while I found the fatal flaw in it. If new worlds were constantly being produced, then the number of observers would be growing exponentially. The probability of future observations, as far into the future as possible, would be much greater than that of our current observations. Thus, the scheme must be false unless we are highly atypical observers, which is highly unlikely. While false, Mike's model serves as a good way to discuss the need for approximate conservation of measure for a successful model. In any case, Mike proved to be a good guy to talk to.

Also among the 'counters' was David Strayhorn, who proposed that an indeterminacy in General Relativity could lead to a Many Worlds model in which spacetime topologies were distributed according to, and formed the basis for, the Born Rule. His ideas did not seem fully developed, and I was skeptical of them as well, but we had interesting discussions.

Another guy with us was Allan Randall. He supports Tegmark's Everything Hypothesis, and is also interested in transhumanism and immortality. As I explained to Allen and to Max Tegmark, I wasn't sure about the Everything hypothesis, because of the problem of what would determine a unique measure distribution, but I used to support it and still like it. I think it's important and maybe useful. After all, and like many supporters of the hypothesis, I discovered a version of it on my own long before I ever heard of Tegmark.

Which brings me to a subject that received little official mention at the conference, the 'Quantum Immortality / Quantum Suicide' fallacy which Tegmark had publicized. This is the belief, which many MWI supporters have come to endorse, that the MWI implies that people always survive because some copies of them survive in branches of the wavefunction. I had always regarded this as the worst form of crackpot thinking, and had hoped to discuss it at the conference as something that MWI supporters must crush before it gets out of hand. My brief discussions about it at the conference convinced me that it was not getting the condemnation that it deserves. This ultimately led me to write my own eprint against it, Many-Worlds Interpretations Can Not Imply 'Quantum Immortality', despite my misgivings that even discussing the subject could give the dangerous idea extra publicity.

I also had interesting discussions with Mark Rubin, who had shown an explicit local formulation of the MWI using the Heisenberg picture, which is something I still need to study more. Mark and I had dinner with the Pughs. I liked the Swiss Chalet restaurant and Canadian beer.

I also happened to run into a friend of mine from NYU, where I got my Ph.D. in physics. Andre is a Russian who came to the US to study, and he had a postdoc at the Perimeter Institute. He's not an MWI supporter or really into interpretation of QM, but he knew that I am, so he was not too surprised that I showed up at the conference. I was lucky to run into him, because the next day he was heading to England for a postdoc there, studying quark-gluon plasmas using the methods he learned from models of string theory. He said he might never return to the US.

All in all, it was certainly an interesting experience. Ultimately, though, it was disappointing because I didn't get to discuss my paper much, and I never was able to have a substantive discussion with the well-known figures in the field who were there to present their own work. It was largely a lecture series rather than an egalitarian discussion group. Some discussion took place on the sidelines, such as at meals, but that was limited in who you happened to be next to. Well-known people mainly talked to each other.

One thing that grew out of the discussions on observer-counting was that a group of us decided to continue the discussion on-line. This led to the creation of the OCQM yahoo group, which included David Strayhorn, Michael Weissman, Allan Randall, Robin Hanson, and myself. Robin had not been at the conference, but he was the originator of the Mangled Worlds approach to the Born Rule, and accepted our invitation to join the group. In practice, however, posts to the group largely came from just David and myself. We all supported some form of observer-counting, but our approaches were quite different. We had some very interesting discussions, and it was a good place to 'think out loud', but ultimately even David's posting to the group petered out and it seems dead at this point.

I gave the Pughs my printed copy of the MCI paper. They were compiling a book in which they would quote various people about why Everett's interpretation of QM was important, so I wrote a few lines for them. Ultimately they decided not to use it though. I think they didn't like my criticism of the current status of the Born Rule in the MWI.

Interlude: Anticipating the 2007 Many Worlds conference

2009-08-27T22:02:00.009-04:00

For many years, I knew it was coming. You just had to do the math: Hugh Everett III had published his thesis, which introduced the Many Worlds Interpretation (MWI) of quantum mechanics, in 1957. So, somewhere, there would be a 'Many Worlds at 50' conference in 2007. And I would be there.

-------------------------------------------------------------------------------------

Back in 2000, I attended the conference ‘One Hundred Years of the Quantum: From Max Planck to Entanglement’ at the University of Puget Sound, which commemorated Planck's paper which first introducted the concept of energy quantization, used to explain why the equilibrium density of thermal radiation is not infinite.

I had already started exploring the concepts behind the Many Computations Interpretation (MCI). [I called it the 'Computationalist Wavefunction Interpretation' (CWI) but that just didn't have the same ring to it.] It grew out of David Chalmer's suggestion, in the last chapter of his book The Conscious Mind, that applying computationalism to quantum mechanics was the right way to make sense of the MWI. But I knew that computationalism had to be made more precise before that could be done, and I knew that the Born Rule would be the key issue.

I submitted a short paper about it for the conference book. The paper is still available online at
http://www.finney.org/~hal/mallah1.html

At the conference I met a few well known physicists, the most famous of whom was James Hartle. At the time, the 'Consistent Histories' approach to interpretation of QM was getting a lot of attention, and Hartle and Murray Gell-Mann had written a book about it. As far as I was concerned, that approach was not of much interest, because it pretended that single-world-style probabilities could be assigned to terms in the wavefunction 'once decoherence occurred' despite the fact that decoherence is never truly complete. (Probabilities can not generally be assigned in the sense that, prior to decoherence, interference effects can occur and only be understood as showing the simultaneous existance of multiple terms in the wavefunction.)

It was also maddeningly vague about what exactly was suppposed to really exist, and declared that some questions must not be asked. It was not clear whether it was really just the MWI in drag, deliberately using vague language so as not to scare away those who thought the MWI is too weird, or if it was some new variant of the single world Copenhagen Interpretation. Its advocates publically claimed inspiration from both sources!

I got the chance to ask Hartle a question. I asked him two things:

1) Is Consistent Histories the same as the MWI?

He said it is. That provoked a gasp from the audience! You see, Consistent Histories was looked on quite favorably by many physicists at the time, while the MWI was still largely dismissed as material for science fiction.

2) Is it the same as the Pilot Wave Interpretation?

He said it's not. The second question was necessary because some people, especially those who like the Copenhagen Interpretation, consider experimental predictions to be the only thing that matters - so that they would consider all interpretations which give the same predictions to be the same thing. Now I knew that was not the case with him, so the first answer really did mean something.

Anyway, after that conference I resolved to try to make my interpretation of QM precise in time to discuss it at the inevitable 2007 conference. Seven years should be enough time, right? Of course, it was never my day job, just a hobby of sorts.

--------------------------------------------------------------------------------------

In 2002 I attended ‘Towards a Science of Consciousness’ (TSC), a yearly philosophy conference which was held at the University of Arizona that year and every even year. That was interesting in its own right, as I met interesting people and learned about issues and thought experiments in philosophy of mind which I had not previously been exposed to. (I don't think it would be as interesting to attend another TSC, because many of the issues are the same every year, unless I have published something of my own that will be talked about. But it's not bad so perhaps I will.)

At that 2002 TSC, I participated in the poster session, with a poster called “What Does a Physical System Compute?” which laid out my ideas about an implementation criterion for computations. It got little attention, except that David Chalmers himself was kind enough to stop by and consider it. He made some comments and criticisms. I'd had many false starts at formulating a criterion, and had discussed it by email with him, so he knew what it was about. The criteria I listed weren't good enough, and we both knew it, but I believed it was a step in the right direction.

[Some of the other posters there were interesting, but I remember only one, because it stood out as being the most crackpot idea I'd yet encountered - and I'd encountered many on the usenet newgroups. This guy was combining the kooky notion that humans only became conscious when language was invented, with the crazy idea that only consciousness causes wavefunction collapse, to argue that _the biblical age of the Earth is correct_ (a few thousand years) because that's when the first wavefunction collapse brought the universe into real existence! Quite a combination!]

--------------------------------------------------------------------------------------

So, years passed by and before I knew it the 2007 Perimeter Institute conference Many Worlds @ 50 was approaching. This was it; the conference I'd been looking forward to for so long, in which I hoped to discuss my ideas about the MWI with other supporters of the interpretation. Would I be ready? I'd had some success in refining my implementation ideas, and scrambled to write up what I had.

The Born Rule still eluded me, though. I had hoped that once I found the precise criteria for existence of an implementation, I could apply it to quantum mechanics and the Born Rule might pop out. After all, it's actually fairly easy to get the Born Rule to pop out if you impose certain simple requirements such as conservation of measure. People have been doing it for years without even realizing they'd made unjustified assumptions. All I had to do was find a reason to justify an assumption like that for the counting of implementations.

I didn't find that justification, and time was getting short. I turned to an unusual approach for inspiration - Robin Hanson's 'Mangled Worlds' papers. He had a rather innovative approach to the MWI, in which large terms in the wavefunction 'mangle' small ones, leading to an effective minimum amplitude, and he argued that the Born Rule followed from counting worlds (lumps of wavefunction) in the distribution of survivors. The world-counting appealed to me, as it could easily be translated into implementation-counting, but I did not believe his scheme could work: large worlds would not 'mangle' worlds they had decohered from nearly as much as Hanson had assumed.

To get that kind of thing to work, I had to assume new physics, contrary to Everett. But the new physics was fairly simple: random background noise in the wavefunction (which could be part of the initial conditions rather than new dynamics) could 'mangle small worlds' and if it does the Born Rule pops out (in an interesting new way). There were still some real questions about whether this could work out right, so I explored a more direct approach as well in which I tried to rig the way implementations are to be counted in order for it to come out right. That turned out to be easier said than done, and it remains an open question about whether it can or should be done, though I regard it more favorably now. All of this will be discussed in later posts.

I also discussed other alternatives, such as an MWI with hidden variables, and other ways that a minimum amplitude could be introduced. The basic conclusion was that computationalism strongly favors some kind of MWI over single-world interpretations, even if both have hidden variables, but the details are unknown (and might always remain so).

I wrote all this up and added criticisms of the incorrect attempts to derive the Born Rule in the MWI, including the one based on decision theory, which was widely considered the strongest of the attempted derivations although it had its critics. This became my MCI paper, which I placed on the preprint arxiv: http://arxiv.org/abs/0709.0544

I knew that I was cutting it close, so I emailed some of the people who had written about the MWI and who would attend the conference to tell them about my paper on the arxiv.

It was time to go to Canada and see if the 2007 MWI Perimeter Institute conference would live up to the anticipation.

Futher Study

2009-08-27T21:39:00.003-04:00

I'd like to wait for some comments for this one. What do you want to learn?

I assume you know how to search the web. The Stanford Encyclopedia is good for many topics, as is Wikipedia. Though as always, don't assume that something is true just because you read it there. You must develop an eye for controversial issues.

What I have attempted to do so far here is twofold: First, to provide an easy to understand overview of many issues surrounding interpretation of quantum mechanics. That should be useful to students who intend to pursue a serious interest in philosophy of physics. Secondly, to convey my own ideas about philosophy of physics; some of that requires a lot of background in very specific issues to properly understand.

I will add references here on an irregular basis. Traffic on this 'blog' is not high as of yet so there is no typical reader. If that changes, I expect some requests. Unlike a typical blog, I edit these posts as needed to cover a topic, rather than just making new posts all the time.

You can email jackmallah@yahoo.com if you don't want to post a comment.

You can also add your own links in your comments.

From here on out, the focus of the 'blog' will change from review of QM to discussion of contemporary research topics related to the MWI, but still will hopefully be understandable.

Primers on Basic QM:

https://arxiv.org/abs/1803.07098

http://theoreticalminimum.com/courses/quantum-mechanics/2012/winter

Studying Quantum Mechanics: Measurement and Conservation Laws

2009-08-20T14:16:00.006-04:00

When you learned that the results of measurements in quantum mechanics are random, it may have raised a question in your mind: What about conservation laws? Do they only hold on average? For example, if you measure the energy of an atom, you might end up with a different amount of energy than the average, right? If there are random fluctuations in 'conserved' quantities, could the effect be used to violate conservation laws in a systematic way?

For example, consider a spin measurement for spin-1/2 particles. Each particle's spin carries an amount of angular momentum equal to hbar/2 in the direction it points. The particles are prepared so that their spins point in the +Z direction, and then sent into a Stern-Gerlach (SG) device, which we can rotate to measure spin along any direction. If we measure a spin in the X direction, the result is that the spin ends up in either the +X or -X direction. So it looks like we are violating conservation of angular momentum in a systematic way, destroying the +Z direction angular momentum we prepared the particles with. If that were true and the experiment is done in an isolated satellite, we could use it to build up a net angular momentum in the -Z direction.

If conservation laws mean anything, there must be something wrong with the above picture. Perhaps, one might think, there must be some back-action of the particles on the Stern-Gerlach device. That is, the missing angular momentum is being transferred into the SG device, as the particles exert torques on it with their magnetic moments as they come through.

The problem we run into next is that this seems to violate linearity: A +Z spin can be written as a superposition of a +X term and a -X term. After going through the SG device, there is decoherence (or as some people wrongly assume, wavefunction collapse), and what is observed is just a +X result or a -X result. Since QM is linear, the final wavefunction is a linear superposition of the terms that would have resulted if the original spins had been +X or -X. Such terms do not take the original +Z spin into account. So at least as far as an observer within such a term is concerned, there is no residual effect of the original spin direction, such as we would need if the SG device had received angular momentum that depended on that direction.

The solution to this puzzle, naturally, is to treat the measuring device as a fully quantum-mechanical system. That means that its angular orientation can not be precisely known, due to its finite uncertainty in angular momentum. (The uncertainty principle applies, limiting how small the product of the uncertainties of angle and angular momentum can get.) As a result, there will be very small 'error' terms in which the wrong spin outcome is measured, i.e. -X instead of +X, or an incoming spin is flipped.

This effect may seem negligible, but it is enough to allow the information about the original direction of the particle spin to be encoded in the final state of the SG device. It works out to be exactly enough of an effect to enforce the conservation law. The uncertainty in the SG device's angular momentum allows a sort of selection effect; in effect, the 'lost' angular momentum does end up in the SG device. The same kind of effect holds for all conservation laws. This is explained in detail in my eprint "There is No Violation of Conservation Laws in Quantum Measurement". It was first studied by Wigner in 1952, and is related to the Wigner-Araki-Yanase theorem (1960).

See also
"WAY beyond conservation laws"