Thursday, December 29, 2011

Interlude: The Partial Brain thought experiment

Mark Bishop's 2002 paper "Counterfactuals Cannot Count" attacked the use of counterfactuals in computationalism using a neural replacement scenario, in which the components (e.g. neurons) of a brain (which may already be an artificial neural net) are replaced one at a time with components that merely pass through a predetermined sequence of states that happens to be the same sequence that the computational components would have passed through. Eventually, the whole brain is replaced with the equivalent of a player piano, unable to perform nontrivial computations.

There is a long history in philosophy of mind of using the neural replacement thought experiment to argue that various factors (such as what the components are made of) can't affect conscious experiences. For example, David Chalmers used it as an argument for computationalism. It works like this: the rest of the brain can't have any reaction to the replacement process, since by definition the replaced components provide the same signals to the rest of the brain. It's been argued that it doesn't seem plausible that the brain could be very much mistaken about its own experiences, so a gradual change in or vanishing of consciousness is taken to be implausible. A sudden change isn't plausible either, since there's no reason why a particular threshold of how far the replacement has gone should be singled out.

Bishop's argument is really no different from other neural replacement thought experiments, except in the radical (to a computationalist) nature of its conclusions. So, if neural replacement thought experiments do establish that consciousness must be invariant in these scenarios, then computationalism must be rejected.

My Partial Brain thought experiment shows that neural replacement thought experiments are completely worthless. It works like this: Instead of replacing the components of the brain with (whatever), just remove them, but provide the same inputs to the remainder of the brain as the missing components would have provided.

What would it be like to be such a partial brain? Some important features seem obvious: it is not plausible that as we let the partial brain decrease in size, consciousness would vanish suddenly. But now it's not even possible (unlike in neural replacement scenarios) that consciousness will remain unchanged; it must vanish when the removal of the brain is complete.

Therefore, progressively less and less of its consciousness will remain. In a sense it can't notice this - it beliefs will disappear as certain parts of the brain vanish, but they won't otherwise change - but that just means its beliefs will become more wrong until they vanish. For example, if the higher order belief center remains intact but the visual system is gone, the partial brain will believe it is experiencing vision but will in fact not be.

The same things would happen - by definition - in any neural replacement scenario in which the new components don't support consciousness; the remaining brain would have partial consciousness. So neural replacement scenarios can't show us anything about what sorts of components would support consciousness.

The partial brain thought experiment also shows that consciousness isn't a unified whole. It also illustrates that the brain can indeed be wrong about its own conscious experiences; for example, just because a brain is sure that it has qualitative experiences of color, that is not strong evidence in favor of the idea that it actually does, since a partial brain with higher-order thoughts about color but no visual system would be just as sure that it does.

Wednesday, December 14, 2011

Restrictions on mappings 2: Transference

In the previous post, Restrictions on mappings 1: Independence and Inheritance, the "inheritance" of structured state labels was explained; it allows the same group of underlying variables to be mapped to more than one independent formal variable. In the example a function on a 2-d grid was mapped to a pair of variables.

Transference is something like the reverse process: It allows a set of simpler variables to be mapped to a structured state function on a grid.

This allows ordinary digital computers to implement wave dynamics on a 3-d spaces, which could matter for the question of whether the universe could be ultimately digital. The AdS/CFT correspondence in some models of string theory would need something similar if the bulk model is to be implemented on the boundary in the computational sense.

Transference can be Direct or Indirect. It works like this:

Direct Transference could be used in a mapping by taking the value from a given variable and turning it into a label for structuring a set of new variables.

For example, if there is a single integer variable I(t), we can transfer its value to label to a set of bits B(j) which each only depend on whether I(t) equals the value of its label, e.g.

B(j) = 1 if I = j
B(j) = 0 if I does not = j

These bits can be considered an ordered series of "occupation tests" of the different regions that the underlying variable's value could be in.

Of course, only one of these bits at a time will be nonzero. But they are to be considered independent variables. At this point you might object: If you know the value of the nonzero one, don't you know the other bit values must be zero? But just as Inheritance carved out an exception to the rule for independence, so would Direct Transference carve out an exception to it.

Going the other direction is no problem: If we restrict a mapping such that only one bit in an ordered set B(i) is nonzero, then a new variable I can be constructed such that I has a value equal to the index i of the nonzero bit. Here we are doing the reverse.

We can't double count, though; if we make the new set of variables b(i), we can't make a second independent new set of variables c(i) which gets its label transferred from the same underlying variable I(t) for the same values of I.

If we have two underlying variables I and J, we could similarly use Direct Transference to map them to a 2-d grid of bits, B(i,j), in which only one bit is nonzero.

If we then re-map this grid using inheritance we could arrive back at our original I and J variables. So, basically, what Direct Transference is saying is that these two pictures are really equivalent.

We could also map the two of them to a single 1-d series of variables, e.g. which are the sum of the respective 1-d series of bits. (Since the value of the sum becomes 2 when X=Y, these are trits, not bits.)

Can the variables that were obtained using Direct Transference be used to make a mapping so flexible that it must not be allowed? Something like a clock and dial mapping? The answer to that certainly appears to be, no. And that may be justification enough for allowing it; my philosophy is to be liberal in allowing mappings, as long as those mappings don't allow implementations of arbitrary computations.

Indirect Transference is a little more complicated. Consider a computer simulation of dynamics on a 2-d grid, f(x,y). When the value of f is updated at the pair of parameters x and y, this can be done by setting one variable equal to x, another equal to y, and using them to find the memory location M in the computer at which the corresponding value of f is to be changed. Since updates of f at (x,y) always involve fixed values for each of those parameters, f(x,y) can be labeled by those values. In this way, mapping of the values of f to the actual function of x and y, f(x,y), is considered a valid mapping, even though the computer's memory is not laid out in that matter. This is an example of Indirect Transference. It can be generalized to any case in which a parameter of a function is used.

Thursday, October 6, 2011

The Everything Hypothesis: Its Predictions and Problems

There are some basic questions about the world:
1) Why does anything exist, instead of just nothing?
2) What does exist?
3) Why does that stuff exist instead of other stuff?
4) Given that stuff does exist, why is there consciousness instead of just behavior?

These questions are so basic that it would be nice to know the answers to them before worrying about specifics of our own situation.

Regarding these questions we can make highly counter-intuitive observations:

Suppose that somehow you didn't know that anything exists, and you are asked to guess: Does anything exist? My guess would certainly be "No, it would make a lot more sense if nothing exists. That is not only much simpler, it's also a lot easier to understand how it might be that nothing exists."

OK, but things do exist. So why THIS instead of THAT? We don't know. And not only that: It seems impossible to even imagine any reason why one possible thing would be selected over another. You can't say "it's because of this other thing" (whether the other thing is a law of physics, a god, or whatever) because that doesn't explain anything, it just begs the question of "So why does that other thing exist?" and we are back to the start.

There are basically two schools of thought on consciousness: 1] Dualism: Consciousness is a basic thing, which can not be due to something mathematically describable; or 2] Reductionism: Consciousness is an inevitable consequence of certain mathematically describable things. While I tend to fall into the latter camp, both ideas have counter-intuitive implications which I will not address in this post.

At least we do have ideas to debate about consciousness. Are there even any ideas about possible answers to the other questions, controversial or not?

It turns out that there is an idea which might begin to address them to some extent, called the Everything Hypothesis:

What if everything that possibly could exist does exist?

This would seemingly avoid avoid the apparent paradox of some things existing rather than other things despite there being no possible reason why that could be the case. It is also the simplest possible set of things that could have existed (other than just nothing) due to being fully symmetrical.

However, it doesn't really answer question 3). To really put question 3) to rest we would still need to know "Why does everything exist?" which would also cover question 1).

It turns out that there is an idea that could address that; this idea is sometimes called "radical Platonism" in analogy with certain ideas that Plato had, but it is really a modern idea that was perhaps, although not necessarily inspired by Plato, given a bit of philosophical confidence by his precedent.

The idea is that there is no fundamental physical reality; instead, the fundamental reality is the world of logical and mathematical possibilities (which would thus better be called actualities). Of course, it remains difficult to understand why a logically possible world would automatically be real enough to have real observers inside it; but if that is the way it is, the Everything Hypothesis would have to be true. This can be seen as a 'new version' of either Question 1 or Question 4.

While Max Tegmark is credited with the first paper on the Everything Hypothesis, various other people came up with similar ideas on their own. I was one of them.

There are variations or more limited versions of the Everything Hypothesis based on what is meant by 'Everything'. Tegmark's version of the Everything Hypothesis is explicitly mathematical: namely, that every possible mathematical structure exists. If Dualism is false, then that is equivalent to the full Everything Hypothesis. However, if Dualism is true, then consciousness and laws related to it are other things within the set of Everything.

The next question is: What does the Everything Hypothesis predict?

This can be put into familiar terms for a many-worlds model: What measure distribution on conscious observations does it predict?

At first glance, one might think that a typical observer within the Everything would see a quite random mess. If so, the Everything Hypothesis must be false, since we see reliable laws and a highly ordered universe.

However, taking a computationalist view of mind, only certain mathematical structures would support observers: those that have the equivalent of time evolution with respect to some parameter (or some suitable substitute), and have reliable laws for such dynamics. We should therefore consider the set of such dynamical structures with all possible dynamical laws. The starting state may be completely random, but the state will no longer be so random once time evolution occurs.

While we don't know how deal with the set of all possible dynamical systems, there is one subset that is much easier to handle: Turing machines, which are digital computers. A universal Turing machine can run the equivalent of any digital computer program.

So we can look at a simple universal Turing machine and consider the set of all programs for it. Such a program is just an infinitely long symbol string, which I'll call a bit string for simplicity. The machine has a 'head' that move along the string and changes bits and a few value values internal to the head.

Most programs have only a finite number of bits that actually do anything (the active region), because the head is likely to be instructed either to halt or to enter an infinite loop. The number of programs that are the same within the active region but different in the region of bits that don't do anything decreases exponentially with the number of bits in the active region, because each bit brought into the active region is a bit that can't be randomly varied in the inactive region.

Therefore, shorter and simpler programs have a higher measure (more copies) than longer programs do. The typical laws of worlds simulated by these programs are therefore likely to be as simple as possible, consistent with the requirement that observers exist within them. Perhaps, then, our own world is such a world.

That is an impressive result! It would certainly be interesting - though probably it will always be beyond our capabilities - to know exactly what would be typical for observers in the set of all Turing machine program runs to see.

However, there's a big problem here for the Everything Hypothesis: there are infinitely many possible Turing machines and digital computers in general. We can pick one, but that contradicts the fact that radical Platonism must have no arbitrary choices - no free parameters - if it is to explain why the world is the way it is. So why not just pick all of them and weight them equally? The problem there is that since there are infinitely many, the order in which we list them makes a difference to the result we get when trying to get the measure distribution. It's a very small difference for 'subjectively reasonable' choices of these parameters, but that's not the point; ANY arbitrariness completely ruins the explanatory power of radical Platonism.

Besides, what about continuous systems? Some people are content to assume that the fact that there appears to be no natural measure for them means we don't need to include them in the Everything even if we are Platonists. However, it seems to me that they are just as much legitimate candidates for existence as digital systems.

Reluctantly, I am forced to conclude that - unless there is some way of overcoming these mathematical problems that we don't know about, which seems unlikely - Platonism does not provide the explanation we were looking for. This is a paradox, because it also seems that Platonism is the only thing that even could have been a real explanation for why things are the way they are.

With the Everything Hypothesis we are still left with a 'new version' of Questions 1-2: NOT "What does exist, and why?" but "What is the measure distribution, and why?" This is a change, at least! Perhaps it is a more tractable question, but for now it is still a question which demands explanation for which we have none.

I still think that the set of all things which exist is probably very simple in some sense and that the physics we see is just a small part of it. We see the part of it that we see due to being fairly typical observers.

Perhaps the right way to derive the Born Rule of quantum mechanics would be to start with something like the set of all possible Turing machine programs and derive from it the measure distribution on conscious observations, but obviously, such a project would be all but impossible to carry out in practice. My work will focus instead on studying the consequences of the standard physics equations (which are based in large part on experimental observations). However, my criteria for implementation of a computation are general and do not depend on the assumption that the underlying system is a physical one, so in principle, they should apply even to underlying systems that are Platonic Turing machines.

Tuesday, October 4, 2011

Restrictions on mappings 1: Independence and Inheritance

Previous: The Putnam-Searle-Chalmers Theorem

Chalmers proposed that a computation state be represented by a string of numbers, where each part of the string is "independent"; he proposed that independence can be achieved if each part of the string depends on a different part of the underlying system. He called this a Combinatorial State Automaton (CSA), because the state is given not by a single number but by the combination of the different parts of the string. The CSA avoids the false "clock and dial" implementations discussed in the previous post.

However, as Chalmers noted, there are still false implementations that can be found for the CSA. Suppose that each part of the underlying system which is mapped to part of the string at the current time step contains all of the relevant information about every other relevant part of the system at the previous time step.

In that case, a mapping can be made in which - although each part of the string depends only on its own part of the underlying system - a sequence of values can be assigned to each part of the string in such a way as to mimic any desired computation: enough information is available for the value assigned to that part to depend on any desired combination of the values of the parts of the string at the previous time step.

The amount of information that must be stored would grow exponentially with time, which leads to systems that quickly become physically unrealistic after a few time steps, but the fact that the problem exists in principle is enough to show that a different restriction on mappings is needed.

CSSA:

An implementation criterion will be given for a formal system consisting of a structured set of variables (also called substates), and a transition rule. Such a system will be called a Combinatorial Structured State Automaton (CSSA). It differs from Chalmers' CSA in that the structure of the state is not restricted to that of a string of values. For example, the substates could be arranged into a rectangular matrix, or a higher-dimensional array, and this arrangement and its specification would be considered part of the specification of the CSA.

One reason for including such structure is to allow label inheritance (explained below) from an underlying computation to another computation. While the ultimate underlying system is assumed to be the physical universe (or whatever underlies that!), the definitions must work in general so that the underlying system can be anything that can be described mathematically, because the criteria for implementation of a computation should not depend on experimentally discovered facts of physics but rather on pure mathematics.

It will also be useful in analyzing systems to allow computations to implement other computations. For example, if an object implements a "Windows machine" computation, then we only need to check variables on the level of that computation (rather than looking at fundamental systems variables such as the wavefunction's behavior in terms of electron positions) to see if it also implements the "Firefox" computation. In this case the "Windows machine" is now treated as the underlying system. Many of the simple examples I will use will involve a simple underlying digital system.

The criteria given below are a modification of the ones given in my MCI paper, and will be incorporated into a revised paper.

Independence:

The basic rule for independence is that it must not be possible to determine from knowledge of the physical states that are mapped to the value of the current substate and of the system dynamics

- the values of any of those substates (at the previous time step) that determined what the value of a given substate is at the current time step, except when the current value itself provides enough information; or

- the values of other substates (at the same time step), except when the current value itself provides enough information.

This rules out clock and dial mappings, because the clock and dial physical state which any substate depends on would reveal all formal values of the other substates and the previous substates. It also rules out the other false implementations discussed above in which the variables record information that would reveal the values of the previous states that determine them.

Inheritance:

However, it is sensible to allow cases in which interacting formal substates share the underlying variables that they depend on. This is especially important with quantum mechanics (see below).

In these cases, independence is instead established with the help of “inheritance” of labels. Underlying label indices are inherited when a substate depends on the values of that index, associating the substate with that index.

Knowledge of the variables that determine the other substates - modulo permutations (or in other words, limited knowledge that would not be affected by such swapping) of the labeling indexes that are inherited by the given substate - must not reveal the value that the given substate has (at the same time step).

The restriction that knowledge of the underlying variables that determine the substate must not reveal the values of the substates at previous time step that determine its value (with exceptions as the basic rule for inheritance above) still applies as well, but modulo permutations of the inherited indexes for the respective substates.

An example helps show what is involved here:

Bits on a 2-d grid plus time --> (mapped to) 2 integers as functions of time

b(i,j,t) --> I(t), J(t)

The computation would then involve the dynamics of I and J, such as
I(t+1) = f(I(t),J(t))
J(t+1) = g(I(t),J(t))

where f and g are appropriate functions.

Each bit takes on the value 0 or 1. If the number of nonzero bits is not equal to 1, this mapping will be undefined (there is no need for the mapping to cover all possible states of the underlying system).

If only one bit is nonzero at a given time, then let
I(t) = the value of i at which b(i,j,t) is 1
J(t) = the value of j at which b(i,j,t) is 1

If we were to swap around the values of the i-index, it would affect the value of I(t) but not of J(t), and similarly for the j-index affecting J(t) but not I(t). Therefore the conditions are met so that I(t) “inherits” the i label and J(t) the j label.

To verify independence - even though both I and J depend on all bits on the grid in the sense that only one nonzero bit is allowed - knowing the full state of the 2-d grid modulo a permutation of the j-index (that is, knowing it only up to a possible swapping around of the j-index values) reveals the value of I(t), but not J(t), and vice versa. Thus they are independent.

In this example, an underlying system which consists of a function ON a 2-dimensional grid was reduced to a system of two variables, which together define a single point IN a 2-d space. This can be summarized as “on --> in”.

In cases where the underlying system has other variables, e.g. the k index in b(i,j,k,t), a mapping might not be symmetric with respect to the other variables. For example, we could have

X(t) = b(1,1,1,t)+ b(1,2,2,t) - b(2,1,3,t) - b(2,2,4,t)
Y(t) = b(1,1,1,t)+ b(2,1,3,t) - b(1,2,2,t) - b(2,2,4,t)

In this case, are X and Y independent, with X inheriting from i and Y from j? If two values of j (say j=1 and j=2) are swapped as indexes of b(), then the value of X could change: b(1,1,1,t) + b(1,2,2,t) is not b(1,2,1,t) + b(1,1,2,t). On the other hand, if we only had a limited knowledge of b() and that knowledge is symmetric with respect to such swaps, we wouldn't know that. This mapping is OK. It is perhaps better to think of it as if the terms in X(t) are labeled by different i-values and j-values (the values the mapping is supposed to inherit from) while ignoring their functional dependence on k (which is not being inherited by anything).

The concept of inheritance can be generalized to cases in which more than one underlying variable is responsible for the distinction being inherited. For example, here X inherits from i, but Y is to inherit from j and k:

X(t) = b(1,1,1,t)+ b(1,2,1,t) - b(2,1,1,t) - b(2,1,2,t)
Y(t) = b(1,1,1,t)+ b(2,1,1,t) - b(1,2,1,t) - b(2,1,2,t)

Inheritance can also occur with multiple levels of functionals. For example, suppose the underlying variables are a functional g of function f(i,j), e.g. if i and j are bits, here g=g(f(0,0),f(0,1),f(1,0),f(1,1)). An intermediate mapping could be made with substates that inherit from f and are functions on (i,j). This can then be mapped to a pair of substates X,Y that inherit from i and j respectively.

In quantum mechanics, the wavefunction is a function ON a high dimensional configuration space. In classical mechanics, the set of particle positions defines a point IN configuration space. It certainly seems that, in the classical limit, the computations that would performed by the related classical system (such as a system of classical switches) would in fact be performed by the actual quantum system. Inheritance of labels - from quantum directions in configuration space to classical position variables - permits that. Quantum field theory, which is even more of a realistic model, involves functional dependency of the wavefunctional on a function of space.

Union:

A substate can be mapped based on a region that includes a number of subregions each of which is associated with certain label values. If each such subregion meets a disinheritance criterion, those certain label values can be considered separately for determining disinheritance. This rule is particularly useful when dealing with Quantum Field Theory.

For example, suppose that several bits are to be mapped from psi(F1,F2,F3). In a straightforward inheritance scenario, perhaps one such bit, bit A, depends on how psi is distributed among values of F1, with F2 disinherited (perhaps summed over) and a particular value of F3 is chosen for the mapping. Another bit, bit B, might then disinherit F1 but similarly depend on the F2 distribution and use the same F3 value. Recall that disinheriting F1 means that swapping the values of psi evaluated at different F1 values with the other F's held fixed has no effect on B. Since A and B disinherit different variables, they are independent.

The Union rule means that we can choose a second F3 value and with it held fixed let A (besides for what it depended on before with the first F3 value still fixed for that) also depend on the distribution of psi on F2 (but not F1) with the new F3 value, while B also depends on the distribution of psi on F1 (but not F2) with the new F3 value, and A and B are still considered independent.

This sort of thing lets two substates in QFT depend on the same areas of space as long as the configuration of the environment distinguishes them, since particle identity is not available to do so. For example, given a row of evenly spaced switches, for certain combinations of states the row might be translated so as to replace one switch with the next; this should not ruin the computation.

Monday, October 3, 2011

The Putnam-Searle-Chalmers Theorem

Previous: Basic idea of an implementation

If a computation is implemented, there must be a mapping from states of the underlying system to formal states of the computation, and the states must have the correct behavior (transition rule) as they change over time.

For example, for an electronic digital computer, we can map a high-voltage state of a transistor lead to a "1" and a low voltage state to a "0". By doing the same for various other circuit elements, we can obtain an ordered string of 1's and 0's. This string will change over time as the voltages change. If, due to the laws of physics and the way the circuit is connected, this string must always change in accordance with the transition rules for computation C, then the system implements C.

We must allow as much flexibility as possible in the choice of mapping, because we are trying to understand the behavior of the system without any reference to things outside the system. Human convenience in recognizing the mapping is not a consideration.

The obvious way to try do this is simply to allow any mapping that is mathematically possible. This leads to what I will call the naive implementation criterion, because while it may sound good at first it is not a viable option for a satisfactory criterion. Chalmers' paper Does a Rock Implement Every Finite-State Automaton? explained this in detail.

The following is my version of what I'll call the "Putnam-Searle-Chalmers (PSC) Theorem" which shows that unrestricted mappings are not a viable option:

Suppose that a system consists of two parts, S and T, each of which has a numerical value, and that the dynamics of our system are as follows:
S(t+1) = S(t)
T(t+1) = T(t) + 1

These dynamics are fairly trivial; S is a dial that maintains a constant setting, while T is a clock.

We will check to see if this system implements a computation, C, which has the transition rule X(t+1) = F(X(t)) where t is a time index.

Here X need not be a single number; it might be a string of bits, for example.

F could be a complicated function, such as F(X) = the Xth prime number (expressed in base 2, where X is an integer expressed in base 2).

Now make a mapping M going from (S,T) to X with the following properties:
X = M(S,T)
M(S,T+1) = F(M(S,T))

We do need to make sure that any possible starting value of X is allowed by the mapping, which we can always do if our system has enough possible dial values.

Now, according to the mapping, X will change as a function of time based on the dynamics of the system as follows:
X(t+1) = M(S(t+1),T(t+1))
= M(S(t),T(t)+1)
= F(M(S(t),T(t)))
= F(X(t))

Therefore, the system would implement the computation if this mapping is allowed. But the system's dynamics are trivial while the computation can be a very complicated function. Obviously, this is not acceptable; the computation does not characterize the behavior of the system at all. This is what I call a "false implementation". All of the complicated dynamics of the computation have been put into the mapping. It is therefore necessary to put restrictions on what mappings are allowed.

One possibility is to require that each part of a string that defines a formal computational state (e.g. each bit in a bit string) takes its value based on a different part of the underlying system. That is basically what Chalmers proposed to overcome the problem.

While it somewhat counter-intuitively rules out distinctions based on the value of a single number (since a single number can still be mapped to any other single number), it goes a long way towards ruling out false implementations, while still allowing important standard examples of implementations that we want to retain, such as mapping switch positions to bit values (assuming classical physics).

However, it is not quite right. There are some systems for which it still allows false implementations, and there are other cases where it rules out what seem to be legitimate implementations - and these become clearly important when quantum mechanics is brought into the picture. In classical mechanics, different particles are different parts of the system; in quantum mechanics, different particles are different directions in the shared configuration space on which the wavefunction evolves.

Next: Restrictions on mappings 1: Independence and Inheritance

Saturday, October 1, 2011

Basic idea of an implementation

In conceptual terms, to say that a system implements a given computation is to say that something about that system - some aspect of its behavior - is described by the computation. It is thus a way of characterizing the system.

For example, if a system can take two numbers as inputs and produces their sum as an output, we could characterize it as an "adder". If instead it can take two numbers as inputs and produces their product as an output, we could characterize it as a "multiplier". "Adding" and "multiplying" are two different _types_ of computations. These behaviors are fairly general, so they can be can be further divided into specific computations if more details are specified, such as what format the inputs are in, internal functioning during intermediate steps, and so on.

Two "multipliers" with different intermediate steps would implement different computations. Thus, while the word "computation" suggests something that is done to get a result, that is misleading in the context of the computationalist view of consciousness, which instead focuses on internal functioning of the system in question.

If we find a third system and discover that it behaves like an "adder", then it has a lot in common with our other "adder": we now know a lot about how it can behave. But there's also a lot that this characterization does NOT tell us. It doesn't tell us what the system is made out of; or what its other behaviors might be; or what internal processes it uses to perform the additions. It also does not tell us whether it performs the addition multiple times, redundantly, perhaps performing the same addition again but sending that (same) result to some other person.

In addition to its capabilities, a system's behavior is further characterized by the exact sequence of computer states that are actually involved in its behavior. For example, an adder which added the numbers 45 and 66 and got 111 is characterized by those numbers, in addition to just being an adder. Such a sequence of numbers (or more precisely the sequence that includes all intermediate states of the computation) can be called the "activity" of the computation, and together with the computation it specifies what is known as a run of the computation.

It is important to note that neither the computation alone nor the activity alone are enough to adequately describe the behavior of the system; both are needed.

Often, I will speak only of computations, but it should be understood by the context that "runs of computations" are often meant as well. For example, if I say that a particular computation gives rise to a particular conscious observation, I mean that a particular run of that computation - corresponding to a particular starting state - gives rise to that observation. A different run of the same computation (such as the same type of brain except in a different starting state) might then give rise to a different kind of observation or to none at all.

In the case of programmable systems, the distinction between computations and runs becomes somewhat arbitrary, but this is not a problem as we can always specify what computations we are interested in and use that to make the choice.

A system could be both an adder and a multiplier; for example, the universe is both if at least one of each exists as a sub-system of it. Yet a hypothetical universe with 100 adders and 1 multiplier would be significantly different from one with 1 adder and 100 multipliers. A more detailed way of characterizing such systems would be to state how many instances of each kind of computation is performed by it: in other words, to give a measure distribution on computations. Such a measure distribution (or more exactly, a measure distribution on runs of computations) is the main tool needed to evaluate the computationalist version of the Many Worlds Interpretation of QM, and will be addressed in later posts.

While artificial computers tend to operate on command, there is nothing about the concept of implementing computations that requires that. For example, a robot that walks around and listens for numbers, then once it has heard two of them, adds them and says the result, then starts again, would still be an "adder". Such a robot would behave according to its internal agenda, ignoring the desires of the people around it even if they beg it to stop.

It's also important to note "outputs" need not be distinguished from internal parts of the system. Any parts of the system that produce the behavior in question are the relevant parts, regardless of how they interact with things outside the system.

For a computation with more than one step, it is useful to define "inputs" as substates that are affected by influences outside the scope of the computation in such a way that their values are not determined by the previous state of the computation. Influences outside the scope of the computation are not necessarily outside of the system (such as the universe) which performs that computation.

Those familiar with computer science might be surprised that the concept of the Turing Machine will play no role in using computations to characterize systems. A Turing Machine is a type of computer that is useful for specifying which functions could be calculated by in principle by digital computers (if unlimited memory - an infinite number of substates - is available, but the transition rule for each substate depends on a finite number of substates) and how easily (setting bounds for how memory needs and processing time scale with the size of a problem).

Because this class of computers is so ubiquitous in computer science, those computer scientists who dare venture into the swamps of philosophy far enough to read a paper or attend a lecture on the implementation problem sometimes completely lose interest in the problem when they realize that no one is mentioning Turing Machines. In fact, it would be quite easy to describe Turing Machines as a special case of the computers that I will describe.

But here we are concerned with characterizing the behavior of actual systems as they exist, not with finding what size of problems they could be programmed to handle. It is also important to note that digital computing is just a special case of the behaviors that could be characterized as computation; analog computing can certainly be considered as well, and most of the definitions I will make are general enough to cover all cases. However, I will focus on digital computation in most of the examples I look at.

The next step is to formalize the idea of implementation by giving a mathematical criterion for whether a computation is implemented by a given mathematically described system. This will be done by requiring a mapping from states of the underlying system to formal states of the computation, and requiring the correct behavior of the states as they change over time.

However, as will be seen in the next post, this approach quickly runs into a problem: without restrictions on allowed mappings, 'almost any' physical system would seem to implement 'almost any' computation. This absurd result would imply that a rock implements the same computations as a brain. The likely solution is to impose restrictions on the allowed mappings, but finding a fully satisfactory set of restrictions has proven to be a difficult task. My proposals for this will be presented in subsequent posts; I think that I have been successful in finding (at least close to) the right set of restrictions.

Monday, September 26, 2011

The Computationalist approach to Measure

For reference, before reading this post, read the posts on Meaning of Probability in an MWI and Measure of Consciousness versus Probability.

Here, I'll outline the way in which the Computationalist philosophy of mind suggests that it might be possible to calculate the measures of consciousness for various outcomes.

My goal in this is to either derive from the MWI the Born Rule (which allows us to calculate the probabilities in QM), thus providing strong support for the computationalist view of the MWI, or to show that the experimentally discovered Born Rule is inconsistent with these popular views of mind and physics.

Computationalism is basically the idea that a brain gives rise to conscious observations due to its mathematically describable functioning: the motions of its parts together with the laws of physics that restrict and determine those motions.

With the existence of consciousness thus being in principle describable mathematically, we can hypothesize that it might be possible to analyze the mathematical description of a physical system, and based on it determine not only whether consciousness is present, but also to determine that multiple different types of observations are being made in different parts of the system. We can make a reasonable guess as to the nature of those observations.

We can also attempt to determine what the "quantity of consciousness" is for each significantly different type of observation. If we can, then we have a measure distribution, which we can compare to what the Born Rule predicts.

In doing so, it is necessary to work with computations as proxies for the conscious observations which are assumed to accompany them. The nature of the link between computations and conscious observations, while an important topic in its own right, is for the most part not something we need to analyze for this purpose, which is fortunate as understanding it is famously hard.

In comparing the computationalist + MWI prediction to the Born Rule, it would not be a problem if there are very small deviations which would not yet have been detected experimentally. In theory, such a situation could open the door to a straightforward experimental test of the computationalist approach to the MWI, which would be a great thing to have. However, it seems likely that IF small deviations are predicted then any such deviations would be too small to ever measure.

On the other hand, if the measure distribution we get does NOT agree with the Born Rule even approximately, then the computationalist picture of the MWI would be refuted.

Unfortunately, all of this is easier said than done. While the general idea of Computationalism is fairly popular, it has not yet been developed into something precise enough to carry out this program. Needless to say, experiments can be of no help in this regard.

As will be explained in detail in forthcoming posts, I have made proposals as to how to make computationalism precise enough. While it is technically challenging to state necessary and sufficient conditions to be able to say that a given computation is carried out by a given system, I am satisfied that this can be done and that my proposal is at least quite close to the correct way to do so.

Given such conditions, the next thing that we need is a way to determine the accompanying quantity of consciousness. Hopefully, we can avoid dealing with the link between computation and consciousness even here, by determining as a proxy the "quantity of each computation" that is present, and assuming that it is proportional to the quantity of consciousness.

However, the link may not be so easily bypassed, as it turns out that there are different options for determining the "quantity of computation" and naturally the "right" choice would be the one that best suits the application we have in mind, namely consciousness. It is not clear, however, what that one is.

Given the choices that I would tend to pick based upon subjective criteria of what "seems most natural" to me, the application of those choices to the standard physics of the MWI apparently DOES NOT give the Born Rule. This fact is not sufficient for me to declare that the MWI has been disproven; however, it does help motivate a search for MWI proposals that include modifications of physics, as well as further work on how to determine "quantities of computations".

I would like other people to learn of and study these issues. Perhaps, in the future, people who have studied these issues deeply and explored the implications and alternatives in great detail will find that the criteria which happen to give the Born Rule from the unmodified MWI are the same that they prefer for quite different reasons.

Featured Post

Why MWI?

Before getting into the details of the problems facing the Many-Worlds Interpretation (MWI), it's a good idea to explain why I believe t...

Followers