There are some basic questions about the world:
1) Why does anything exist, instead of just nothing?
2) What does exist?
3) Why does that stuff exist instead of other stuff?
4) Given that stuff does exist, why is there consciousness instead of just behavior?
These questions are so basic that it would be nice to know the answers to them before worrying about specifics of our own situation.
Regarding these questions we can make highly counter-intuitive observations:
Suppose that somehow you didn't know that anything exists, and you are asked to guess: Does anything exist? My guess would certainly be "No, it would make a lot more sense if nothing exists. That is not only much simpler, it's also a lot easier to understand how it might be that nothing exists."
OK, but things do exist. So why THIS instead of THAT? We don't know. And not only that: It seems impossible to even imagine any reason why one possible thing would be selected over another. You can't say "it's because of this other thing" (whether the other thing is a law of physics, a god, or whatever) because that doesn't explain anything, it just begs the question of "So why does that other thing exist?" and we are back to the start.
There are basically two schools of thought on consciousness: 1] Dualism: Consciousness is a basic thing, which can not be due to something mathematically describable; or 2] Reductionism: Consciousness is an inevitable consequence of certain mathematically describable things. While I tend to fall into the latter camp, both ideas have counter-intuitive implications which I will not address in this post.
At least we do have ideas to debate about consciousness. Are there even any ideas about possible answers to the other questions, controversial or not?
It turns out that there is an idea which might begin to address them to some extent, called the Everything Hypothesis:
What if everything that possibly could exist does exist?
This would seemingly avoid avoid the apparent paradox of some things existing rather than other things despite there being no possible reason why that could be the case. It is also the simplest possible set of things that could have existed (other than just nothing) due to being fully symmetrical.
However, it doesn't really answer question 3). To really put question 3) to rest we would still need to know "Why does everything exist?" which would also cover question 1).
It turns out that there is an idea that could address that; this idea is sometimes called "radical Platonism" in analogy with certain ideas that Plato had, but it is really a modern idea that was perhaps, although not necessarily inspired by Plato, given a bit of philosophical confidence by his precedent.
The idea is that there is no fundamental physical reality; instead, the fundamental reality is the world of logical and mathematical possibilities (which would thus better be called actualities). Of course, it remains difficult to understand why a logically possible world would automatically be real enough to have real observers inside it; but if that is the way it is, the Everything Hypothesis would have to be true. This can be seen as a 'new version' of either Question 1 or Question 4.
While Max Tegmark is credited with the first paper on the Everything Hypothesis, various other people came up with similar ideas on their own. I was one of them.
There are variations or more limited versions of the Everything Hypothesis based on what is meant by 'Everything'. Tegmark's version of the Everything Hypothesis is explicitly mathematical: namely, that every possible mathematical structure exists. If Dualism is false, then that is equivalent to the full Everything Hypothesis. However, if Dualism is true, then consciousness and laws related to it are other things within the set of Everything.
The next question is: What does the Everything Hypothesis predict?
This can be put into familiar terms for a many-worlds model: What measure distribution on conscious observations does it predict?
At first glance, one might think that a typical observer within the Everything would see a quite random mess. If so, the Everything Hypothesis must be false, since we see reliable laws and a highly ordered universe.
However, taking a computationalist view of mind, only certain mathematical structures would support observers: those that have the equivalent of time evolution with respect to some parameter (or some suitable substitute), and have reliable laws for such dynamics. We should therefore consider the set of such dynamical structures with all possible dynamical laws. The starting state may be completely random, but the state will no longer be so random once time evolution occurs.
While we don't know how deal with the set of all possible dynamical systems, there is one subset that is much easier to handle: Turing machines, which are digital computers. A universal Turing machine can run the equivalent of any digital computer program.
So we can look at a simple universal Turing machine and consider the set of all programs for it. Such a program is just an infinitely long symbol string, which I'll call a bit string for simplicity. The machine has a 'head' that move along the string and changes bits and a few value values internal to the head.
Most programs have only a finite number of bits that actually do anything (the active region), because the head is likely to be instructed either to halt or to enter an infinite loop. The number of programs that are the same within the active region but different in the region of bits that don't do anything decreases exponentially with the number of bits in the active region, because each bit brought into the active region is a bit that can't be randomly varied in the inactive region.
Therefore, shorter and simpler programs have a higher measure (more copies) than longer programs do. The typical laws of worlds simulated by these programs are therefore likely to be as simple as possible, consistent with the requirement that observers exist within them. Perhaps, then, our own world is such a world.
That is an impressive result! It would certainly be interesting - though probably it will always be beyond our capabilities - to know exactly what would be typical for observers in the set of all Turing machine program runs to see.
However, there's a big problem here for the Everything Hypothesis: there are infinitely many possible Turing machines and digital computers in general. We can pick one, but that contradicts the fact that radical Platonism must have no arbitrary choices - no free parameters - if it is to explain why the world is the way it is. So why not just pick all of them and weight them equally? The problem there is that since there are infinitely many, the order in which we list them makes a difference to the result we get when trying to get the measure distribution. It's a very small difference for 'subjectively reasonable' choices of these parameters, but that's not the point; ANY arbitrariness completely ruins the explanatory power of radical Platonism.
Besides, what about continuous systems? Some people are content to assume that the fact that there appears to be no natural measure for them means we don't need to include them in the Everything even if we are Platonists. However, it seems to me that they are just as much legitimate candidates for existence as digital systems.
Reluctantly, I am forced to conclude that - unless there is some way of overcoming these mathematical problems that we don't know about, which seems unlikely - Platonism does not provide the explanation we were looking for. This is a paradox, because it also seems that Platonism is the only thing that even could have been a real explanation for why things are the way they are.
With the Everything Hypothesis we are still left with a 'new version' of Questions 1-2: NOT "What does exist, and why?" but "What is the measure distribution, and why?" This is a change, at least! Perhaps it is a more tractable question, but for now it is still a question which demands explanation for which we have none.
I still think that the set of all things which exist is probably very simple in some sense and that the physics we see is just a small part of it. We see the part of it that we see due to being fairly typical observers.
Perhaps the right way to derive the Born Rule of quantum mechanics would be to start with something like the set of all possible Turing machine programs and derive from it the measure distribution on conscious observations, but obviously, such a project would be all but impossible to carry out in practice. My work will focus instead on studying the consequences of the standard physics equations (which are based in large part on experimental observations). However, my criteria for implementation of a computation are general and do not depend on the assumption that the underlying system is a physical one, so in principle, they should apply even to underlying systems that are Platonic Turing machines.
Thursday, October 6, 2011
Tuesday, October 4, 2011
Restrictions on mappings 1: Independence and Inheritance
Previous: The Putnam-Searle-Chalmers Theorem
Chalmers proposed that a computation state be represented by a string of numbers, where each part of the string is "independent"; he proposed that independence can be achieved if each part of the string depends on a different part of the underlying system. He called this a Combinatorial State Automaton (CSA), because the state is given not by a single number but by the combination of the different parts of the string. The CSA avoids the false "clock and dial" implementations discussed in the previous post.
However, as Chalmers noted, there are still false implementations that can be found for the CSA. Suppose that each part of the underlying system which is mapped to part of the string at the current time step contains all of the relevant information about every other relevant part of the system at the previous time step.
In that case, a mapping can be made in which - although each part of the string depends only on its own part of the underlying system - a sequence of values can be assigned to each part of the string in such a way as to mimic any desired computation: enough information is available for the value assigned to that part to depend on any desired combination of the values of the parts of the string at the previous time step.
The amount of information that must be stored would grow exponentially with time, which leads to systems that quickly become physically unrealistic after a few time steps, but the fact that the problem exists in principle is enough to show that a different restriction on mappings is needed.
CSSA:
An implementation criterion will be given for a formal system consisting of a structured set of variables (also called substates), and a transition rule. Such a system will be called a Combinatorial Structured State Automaton (CSSA). It differs from Chalmers' CSA in that the structure of the state is not restricted to that of a string of values. For example, the substates could be arranged into a rectangular matrix, or a higher-dimensional array, and this arrangement and its specification would be considered part of the specification of the CSA.
One reason for including such structure is to allow label inheritance (explained below) from an underlying computation to another computation. While the ultimate underlying system is assumed to be the physical universe (or whatever underlies that!), the definitions must work in general so that the underlying system can be anything that can be described mathematically, because the criteria for implementation of a computation should not depend on experimentally discovered facts of physics but rather on pure mathematics.
It will also be useful in analyzing systems to allow computations to implement other computations. For example, if an object implements a "Windows machine" computation, then we only need to check variables on the level of that computation (rather than looking at fundamental systems variables such as the wavefunction's behavior in terms of electron positions) to see if it also implements the "Firefox" computation. In this case the "Windows machine" is now treated as the underlying system. Many of the simple examples I will use will involve a simple underlying digital system.
The criteria given below are a modification of the ones given in my MCI paper, and will be incorporated into a revised paper.
Independence:
The basic rule for independence is that it must not be possible to determine from knowledge of the physical states that are mapped to the value of the current substate and of the system dynamics
- the values of any of those substates (at the previous time step) that determined what the value of a given substate is at the current time step, except when the current value itself provides enough information; or
- the values of other substates (at the same time step), except when the current value itself provides enough information.
This rules out clock and dial mappings, because the clock and dial physical state which any substate depends on would reveal all formal values of the other substates and the previous substates. It also rules out the other false implementations discussed above in which the variables record information that would reveal the values of the previous states that determine them.
Inheritance:
However, it is sensible to allow cases in which interacting formal substates share the underlying variables that they depend on. This is especially important with quantum mechanics (see below).
In these cases, independence is instead established with the help of “inheritance” of labels. Underlying label indices are inherited when a substate depends on the values of that index, associating the substate with that index.
Knowledge of the variables that determine the other substates - modulo permutations (or in other words, limited knowledge that would not be affected by such swapping) of the labeling indexes that are inherited by the given substate - must not reveal the value that the given substate has (at the same time step).
The restriction that knowledge of the underlying variables that determine the substate must not reveal the values of the substates at previous time step that determine its value (with exceptions as the basic rule for inheritance above) still applies as well, but modulo permutations of the inherited indexes for the respective substates.
An example helps show what is involved here:
Bits on a 2-d grid plus time --> (mapped to) 2 integers as functions of time
b(i,j,t) --> I(t), J(t)
The computation would then involve the dynamics of I and J, such as
I(t+1) = f(I(t),J(t))
J(t+1) = g(I(t),J(t))
where f and g are appropriate functions.
Each bit takes on the value 0 or 1. If the number of nonzero bits is not equal to 1, this mapping will be undefined (there is no need for the mapping to cover all possible states of the underlying system).
If only one bit is nonzero at a given time, then let
I(t) = the value of i at which b(i,j,t) is 1
J(t) = the value of j at which b(i,j,t) is 1
If we were to swap around the values of the i-index, it would affect the value of I(t) but not of J(t), and similarly for the j-index affecting J(t) but not I(t). Therefore the conditions are met so that I(t) “inherits” the i label and J(t) the j label.
To verify independence - even though both I and J depend on all bits on the grid in the sense that only one nonzero bit is allowed - knowing the full state of the 2-d grid modulo a permutation of the j-index (that is, knowing it only up to a possible swapping around of the j-index values) reveals the value of I(t), but not J(t), and vice versa. Thus they are independent.
In this example, an underlying system which consists of a function ON a 2-dimensional grid was reduced to a system of two variables, which together define a single point IN a 2-d space. This can be summarized as “on --> in”.
In cases where the underlying system has other variables, e.g. the k index in b(i,j,k,t), a mapping might not be symmetric with respect to the other variables. For example, we could have
X(t) = b(1,1,1,t)+ b(1,2,2,t) - b(2,1,3,t) - b(2,2,4,t)
Y(t) = b(1,1,1,t)+ b(2,1,3,t) - b(1,2,2,t) - b(2,2,4,t)
In this case, are X and Y independent, with X inheriting from i and Y from j? If two values of j (say j=1 and j=2) are swapped as indexes of b(), then the value of X could change: b(1,1,1,t) + b(1,2,2,t) is not b(1,2,1,t) + b(1,1,2,t). On the other hand, if we only had a limited knowledge of b() and that knowledge is symmetric with respect to such swaps, we wouldn't know that. This mapping is OK. It is perhaps better to think of it as if the terms in X(t) are labeled by different i-values and j-values (the values the mapping is supposed to inherit from) while ignoring their functional dependence on k (which is not being inherited by anything).
The concept of inheritance can be generalized to cases in which more than one underlying variable is responsible for the distinction being inherited. For example, here X inherits from i, but Y is to inherit from j and k:
X(t) = b(1,1,1,t)+ b(1,2,1,t) - b(2,1,1,t) - b(2,1,2,t)
Y(t) = b(1,1,1,t)+ b(2,1,1,t) - b(1,2,1,t) - b(2,1,2,t)
Inheritance can also occur with multiple levels of functionals. For example, suppose the underlying variables are a functional g of function f(i,j), e.g. if i and j are bits, here g=g(f(0,0),f(0,1),f(1,0),f(1,1)). An intermediate mapping could be made with substates that inherit from f and are functions on (i,j). This can then be mapped to a pair of substates X,Y that inherit from i and j respectively.
In quantum mechanics, the wavefunction is a function ON a high dimensional configuration space. In classical mechanics, the set of particle positions defines a point IN configuration space. It certainly seems that, in the classical limit, the computations that would performed by the related classical system (such as a system of classical switches) would in fact be performed by the actual quantum system. Inheritance of labels - from quantum directions in configuration space to classical position variables - permits that. Quantum field theory, which is even more of a realistic model, involves functional dependency of the wavefunctional on a function of space.
Union:
A substate can be mapped based on a region that includes a number of subregions each of which is associated with certain label values. If each such subregion meets a disinheritance criterion, those certain label values can be considered separately for determining disinheritance. This rule is particularly useful when dealing with Quantum Field Theory.
For example, suppose that several bits are to be mapped from psi(F1,F2,F3). In a straightforward inheritance scenario, perhaps one such bit, bit A, depends on how psi is distributed among values of F1, with F2 disinherited (perhaps summed over) and a particular value of F3 is chosen for the mapping. Another bit, bit B, might then disinherit F1 but similarly depend on the F2 distribution and use the same F3 value. Recall that disinheriting F1 means that swapping the values of psi evaluated at different F1 values with the other F's held fixed has no effect on B. Since A and B disinherit different variables, they are independent.
The Union rule means that we can choose a second F3 value and with it held fixed let A (besides for what it depended on before with the first F3 value still fixed for that) also depend on the distribution of psi on F2 (but not F1) with the new F3 value, while B also depends on the distribution of psi on F1 (but not F2) with the new F3 value, and A and B are still considered independent.
This sort of thing lets two substates in QFT depend on the same areas of space as long as the configuration of the environment distinguishes them, since particle identity is not available to do so. For example, given a row of evenly spaced switches, for certain combinations of states the row might be translated so as to replace one switch with the next; this should not ruin the computation.
Chalmers proposed that a computation state be represented by a string of numbers, where each part of the string is "independent"; he proposed that independence can be achieved if each part of the string depends on a different part of the underlying system. He called this a Combinatorial State Automaton (CSA), because the state is given not by a single number but by the combination of the different parts of the string. The CSA avoids the false "clock and dial" implementations discussed in the previous post.
However, as Chalmers noted, there are still false implementations that can be found for the CSA. Suppose that each part of the underlying system which is mapped to part of the string at the current time step contains all of the relevant information about every other relevant part of the system at the previous time step.
In that case, a mapping can be made in which - although each part of the string depends only on its own part of the underlying system - a sequence of values can be assigned to each part of the string in such a way as to mimic any desired computation: enough information is available for the value assigned to that part to depend on any desired combination of the values of the parts of the string at the previous time step.
The amount of information that must be stored would grow exponentially with time, which leads to systems that quickly become physically unrealistic after a few time steps, but the fact that the problem exists in principle is enough to show that a different restriction on mappings is needed.
CSSA:
An implementation criterion will be given for a formal system consisting of a structured set of variables (also called substates), and a transition rule. Such a system will be called a Combinatorial Structured State Automaton (CSSA). It differs from Chalmers' CSA in that the structure of the state is not restricted to that of a string of values. For example, the substates could be arranged into a rectangular matrix, or a higher-dimensional array, and this arrangement and its specification would be considered part of the specification of the CSA.
One reason for including such structure is to allow label inheritance (explained below) from an underlying computation to another computation. While the ultimate underlying system is assumed to be the physical universe (or whatever underlies that!), the definitions must work in general so that the underlying system can be anything that can be described mathematically, because the criteria for implementation of a computation should not depend on experimentally discovered facts of physics but rather on pure mathematics.
It will also be useful in analyzing systems to allow computations to implement other computations. For example, if an object implements a "Windows machine" computation, then we only need to check variables on the level of that computation (rather than looking at fundamental systems variables such as the wavefunction's behavior in terms of electron positions) to see if it also implements the "Firefox" computation. In this case the "Windows machine" is now treated as the underlying system. Many of the simple examples I will use will involve a simple underlying digital system.
The criteria given below are a modification of the ones given in my MCI paper, and will be incorporated into a revised paper.
Independence:
The basic rule for independence is that it must not be possible to determine from knowledge of the physical states that are mapped to the value of the current substate and of the system dynamics
- the values of any of those substates (at the previous time step) that determined what the value of a given substate is at the current time step, except when the current value itself provides enough information; or
- the values of other substates (at the same time step), except when the current value itself provides enough information.
This rules out clock and dial mappings, because the clock and dial physical state which any substate depends on would reveal all formal values of the other substates and the previous substates. It also rules out the other false implementations discussed above in which the variables record information that would reveal the values of the previous states that determine them.
Inheritance:
However, it is sensible to allow cases in which interacting formal substates share the underlying variables that they depend on. This is especially important with quantum mechanics (see below).
In these cases, independence is instead established with the help of “inheritance” of labels. Underlying label indices are inherited when a substate depends on the values of that index, associating the substate with that index.
Knowledge of the variables that determine the other substates - modulo permutations (or in other words, limited knowledge that would not be affected by such swapping) of the labeling indexes that are inherited by the given substate - must not reveal the value that the given substate has (at the same time step).
The restriction that knowledge of the underlying variables that determine the substate must not reveal the values of the substates at previous time step that determine its value (with exceptions as the basic rule for inheritance above) still applies as well, but modulo permutations of the inherited indexes for the respective substates.
An example helps show what is involved here:
Bits on a 2-d grid plus time --> (mapped to) 2 integers as functions of time
b(i,j,t) --> I(t), J(t)
The computation would then involve the dynamics of I and J, such as
I(t+1) = f(I(t),J(t))
J(t+1) = g(I(t),J(t))
where f and g are appropriate functions.
Each bit takes on the value 0 or 1. If the number of nonzero bits is not equal to 1, this mapping will be undefined (there is no need for the mapping to cover all possible states of the underlying system).
If only one bit is nonzero at a given time, then let
I(t) = the value of i at which b(i,j,t) is 1
J(t) = the value of j at which b(i,j,t) is 1
If we were to swap around the values of the i-index, it would affect the value of I(t) but not of J(t), and similarly for the j-index affecting J(t) but not I(t). Therefore the conditions are met so that I(t) “inherits” the i label and J(t) the j label.
To verify independence - even though both I and J depend on all bits on the grid in the sense that only one nonzero bit is allowed - knowing the full state of the 2-d grid modulo a permutation of the j-index (that is, knowing it only up to a possible swapping around of the j-index values) reveals the value of I(t), but not J(t), and vice versa. Thus they are independent.
In this example, an underlying system which consists of a function ON a 2-dimensional grid was reduced to a system of two variables, which together define a single point IN a 2-d space. This can be summarized as “on --> in”.
In cases where the underlying system has other variables, e.g. the k index in b(i,j,k,t), a mapping might not be symmetric with respect to the other variables. For example, we could have
X(t) = b(1,1,1,t)+ b(1,2,2,t) - b(2,1,3,t) - b(2,2,4,t)
Y(t) = b(1,1,1,t)+ b(2,1,3,t) - b(1,2,2,t) - b(2,2,4,t)
In this case, are X and Y independent, with X inheriting from i and Y from j? If two values of j (say j=1 and j=2) are swapped as indexes of b(), then the value of X could change: b(1,1,1,t) + b(1,2,2,t) is not b(1,2,1,t) + b(1,1,2,t). On the other hand, if we only had a limited knowledge of b() and that knowledge is symmetric with respect to such swaps, we wouldn't know that. This mapping is OK. It is perhaps better to think of it as if the terms in X(t) are labeled by different i-values and j-values (the values the mapping is supposed to inherit from) while ignoring their functional dependence on k (which is not being inherited by anything).
The concept of inheritance can be generalized to cases in which more than one underlying variable is responsible for the distinction being inherited. For example, here X inherits from i, but Y is to inherit from j and k:
X(t) = b(1,1,1,t)+ b(1,2,1,t) - b(2,1,1,t) - b(2,1,2,t)
Y(t) = b(1,1,1,t)+ b(2,1,1,t) - b(1,2,1,t) - b(2,1,2,t)
Inheritance can also occur with multiple levels of functionals. For example, suppose the underlying variables are a functional g of function f(i,j), e.g. if i and j are bits, here g=g(f(0,0),f(0,1),f(1,0),f(1,1)). An intermediate mapping could be made with substates that inherit from f and are functions on (i,j). This can then be mapped to a pair of substates X,Y that inherit from i and j respectively.
In quantum mechanics, the wavefunction is a function ON a high dimensional configuration space. In classical mechanics, the set of particle positions defines a point IN configuration space. It certainly seems that, in the classical limit, the computations that would performed by the related classical system (such as a system of classical switches) would in fact be performed by the actual quantum system. Inheritance of labels - from quantum directions in configuration space to classical position variables - permits that. Quantum field theory, which is even more of a realistic model, involves functional dependency of the wavefunctional on a function of space.
Union:
A substate can be mapped based on a region that includes a number of subregions each of which is associated with certain label values. If each such subregion meets a disinheritance criterion, those certain label values can be considered separately for determining disinheritance. This rule is particularly useful when dealing with Quantum Field Theory.
For example, suppose that several bits are to be mapped from psi(F1,F2,F3). In a straightforward inheritance scenario, perhaps one such bit, bit A, depends on how psi is distributed among values of F1, with F2 disinherited (perhaps summed over) and a particular value of F3 is chosen for the mapping. Another bit, bit B, might then disinherit F1 but similarly depend on the F2 distribution and use the same F3 value. Recall that disinheriting F1 means that swapping the values of psi evaluated at different F1 values with the other F's held fixed has no effect on B. Since A and B disinherit different variables, they are independent.
The Union rule means that we can choose a second F3 value and with it held fixed let A (besides for what it depended on before with the first F3 value still fixed for that) also depend on the distribution of psi on F2 (but not F1) with the new F3 value, while B also depends on the distribution of psi on F1 (but not F2) with the new F3 value, and A and B are still considered independent.
This sort of thing lets two substates in QFT depend on the same areas of space as long as the configuration of the environment distinguishes them, since particle identity is not available to do so. For example, given a row of evenly spaced switches, for certain combinations of states the row might be translated so as to replace one switch with the next; this should not ruin the computation.
Monday, October 3, 2011
The Putnam-Searle-Chalmers Theorem
Previous: Basic idea of an implementation
If a computation is implemented, there must be a mapping from states of the underlying system to formal states of the computation, and the states must have the correct behavior (transition rule) as they change over time.
For example, for an electronic digital computer, we can map a high-voltage state of a transistor lead to a "1" and a low voltage state to a "0". By doing the same for various other circuit elements, we can obtain an ordered string of 1's and 0's. This string will change over time as the voltages change. If, due to the laws of physics and the way the circuit is connected, this string must always change in accordance with the transition rules for computation C, then the system implements C.
We must allow as much flexibility as possible in the choice of mapping, because we are trying to understand the behavior of the system without any reference to things outside the system. Human convenience in recognizing the mapping is not a consideration.
The obvious way to try do this is simply to allow any mapping that is mathematically possible. This leads to what I will call the naive implementation criterion, because while it may sound good at first it is not a viable option for a satisfactory criterion. Chalmers' paper Does a Rock Implement Every Finite-State Automaton? explained this in detail.
The following is my version of what I'll call the "Putnam-Searle-Chalmers (PSC) Theorem" which shows that unrestricted mappings are not a viable option:
Suppose that a system consists of two parts, S and T, each of which has a numerical value, and that the dynamics of our system are as follows:
S(t+1) = S(t)
T(t+1) = T(t) + 1
These dynamics are fairly trivial; S is a dial that maintains a constant setting, while T is a clock.
We will check to see if this system implements a computation, C, which has the transition rule X(t+1) = F(X(t)) where t is a time index.
Here X need not be a single number; it might be a string of bits, for example.
F could be a complicated function, such as F(X) = the Xth prime number (expressed in base 2, where X is an integer expressed in base 2).
Now make a mapping M going from (S,T) to X with the following properties:
X = M(S,T)
M(S,T+1) = F(M(S,T))
We do need to make sure that any possible starting value of X is allowed by the mapping, which we can always do if our system has enough possible dial values.
Now, according to the mapping, X will change as a function of time based on the dynamics of the system as follows:
X(t+1) = M(S(t+1),T(t+1))
= M(S(t),T(t)+1)
= F(M(S(t),T(t)))
= F(X(t))
Therefore, the system would implement the computation if this mapping is allowed. But the system's dynamics are trivial while the computation can be a very complicated function. Obviously, this is not acceptable; the computation does not characterize the behavior of the system at all. This is what I call a "false implementation". All of the complicated dynamics of the computation have been put into the mapping. It is therefore necessary to put restrictions on what mappings are allowed.
One possibility is to require that each part of a string that defines a formal computational state (e.g. each bit in a bit string) takes its value based on a different part of the underlying system. That is basically what Chalmers proposed to overcome the problem.
While it somewhat counter-intuitively rules out distinctions based on the value of a single number (since a single number can still be mapped to any other single number), it goes a long way towards ruling out false implementations, while still allowing important standard examples of implementations that we want to retain, such as mapping switch positions to bit values (assuming classical physics).
However, it is not quite right. There are some systems for which it still allows false implementations, and there are other cases where it rules out what seem to be legitimate implementations - and these become clearly important when quantum mechanics is brought into the picture. In classical mechanics, different particles are different parts of the system; in quantum mechanics, different particles are different directions in the shared configuration space on which the wavefunction evolves.
Next: Restrictions on mappings 1: Independence and Inheritance
If a computation is implemented, there must be a mapping from states of the underlying system to formal states of the computation, and the states must have the correct behavior (transition rule) as they change over time.
For example, for an electronic digital computer, we can map a high-voltage state of a transistor lead to a "1" and a low voltage state to a "0". By doing the same for various other circuit elements, we can obtain an ordered string of 1's and 0's. This string will change over time as the voltages change. If, due to the laws of physics and the way the circuit is connected, this string must always change in accordance with the transition rules for computation C, then the system implements C.
We must allow as much flexibility as possible in the choice of mapping, because we are trying to understand the behavior of the system without any reference to things outside the system. Human convenience in recognizing the mapping is not a consideration.
The obvious way to try do this is simply to allow any mapping that is mathematically possible. This leads to what I will call the naive implementation criterion, because while it may sound good at first it is not a viable option for a satisfactory criterion. Chalmers' paper Does a Rock Implement Every Finite-State Automaton? explained this in detail.
The following is my version of what I'll call the "Putnam-Searle-Chalmers (PSC) Theorem" which shows that unrestricted mappings are not a viable option:
Suppose that a system consists of two parts, S and T, each of which has a numerical value, and that the dynamics of our system are as follows:
S(t+1) = S(t)
T(t+1) = T(t) + 1
These dynamics are fairly trivial; S is a dial that maintains a constant setting, while T is a clock.
We will check to see if this system implements a computation, C, which has the transition rule X(t+1) = F(X(t)) where t is a time index.
Here X need not be a single number; it might be a string of bits, for example.
F could be a complicated function, such as F(X) = the Xth prime number (expressed in base 2, where X is an integer expressed in base 2).
Now make a mapping M going from (S,T) to X with the following properties:
X = M(S,T)
M(S,T+1) = F(M(S,T))
We do need to make sure that any possible starting value of X is allowed by the mapping, which we can always do if our system has enough possible dial values.
Now, according to the mapping, X will change as a function of time based on the dynamics of the system as follows:
X(t+1) = M(S(t+1),T(t+1))
= M(S(t),T(t)+1)
= F(M(S(t),T(t)))
= F(X(t))
Therefore, the system would implement the computation if this mapping is allowed. But the system's dynamics are trivial while the computation can be a very complicated function. Obviously, this is not acceptable; the computation does not characterize the behavior of the system at all. This is what I call a "false implementation". All of the complicated dynamics of the computation have been put into the mapping. It is therefore necessary to put restrictions on what mappings are allowed.
One possibility is to require that each part of a string that defines a formal computational state (e.g. each bit in a bit string) takes its value based on a different part of the underlying system. That is basically what Chalmers proposed to overcome the problem.
While it somewhat counter-intuitively rules out distinctions based on the value of a single number (since a single number can still be mapped to any other single number), it goes a long way towards ruling out false implementations, while still allowing important standard examples of implementations that we want to retain, such as mapping switch positions to bit values (assuming classical physics).
However, it is not quite right. There are some systems for which it still allows false implementations, and there are other cases where it rules out what seem to be legitimate implementations - and these become clearly important when quantum mechanics is brought into the picture. In classical mechanics, different particles are different parts of the system; in quantum mechanics, different particles are different directions in the shared configuration space on which the wavefunction evolves.
Next: Restrictions on mappings 1: Independence and Inheritance
Saturday, October 1, 2011
Basic idea of an implementation
In conceptual terms, to say that a system implements a given computation is to say that something about that system - some aspect of its behavior - is described by the computation. It is thus a way of characterizing the system.
For example, if a system can take two numbers as inputs and produces their sum as an output, we could characterize it as an "adder". If instead it can take two numbers as inputs and produces their product as an output, we could characterize it as a "multiplier". "Adding" and "multiplying" are two different _types_ of computations. These behaviors are fairly general, so they can be can be further divided into specific computations if more details are specified, such as what format the inputs are in, internal functioning during intermediate steps, and so on.
Two "multipliers" with different intermediate steps would implement different computations. Thus, while the word "computation" suggests something that is done to get a result, that is misleading in the context of the computationalist view of consciousness, which instead focuses on internal functioning of the system in question.
If we find a third system and discover that it behaves like an "adder", then it has a lot in common with our other "adder": we now know a lot about how it can behave. But there's also a lot that this characterization does NOT tell us. It doesn't tell us what the system is made out of; or what its other behaviors might be; or what internal processes it uses to perform the additions. It also does not tell us whether it performs the addition multiple times, redundantly, perhaps performing the same addition again but sending that (same) result to some other person.
In addition to its capabilities, a system's behavior is further characterized by the exact sequence of computer states that are actually involved in its behavior. For example, an adder which added the numbers 45 and 66 and got 111 is characterized by those numbers, in addition to just being an adder. Such a sequence of numbers (or more precisely the sequence that includes all intermediate states of the computation) can be called the "activity" of the computation, and together with the computation it specifies what is known as a run of the computation.
It is important to note that neither the computation alone nor the activity alone are enough to adequately describe the behavior of the system; both are needed.
Often, I will speak only of computations, but it should be understood by the context that "runs of computations" are often meant as well. For example, if I say that a particular computation gives rise to a particular conscious observation, I mean that a particular run of that computation - corresponding to a particular starting state - gives rise to that observation. A different run of the same computation (such as the same type of brain except in a different starting state) might then give rise to a different kind of observation or to none at all.
In the case of programmable systems, the distinction between computations and runs becomes somewhat arbitrary, but this is not a problem as we can always specify what computations we are interested in and use that to make the choice.
A system could be both an adder and a multiplier; for example, the universe is both if at least one of each exists as a sub-system of it. Yet a hypothetical universe with 100 adders and 1 multiplier would be significantly different from one with 1 adder and 100 multipliers. A more detailed way of characterizing such systems would be to state how many instances of each kind of computation is performed by it: in other words, to give a measure distribution on computations. Such a measure distribution (or more exactly, a measure distribution on runs of computations) is the main tool needed to evaluate the computationalist version of the Many Worlds Interpretation of QM, and will be addressed in later posts.
While artificial computers tend to operate on command, there is nothing about the concept of implementing computations that requires that. For example, a robot that walks around and listens for numbers, then once it has heard two of them, adds them and says the result, then starts again, would still be an "adder". Such a robot would behave according to its internal agenda, ignoring the desires of the people around it even if they beg it to stop.
It's also important to note "outputs" need not be distinguished from internal parts of the system. Any parts of the system that produce the behavior in question are the relevant parts, regardless of how they interact with things outside the system.
For a computation with more than one step, it is useful to define "inputs" as substates that are affected by influences outside the scope of the computation in such a way that their values are not determined by the previous state of the computation. Influences outside the scope of the computation are not necessarily outside of the system (such as the universe) which performs that computation.
Those familiar with computer science might be surprised that the concept of the Turing Machine will play no role in using computations to characterize systems. A Turing Machine is a type of computer that is useful for specifying which functions could be calculated by in principle by digital computers (if unlimited memory - an infinite number of substates - is available, but the transition rule for each substate depends on a finite number of substates) and how easily (setting bounds for how memory needs and processing time scale with the size of a problem).
Because this class of computers is so ubiquitous in computer science, those computer scientists who dare venture into the swamps of philosophy far enough to read a paper or attend a lecture on the implementation problem sometimes completely lose interest in the problem when they realize that no one is mentioning Turing Machines. In fact, it would be quite easy to describe Turing Machines as a special case of the computers that I will describe.
But here we are concerned with characterizing the behavior of actual systems as they exist, not with finding what size of problems they could be programmed to handle. It is also important to note that digital computing is just a special case of the behaviors that could be characterized as computation; analog computing can certainly be considered as well, and most of the definitions I will make are general enough to cover all cases. However, I will focus on digital computation in most of the examples I look at.
The next step is to formalize the idea of implementation by giving a mathematical criterion for whether a computation is implemented by a given mathematically described system. This will be done by requiring a mapping from states of the underlying system to formal states of the computation, and requiring the correct behavior of the states as they change over time.
However, as will be seen in the next post, this approach quickly runs into a problem: without restrictions on allowed mappings, 'almost any' physical system would seem to implement 'almost any' computation. This absurd result would imply that a rock implements the same computations as a brain. The likely solution is to impose restrictions on the allowed mappings, but finding a fully satisfactory set of restrictions has proven to be a difficult task. My proposals for this will be presented in subsequent posts; I think that I have been successful in finding (at least close to) the right set of restrictions.
For example, if a system can take two numbers as inputs and produces their sum as an output, we could characterize it as an "adder". If instead it can take two numbers as inputs and produces their product as an output, we could characterize it as a "multiplier". "Adding" and "multiplying" are two different _types_ of computations. These behaviors are fairly general, so they can be can be further divided into specific computations if more details are specified, such as what format the inputs are in, internal functioning during intermediate steps, and so on.
Two "multipliers" with different intermediate steps would implement different computations. Thus, while the word "computation" suggests something that is done to get a result, that is misleading in the context of the computationalist view of consciousness, which instead focuses on internal functioning of the system in question.
If we find a third system and discover that it behaves like an "adder", then it has a lot in common with our other "adder": we now know a lot about how it can behave. But there's also a lot that this characterization does NOT tell us. It doesn't tell us what the system is made out of; or what its other behaviors might be; or what internal processes it uses to perform the additions. It also does not tell us whether it performs the addition multiple times, redundantly, perhaps performing the same addition again but sending that (same) result to some other person.
In addition to its capabilities, a system's behavior is further characterized by the exact sequence of computer states that are actually involved in its behavior. For example, an adder which added the numbers 45 and 66 and got 111 is characterized by those numbers, in addition to just being an adder. Such a sequence of numbers (or more precisely the sequence that includes all intermediate states of the computation) can be called the "activity" of the computation, and together with the computation it specifies what is known as a run of the computation.
It is important to note that neither the computation alone nor the activity alone are enough to adequately describe the behavior of the system; both are needed.
Often, I will speak only of computations, but it should be understood by the context that "runs of computations" are often meant as well. For example, if I say that a particular computation gives rise to a particular conscious observation, I mean that a particular run of that computation - corresponding to a particular starting state - gives rise to that observation. A different run of the same computation (such as the same type of brain except in a different starting state) might then give rise to a different kind of observation or to none at all.
In the case of programmable systems, the distinction between computations and runs becomes somewhat arbitrary, but this is not a problem as we can always specify what computations we are interested in and use that to make the choice.
A system could be both an adder and a multiplier; for example, the universe is both if at least one of each exists as a sub-system of it. Yet a hypothetical universe with 100 adders and 1 multiplier would be significantly different from one with 1 adder and 100 multipliers. A more detailed way of characterizing such systems would be to state how many instances of each kind of computation is performed by it: in other words, to give a measure distribution on computations. Such a measure distribution (or more exactly, a measure distribution on runs of computations) is the main tool needed to evaluate the computationalist version of the Many Worlds Interpretation of QM, and will be addressed in later posts.
While artificial computers tend to operate on command, there is nothing about the concept of implementing computations that requires that. For example, a robot that walks around and listens for numbers, then once it has heard two of them, adds them and says the result, then starts again, would still be an "adder". Such a robot would behave according to its internal agenda, ignoring the desires of the people around it even if they beg it to stop.
It's also important to note "outputs" need not be distinguished from internal parts of the system. Any parts of the system that produce the behavior in question are the relevant parts, regardless of how they interact with things outside the system.
For a computation with more than one step, it is useful to define "inputs" as substates that are affected by influences outside the scope of the computation in such a way that their values are not determined by the previous state of the computation. Influences outside the scope of the computation are not necessarily outside of the system (such as the universe) which performs that computation.
Those familiar with computer science might be surprised that the concept of the Turing Machine will play no role in using computations to characterize systems. A Turing Machine is a type of computer that is useful for specifying which functions could be calculated by in principle by digital computers (if unlimited memory - an infinite number of substates - is available, but the transition rule for each substate depends on a finite number of substates) and how easily (setting bounds for how memory needs and processing time scale with the size of a problem).
Because this class of computers is so ubiquitous in computer science, those computer scientists who dare venture into the swamps of philosophy far enough to read a paper or attend a lecture on the implementation problem sometimes completely lose interest in the problem when they realize that no one is mentioning Turing Machines. In fact, it would be quite easy to describe Turing Machines as a special case of the computers that I will describe.
But here we are concerned with characterizing the behavior of actual systems as they exist, not with finding what size of problems they could be programmed to handle. It is also important to note that digital computing is just a special case of the behaviors that could be characterized as computation; analog computing can certainly be considered as well, and most of the definitions I will make are general enough to cover all cases. However, I will focus on digital computation in most of the examples I look at.
The next step is to formalize the idea of implementation by giving a mathematical criterion for whether a computation is implemented by a given mathematically described system. This will be done by requiring a mapping from states of the underlying system to formal states of the computation, and requiring the correct behavior of the states as they change over time.
However, as will be seen in the next post, this approach quickly runs into a problem: without restrictions on allowed mappings, 'almost any' physical system would seem to implement 'almost any' computation. This absurd result would imply that a rock implements the same computations as a brain. The likely solution is to impose restrictions on the allowed mappings, but finding a fully satisfactory set of restrictions has proven to be a difficult task. My proposals for this will be presented in subsequent posts; I think that I have been successful in finding (at least close to) the right set of restrictions.
Subscribe to:
Posts (Atom)
Featured Post
Why MWI?
Before getting into the details of the problems facing the Many-Worlds Interpretation (MWI), it's a good idea to explain why I believe t...