Friday, February 19, 2016

Counterfactual Computations or Generalized Causal Networks?

When determining whether a physical system implements a particular computation, if only the actual sequence of states is considered, rather than the whole set of counterfactual states that the computation could have been in, it would not be possible to distinguish between a valid implementation by a system that implements it as opposed to a false implementation by a system that is much too simple to implement the computation of interest.

For example, consider a set of clocks, c_1 through c_N. The initial state of the clocks can be mapped to any desired initial bitstring, and at each time step thereafter, the state of each clock can be mapped to what the state of that bit would be according to any desired computation involving bitstrings which change in time according to a given set of rules. There is no problem here with the first criterion for independence: Each bit depends on a different physical variable. Even the second criterion for basic independence could be also satisfied by using a new set of clocks or dials at each time step, instead of the same clocks.

Requiring the correct counterfactual transitions rules out such false implementations, since a hypothetical change in the state of one clock at the initial time step would no longer result in states at the next time step that are correct according to the transition rules for the computation, since each clock is unaffected by the others. This was Chalmers' first move in his formulation of the CSA to rule out the false implementations discussed by Searle and Putnam, and in many discussions of that work it is the only thing considered worth mentioning.

Counterfactual relationships are thus considered key to ruling out false implementations, but relying on them introduces a new problem for computationalists: It seems implausible that parts of a system which are never called upon to actually do anything would have any effect on consciousness, yet such parts can determine the counterfactual relationships between system components.

For example, consider a NAND (Not-AND) gate. Sufficiently many NAND gates can be connected to each other in such a way as to perform any desired bitwise computation (neglecting for now any further structure in the computation), and historically ordinary digital computers were sometimes constructed solely out of NAND gates as they tended to be cheap.

The NAND gate takes input bits A,B and outputs a bit C. If A=0 or B=0, then C=1, and otherwise (meaning A=1 and B=1) C=0.

Such a NAND gate can be implemented as follows: Daphne, Fred, Shaggy, and Velma are all recruited to help. Each is assigned a two bit code: 00 for Daphne, 01 for Fred, 10 for Shaggy, and 11 for Velma. The values of A and B are announced in order while the recruits get a snack. Meanwhile, a coin is placed face down on a table signifying C=0.

During the meal, a monster shows up and chases the recruits, but they manage to escape. Shaggy is the last to get away, though he is separated from the others. He smokes weed to calm himself down.

After the meal, the person whose code matches the numbers read has the job of making sure the coin is correctly positioned for the next time step. If C is supposed to equal 0 then it should be face down, and if C is supposed to be 1 it should be face up.

The actual numbers were 1 and 1, so Velma returns to do her job and verifies that the coin is face down. Seeing that the coin is already face down she leaves it alone, just as she does with the other coins on other tables in the room. Was this a valid implementation of the NAND gate? That depends on what would have happened in the counterfactual situations of other values for A and B. For example if Shaggy would have forgotten to do his job, it would not have been a valid implementation. Yet (though very slowly), we could use such NAND gates to run any desired artificial intelligence program.

In pseudo-code, this NAND gate works as follows, where for example C(0) means the coin state at time 0:

LET C(0) = 0
SELECT CASE A,B
CASE 0,0: Daphne sets C(1)=1
CASE 0,1: Fred sets C(1)=1
CASE 1,0: Shaggy sets C(1)=1
CASE 1,1: Velma verifies C(0)=0 (which implies C(1)=0) or sets C(1)=0 otherwise
END SELECT

Since Velma doesn't change anything, her role can be eliminated while preserving the proper transition rule, and it would still be a NAND gate. But that is also troubling: In the actual situation, no coin was directly influenced by things resulting from the values of A,B. It is merely that the other recruits refrain from influencing the coin in question, just as they do the other coins. It may be that the other coins are being used by other groups of recruits for their own NAND gates. Some of the other coins may be initialized to face up instead, and by coincidence, perhaps none of the coins will ever need to be flipped.

Even more troubling is the following: Suppose that Shaggy would have remembered his job, but in order to get back to the coin room in time, he would have needed to climb a rope hidden in a dusty room. If the rope were strong enough, he would have made it, but actually the rope is too weak. No one realizes that, and he never actually entered the dusty room or even saw the rope. Yet the state of that rope determines whether or not this was a valid NAND gate implementation, and by extension, such things could apparently determine whether our AI is conscious or not.

It would seem that the rope should not matter in the actual situation, but as we have seen such counterfactual situations are key to ruling out false implementations and can't be ignored. Note that the mere fact that the implementation is baroque isn't the problem; our own cells do operate like miniature Rube Goldberg machines. The problem is that the rope never actually played a role in what happened, yet its strength determined the success of the implementation. Similarly, perhaps Daphne would have found a dusty old Turing machine and ran a program on it, and only would do her job after the program halted. In that case, the halting problem for a program that was never run determines the validity of the NAND gate implementation!

Maudlin (Computation and consciousness) gave a similar example as an attack on computationalism: He described a water-based Turing Machine which follows a predetermined sequence for one set of initial conditions, but calls upon different machinery if the state is different from that which is expected. He points out that the link to the other machinery may be broken, and if so, the computation is not implemented. However, it seems implausible that the other machinery could matter in the case in which it's not used.

In a similar vein, Muhlestein (Counterfactuals, Computation, and Consciousness) discussed a light-based cellular automaton which can have its counterfactual sensitivity over-ridden by a projection of the correct pattern and sequence of lights onto its elements, and concluded that computationalism must be false since it's implausible that the projection makes a difference as the pattern of which elements are lit remains the same as does the operation of each component device.

Such things do seem implausible, but I must also note that there is no logical contradiction in it, and a case could be made that seemingly inactive components are doing more than one might think: they propagate forward in time, have well defined transition rules, and refrain from changing the value of the states of interest in cases where they should not. It is possible to retain the role of counterfactual situations as I have described for determining what computations are implemented, and that is the standard approach among computationalists.

Nevertheless, the above implications of computationalism are bizarre and perhaps too absurd to accept, and if an approach can be formulated that avoids them, it would be more plausible. Computationalist philosophy of mind is by no means firmly established enough to dismiss such concerns; on the contrary, it is hard to see how anything can give rise to consciousness, whether computational or otherwise.

In the above example, there are two problems:

1) With the actual values of A=1,B=1, the computation could be implemented in a way that didn't seem to require that these variables are the _cause_ of the output being what it was (since Velma could be removed without any change in the output), and so the system seems to have the wrong causal structure.

2) For other values of A,B, it seems like the exact complicated events that _would have_ transpired in those cases should be irrelevant to whatever consciousness the system would give rise to.

To address these problems, I want to first require that in the actual situation A=1,B=1 are the _cause_ of C(1)=0 in a sense to be defined. If that is so, then a network of such causal relationships among variables may be enough for consciousness, without requiring the correct counterfactual behavior of the whole system for other values of A,B.

But that seems problematic, because causation is defined in terms of counterfactual behaviors! A cause is typically defined as a "but-for" cause: The value of A causes the value of C(1) if there is a change in the value of A that _would have_ resulted in a change in the value of C(1). Therefore, in order to establish causation, we still still need to know what would have occurred in the counterfactual situations.

Intuitively, it seems like there should be a way to establish causation without knowing what would have occurred in the counterfactual cases. But what if the output's value is totally unrelated to A,B? That possibility needs to be ruled out, or else the system could be rather trivial and certainly not the sort of system that should give rise to consciousness.

Consider each CASE of A,B values as a different channel of potential influence on C(1). The counterfactual channels need to be selectively blocked if we are to establish that the actual channel was a cause of the value of the output, without having to look at the full counterfactual behavior of the system. If they are blocked, and changing A,B results in no change in the output, then there is no causation between A,B and the output; if the output would have changed, then there is such causation.

The channels could be selectively blocked if the mapping were augmented in such a way that the pseudo-code could be described as follows:

LET C(0) = 0
SELECT CASE A,B
CASE 0,0: IF D THEN Daphne sets C(1)=1
CASE 0,1: IF F THEN Fred sets C(1)=1
CASE 1,0: IF S THEN Shaggy sets C(1)=1
CASE 1,1: IF V THEN Velma verifies C(0)=0 (which implies C(1)=0) or sets C(1)=0 otherwise
END SELECT

If D,F,S were (counterfactually) set to FALSE, then the counterfactual channels would be blocked.

Now if V is TRUE, is there causation between A,B and C(1)? Since the coin would have been face down anyway, it would appear not.

But there is a difference between this coin and all of the other coins: If the coin had been face up, it would be changed by Velma to face down. This difference can be exploited by allowing consideration of the counterfactual case in which C(0)=1. The system implements something along the lines of: IF [(V=TRUE AND A=1 AND B=1) OR C(0)=0] THEN C(1)=0. In situations of this type, I will say that A,B are "generalized causes" of C(1)=0.

Note that if V is FALSE in the actual situation, then even if D,F,S are TRUE so that the NAND gate is implemented, the generalized causal link between A,B and C(1)=0 is missing, so this system would NOT play the same role as part of a Generalized Causal Network (GCN).

Also, if C(0) not equaling 0 is not possible, then the generalized causal link between A,B and C(1)=0 is likewise missing.

What if the underlying physical system is too simple to allow this kind of augmented mapping with D,F,S variables? If so, then I don't think the problem arises in the first place: A,B are causes of C=0 if the NAND gate is implemented by a system which is too simple to involve complicated chains of counterfactual events.

Another interesting example of a GCN is a "computer with a straight-jacket". This works as follows: With the actual initial conditions, the computer is allowed to run normally. However, it is being watched. If the state of the computer at any given time is different from the expected sequence, it will be altered to match the state of the expected sequence; otherwise the watcher won't touch it. Could this system implement a conscious AI, if the computer could do so had it not been watched? Since the computer is not actually touched, it would seem that it could, but it does not have the right counterfactual behavior for the computation due to the effect of the potentially interfering watcher. It does however have the right GC network, because removing or blocking the watcher is analogous to setting D,F,S to FALSE in the above example. It should be noted though that there are other ways to deal with the "computer with a straight-jacket", such as noting that at each time step it implements a computation 'closely related' to the original one which can be enough for the conscious AI; the watcher is treated as input in this case.

In the case of Muhlestein's cellular automaton, the projection of a spot of light onto a cell (call it P=TRUE) is analogous to the command LET C(1)=0 being placed after the END SELECT. This ruins the NAND gate as well as the causal link from A=1,B=1 to C(1)=0. However it is much like a combination of a "straight-jacket" and the initialization C(0)=0.

The underlying system appears to implement something analogous to: IF (A=1 AND B=1) OR P=TRUE then C(1)=0. In this generalized mapping, there is sensitivity to A,B, so I'd say they count as generalized causes in this case; in other words, the cellular automaton with the projected lights would still be conscious if the original one would have been.

Klein gave an account of Dispositional Implementation, based on 'episodic' implementation of computations, to solve the problem of the implausibility of relying on inert machinery, which he dubbed the Superfluous Structure Problem (SSP). My solution in terms of GCNs is essentially the same as his solution in terms of dispositions, except that it is better-defined. Bartlett criticized Klein's solution on the basis that it conflicts with the 'activity thesis' (which Bartlett found plausible) that only physical 'activity' matters; as a result Bartlett thought that Klein's solution was really just computationalism. Klein's idea does conflict with the 'activity thesis' since it also brings in dispositions which ultimately rely on physical laws instead of just physical structure. The 'activity thesis' ought to be discarded and to me it was never plausible in the least. I read Klein as one who actually rejects the standard 'activity thesis' for the right reasons, yet uses variant language in which he relies on his own modified 'activity thesis'. Perhaps if Klein had written instead of GCNs, Bartlett would have better understood the idea as being distinct from both the 'activity thesis' and standard computationalism.

Is it a form of computationalism? It is not a form of standard computationalism because a system in which standard NAND gates are all replaced by "gates" which produce the same output on all inputs could still implement the same GCN as the original system. In the above example, that would involve Daphne, Fred, and Shaggy all deciding in advance that they would place the coin face down instead of face up like they were supposed to, while in the actual situation they are never called upon to attend to the coin since Velma's numbers were called instead.

However, if we consider the entire spectrum of computations implemented by the whole system - in other words, we consider not just the original mapping but also the generalized one - then we have enough information to know what GCNs are implemented or not implemented. GCNs are simply a way of characterizing the structure and function of a dynamical system, just like computations are. In that sense, I would say that it is a generalized computationalism, which evades the SSP while being philosophically the same as computationalism in all other ways that matter. I will not make a distinction between 'generalized computationalism' and computationalism unless the technical difference is relevant in a particular case.

So far I have discussed discrete computations here. What about analog continuous computations, such as those implemented by systems described by coupled differential equations such as those of fluid mechanics? In that case the generalization is as follows: Instead of looking at differential equations that tell us what the system would do with any initial condition, we need only concern ourselves with the differential equations that hold for the actual situation; if those are the same for two systems, and the initial conditions are the same, then they implement the same generalized computation even if they would have behaved differently from each other given a different set of initial conditions.

7 comments:

  1. Dear Jack, you have been gone for a while now. Are you still developing the constitutionalism approach?

    ReplyDelete
  2. Computationalism, I assume you mean. I would like to continue work on it, and plan to do so, but it has not been a priority in my life. Anyone willing and able to pick up the torch is welcome to do so.

    ReplyDelete
    Replies
    1. What other approaches to deriving the born rule from the MWI look promising? Even ones that add new physics

      Delete
    2. And what about the many minds interpretation of QM? Can the born rule and probability be defined from that?
      Mike

      Delete
  3. On the first question this page still is relevant:
    http://onqm.blogspot.com/2009/10/mwi-proposals-that-include.html

    You could say that computationalism is a version of Many-minds. Effective probabilities can be defined, and might be consistent with the Born Rule, but the Born Rule has not been derived.

    ReplyDelete
    Replies
    1. Are there other approaches that are not included in the link you gave?
      Mike

      Delete
  4. In terms of other approaches to derive the Born Rule from the MWI that look promising: None that I know of.

    ReplyDelete

Featured Post

Why MWI?

Before getting into the details of the problems facing the Many-Worlds Interpretation (MWI), it's a good idea to explain why I believe t...

Followers