--> Today I'll come back to an old topic -- my old chum Eliezer Yudkowsky's intriguing yet ill-founded notion of "Coherent ExtrapolatedVolition" (CEV).
The core idea of CEV is, as Eli put it,
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
This is a beautiful concept, but I think it fundamentally doesn't make sense and will never be directly useful (perhaps this is part of its beauty!).
Obviously you should not judge the CEV concept by the above "poetic" gloss though -- if you're curious read the whole paper linked above. It's interesting.
In the past I have suggested a few variations like averaging together what everyone on the planet wants, or making a conceptual blend of what everyone on the planet wants. However, these variations do lose a key aspect of the original CEV idea: that it's not peoples' current desires that we're accounting for, but rather the desires of other "better" beings that have been hypothetically created based on current people.
Here I will present a new variant, CEVa (Coherent Extrapolated Valuation), which I believe captures more of the spirit of the original.
The main reason I think the original CEV idea is incoherent is that "what person X wants to be" is not a coherent notion. Quite often, when a person becomes what they (thought they) wanted to be, they realize they didn't want that at all. To talk about "what a person wants to be, deep deep down" as distinct from what they consciously THINK they want to be -- this just wanders into the realm of the unacceptably nebulous, even though I do sorta grok what it means on an intuitive basis.
What I want to do here is try to rescue the original CEV idea by replacing the "what a person wants to be" part with something a bit more concrete (though still not anywhere close to feasible to implement at the present time).
Eliezer has more recently talked less about CEV and more about "Rawlsian reflective equilibrium" as a conceptually related idea that's easier to formulate, or even as a near-equivalent of CEV. See this recent review of CEV and related ideas by Nick Tarleton. But I think the Rawlsian approach lacks the bite of the original CEV, somehow. I'm more inspired to keep pushing on the original CEV to see if it can be made in some sense workable.
Continuity of Self
In a previous paper published in the Journal of Machine Consciousness, I addressed the question of: When does a descendant of a certain mind, count as a continuation of that mind? For instance I am a continuation of my 2 year old self, even though we are very, very different. What if tomorrow I got a brain implant and became 5% machine ... then a year later I became 10% machine ... then in a few decades I was essentially all machine. Suppose that as I got more and more machine in my brain, I became more and more cognitively different. Would I still be "myself" by 2050? In a sense yes, in a sense no.
What I introduced there was a notion of "continuity of self" -- i.e. when a mind M changes its into another different mind M", there is the question of whether M' feels it is (and models itself as) the same entity as M. What I suggest is that, if one has a long chain of minds so that each element in the chain has continuity of self with the previous entity, then a later entity on the chain should be considered, in a sense, a later version of every earlier entity on the chain.
So if I upgraded my brain with machine parts on a gradual schedule as I suggested above, probably there would be continuity of self all along, and at each stage I would feel like I was continuously growing and evolving (just as I've done over my life so far), even though eventually the changes would accumulate and become tremendous. But if I upgraded 50% of my brain at once, the change might be so sudden and discontinuous that after the upgrade, I really did not feel like myself anymore.
Coherent Extrapolated Valuation: individualized version
Most probably you've seen where I'm going already.
Suppose we consider, for each person in a society at a certain point in time, the set of forward-going paths beginning from that person -- but possessing continuity of self at each step along the way.
Now let's add one more ingredient: Let's ask at each step of the way, whether the change is recognized as desirable. There are two aspects here: desirable in hindsight and desirable in foresight. When mind M changes into mind M', we can ask: if M could see M', would it think the change was for the better ... and we can ask: does M', looking backward, think the change is for the better? How to weight these two aspects of desirability is basically a "parameter choice" in CEVa.
If we can weight each step on a path of mind-evolution as to desirability, then we can also weight a whole path as to desirability, via averaging the desirabilities of the various steps. This requires an assumption of some time-discounting factor: nearer-term changes have got to be weighted higher than further-term changes, according to some time series with a finite sum. This set of temporal weights is another parameter choice in CEVa.
Given a person at a particular time, then, we can look at the self-continuing forward-going paths started at that person, and we can weight each of these paths via its desirability.
This gives the first version of CEVa: We can associate with a person, not just their value judgments at the present time, but the value judgments of all the minds existing along self-continuing forward-going mind-evolution paths from their present mind. We can then weight these different minds, and make an overall weighted average of "the judgment of the current person M and all the minds M' they might eventually become, where the latter are weighted by the desirability along the path from M to M' ".
There are a lot of free parameters here and I certainly don't know how to compute this in practice. However, it seems like a reasonably fair interpretation of Eliezer's original notion of "the person that a certain person wishes they were."
Coherent Extrapolated Valuation: collective version
There is still a gaping flaw in the CEVa version I've just outlined, though: it's too individual-centric. It doesn't really make sense to think about the evolution of human minds as individuals, given the degree of collective experience and collective intelligence in modern humanity.
Instead it probably makes more sense to look at potential futures of a whole SOCIETY of minds. One can then ask, for a society S and then a slightly changed society S': how desirable is the change, from the point of view of S, and also from the point of view of S'?
One can calculate desirability based on individual minds within the society -- but also based on "group intelligences" existing within the society, such as families, corporations or even the whole society considered as a sort of "global brain."
Weighting the desirabilities of individuals versus those of larger groups involves some subtlety in terms of "subtracting off for overlap." Also, identifying what is a coherent enough entity to count in the average may become subtle, especially if we see the emergence of "mindplexes" in which multiple minds fuse together in various partial ways to form mixed individual/collective intelligences. But these complexities are not really bugs in CEVa -- they're just complexities of the actual situation being analyzed.
This "collective" CEVa -- CEVav2 -- is my current suggestion regarding how to transform the original CEV idea into something related that is at least conceptually sound.
Now, one possibility is that when one does CEVa (version 1 or 2) one does not find anything coherent. One may find that some individuals or groups and their self-continuing descendants have values X, and others have values Y, and X and Y are very different. In that case, if one has need to come up with a single coherent value system, one can try to do a conceptual blend and come up with something new and coherent that incorporates key aspects of X and Y and also has other desirable merits like simplicity or various aesthetic qualities.
Ethics is Solved! Woo hoo!!
Ethics now becomes simple! To figure out if you should run in front of that train to save that baby, at risk of your own life -- you merely simulate all possible future evolutions of human society (including those involving transcendence to various transhuman entities), calculate a certain weighting function for each one, and then figure out what each mind at each level of organization in each possible future evolution of society would want you to do regarding the baby. Simple as pie! Ah, and you'd better do the calculation quickly or the baby will get squashed while you're programming your simulator... and then no pie for you ...
Oh yeah -- and there are some further subtleties I swept under the transhuman rug in the above. For instance, what if a trajectory of self-modification results in something without a self, or something that makes no judgments about some situations but does about others. Does one assume continuity-of-self or not, when dealing with selfless hypothetical future entities and their hypothetical future evolutions? How, quantitatively, does one incorporate "number of judgments" (weight of evidence) into a composite value assessment? But I am reasonably comfortable assuming that a superhuman AGI capable of doing the CEVa calculations, will also be capable of handling these matters and the various other loose ends.
No But Really -- So What?
To my own taste, at least, CEVa is a lot clearer conceptually than the original CEV, and meatier than Rawlsian reflective equilibrium and related notions. Perhaps it's less beautiful, in some correlated way, but so it goes....
On the other hand, CEVa does share with the original CEV the trait of not being remotely useful in practice at the present time. We simply have no way to compute this sort of thing.
Furthermore, there are so many free parameters in the definition of CEVa that it seems likely one could tweak it in many different ways to get many different answers to the same question. This is not a bug in CEVa, either -- it would be the case in any reasonably concrete idea in the vicinity of CEV....
If there is any value to this sort of thought-exercise -- aside from its inarguable value as weird-brow entertainment for a small crew of futurist geeks -- it is probably as a way of clarifying conceptually what we actually mean by "desirable" or "valuable" in a future-looking sense. I, for one, genuinely DO want to make choices that my future self-continuing descendants would think are good, not just choices that my current incarnation thinks are good based on its own immediate knowledge and reactions. I don't want to make choices that my current self HATES just because my future evolutions have a very different set of values than my current self -- but very often I'm faced with hard choices between different options that seem confusingly, roughly equally valuable to me... and I would really LOVE to get input from the superminds I will one day give rise to. I have no good way to get such input, alas (despite what Terrence McKenna said sometimes, mushrooms are a pretty noisy channel...), but still, the fact that I like this idea, says something about how I am thinking about value systems and mind evolution.
I doubt very much we are going to "hard-code" complex ethical systems into future AGIs. Ethics is just not that simple. Rather, we will code in some general principles and processes, and AGI systems will learn ethics via experience and instruction and self-reflection, as intelligent minds in the world must.
HOWEVER -- at very least -- when we guide AGI systems to create their own value systems, we can point them to CEV and CEVa and Rawlsian coherence and the whole mess of other approaches to human ethics ... and who knows, maybe this may help them understand what the heck we mean by "what we want deep down."
Or on the other hand, such notions may end up being no use to the first superhuman AGIs at all -- they may be able to form their own ideas about what humans want deep down via their own examination of the nitty-gritty of human life. They may find our hairy human abstractions less informative than specific data about human behaviors, from which they can then abstract in their own ways.
But hey, providing them with multiple forms of guidance seems more likely to help than to hurt.... And at very least, this stuff is fun to think about! (And if you read the first link above, you will know that Mr. Yudkowsky has warned us against the dangers of things that are fun to think about ... but please rest assured I spent most of my time thinking about more useful but more tedious aspects of AGI ;-p )