This (rather technical) blog post observes that, via using tensorial
linearization of Boolean functions, one can make a novel formulation of the
maximum entropy and maximum entropy production principles.

The relationship of these novel formulations to the
traditional formulations has not been thoroughly explored yet.

For general background on maximum entropy production, see
XX. Maximum entropy is better known and
understood; see XX for the basics.

**Basic Setup**

Suppose we have a world with N specific observations in
it. Suppose each observation occurs at a
certain point in time (where time-points are defined relative to a particular
observer O …)

Now suppose that observer O1 has a coarse-grained view and often
cannot distinguish two different observations from each other. In this case O1 will view observations as
coming with “count” values: n1 observations of type t1, n2 observations of type
t2, etc. (O1 may also be a subset of
O’s mind…). So what O1 sees will be a
certain probability distribution over types t1, t2,….

Next, consider a distribution over “possible worlds”…. Note this can be done two ways:

·
Possible worlds as seen by O (who sees the
individual observations)

·
Possible worlds as seen by O1 (who sees only
distributions of counts over types)

For starters, consider the assumption that each observation
is equally likely. Let’s look at the
scope of possible worlds as perceived by O1, in this case. We can say that the distribution where O1
makes N observations and they all go into type t1, occurs only in 1 way. But the distribution where O1 makes N
observations and they go into m different categories, with N/m in each
category, occurs in many more ways. So
if we are asking which distributions occur most often, from O1’s point of view,
then we come up with the conclusion that the equiprobable distribution will
occur most often.

Looking at the world according to O1 from O’s view, that is
-- the comparison comes out similarly.
If we assume that each observation has an equal, independent chance of
being assigned each type… then the number of possible worlds in which O1 puts
all the observations in bin t1 is fewer, and the number in which O1 distributes
the observables evenly among the bins is larger.

The above argument yields the maximum entropy principle,
according to the standard Boltzmann argument.

**Fun with Tensorial Linearization**

The above is all well-known
stuff, just phrased a little differently.

Now let's make things a bit more
interesting.

Suppose one has a constraint more complex than equiprobable,
independent observations. One can still
ask what distributions of counts over types are more likely. If the constraints are linear then the
answer still comes out as an entropy.

What if the constraints are nonlinear? In general the maxent principle only applies
with linear constraints. However, using
tensorial linearization, any Boolean function can be written as linear, on a
very high dimensional space in which every conjunction of variables corresponds
to a dimension. So if one has a set of
Boolean constraints on the observations seen by O (or by O1, as a special case)
, then one can reformulate these as linear constraints on a higher dimensional
space. One can then argue that the most
likely distribution O1 should assume is the maximum entropy distribution over
this higher dimensional space whose axes are conjunctions of observations.

**A New Look at Maximum Entropy of Dynamics**

Now what if we want to apply
this to dynamics?

In this case one has a system S at a certain point in time,
T. The observations in question are observations
of S at the slight future of T (time T plus epsilon, say). The constraints involved are basically the
probabilities of each slight-future observation from the point of view of O, or
O1. Some slight-future observations are
more likely than others based on the dynamics of S.

Now the dynamics of S are going to make the dependencies
between observations fairly complex.
However, if we can express this complexity as a set of probabilities
attached to Boolean combinations of observation-typed recognized by O1, then we
can do tensorial linearization and obtain a set of linear constraints on a
higher dimensional space. We then get
the result that the system S evolves according to the maximum entropy
distribution, from O1’s point of view (where the entropy is measured in regard
to the higher-dimensional space of conjunctions). I.e., roughly speaking, the various
conjunctions of basic observations are going to be as equally-likely as they
can be, while still obeying consistency with the given logical constraints.

Or, suppose we apply this argument to paths, i.e. sequences
of states of S occurring over time? If
we coarse-grain paths (as we are doing from O1’s perspective) then paths will
overlap, but we can again use tensorial linearization to account for
overlaps. We can then say that the
evolution of S will follow a maximum entropy distribution over the space of
conjunctions of paths, from O1’s point of view.

The question is then in what sense this actually give a law
of “maximum entropy production” in a physics sense. Fairly clearly, if the various paths are
statistically considerable as independent, then it works. But if the paths are subtly and significantly
interdependent, then the logical space on which maxent holds may be different
than the physical space in which thermodynamic entropy is measured.

## 1 comment:

Ben, this Boltzmannesque post is a little beyond your usual mundanity and urbanity. There is no way that my latest artificial Mind at http://ai.neocities.org/forthagi.txt can understand your mathematical discourse. I've been waiting for you to post something so that I can visit here and let you know that I have been working furiously to port my http://github.com/PriorArt/AGI/blob/master/ghost.pl Perl AI back into Forth so that the http://ai.neocities.org AGI-let can think continuously and perhaps even consciously. However, your post of today is actually very Goertzelian. Bye - Arthur T.

Post a Comment