Observers and entropy
Entropy is a notoriously difficult concept to understand. There are several great videos that exist to explain it from a purely physical perspective. There is also plenty of content that explains it from an information perspective. The resources are a bit light on the correspondence between the two, however there’s this entry in Wikipedia which does provide some clarification.
One way of looking at it that can perhaps provide some insight is as a reflection of what we know about a system. If we look at the thermodynamic definitions of entropy, there’s generally a term $$ E = A + B + C + T * ds $$ which represents the change in energy of a system accounted for by an increase in entropy. Where A, B, or C could represent a term like $$ p \middot V $$, $$ m \middot g \middot h $$, $$ \frac{1}{2}k v^2 $$, etc.
There is also the bottom-up version which is of the form $$ S = k \sum {}_{i} p \middot ln(p_i )) $$. In other words, the weighted sum of the logs of probabilities of various states in the system.
What I would argue is that on some level these define a grab-bag of energy in a system that is not accounted for in any other terms. In other words, it’s a reflection of the limits of what we’ve accounted for in our equation.
We can either look at a container full of gas as a system of particles, such as the model which Boltzmann uses for his definition of entropy, or we can look at it as a thermodynamic system. That is, as a material with a certain specific heat. Depending on how we look at it, we may arrive at different calculations for expected entropy. Boltzmann’s model acknowledges that without correcting for the interactions among molecules–which are accounted for as van der Waals forces in the ideal gas model–it becomes very inaccurate very quickly in the real world. [1]
Thus entropy isn’t independent of the model which we’ve chosen. Similarly we can view a chemical reaction which increases entropy in light of its chemical entropy. We can calculate the entropy of the system as if it is an open system with the entire environment outside the reaction as a thermal reservoir, or we can include the air surrounding the experiment, in which case we will need to account for the increased or decreased pressure due to temperature. In certain situations this could give us a more precise measurement of the rate of reaction. It could even result in different chemical products if the conditions are right. The result of taking both systems into account will give us a different entropy calculation.
By increasing the number of factors we’re taking into account in our system, we’re increasing the number of macrostates. In the case of the chemistry experiment, this increase of macrostates comes with an increase in microstates as well. However in the example of the Boltzmann box, it’s the same number of microstates, but we’re increasing the number of macrostates that are accounting for those microstates.
One quantity related to entropy is the number of microstates per macrostate. If as we expand our system we increase then number of macrostates faster than we increase the number of microstates, we should see a decrease in expected entropy.
What this means is that entropy is always relative to our system. Our system is defined by the variables of the equations we’re using. This means entropy is relative to the equation. It’s an implicit acknowledgement that there is an observer. It’s an encoding of the unknown unknowns given the particular abstraction we’re applying, or the particular decoding mechanism we’re using.
Tying this into Shannon entropy, Shannon entropy tells us how much information we can encode into a certain alphabet. It tells us how much information is in a message based on the probability of certain signals. That collection of prior probabilities is the alphabet. Similar to what we’ve discussed, the information content of a message is relative to the alphabet. What we can do is consider our alphabet the system description we are using. Given this alphabet, or set of equations, we can then receive messages, or make certain observations.
The message with the least “information” would be the equilibrium state of the system, where the only energy available is the internal energy of the components. It’s the most probable message, and the most probable set of macrostates. As the system moves away from that equilbrium, Shannon entropy would tell us each observation will give us more information. Thermodynamics would tell us there is more potential energy. Our “alphabet” can tell us how much energy we can extract from the system. It can also tell us how much work we should expect to put in to move the system from macrostate A to macrostate B. The energy of the system which is accounted for in our alphabet is the “free energy” while the energy contained within the alphabet itself is the “internal energy”. The energy that cannot be accounted for in the alphabet is $$ T * S $$, the entropic energy. It’s the energy we can’t decode. However since it still exists, what it does is push us into states that don’t match our expectations, increasing the average surprise we experience.
This brings to mind the neural net. When the neural net is trained into a particular configuration, that configuration becomes the “alphabet” by which the neural net processes the new input data. In the case of the neural net, we are doing work to construct an alphabet. In the case of the neural net, we explicitly program it to create an alphabet which decreases as much as possible the observed potential energy, or observed “surprise” based on previous observations.
And this is the thesis of the free energy principle. Ultimately these processes are all the same thing, it’s describing the subsuming of certain microstates into a smaller number of macrostates. It’s the process of abstraction. As a neural net is trained, it is configured into an alphabet which has a minimum energy, or minimum average new information per observation.
Moving into more speculative territory…
Rather than simply be ambiguous when discussing the quantities involved, it would be great if we could ship the “alphabet” that we’re using along with each “observation.” This would be require a way to encode the assumed system configuration, something like a Goedel numbering. Since we can represent both neural nets and equations in a system like Lean, we can conceive of this possibility.
In the creation of some such universal alphabet encoding–an alphabet of alphabets if you will–not all alphabets would have the same length. Some would be shorter than others. If we assume that the size of the number or program is proportional to the number of macrostates available in that configuration, we have a situation where we can calculate an absolute entropy. With 0 macrostates, the whole system is entropy, meaning that the energy of the system is equal to $$ TS $$, so entropy is $$ \frac{E}{T}. When we have as many macrostates as microstates, that would be 0 entropy. In between are our alphabets, many of which might come close to minimizing free energy based on previous observations. A neural net arrives at one of these alphabets.
However life appears to be simultaneously arriving at many of these alphabets and continuously improving them. Organisms consume the potential energy, processing it via their sensors and metabolic processes. Without those sensors and metabolic processes, they cannot use the information or energy of the system. So the energy is only “free energy” to them if it’s within their alphabet. Each individual organism has a genetic code which represents a particular configuration of sensors and metabolic processes, or an “alphabet” encoded at the genetic level. Over the course of generations, the alphabet which can successfully identify and consume available potential energy generally persists.
Conscious organisms appear to individually try to reduce the amount of surprise of each observation over their lifetimes. They do this by re-arranging their own alphabets, creating new terms etc. over their lifetimes in order to more accurately anticipate outcomes. We should expect that this process continues down to the neural level, where a neuron would likely be attempting to minimize surprise. By adjusting the distance between synapses it can create an internal “alphabet”. This would require cooperation of microtubules to actually increase or decrease the distance between those synapses, so there is likely at least one layer of similar processes going on at a biomolecular level.
By weighting each input, the neuron essentially establishes an alphabet. It weights the input by changing the distance between synapses. This requires construction within the cell of microtubule scaffolding. This means that some process which is connected to that microtubule scaffolding is responsible for minimizing free energy and engaging in this sort of process. In other words, there is likely some biochemical alphabet which is arranged to minimize free energy and directs the building of the scaffolding. Orch O.R. provides a compelling avenue of investigation into possible mechanisms for this. The key is that a collapse happening within the neuron requires that an alphabet exist, which requires some process of decoding messages through it, re-configuring it, etc. But what exactly is an alphabet, and by what processes can it be reconfigured?
These layers also go up. We as groups and societies appear to be following similar thermodynamic trends, which will have to be the topic of another post.
We can extend this metaphor to define a channel capacity between ourselves and the world. If a given message represents a state with a certain amount of potential energy, that means sending a message with information requires energy, so sending repeated messages over time requires power. Similarly receiving messages over time implies an available power, or free energy. However in doing this, it becomes clear that there is another piece of the puzzle. The alphabet may or may not be the correct alphabet.
If we use an inappropriate set of macrostates, what happens? We would be more surprised more often than we should be. This would appear as information, but in fact it would not be. We can use error correction algorithms to identify these erroneous observations, but there is a limit to the amount of information we can receive based on a given alphabet, rate of observation, and rate of accuracy. We should be able to establish a signal / noise ratio, and through defining the rate of observation, we should be able to establish a cap on the rate at which free energy can be consumed, or the rate at which learning can occur. Perhaps we could make estimates on the energy required to reduce that noise by a certain amount.
And moving into even more speculative / crackpot territory….
At each level from life in general, down to the neuron, an alphabet is created, or a code, and the state is identified within that alphabet. An entity which processes the world through this sort of alphabet is an observer. Quantum mechanics has this mysterious concept of an observer. Often people are confused into thinking that this observer must be conscious. Other people say the observer can literally be anything.
It seems to me the reality is that the observer may be us, using the equations we have to model the system. The model is our alphabet. Making an observation means applying our alphabet to the situation. Decoding the physical reality with our sensors and equations. When we do this, we force the physical reality to be processed by us one way or another. And the fact is that once we make that observation, it has been changed. We cannot then re-process the same observation with another alphabet unless we’re only applying a subset of our available alphabet. For instance, we could analyze the observations of particles on detectors at CERN. By assigning a numerical value to the perturbations we observe, we are decoding the reality with our alphabet on some level. However by only using a subset of our alphabet, we can compare the results of several sub-alphabets, we can potentially arrive at a model which minimizes the surprise of those numerical values.
It’s my understanding that this is how the “delayed choice quantum eraser experiment” works. You simply delay the parsing of the data into your alphabet, and it is maintained in superposition until the time that you apply that alphabet.
Regardless, if we can encode the model through which an observation is taking place, we can begin to analyze the interplay that the observation has with the observer. Quantum mechanics and general relativity both contain the concept of an observer. We could potentially say that every particle contains an alphabet of the fundamental forces and tries to minimize the free energy by moving to a location with less surprise. We could also potentially say that every physical interaction is mediated through these alphabets. If we can encode the concept of an observer systematically using this concept of an alphabet of alphabets, we could create some system for combining these various relativistic perspectives and allowing them to interact with each other.
Going back into the Goedel numbering thoughts as well–we should be able to analyze these alphabets numerically. We might be able to come up with a numbering system, or a mapping from the Godel number which allows the size of the Goedel number to align with the Kolmogorov complexity of the alphabet. We could then potentially define entropy on an absolute scale, by considering a “system” or “alphabet’ with 0 Kolmogorov complexity as having 100% entropy.
By exploring this idea deeply enough, this could allow us to calculate a cost/benefit ratio to planning and observing, and the development of new scientific models. It could allow us to make reasonable predictions of how much benefit certain avenues of research might give.
Edit: After some discussion with a reader, it appears there’s a related concept called observational entropy to explore
[2] “$$ S = k \middot log(\Omega_A) $$ — $$S$$ is called the entropy of the macrostate. It is just another way of measuring the number of microstates that make it up.” (where $$ \Omega $$ is previously defined as density of states, or ratio of macrostates to microstates)