Memory-prediction asymmetry

I recently read “Recognizing vs Generating.” The author poses the question, “both reading the textbook and writing proofs feel like they fit the definition ‘studying’, so why can’t the easier one work?” The answer: “Recognition does not necessarily imply understanding, and pushing for something you can Generate can help expose what you actually know.”

This brings to mind the “memory-prediction” framework promoted by Jeff Hawkins. The framework says that recognizing is accomplished by generating. There is a continual dialogue between predictions coming from top-level generative modeling and observations coming from the senses. Discrepancies are registered as surprises that leap from the unconscious to the conscious level:

If you have ever missed a step on a flight of stairs, you know how quickly you realize something is wrong. You lower your foot and the moment it “passes through” the anticipated stair tread you know you are in trouble. The foot doesn’t feel anything, but your brain made a prediction and the prediction was not met.

“On Intelligence”, p91

At first glance, the framework doesn’t accord well with an asymmetry between recognizing & generating. The asymmetry can be accommodated, by emphasizing the fact that most predictions are non-specific or ‘fuzzy.’ A fuzzy prediction doesn’t result in surprise if one of a set of expected outcomes occurs. Hawkins acknowledges this idea: “Prediction is not always exact. Rather, our minds make probabilistic predictions concerning what is about to happen. Sometimes we know exactly what is going to happen, other times our expectations are distributed among several possibilities.” (ibid, p92) Hawkins doesn’t make much of this point in the rest of the book, but it seems crucial to me. In particular, it explains the asymmetry between recognition and generation.

To return to the illustration of studying math: subjectively, I feel like I know what is going on when I read the proof, because I can see that the next line follows by application of a valid logical rule. (That is, the step is among the set of things consistent with my expectations.) Then, when I am called on to reproduce the step, I am surprised to find that I don’t know how — because my prediction is fuzzy, there are multiple reasonable things to do at each step, but I don’t know exactly which one I should do. If on the other hand, I know why each particular step in the proof was taken, then I can uniquely predict each one & reproduce the proof.

An aside about the mechanism of probabilistic predictions: it sounds difficult if you imagine that “probabilistic predictions” means calculating a probability distribution over all possible sensory experiences. However, all that is necessary is for the ‘prediction’ to be abstract — the more abstract it is, the larger the set of observations consistent with it, hence the wider the probability distribution that is implicitly associated with it. It’s not necessary to represent the probability distribution as a blurry activation pattern in the brain at the low-level sensory areas — it is more efficient to activate a single sharp, high-level abstract label which is functionally equivalent. The brain can then lazily evaluate the degree of surprise (ie, the probability) of whatever observation occurs, with respect to the expectation.

In this sense, a word is worth a thousand pictures: “brown dog” is consistent with a huge number of images. That phrase may not seem like a probability distribution — it seems pretty specific. However, from a certain perspective, it’s a blank distribution over all the possible attributes that an image of a dog could have, and in fact would be required to have in order to be made concrete. I may know a brown dog when I see one, but it doesn’t mean I can draw one, or that the one I’m imagining is much like the one you imagine.

This actually connects to statistical physics. There’s a well-defined procedure for constructing explicit probability distributions representing the situation where only a high-level abstraction (in this case, the expectation value or ‘moment’ of a function of the random variables) is known, and all other details of the distribution are uncertain. I suspect that the brain can accomplish something like this, in much the same way that dogs can do calculus. (TL;DR: of course they don’t, but they can approximate the solution to a problem that one could also pose and formally solve using calculus.)

As another aside about math & mental models, I’ve always thought of proofs or derivations like stories: the protagonist is stuck with this problem — our hero has the axioms and the rules (setup), she wants to get to the result, but doesn’t immediately know how (conflict). Then she remember this cool trick (like differentiating the integrand) and voila, problem solved (resolution). I suspect this framing helps with recall. It also puts the focus on why each step is done (the intuition for choosing the step), not just how (ie, the rule that justifies the step).

2 thoughts on “Memory-prediction asymmetry

  1. This is a cool extension using the ideas I wrote about.
    Reminds me of the predictive coding stuff Scott Alexander wrote about in reference to Surfing Uncertainty.
    The rest of your blog looks great too!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s