To follow up on my previous post, I am going to talk about symmetries and image recognition. As mentioned, symmetries crop up both in theoretical physics and in machine learning contexts. Quote of the day: “I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description [“hard-core pornography”], and perhaps I could never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that.” — Supreme Court Justice Potter Stewart.
My pet cat once found a harmless black garden snake in our lawn. She was intensely curious about it, and kept batting at the poor snake. I picked the cat up and set her in a different spot in the lawn, to let the snake escape. My cat started batting at a plastic shovel handle and a green plastic water hose instead. My first thought: “Silly cat! Can’t she tell the difference between a garden hose and a snake?” My second though: “Ah-hah, I have learned something about how a cat sees the world. These are things that look alike to a cat. She can see that they are long and round, but she doesn’t evidently make much distinction about the color of the object.” If you don’t believe me, watch this video in which cats are terrified of cucumbers. (You should watch it anyway, it’s hilarious.)
If you have some mathematical background, you may be familiar with the concept of equivalence classes. An equivalence class is a set of things that are “the same” according to some rule (the equivalence relation). They don’t have to be indistinguishable things in general; the rule just tells you what sort of things are alike, and what are not, under that rule. What I learned by observing my cat was that snakes, shovel handles, and water hoses are all in one equivalence class according to the rule “they look alike to a cat.” For the cat, the important property was the shape of the object, and possibly the smooth surface texture. Equivalence relations not only tell you what properties matter, they tell you what properties don’t matter, which is just as important. In the example of the cat, the color of the snake-like objects were different, but that was an irrelevant detail.
Some bright folks have taken the same approach to machine learning. What they wanted to know was, what things look the same to a state-of-the-art neural network trained to recognize objects from pictures? Have a look at Figure 1 in their preprint, especially the first 8 images. The images appear to be TV static, but a neural network classified them with a high degree of certainty to be recognizable objects (fruits and animals mostly). Another group asked a similar question: can we tweak a familiar image in very subtle ways so that a neural network does not recognize it, but a human could not see the difference? The answer is yes, it’s quite easy to do. These images are effectively optical illusions for artificial neural networks. These neural networks are clearly not sensitive to the “right things,” even though they do very well on the training data. They are sensitive to noise (irrelevant detail), when they should be instead sensitive to large-scale features.
What seems to be the problem? The networks are probably over-fitting the training data. Basically, despite seeing thousands of images, the neural nets haven’t been given enough examples to ensure that the only thing that (for instance) all the images in the set of ‘panda’ images have in common is that they have a panda in them. One can imagine several approaches to fixing such a problem.
The obvious approach is to train the network with noise. That is, feed it a sufficiently large number of copies of the same image of the panda bear, but each with a different random noise added. Thus, the learned representation of the panda bear should become noise-insensitive. The trouble with this approach is that it won’t scale. One would also need to show all different panda bears, in all possible poses, in all lighting conditions, different positions in the image, different levels of zoom, etc. Then it needs to be done all over again, but with dogs. This quickly becomes prohibitive. It is also not how humans operate. You don’t have to show a child 1 million labeled images of an elephant in order for the child to avoid confusing elephants with TV static. (We start with a base level of ability to distinguish signals from noise: newborns track faces better than scrambled faces or blanks. We have capabilities of generalization even at 5 months.)
A slightly more refined approach to training networks: seek out ‘fooling examples’ and train the network to classify them correctly. This is in the spirit of a great anecdote (see Jaynes’s Probability Theory: The Logic of Science p.4):
In reply to the canonical question from the audience (‘But of course, a mere machine can’t really think, can it?’), he [John von Neumann] said: “You insist that there is something a machine cannot do. If you will tell me precisely what it is that a machine
cannot do, then I can always make a machine which will do just that!”
Another approach is to give the the network a head start by designing in such a way that it obeys the rules (noise insensitivity, translation insensitivity, etc.) that we know it ought to obey. We may not be able to define precisely what constitutes a panda (that is, we can’t write down the equivalence relation) but we at least know what sort of image manipulations don’t change a panda into a gibbon (moving the panda 5 pixels left, or zooming in on the panda, for instance). An equivalence relation that is preserved by a remapping operation is called an “invariant.” The set of operations that preserve certain invariants is called a “symmetry group.” The renormalization group is an example of a symmetry group known in theoretical physics which has cropped up in the area of machine learning.
The benefit of incorporating symmetries is that they rule out a great number of possible network configurations. Having too many possible network configurations (that is, fitting parameters) and not enough data to narrow them down is exactly how the problem of over-fitting arises. Thus is seems that incorporating symmetries would be an obvious way to improve machine learning performance. The symmetry connection brings in a great deal of work that has already been done in physics to solve these types of mathematical problems. This is the other part of the question I was discussing last week: why do abstruse concepts like the free energy show up in discussions of neural networks? In both areas of research, one is conducting optimal inference subject to the available information (as discussed last time), and that information is invariant under certain symmetries (today’s topic).
To round out the discussion, I want to talk briefly about generalization. One of the chief issues of “good old fashion AI” (that is, algorithms constructed by experts to solve particular tasks) is that it doesn’t generalize well. The neural networks discussed here are also doing a poor job at generalizing. What does it mean to generalize? Basically, it means extrapolating in a way such that the extrapolations match human expectations. Extrapolation is an ill-posed problem, unless you have some principle to constrain the solution in the region beyond the edge of the training dataset. (For a one-dimensional example, with splines, one might impose constraints on the various derivatives of a function to make the extrapolation smooth.) The remarkable thing is not that neural networks are bad at extrapolating. It is remarkable that people expect them to be able to generalize at all, without some sort of constraints.