Using geometric concepts to describe mental imagery
A post by Thomas Naselaris
For most people, it feels right to talk about mental imagery as a kind of visual experience. However, few experience mental imagery as being exactly like seeing, and describing the ways in which mental imagery differs from seeing is challenging. It is a bit like trying to explain the style of a visual artist whose works have been displayed to you alone. If it were easy to convey a visual style with words, there would probably be nothing special about it.
One of the goals of empirical research on mental imagery should be to develop and name concepts that explain, rigorously and measurably, what makes mental images so different from seen ones, while making clear the connection between seeing and imagining.
An important first stab at this is the characterization of mental imagery as “weak vision” (Pearson et al) . This characterization is certainly consistent with my subjective experience of imagery, which is weaker than my experience of seeing, and it is consistent with objective measurements of brain activity, which is indeed much weaker during imagery than vision (Breedlove et al).
Although useful, “weak” is an underspecified description of the kind of visual experience that mental imagery is. Interpreted as a property of seen images, “weakness” might be something like dimness, or blurriness. But my mental images are somehow worse than even the most dim or blurry seen images. At the same time, my mental images can also be much more informative than a dimmed seen image. Regardless of how “mentally dim” it might be, my mental image of “Betty” by Gerhard Richter (a favorite) retains the painting’s radiance, which would vanish from a seen depiction of the painting if one were to literally dim it via digital manipulation.
“Weak” also implies that seen images can yield the experience of mental imagery if they are weakened (e.g., if their luminance or contrast is reduced, or if they are obscured by noise). However, I am skeptical that there is any operation on pixel values that can transform an image that is seen into one that is experienced as a mental image. Outside of controlled experimental settings, weak seen images (e.g., the back of one’s eyelids in a dark room) are rarely confused for mental images. Moreover, mental images are rarely confused for weak seen images.
So, I think that “weakness” only begins to explain the differences between seen and mental imagery. Recently, my lab has begun to consider whether the differences between seen and mental imagery can be helpfully understood as differences in neural geometry, and to think about how geometric differences might cash out in subjective experience.
By “neural geometry”, I mean the structure of a distribution of points in neural space. To construct a “neural space”, we erect axes that measure the amplitude of activity sampled at different locations in the brain. Let’s say we use fMRI to record human brain activity in two voxels in the visual cortex. Then we can construct a 2D neural space in which the activity measured in voxel 1 is the x-axis, and the activity of voxel 2 is the y-axis. Each time the owner of this brain activity sees an image, we plot the activity caused by seeing it as a point in this neural space. If we repeat this process for many seen images, we obtain a distribution of points in neural space that we refer to as the “visual manifold”. If we then ask the person to generate as many mental images as they saw, and again plot the resulting brain activity as points in neural space, we obtain an “imagery manifold”. Now we are in a position to investigate differences in the geometry of visual and imagery manifolds, and to speculate about the possible implications for differences in the subjective experience of seeing and imagining.
One very plausible hypothesis on the difference between visual and imagery manifolds is that imagery manifolds are likely to be compressed, relative to visual manifolds (Figure, left panel). In geometric terms, compression usually means a loss of variance along one or more dimensions. To illustrate this, let’s assume that the visual manifold is an oriented oscillation, while the imagery manifold is just a boring line with the same orientation. We can imagine (mental imagery is so useful!) obtaining the imagery manifold by taking the visual manifold and squeezing--or compressing--along the dimension that is perpendicular to their shared orientation. By eliminating variance along this dimension, potentially important differences between seen images are lost. Compression thus offers an explanation for the paucity of visual detail that many attribute to mental imagery, and provides a potential explanation for the relative weakness of high spatial frequency representations during imagery (Breedlove et al). Compression may be thought of as “weakness” along a specific dimension in neural space. By identifying the dimensions of neural variance that are compressed, and the dimensions that are not compressed, we might obtain hypotheses about what, specifically, is missing from mental imagery, and what is not missing.
We have illustrated the imagery and visual manifolds as having regular intersections. Under this configuration, indistinguishable seen and mental images would be commonplace. It seems unlikely, for the reasons we discussed above, that there are lots of mental images that cannot be distinguished from seen ones by their beholders. So, there must be something wrong with the configuration we have illustrated. Indeed, in this configuration, the dynamic range of imagery and visual activity is identical. This configuration is therefore not consistent with the empirically observed weakening of activity during imagery in many (but not all) voxels. Rotating and scaling the imagery manifold to result in a realistic range of imagery activity (Figure, right panel) eliminates intersections between the manifolds. Seen and mental images are now perfectly distinguishable. Moreover, non-intersecting vision and imagery manifolds would provide a natural explanation for the difficulty of describing mental imagery. Despite the fact that mental imagery clearly engages the visual cortex, if visual and imagery manifolds do not intersect, it will not be possible to look at anything that evokes the brain states associated with mental imagery. The language we use to describe visual images would thus be guaranteed to fail.
If the imagery manifold is indeed compressed and displaced relative to the visual manifold, how might we account for the obvious correspondence between specific seen and mental images? For example, when I speak of my mental image of “Betty”, I am drawing a direct connection between my experience of seeing and imagining the same thing (a painting). One possible way to model this connection is via geometric projection: to obtain the brain state corresponding to a mental image of the painting, we simply project onto the imagery manifold the corresponding point on the visual manifold (Figure, left panel, dashed line). Thus, all points on the imagery manifold may be considered mental images “of” something that has been seen, or could be seen. Importantly, because of the compression of the imagery manifold, multiple points on the visual manifold may project to the same point on the imagery manifold. It is unclear how a subjective visual experience might be read-out in this case, but it is tempting to speculate that this indeterminacy is a special property of mental imagery that makes the experience of it so different from seeing.
As these extremely simple illustrations show, by taking a neural geometric approach we quickly pick up new ways to express potential differences between seen and mental imagery: in addition to being weak, mental imagery may be compressed, displaced, and/or an indeterminate projection of vision. These descriptors are appealing, but demonstrating that any of them apply to mental imagery will require some challenging empirical work. Characterizing the geometry of neural manifolds is difficult, especially when one considers that the dimensionality of neural spaces often exceeds the number of samples of brain activity that it is practical to acquire.
Recent work from our lab suggests some possible ways forward. By acquiring massive samplings of brain activity to many thousands of seen images (Allen et al) we have been able to construct encoding models of the entire visual system (St-Yves et al, A) that allow us to obtain arbitrarily dense samples of neural manifolds (St-Yves et al, B). These methodological developments make it possible to make contact with theories that link neural manifold geometry to representation (Sorcher et al).
In addition, we have had some success at constructing mappings between brain activity patterns (Mell et al) during vision and imagery (Saha-Roy et al), effectively learning how points on the visual manifold project onto the imagery manifold. Early results do indeed suggest that imagery manifolds are compressed and displaced, and that projecting from vision to imagery may be a decent way to model the relation between seen and mental images of the same things.
I believe that using geometric concepts to describe mental imagery could be clarifying, and could lead to important new empirical findings. The geometric properties I gave to the imagery manifold in my illustrations were motivated mostly by my own introspection. An important future direction for imagery research, I believe, is to develop normative theories that explain what geometric properties an imagery manifold should have, given the computational work that mental images are assumed to do.
References
Pearson, J., Naselaris, T., Holmes, E.A. and Kosslyn, S.M., 2015. Mental imagery: functional mechanisms and clinical applications. Trends in cognitive sciences, 19(10), pp.590-602.
Breedlove, J., St-Yves, G., Olman, C. and Naselaris, T., 2018. Human brain activity during mental imagery exhibits signatures of inference in a hierarchical generative model. bioRxiv, p.462226.
Allen, E.J., St-Yves, G., Wu, Y., Breedlove, J.L., Dowdle, L.T., Caron, B., Pestilli, F., Charest, I., Hutchinson, J.B., Naselaris, T. and Kay, K., 2021. A massive 7T fMRI dataset to bridge cognitive and computational neuroscience. bioRxiv, pp.2021-02.
(A) St-Yves, G., Allen, E.J., Wu, Y., Kay, K. and Naselaris, T., 2023. Brain-optimized deep neural network models of human visual areas learn non-hierarchical representations. Nature communications, 14(1), p.3329.
(B) St-Yves, G., Kay, K. and Naselaris, T., 2023. Brain-optimized models reveal increase in few-shot concept learning accuracy across human visual cortex. Journal of Vision, 23(9), pp.5913-5913.
Sorscher, B., Ganguli, S. and Sompolinsky, H., 2022. Neural representational geometry underlies few-shot concept learning. Proceedings of the National Academy of Sciences, 119(43), p.e2200800119.
Mell, M.M., St-Yves, G. and Naselaris, T., 2021. Voxel-to-voxel predictive models reveal unexpected structure in unexplained variance. NeuroImage, 238, p.118266.
Saha Roy T, Breedlove J, St-Yves G, Kay K, Naselaris T (2023) Mental Imagery: Weak Vision or Compressed Vision? 2023 Conference on Cognitive Computational Neuroscience. Oxford, UK