Mapping the Mental Image

Sam Clarke is a VISTA Research Fellow in the Department of Philosophy and Centre for Vision Research at York University, Toronto. He’s particularly interested in perception and the origins of abstract thought.

Sam Clarke is a VISTA Research Fellow in the Department of Philosophy and Centre for Vision Research at York University, Toronto. He’s particularly interested in perception and the origins of abstract thought.

A post by Sam Clarke

A seminal study:

Mainstream approaches to cognitive science conceive of the mind as a computational system. On this view, the mind’s inferential transitions proceed (largely, perhaps entirely) in virtue of the formal properties our mental representations possess (Fodor 1980).

Image of mental rotation test items (modeled after Shepard and Meltzler 1971, 702) as found in Twisswell (2014). Visualisation in Applied Learning Contexts: A Review. Educational Technology & Society, 17 (3), 180–191.

Image of mental rotation test items (modeled after Shepard and Meltzler 1971, 702) as found in Twisswell (2014). Visualisation in Applied Learning Contexts: A Review. Educational Technology & Society, 17 (3), 180–191.

Take Shepard and Metzler’s (1971) seminal work on mental rotation. Here, subjects were presented with pairs of line drawings, each depicting a three-dimensional shape. In each pair, depicted shapes were either identical, albeit oriented at different angles (like Pair A from the Figure), or mirror images of one another (like Pair C from the Figure). Subjects pressed a button indicating which of these possibilities was true of successively presented pairs and did so as quickly and accurately as possible. In so doing, it was found that the speed with which subjects correctly identified shapes from a pair as identical, albeit rotated at different angles (if and when they were), was proportional to the angular deviation in the shapes as depicted on the screen.

Shepard and Metzler recommended a (now standard) format-based explanation for their findings. They proposed that subjects had completed the task by, first, forming a mental image of one or other of the shapes and, next, ‘rotating’ it to check for a match (or lack thereof) with its perceived partner. The idea was that if the mental image represented the shape like a literal picture (or perhaps a 3D model), rather than a lingua-form description, and ‘rotation’ of the representation proceeded at a steady rate, angular differences would be proportional to the amount of time taken to bring paired shapes (imagined and perceived) into correspondence, so as to confirm a match. In this way, Shepard and Metzler posited that mental imagery has a picture-like format and took this to illuminate their results.

Picture-like, how?

The postulation of pictorial representations in mental imagery tasks has not gone unquestioned. Dubious characters like Wittgenstein, Ryle and Sartre considered it a non-starter on conceptual grounds (cf. Block 1983). More recently, Pylyshyn (2003) offered alternative explanations for the relevant results. Regardless, countless studies have been motivated by the above proposal, and found to vindicate its core predictions. For this reason, the idea that mental imagery is subserved by pictorial representations now enjoys orthodox status in cognitive science (Kosslyn et al. 2007).

With this in view, an obvious question arises: What does it mean to say that mental imagery representations are picture-like in format? Picture-like, how?

A fairly literal interpretation of the proposal seems to be endorsed by many (e.g. Carey 2009; Dretske 1981; Fodor 2008; Kosslyn et al. 2007; Quilty-Dunn 2019). As Quilty-Dunn notes (approvingly), these literalists hold that mental imagery is pictorial in the following respects. First, in its conformity to a parts principle: mental images depict things by picturing their parts, in the way a photograph depicts a scene by having spatially arranged parts of the vehicle picture parts of that scene (Fodor 2007; Kosslyn 1980). Second, and as a direct result of this, in its holism. So, just as one might struggle to realistically depict an object as brown without depicting its shape, and struggle to picture its shape without depicting its (relative) orientation, the literalist posits that mental images cannot help but encode these “multiple properties at once” (Quilty-Dunn, 2019, p.4; see also Dretske 1981).

This literalist interpretation is not unmotivated. Literalists consider their characterisation implicit in the format-based explanations under consideration. For, in the mental rotation tasks noted above, Quilty-Dunn suggests that the rotation process would be irrelevant to the identification of a match unless the rotation process served to bring spatial parts of depicted shapes into correspondence at relevant orientations. And, as he understands matters, the mental image being rotated would only do so if it conformed to the parts principle and exhibited the kind of Holism found in a literal picture. Indeed, Quilty-Dunn considers this particularly pressing, for if the depiction of shape/shape parts were encoded independently of things like orientation it is unclear why mental rotation would even be useful (Quilty-Dunn, 2019, p.8). Representations of shape might just be directly compared, independently of (distinct) constituents specifying orientations, generating the (false) prediction that angular deviation should fail to predict response time.

A problem

Unfortunately, there is a problem with the literalist’s straightforward and elegant proposal.

Given a neuropsychological overlap in the mechanisms involved, literalists tend to think the pictorial format of mental imagery applies equally well to pre-attentive visual representation (Kosslyn et al. 2007; Quilty-Dunn 2019). But note, this point swings both ways. For if there’s good reason to deny that pre-attentive visual representations are pictorial in the recommended sense, an overlap in the mechanisms of vision and visual imagery would (by parity of reasoning) undermine the literalist’s proposal. There is considerable evidence to this effect.

To give just one example: if pre-attentive visual representations of shape, orientation and color are encoded independently, and not holistically like a literal picture, we might expect pre-attentive vision to occasionally mis-combine these. Thus, we might expect the (surprising) existence of illusory conjunctions. The existence of these is well-documented (Prinzmetal, 2012) but puzzling on the above characterisation.

In one experiment, subjects were presented with an array containing three coloured letters. Provided that presentation times were brief (e.g., 200ms), and subjects were prevented from directly attending to the shapes, they would regularly report incorrect letter-color combinations. For instance, in an array containing a blue ‘T’, a green ‘X’ and a red ‘O’, subjects might report seeing a blue ‘O’, or a red ‘T’, and so forth (Treisman & Schmidt, 1982). That subjects were significantly less likely to report colors and letters absent from the arrays suggests that subjects’ reports reflected genuine mis-combinations of perceived features as opposed to misperceptions of the features themselves. And since these results obtain in simultaneous matching tasks, it is generally agreed that they do not simply reflect post-perceptual failures of memory. Rather, mis-combination occurs within visual processing itself (Prinzmetal, 2012). Similar findings obtain, when the relevant features are all spatially defined and (e.g.) individual edges of a triangle are mis-combined with ‘S’ shapes to produce illusory dollar signs (Treisman & Patterson, 1984). Boldly stated, these results are puzzling unless the representation of relevant feature types enjoys an “independent psychological existence” (Treisman, 1986, p.117) and relevant feature types are encoded non-holistically.

Towards a cartographic alternative:

The preceding remarks generate a puzzle. Pictorial representations are meant to explain performance in mental imagery (e.g. mental rotation) tasks. But the explanatory purchase pictorial representations afford here has seemed to stem from their representing shapes and orientations holistically, like a photograph. The trouble is: neuropsychologically overlapping representations in early vision do not encode properties in this way. They represent relevant feature types independently and non-holistically. In this respect, they are quite unlike photographs. But if the non-holistic nature of pre-attentive vision implies the non-holistic nature of mental imagery representations, how is the appeal to picture-like representations to explain performance in the tasks under consideration?

The answer is not straightforward. But we can, again, draw inspiration from research on early vision. Here, there is considerable evidence that distinct feature types are represented in independent ‘feature maps’ (Treisman, 1988). My suggestion is that the same is true of mental imagery.

The neat thing about maps, for present purposes, is that while they have a spatial structure (like a picture), with spatial relations between constituents specifying spatial relations between the things those constituents pick out, these constituents need not picture the things they represent (Camp 2007). For while a map of a park might represent the location of a pond by picturing it from a bird’s eye perspective (thereby depicting its shape and orientation holistically), a cartographer might use a linguistic label instead (‘POND’). Alternatively, they might use non-pictorial analog constituents (for instance, a map of the US might represent the total number of ticks in each state by having each state vary in redness as a function of tick count therein). Crucially, in these latter examples, the pond or tick count would be represented without the map representing the size or shape of the pond/ticks. So, by analogy, an appeal to cartographic representations in pre-attentive vision/mental imagery offers one way in which the holistic representation of properties/features that naturally, or inevitably, gloop together in realistic pictures might be avoided. One feature map might represent the location of colors in visual space, another might represent the location of edges, and so on. But since each of these representations would still have a fundamentally spatial structure, with spatial relations between constituents specifying spatial relations between the things those constituents pick out, we can see how rotating these would still bring features into spatial correspondence, when confirming a match during mental rotation tasks.

Of course, the Devil’s in the details. I try to flesh some of these out in a longer paper that I’ll hopefully be able to share soon. But to close, I want to briefly note an attractive feature of this general account. Namely, that it helps explain why subjects are often able to abstract away from certain feature types in mental rotation tasks. For example, Khooshabeh and Hegarty (2008) found that subjects with relatively poor spatial acuity perform better in mental rotation tasks when matching pairs of shapes are consistently colored. By contrast, subjects who perform well in mental rotation tasks enjoy no such benefit. Indeed, inconsistent coloring failed to interfere with their performance altogether, suggesting that they were able to disregard color entirely.

Both results can be made sense of on a cartographic characterisation of the representations involved. Those with poor spatial acuity identified matches by rotating spatially arranged color maps and observing whether this brought color-representing constituents into correspondence. Meanwhile, those with better spatial acuity completed the task by rotating spatial feature maps. But since spatial maps are (by hypothesis) independent of color maps, subjects could disregard color maps entirely, avoiding Stroop-like interference from these (something which would be surprising on a literalist interpretation of their pictorial format). In any case, a modest point to note is that our understanding of mental imagery representations may have much to gain from perception science itself.


References

Block, N. 1983. Mental Pictures and Cognitive Science. Philosophical Review. 92(4): 499-541.

Camp, E. 2007. Thinking with maps. Philosophical Perspectives. 21: 145-182.

Carey, S. 2009. The Origin of Concepts. Oxford: OUP.

Dretske, F. 1981. Knowledge and the Flow of Information. Cambridge: MIT Press.

Fodor, J. 1980. Methodological Solipsism Considered as a Research Strategy in Cognitive Psychology. Behavioral & Brain Sciences. 3(1): 63-109.

Fodor, J. 2007. Revenge of the Given. In Brian P. McLaughlin & Jonathan D. Cohen (eds.), Contemporary Debates in Philosophy of Mind.

Khooshabeh and Hegarty, 2008. Differential effects of color on mental rotation as a function of spatial ability. In: International Spatial Cognition Conference. Freiburg, Germany

Kosslyn, S. 1980. Image and Mind. Cambridge: Harvard University Press.

Kosslyn, S. et al. 2007. The Case for Mental Imagery. Oxford: OUP.

Prinzmetal, W. 2012. At the core of feature integration theory: on Treisman and Schmidt (1982). In J. Wolfe & L. Robertson (Eds.), Oxford series in visual cognition. From perception to consciousness: Searching with Anne Treisman (p. 211–216). 

Pylyshyn, Z. 2003. Seeing and Visualizing: It’s Not What You Think. Cambridge: MIT Press.

Quilty-Dunn, J. 2019. Perceptual Pluralism. Noûs

Shepard, R. N., & Metzler, J. (1971). Mental rotation of three-dimensional objects. Science, 171(3972), 701–703.

Treisman, A. 1986. Features and Objects in Visual Processing. Scientific American. 255(5): 114b-25.

Treisman, A. 1988. Features and Objects: The Fourteenth Bartlett Memorial Lecture. he Quarterly Journal of Experimental Psychology A: Human Experimental Psychology. 40A(2): 201–237.

Treisman & Patterson, 1984. Emergent features, attention, and object perception. Journal of Experimental Psychology: Human Perception and Performance. 10(1): 12–31.

Treisman & Schmidt, 1982. Illusory Conjunctions in the Perception of Objects. Cognitive Psychology. 14(1): 107-41.