A post by Sean Riley
Brains are weird. And minds are super confusing. Some people are just out there minding their own business, and then bam! Hallucination. Something is there that isn’t actually there, and that’s weird. Incredibly debilitating and tragic, yes, but also weird. Even weirder is that we can do this intentionally. You, right now, can sit pensively in your favourite chair and ask, “What would a giraffe in a necktie look like?” And you can hem and haw and conjure a giant giraffe in your mind, wrapping the latest fashion around its neck, just below the jawline, only to take a moment’s pause and go, “Wait a second, that doesn’t look right...” So you quickly slide that tie straight down to the bottom of its neck, just like God intended. That’s super weird. Not the fashionable giraffe part, the other part. The seeing part. Surely there aren’t tiny giraffes chilling in your mind, so what are you actually seeing? And why does it all look just a little bit “off,” if that is even the right word. Imagery just seems different from perception in some way, and it’s difficult to say what that way is. Vividness appears to play a role in this (e.g., Kind, 2017), but what even is vividness? In many respects I think vividness is kind of like knowledge: we know it when we see it, but it’s a nightmare to define. So let’s take a swing at it and see what happens to fall out.
In my humble opinion, one of natural language’s grand annoyances is its flexibility. This flexibility does enable great works of art, and that’s awesome, but the cost is ambiguity and the boundaries that arise from it. There seem to be things in the world, like the definition of vividness, that can’t, at present, be expressed solely in natural language. They appear to require a level of precision and specificity beyond that offered by the likes of English or Farsi or any of their counterparts: we first have to express these things using a stricter representation system, then use natural language to describe this more strict system. Or in other words, we need computation. We need to lay out, very concretely, the what and how of it all, then use the flexibility of natural language to make the what and how more accessible. An electron is a particle. But also a wave. But it’s defined by the Dirac equation. “Particle” and “wave” are descriptive words used to make electrons more accessible. That’s not to say all things require computational definitions, or even all cognitive things, but there does seem to be a set of things that do, and I suspect vividness is one such thing.
This does open a lot of unanswered questions, and our inability to stick vividness under a microscope complicates the matter. But it’s foolhardy to think our minds are exempt from the shadowy world of symbols (Eddington, 1927), and more so to believe that the operational definition can fully penetrate reality. Both the mind and brain are information processing devices, and information processing is computation. Our task is to figure out what the computations are, and the first step in this process is to determine what type of property vividness is. Is it intrinsic to a visual mental image, like mass is to an object; or is it a relational property, like weight? The default thinking seems to be that vividness is intrinsic to the generated imagery (e.g., Fazekas et al., 2020), but I’m going to suggest that vividness is relational.
To this end, we (Riley & Davies, 2023) assume weak perceptualism. On this, the higher-level (and thus slightly inaccurate) definition we propose is that vividness is the extent to which generated imagery matches an internal model. We view imagery through the lens of predictive coding and suggest that during inspection the generated image is compared against a predictive model born from our beliefs and expectations. The more the image and model align, the more vivid said image is. Our model implementation limits this comparison to object properties such as colour, texture, and the like, but spatial features are liable play a role in this comparison as well.
In terms of making this comparison, we suggest the representations of these object properties encode the relevant regions of their respective quality space. Or in other words, we take Lee’s approach to quality spaces and co-opt fractional binding to encode regions into vectors (Lee 2021; Komer & Eliasmith, 2020). This gives us a way to generate representations that carry phenomenal content, then assemble them into a mental image. When imagery generation is initiated, symbolic labels are first retrieved from long-term memory. These labels are then used to extract the relevant regions from quality space models (presumably stored in memory, though we remain agnostic on which type of memory is involved) with these regions then being incorporated into the mental image during the generation phase. The generated imagery is then inspected, during which time the generated object properties are compared against all the regions that comprise their respective quality space model. The most similar region in the model is then identified and the conceptual label attached to the region is extracted. This label is then compared against the label initially retrieved from memory, and if a mismatch is detected, the mental image is passed down through (at least some of) the visual hierarchy as a means of error correction. Vividness is an auxiliary computation born out of this inspection process. The specifics can be found in Riley and Davies, but the punchline is that we can bootstrap our way to computing vividness at the property, object, and scene levels. Property vividness is determined by taking the largest similarity metric encountered during inspection and dividing it by the size of the quality space region that elicited it. Object vividness is a weighted average of property-level vividness. Scene-level vividness is a weighted average of object-level vividness. We’re still in the process of implementing this in spiking neurons, but it does give us a precise definition of vividness that we can begin to wrap with descriptive language.
To wit, at the highest-level one might say, “Vividness is the extent to which generated imagery looks how it’s supposed to look,” and at the mid-level, “Vividness is the extent to which generated imagery matches a predictive model.” However, this predictive model is comprised of labels and the actual comparison is made between the generated imagery and quality space regions, including the regions tied to the labels comprising said predictive model. So from a purely computational point of view, this means that if one intends to generate a red brick, but by neural disfunction generates a green brick, this green brick can be highly vivid even though it is far removed from the model. I don’t think such a scenario is biologically possible and that every imaged property will fall somewhere between the most precise and least precise region for the intended property “class” (e.g., the most precise shade of brick red to the least precise red), meaning that our higher-level definitions are on the right track, but give up accuracy to avoid verbosity. The colour of the generated brick is supposed to match the quality space region corresponding to the colour label of the predictive brick, but the actual computations (and thus definition) are indifferent to intention, and our linguistic definitions take for granted that the generated brick will not come out lime green; that the generated brick’s colour will be in the direction of the intended colour, thus the mid-level definition.
This said, there’s an experiential quality to vividness that’s noticeably absent from the above. Vividness seems to have a certain “force” to it, and this may be where vividness judgments come in. We can construct a quality space for vividness and partition it into regions à la Lee (2021), and we can encode computed vividness into a vector that can then be compared against regions in this vividness quality space. Using a similar computation to property vividness we can then get a “force” measurement, and in doing so provide a concrete definition for what this “force” actually is. It’s a specific computation that is automatically and unconsciously performed during imagery inspection. This automatic and unconscious judgment is likely separate from conscious and deliberate vividness judgments (e.g., VVIQ ratings), and although it is hard to speculate on how this computation elicits feeling, it’s harder to turn our backs on the primacy of computation. Somehow. Someway. Everything born of our mind and brain has its roots in the shadowy world of symbols, and I suggest we owe it to ourselves to step away from the operational definition and attempt to embrace more grounded and exact definitions.
References
Eddington, A. The nature of the physical world: The Gifford Lectures 1927. Vol. 23.
Fazekas, P., Nemeth, G., & Overgaard, M. (2020). Perceptual representations and the vividness of stimulus-triggered and stimulus-independent experiences. Perspectives on Psychological Science, 15(5), 1200-1213.
Kind, A. (2017). Imaginative vividness. Journal of the American Philosophical Association, 3(1), 32-50.
Komer, B., & Eliasmith, C. (2020). Efficient navigation using a scalable, biologically inspired spatial representation. In Proceedings of the Annual Meeting of the Cognitive Science Society, (Vol. 42), pp. 1532-1538.
Lee, A. Y. (2021). Modeling mental qualities. Philosophical Review, 130(2), 263-298.
Riley, S. N., & Davies, J. (2023). Vividness as the similarity between generated imagery and an internal model. Brain and Cognition, 169, 105988.