The Debate over Deep Learning Needs More Imagination

Cameron Buckner is an Associate Professor in the Department of Philosophy at the University of Houston. His research interests lie at the intersection of philosophy of mind, philosophy of science, animal cognition, and artificial intelligence.

A post by Cameron Buckner

Nativists in psychology like Steven Pinker and Gary Marcus often warn their readers about the dangers of empiricism.  In particular, they worry that many neural network modelers are reviving the minimalist goals of behaviorist psychology, which “through most of the 20th century…tried to explain all of human behavior by appealing to a couple of simple mechanisms of association and conditioning” (Pinker 2003) without any other forms of innate structure whatsoever (see, e.g., Marcus, 2018).  Unfortunately, as their fellow nativists Laurence & Margolis (2015) observe, casting current disputes in cognitive science in these terms has the consequence that “the empiricists” no longer really exist…and maybe never did.  While most empiricists are like radical behaviorists in eschewing innate ideas, almost all the other empiricists agree that a significant amount of innate, general-purpose cognitive machinery is required to extract abstract ideas from experience.

Recent criticisms of “Deep Learning” (LeCun, Bengio, & Hinton 2015) are a case in point (e.g. Marcus 2018; Lake et al. 2017). Critics worry that all the impressive things that Deep Neural Networks (DNNs) appear to do—from recognizing objects in photographs at human or superhuman levels of accuracy, beating human experts at chess, Go, or Starcraft II, or predicting protein folds better than molecular biologists who have devoted their lives to the task—are just the results of massive amounts of computation being directed at “statistics”, “linear algebra”, or “curve-fitting”, which, without the structure provided by innate ideas, will never scale up to human-like intelligence.  Of course, everything the brain does could be described as mere “neural firings”, so the problem can’t just be that a DNN’s operations can be thinly redescribed; there must be specific things that human brains can do which DNNs, in principle, cannot. Many other suggestions have been offered here, but to illustrate my points about empiricism and cognitive architecture, I will focus on a brilliant list of operations that Jerry Fodor (2003) thinks are required for rational cognition but that empiricists cannot explain. This list includes: 1) synthesizing exemplars of abstract categories for use in reasoning, 2) fusing together simpler ideas into novel composites (e.g. unicorn), 3) making decisions in novel contexts on the basis of simulated experience, and 4) distinguishing causal and semantic relations between thoughts. 

Fodor’s primary empiricist target Hume agrees that it is not obvious how his three principles of associative learning could explain these mental maneuvers, and so he assigns all of them to the faculty of imagination (or “fancy”).  The appeal to the imagination here constitutes perhaps the most prominent feature of both empiricist philosophy of mind and deep learning that does not fit into the behaviorist caricature.  Rather than re-assessing their reading of Hume in light of these appeals, Fodor & Pylyshyn describe them as “cheating,” because the imagination is something to which Hume “qua associationist, had, of course, no right” (Fodor & Pylyshyn, 1988 p50, fn29).                                               

We should instead acknowledge that this interpretation of Hume and of empiricism more generally is due for a serious revision (Demeter, 2021).  Nearly all of the major empiricist philosophers were faculty psychologists; that is, they held that the mind begins with a set of innate domain-general faculties, which are actively involved in the extraction of abstract knowledge from experience.  These faculties include at least perception, memory, imagination, and attention.  Contemporary rationalists still think we need some innate domain-specific knowledge to get human-like learning off the ground (Carey & Spelke, 1996; Marcus 2018; Chollet, 2019; Mitchell, 2019), whereas contemporary empiricists insist that domain-general, faculty-enhanced learning is sufficient (Botvinick et al., 2017; Goyal & Bengio, 2020).  In the case of artificial intelligence, this becomes a specific debate about engineering methodology: should we manually program domain-specific abstract concepts into artificial agents from the beginning, or focus on architectural innovations deploying modules corresponding to domain-general faculties that allow agents to extract domain-specific representations by themselves?  In my current book project, I defend the latter strategy, which I dub “the new empiricist DoGMA”, for Domain-General Modular Architecture—arguing that the DoGMA is a more charitable way to construe the empiricist assumptions that undergird both Hume and deep learning, and which are in turn bolstered by deep learning’s recent successes.

Fodor has some cause for objecting to these appeals to the imagination, because Hume plays them like a “get out of associationism for free” card without ever explaining how the faculty actually works.  As if to bite the bullet in style, Hume even says that the imagination does these things “as if by magic” and worries that its operations “cannot be explained by the utmost efforts of human understanding” (Treatise 1.1.7).  Perhaps, however, the new empiricist DoGMA—which aims to replace black box faculties with specific neural network architectures—can cash out such promissory notes written by the historical empiricists in a scientifically-compelling way (Buckner 2018).

Philosophers of cognitive science and perception may already be familiar with the most common form of DNN architecture, the Deep Convolutional Neural Network (DCNN).  I have written about this architecture and its links to the neuroscience of human perception before (Buckner, 2018, 2019) and so will not reprise those details here.  Less widely appreciated, however, is that there may be similar correspondences between other kinds of DNN architectures and “generative” processing hierarchies in the brain.  In the remainder of this post, I will focus on the promise that Generative Adversarial Networks (GANs) hold for modelling the imagination and its operations (for other speculations on this promise, see Aronowitz & Lombrozo, 2020; Gershman, 2019; Goodfellow et al., 2014; Hamrick, 2019; Langland-Hassan, 2020 Ch12; Mahadevan, 2018; Vedantam et al., 2017; Weber et al., Unpublished).

DCNNs implement an “abstraction hierarchy”, in which highly-complex inputs from vision and other senses are iteratively compressed into more and more abstract encodings that capture the stimulus stream’s high-level structure (Buckner, 2018; DiCarlo et al., 2012; DiCarlo & Cox, 2007).  This hierarchy of layers gradually transforms input signals in systematic ways to control for what machine learning researchers call “nuisance parameters”—systematic sources of variance in input that must be overcome, like pose, rotation, color, contrast, size, location in the visual field, and so on.  The abstracted, nuisance-invariant encodings are called “latent space” representations, based on the idea that an enormous amount of abstract information—even about combinations and situations that were never experienced in the system’s training set—can be extracted from transformed latent space representations for classification and decision-making purposes. 

An image of a cat that, sadly, does not exist, generated by StyleGAN at https://thiscatdoesnotexist.com/

Generative architectures such as GANs—the technology behind “deepfakes” —leverage this latent information in the other direction. By passing latent encodings through an iterative “specification hierarchy”, GANs render them into detailed and realistic outputs by iteratively specifying plausible combinations of nuisance parameters.  I won’t get into the details of how these networks are trained—for empiricist purposes it suffices that they are trained end-to-end from experience, without any domain-specific innate knowledge.  Since talk of latent space can also sound a bit spooky, it might be more straightforward to show what these systems can do by considering their state of the art at a website like This Person Does Not Exist.  Refreshing this website repeatedly renders random latent space encodings through a GAN hierarchy to generate photorealistic faces of people (or cats) who do not exist. 

GANs show that a physical system based on empiricist principles and vaguely brain-like structure can solve Fodor’s first problem—synthesizing exemplars for abstract categories—without any magic or innate domain-specific knowledge. Even more impressively, latent space representations can be systematically manipulated to perform other operations on Fodor’s list, such as creating composite images with some desired combinations of features.  Dimensions of latent space can even be located which correspond to specific features or nuisance parameters (Chen et al., 2016; Radford et al., 2015).  If we travel some distance in latent space along one of these dimensions and render the novel location’s encoding through a GAN, we can create a photorealistic exemplar with a particular combination of desired features. By identifying and moving along a “rotation” dimension, for example, a GAN can be used to “mentally rotate” an object, even if the network never saw those exact intermediate poses in its training set.  Even more ambitiously, adding or subtracting latent space coordinates can produce novel exemplars with a desired combination of abstract features.  For example, I passed some of Hume’s favored examples of novel composites—“winged horses, fiery dragons, and monstrous giants”, and “the New Jerusalem, with streets of gold and walls inlaid with gemstones”—through a latent space encoder and rendered their results through the specification hierarchy of a GAN.  The very first outputs generated using this approach are below; the images contain features recognizably corresponding to the target categories and having a dreamlike quality to them, which many find compelling (a primer on this approach—which has created its own fascinating digital art community—can be found here). 

The text prompt “winged horses, fiery dragons, and monstrous giants” rendered by the author in Latent3visions, a Google Colab notebook created by the computational artist @Advadnoun. 
https://colab.research.google.com/drive/1KApkVfF0LdoImjU7u0UnGlptH0XamwgX

 The prompt “the new Jerusalem with streets of gold and walls of gemstones” in Latent3Visions

The text prompt “A tiny pink elephant wearing a tutu and holding an umbrella and hanging from the projector in the back of the classroom while singing a sea shanty”, which I used to use as a standard example of the mind’s productivity when teaching Fodor.

This only scratches the surface of the potential generative DNNs hold for understanding the operations of the human imagination.  Some of the most impressive examples are from OpenAI’s DALL-E system, which is capable of coherently depicting combinations of abstract categories which could not possibly have appeared in its training set, such as “an illustration of baby daikon radish in a tutu walking a dog” (which can be viewed with other examples here).  Even more ambitious applications of the method can tackle the other items on Fodor’s “to do” list, such as synthesizing entire trajectories of predicted experience for events that never happened.  These trajectories can then be used for further offline learning (as might occur in human REM sleep or daydreaming), or to allow a consequence- and probability-sensitive choice amongst future options, on the basis of which simulation is judged likeliest to lead to a desired outcome.  In fact, something much like Hume’s solution to these problems is implemented in DeepMind’s “Imagination-Augmented Agents” (I2As—Racanière et al., 2017). 

In my book, I highlight how many of deep learning’s other recent achievements were similarly achieved by adding modules implementing roles that historical empiricists attributed to other domain-general faculties.  In particular, I show how recent architectures in deep learning are illuminated by and in turn empirically vindicate some of the more speculative (and often derided) empiricist ideas about these faculties from the history of philosophy, such as Locke on abstract perception, Ibn Sina on memory, James on attention, and Sophie de Grouchy on empathy/sympathy.  There might even be reciprocal benefit to engineers in reviewing the history of empiricist philosophy more closely, as these philosophers anticipated control and coordination problems that will confront programmers as they integrate more semi-independent faculty modules into a coherent cognitive architecture.  In short, empiricism today seems to be thriving, without magic or behaviorist strictures anywhere in sight; we just need to have a little more imagination. 


References

Aronowitz, S., & Lombrozo, T. (2020). Learning through simulation. Ann Arbor, MI: Michigan Publishing, University of Michigan Library.

Botvinick, M., Barrett, D. G., Battaglia, P., de Freitas, N., Kumaran, D., Leibo, J. Z., Lillicrap, T., Modayil, J., Mohamed, S., & Rabinowitz, N. C. (2017). Building machines that learn and think for themselves. Behavioral and Brain Sciences, 40.

Buckner, C. (2018). Empiricism without magic: Transformational abstraction in deep convolutional neural networks. Synthese, 195(12), 5339–5372.

Buckner, C. (2019). Deep learning: A philosophical introduction. Philosophy Compass, 14(10), e12625.

Carey, S., & Spelke, E. (1996). Science and core knowledge. Philosophy of Science, 63(4), 515–533.

Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. ArXiv Preprint ArXiv:1606.03657.

Chollet, F. (2019). On the measure of intelligence. ArXiv Preprint ArXiv:1911.01547.

Demeter, T. (2021). Fodor’s guide to the Humean mind. Synthese, 1–21.

DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cognitive Sciences, 11(8), 333–341. https://doi.org/10.1016/j.tics.2007.06.010

DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain solve visual object recognition? http://preprints.sissa.it/xmlui/handle/1963/7028

Fodor, J. A. (2003). Hume variations. Oxford University Press.

Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), 3–71.

Gershman, S. J. (2019). The Generative Adversarial Brain. Frontiers in Artificial Intelligence, 2. https://doi.org/10.3389/frai.2019.00018

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 2672–2680.

Goyal, A., & Bengio, Y. (2020). Inductive biases for deep learning of higher-level cognition. ArXiv Preprint ArXiv:2011.15091.

Hamrick, J. B. (2019). Analogues of mental simulation and imagination in deep learning. Current Opinion in Behavioral Sciences, 29, 8–16.

Langland-Hassan, P. (2020). Explaining imagination. Oxford University Press.

Laurence, S., & Margolis, E. (2015). Concept nativism and neural plasticity. In S. Laurence & E. Margolis (Eds.), The conceptual mind: New directions in the study of concepts (pp. 117–147). MIT Press.

Mahadevan, S. (2018). Imagination machines: A new challenge for artificial intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1).

Marcus, G. (2018). Innateness, alphazero, and artificial intelligence. ArXiv Preprint ArXiv:1801.05667.

Mitchell, M. (2019). Artificial intelligence: A guide for thinking humans. Penguin UK.

Racanière, S., Weber, T., Reichert, D. P., Buesing, L., Guez, A., Rezende, D., Badia, A. P., Vinyals, O., Heess, N., & Li, Y. (2017). Imagination-augmented agents for deep reinforcement learning. Proceedings of the 31st International Conference on Neural Information Processing Systems, 5694–5705.

Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. ArXiv Preprint ArXiv:1511.06434.

Vedantam, R., Fischer, I., Huang, J., & Murphy, K. (2017). Generative models of visually grounded imagination. ArXiv Preprint ArXiv:1705.10762.

Weber, T., Racanière, S., Reichert, D. P., Buesing, L., Guez, A., Rezende, D. J., Badia, A. P., Vinyals, O., Heess, N., & Li, Y. (Unpublished). Imagination-augmented agents for deep reinforcement learning. ArXiv Preprint ArXiv:1707.06203.