What impact will Generative AI have on work, education and society?
5-second summary
The new generation of AI applications, like ChatGPT and StableDiffusion, are just early demos of the generative capability that will be built into our knowledge work tools going forward
As generative AI increasingly permeates knowledge work in the coming years, and is important to understand what it is, its limitations and how it can be used productively
AI powered tools work best as co-pilots that complement human understanding, creativity and common sense, rather than as replacements for human judgement and oversight
They also provoke important questions for the future of work, learning and our broader information ecosystem
We Finally Built A Machine We Can Talk To
In 2022, a new set of generative Artificial Intelligence (AI) applications became available for the public to use. These began with image creating tools like Dall-E 2, Midjourney, Stable Diffusion and Lensa, which mostly made a splash creating weird AI art. These image generating tools are fun to play with, and do raise some interesting and important questions for visual designers and the art world, but it’s hard to see how most of us would use it in our work, studies or day-to-day life. But in November, OpenAI released ChatGPT, the first intelligent chat tool that appears to…actually work. You can ask it any question and it appears to not only easily understand you, but gives high quality and coherent answers. Normally it’s wise to treat technology demos, especially about AI, with a decent dose of scepticism. So it’s a wonderful thing that you can try ChatGPT yourself. If you haven’t done this yet, it’s worth pausing, opening an account and playing with it. It can frequently generate wikipedia-like answers to most questions in real time, tells pretty decent jokes and can write amusing lyrics and music. It’s worth tinkering to discover how you might use it productively, and probing for its shortcomings, by giving it a task you actually need to do – ask it to outline an idea for an article, give you some recipe ideas, or edit a paragraph you’re working on. It really is a remarkable accomplishment. But you might also notice something slightly ‘off’ about some of its answers (more on that later). Sometimes it gets things completely wrong, but with an air of confidence that’s unsettling.
ChatGPT and similar applications have turned public attention back towards AI and its consequences on work and society, and it will likely continue to be much discussed in 2023. They’ve also reopened old debates about the possibility and proximity of developing Artificial General Intelligence (AGI) more broadly – machines that exhibit the kind of flexibility and complexity towards solving problems that we witness in human cognition. I’m not a computer (or neuro) scientist, but I do study technological innovation and its impact on work and society, and I find these debates fascinating. I also believe that as AI continues to shape our world more visibly, it’s useful to learn the basics of what these tools are actually doing, so we might better appreciate their uses and limitations.
I’m optimistic about the broad direction of science, technology and progress, and excited about the potential of AI. But I recognise that digital technology doesn’t have to be ‘superintelligent’ to be extremely useful or cause significant social harm. The growth of social media over the past decade offers a cautionary tale of how letting loose new digital technologies, with features underwritten by the same machine learning models that make ChatGPT so impressive, can have perverse consequences. I also know that when evaluating the claims of AI entrepreneurs and investors, we need to remember the financial incentives underpinning the more entrancing visions of what AI might eventually be able to accomplish rather than sober reporting on our current limitations, much less the harm these technologies might cause if inappropriately applied or poorly incentivised. So what follows is a simple overview of my understanding of machine and human intelligence, and some reflections on what tools like ChatGPT might mean for education, work and society in the coming years.
Artificial General Intelligence: The Horizon Perpetually 10 Years Away
The notion that superintelligent machines will utterly transform society is actually quite an old idea, both as an early trope of science fiction and as a serious claim made by computer scientists working in the field. In the late 1950s for example, polymath and AI pioneer Herbert Simon proclaimed that by 1967 computers would beat the world champion of chess, write music that critics would value, discover new mathematical theorems and soon be capable of “doing any work a man can do”. Needless to say none of this happened. But this didn’t deter Simon’s equally prodigious colleague Marvin Minsky, from proclaiming in 1970 that within “three to eight years we will have a machine with the intelligence of an average human being”.
Undeterred by these setbacks, even bolder claims followed, often in the wake of a genuinely impressive accomplishment. After IBM’s DeepBlue defeated chess grandmaster Gary Kasporov in 1997, Raymond Kurzweil, began to popularise the notion of a coming technological singularity, a point beyond which AGI with god-like powers would usher in an age of abundance by 2029. Sam Altman, the CEO of OpenAI, continues this optimistic vision of AGI in his recent essay Moore’s Law for Everything, and in a recent talk about ChatGPT. But real general intelligence – the kind that we often take for granted as common sense in humans – remains elusive. Just over the horizon. Some 10 years away.
I don’t raise these examples to poke fun. It’s tough to make predictions, as Neils Bohr and Yogi-Berra have observed, especially about the future. Herbert Simon and Marvin Minsky were intellectual giants who made extraordinary contributions to multiple fields, Kurzweil led important advances in machines reading printed text and, in one of the cooler moves for a computer scientist, in building synthesisers for Stevie Wonder. What OpenAI has demonstrated through ChatGPT is an astonishing feat of engineering. But, at least in this field, even deep domain expertise doesn’t translate into reliable predictions about what machines will be able to do in the future, or even within the next few years. In fact, very smart people with deep domain expertise often disagree about what is fundamentally possible. This is because no-one understands how the human mind really works, neither the world’s leading computer nor neuro scientists. We have no general theory of the kind of general intelligence humans routinely demonstrate when solving the minutiae of everyday problems, like doing household chores, and even the cognitive capabilities of a young child remains one of science’s deepest mysteries. As the philosopher of mind Jerry Fodor once put it: “How does the mind work? I don’t know. You don’t know. [Steven] Pinker doesn’t know. And, I rather suspect, such is the current state of the art, that if God were to tell us, we wouldn’t understand”.
The shifting goal posts of intelligence
Criticism of the upper limits of AI also dates back to the founding period of research and development. Shortly after Herbert Spencer made his bold predictions about the future of AI in the late 1950s, the philosopher Hubert Dreyfus released the report Alchemy and Artificial Intelligence (1965), and a later book called ‘What computers’ can’t do: the limits of artificial intelligence’ (1972). Dreyfus, one of the first serious critics of AI, argued that computers will never develop human-like intelligence, because our conscious mental abilities rest on a myriad of unconscious and inscrutable processes that do not follow rules of symbolic-logic – the dominant approach to AI at the time. Like the history of bold predictions made by AGI proponents, there have been numerous experts that have given more sober evaluations of the possibility of AGI over the years. Some of these, like renowned linguists and cognitive scientists Noam Chomsky and Steven Pinker argue it’s highly improbable we will ever build artificial minds with human-like flexibility. Others, like computer scientists and AI researchers Judea Pearl (REF) and Gary Marcus (REF), argue that AGI might be feasible one day, but would require some fundamental breakthrough in cognitive and computer science, a paradigm shift in our understanding of the order we associate with a Newton, Darwin, or Einstein before we can even embark on the journey. In fact, a recent book argues that AGI is fundamentally impossible based on our current understanding of mathematics. This, so the authors argue, is ‘why machines will never rule the world’.
Much of this debate rests upon our definition of intelligence, and perhaps even the very coherence of the abstract concept. Try defining intelligence in general terms, without any reference to a specific task. It’s quite difficult isn’t it? One view is that the notion of intelligence isn’t a very useful idea in the abstract, but only when applied to a context, such as the ability to achieve a specific goal in the face of obstacles. This certainly provides a more pragmatic way of defining intelligence, and we’ve long used specific tests we consider difficult as proxies for intelligence to evaluate the capabilities of humans, animals and machines. Is your calculator smart? Well it’s smarter than any human who has ever lived at arithmetic, but is pretty stupid at everything else. Is an octopus smart? Much better than humans at catching crustaceans on an ocean floor, but not so good at driving to the airport. It is for this reason that Alan Turing, one of the founders of computer science, largely gave up on defining machine intelligence in general terms, and suggested we use the ‘imitation game’ as a proxy for intelligence. Now called the Turing Test, it involves a machine – basically a chatbot – that can imitate natural language in a conversation so effectively as to be indistinguishable from a real human.
No machine has passed an ‘officially’ judged Turing Test yet, although some have come close, and ChatGPT can generate text that many have taken as human generated. But on the question of passing specific tests, AI optimists have a pretty good past story to tell. Machines can now surpass humans at many tasks that detractors once claimed require human-like intelligence to perform, from reading handwriting, playing chess, transcribing speech, composing music or recognising human faces. If progress towards ‘strong’ AGI remains contested, our progress on ‘weak’ or narrow AI – building superhuman abilities at specific tasks – should be celebrated rather than downplayed. However, everytime a machine passes a new, previously unassailable test, proponents argue that these same critics play a version of ‘No True Scotsman’ and simply shift the goalposts of intelligence tests to a new challenge. Dreyfus, for example, the original gadfly of AI enthusiasts, once confidently declared that an ‘AI system would never beat a grandmaster at chess’. Whenever these bars are cleared, as this one was in 1997, we tend to downgrade our reverence for the nature of the challenges they present. Games like Chess and Go, once considered emblems of human intellectual achievement, are cast aside for new, apparently even more vexing challenges, like writing a novel. This routine raising of the bar actually has a name, called the AI Effect, where intelligence is constantly redefined as ‘whatever machines haven’t done yet’.
The Mystery of Common Sense
Part of my fascination with the elusive horizon of AGI comes from the fact that both the successes and failures teach us a little more about the enigma of human cognition. We’ve learned that our minds are more mysterious than the pioneering AI researchers assumed. But will machine intelligence eventually outcompete humans at every intellectual feat we cherish? No one knows with certainty. But my reading of the various arguments leads me to believe that this is still a distant prospect, possibly akin to early maritime exploration imagining they could sail to the moon. In fact, there might be something to this analogy. An inadequate theoretical model of the world, say thinking that the earth was a flat disk, would lead to bad ideas about where we could get to by sailing. Each new discovery in sailing technology might look like it could bring us closer to the edge of the world. A bold captain might even predict that we will reach the horizon within the next 10 years. But, as is now obvious, travelling to the moon required a completely different direction of travel, only possible through fundamentally new discoveries in science and a completely different kind of ship.
What has sailing to the moon got to do with machine and human intelligence? Many argue we still have an inadequate theory of the mind and brain, and like flat-earth sailors, this can lead us astray when we imagine what’s possible. For all the advances of artificial intelligence, humans continue to effortlessly exhibit a way of thinking, in fact a specific mode of reasoning, that we still really don’t understand, and so can’t reproduce in machines. We constantly update our understanding of the world (and other people) by using three distinct modes of reasoning:
Induction is perhaps the most fundamental way we acquire knowledge through experience. It is the process by which we order raw data by counting and classifying observations, laying the groundwork to move from specific cases to formulating general rules. For this reason it’s sometimes called bottom-up reasoning.
Deduction is the method of reasoning used to infer logical consequences from premises. We use this to assess the validity of arguments, such as famous syllogisms like: ‘All men are mortal. Socrates is a man. Therefore Socrates is mortal.’ It is sometimes called top-down reasoning, because it moves from general statements and uses logical rules to deduce specific conclusions.
Abduction: is a method of reasoning that involves inferring the best explanation for a given observation. We use it to *‘*guess’ at likely causes or explanations for what we encounter, and use it to construct flexible mental models of the world around us.
We rapidly cycle between these three modes of reasoning when navigating the world. Solving even the most mundane problems like ‘where did I leave my keys’ will require us to jump back and forth between them – looking around the room to take in new information, drawing on past experience to formulate likely scenarios, checking these guesses and updating our mental models based on what we find.
Thinking Machines
We can program machines to approximate deductive and inductive reasoning, but we have no idea how to get computers to do abduction, largely because how humans do it is still largely a mystery. In very simple terms, there are two main approaches to building artificial intelligence. The first, Symbolic AI, involves creating symbolic representations of functions using formal, algebra-like code. This approach is basically similar to what a calculator does, it is programmed to perform mathematical calculations by following specific sequences of code, and performs this narrow set of operations much faster and more accurately than we can. This approach employs deductive reasoning – machines can execute a set of logical operations that they have been manually coded to perform. If we input to a computer Y=X+2, and X=5, they are excellent at performing logical operations and outputting that Y=7. The initial success of this kind of symbolic manipulation was the kind that first excited Herbert Simon and others to imagine that we could use these methods to model general intelligence back in the 1950s, and it’s now called classical or even Good Old Fashioned AI.
The second approach, Connectionism involves creating large networks of simple processing units, called artificial neurons, that can identify patterns via statistical approximation when trained on sufficiently large amounts of data. Although the basic methods of this process have been understood since the 1950s, this approach, more specifically deep learning neural networks, has dominated the past decade as the web has facilitated abundant data for training. The benefits of this approach is that machines can learn to do things, say recognise images or play a computer game, that they were not explicitly programmed to do. It is this deep learning approach that has led to the recent explosion of AI applications, including Large Language Models like ChatGPT.
The first approach uses symbolic manipulation and is largely deductive. It works extremely well for performing logical operations that it has been programmed to execute, but has no way of working out whether the premises are true or false – does X really equal 5? – and can’t learn anything new beyond what it has been programmed to do. The second, connectionist approach, is largely inductive. It can ‘learn’ associations between data that it wasn’t explicitly programmed to do, but it doesn’t transform these connections into understanding through constructing abstracted models and representations of the world. This lack of symbolic representation is part of the reason machine learning models can appear as ‘black boxes’ to us, where the reasons it forms associations remain inscrutable to us. This can raise all manner of practical and ethical problems when used to make consequential decisions. But it also might put some hard constraints around what these neural networking approaches can ultimately get machines to actually learn. Deep learning models can develop superhuman abilities when trained on sufficient data along narrow tasks within constrained parameters, such as playing computer games and recognising images. But these abilities tend to be brittle, introducing a few novel elements into the mix and they get confused by problems that look like common sense to us. This ability to get easily fooled has meant that developing fully autonomous vehicles that can drive around a busy city is much more difficult and dangerous than many had hoped.
The Magic of Abduction
The problem is that most of the real world is, and has always been, volatile, uncertain, complex and ambiguous. Neither rigidly logical deduction nor naively open induction were sufficient to solve the problems our forebears faced. These processes had to be complemented and coordinated by abduction. We reason by guessing, we conjure up theories that might explain what we see before us, and help us make predictions about what might come next. But, most importantly, our representations of the world are provisional, able to be updated as we encounter new information. It is abduction, drawing upon our prior knowledge to formulate our best guess, that enables us to deal with outliers and novel situations, as it frees us from merely applying rigid rules or haphazardly groping for more data. Guessing comes naturally to us, so easily in fact that in school we often disparage it (‘don’t just guess!’), and hold deductive reasoning and memorisation in higher regard – ironically capacities that machines can excel in. But abductive inference, or guessing, infuses the very way we perceive and learn about the world. Charles Sanders Peirce, the first real theorist of abduction, put it like this:
“All that makes knowledge applicable comes to us via abduction…I perform an abduction when I so much as express in a sentence anything I see. The truth is that the whole fabric of our knowledge is one matted felt of pure hypothesis confirmed and refined by induction. Not the smallest advance can be made in knowledge beyond the stage of vacant staring, without making an abduction at every step.”
Moravec’s Paradox: why the hard problems are easy and the easy problems hard
Our capacity for conjecture is not a bug, but a startling feature of human cognition. Our brains are abduction machines, constantly generating stories to make sense of and explain what we observe around us. It’s a key reason that our minds work differently to machines, and why things we find effortlessly easy are so difficult for machines to do. In fact, the optimism of the pioneering AI researchers was propelled by the reverse discovery, that the things we think of as difficult, such as solving algebraic problems, machines do exceptionally well – much faster and less error prone than humans. They assumed, perhaps understandably, that if machines can do things we think of as difficult so effortlessly, things we find easy, such as visually recognising faces, or walking around a room and picking up objects, should be achieved with relative ease. But by the 1980s, after decades of research saw minimal progress on these problems, it was clear this wasn’t the case. Despite all the advances in computation, the skills in perception, comprehension and mobility that toddlers exhibit still appeared beyond the world’s leading AI labs. This problem has come to be known as Moravec’s Paradox, after the AI researcher Hans Moravec observed :
“It is comparatively easy to make computers exhibit adult-level performance in solving problems on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility.”
Chess actually continues to be an interesting example of Moravec’s Paradox, because while we can build machines that can easily defeat any human at the game, we still can’t build machines that “can walk to the shelf, take down the chess set, set up the pieces, and move them around during a game.”
Back to ChatGPT: Fun Toy or Productive Tool?
So what are ChatGPT and other Large Language Models actually doing when they respond to a prompt or answer a query? GPT’s deep learning neural network (in the connectionist approach), has been trained on a mind-boggling amount of data (570GB) to predict the next word, and now sequences of words, in response to a text prompt. At its base, it’s doing something similar to the predictive text features that try and autocomplete a sentence we encounter when performing a google search. It’s just much, much better at it. So impressive in fact, that it seems like it understands language. These models are extraordinary examples of Natural Language Processing, and have even convinced a few engineers that work with them that they’re actually sentient. The hope by deep learning enthusiasts is that feeding ever larger amounts of data into the system will bootstrap emergent properties that function more like general, flexible intelligence.
What GPT can do
The first few times you use ChatGPT, it seems phenomenally impressive. But use it a hundred times and you start to see its limitations. For example, it’s generally excellent at summarising information, either based on its training data or from text you directly input. It seems extraordinarily competent at translating between different languages too. It can read computer code in most programming languages, and can often help detect errors and actually write lines of code. It’s very good at imitating styles of writing, and consequently you can give it all sorts of amusing stylistic challenges.
What GPT can’t do
But, sometimes it completely makes things up, and offers some with a confidence indistinguishable from the truth. This is easy to see if you give it some maths problems. Based on its training data, it can predict the correct answers to simple operations, it will tell you that 2+2 = 4. But give it a few more complicated operations, and it will start to spit out errors. This is because it’s extrapolating text predictions not actually calculating the numbers. So what, some might say, that’s why we have calculators and spreadsheets. We use different tools for different tasks, and don’t say a rolling pin is bad because it can’t cut like a knife. But, unless we keep its mechanism clearly in mind, this kind of text generation might lead to all sorts of problems. It can’t cite sources for the information it provides (at least not yet, but other models are working on this). Not only this, but it often invents plausible sounding references to scientific papers and books, ones that look legitimate until you actually try and find them, because they’re pastiches of real authors, journals and papers.
How can you use it?
In my own work, I’ve been experimenting with using it as another tool alongside the standard suite of knowledge work applications – google, google scholar, wikipedia, word processors, spreadsheets etc. Sometimes it feels like having a savant-like research assistant with a secret ayahuasca addiction. It can function as a super skilled assistant for some tasks, such as providing quick summaries of areas of knowledge I don’t know much about (but you need to check these), or providing alternative edits to paragraphs of writing. But it can also contradict itself, get confused and confidently hallucinate complete bullshit, especially if you press it to come up with sources of its information. There’s actually a growing skill in directing these models to output desired information, called Prompt Engineering, which is worth learning about.
Zero-Marginal-Cost Bullshit
“The bullshitter is faking things. But this does not mean that he necessarily gets them wrong.”
—Harry Frankfurt
Bullshit might actually be quite an apt word for what these models do. On Bullshit, a now famous essay by the philosopher Harry Frankfurt, makes an important distinction between lying, or deliberate deception, and bullshiting. A liar knows what the truth actually is, but chooses to say something else. For a bullshitter, the difference between truth and reality is irrelevant, inconsequential and perhaps even completely unknown. Whether statements map on to reality or deviate from will be due to other factors than a regard for the truth. Frankfurt argues that “lack of connection to a concern with truth—this indifference to how things really are—that I regard as the essence of bullshit.”
GPT and similar models appear to be doing something very similar, which is why some critics have called them stochastic parrots. This isn’t necessarily a problem if we use the tools appropriately. I’ve had interesting exchanges with ChatGPT that I’ve actually found helpful in thinking through a new idea. But I hold the truthfulness of the responses in the same vein as a set of Tarot Cards (or what I imagine I would if I used Tarot Cards), as an interesting stimulus for thinking about something but not as a reliable map of reality.
Like social media, we might see that unleashing an industrial grade bullshit machine in the wild has all sorts of negative effects. Schools and universities, already besieged by the cat and mouse game of detecting plagiarism and academic integrity that the web has introduced, might have to give up on the take home, unsupervised essay, as an meaningful model of assessment. Just as calculators did for arithmetic and google has done for rote memorisation of facts. But for genuinely self-directed learning (rather than institutional assessment) these technologies might end up being as helpful as google searches can be.
Beyond testing student’s writing abilities, enabling people to rapidly generate highly plausible sounding mistruths might have grave outcomes on social attitudes towards the trustworthiness of information in general. Public trust in the credibility and reliability of scientific knowledge and other official information has already come under great strain in recent years and an entire industry of alternative information sources has emerged across the new media landscape, social media channels, podcasts, newsletters etc. The financial incentives that drive these new media entrepreneurs are usually to attract and retain as much attention as possible, this maximises advertising or subscription revenues, rather than communicate as accurately, cautiously and truthfully as possible. The prospect of introducing extremely powerful tools that can instantaneously generate not just persuasive text, but also images, audio and even video, driven by the current attention maximising business models of the web, should clearly cause alarm. The combination of these tools, coupled with the data-rich psychological profiles social media companies develop, could make micro-targeted, personalised propaganda, much easier. Zero-marginal cost bullshit will no doubt be of great interest to actors that see benefit in stirring a public fog of misinformation.
Orchestrating A Suite of Smarter Tools
These reservations notwithstanding, deep learning and large language models like ChatGPT are clearly extremely powerful. As a platform infrastructure, I think they will power a suite of smarter knowledge work tools that we can instruct through natural language, and that can offer useful predictions, powerful versions of autocomplete. The analogy I would draw is with electrification. We’ve made many tools more effective and easy to use by powering them with electricity, from lights, to fridges to drills, but they use the energy to do very different things. For example, Microsoft has already built one of these for computer programers called Co-Pilot, that integrates into code editors that can translate natural language prompts into coding suggestions. Some computer engineers claim it frees them up to work on the higher order and more important problems of software design. I think this is a good way to think about these tools, as co-pilots, rather than substitutes for human knowledge and skill. I’m excited about the new array of tools that will become available to support human creativity and ingenuity over the coming years, as long as we understand their strengths and remember their limitations.