Natural-language interaction is here: now what?
I had the great good fortune to spend my first few formative years in digital product design and development at the seminal consultancy Cooper, founded and led by the brilliant Alan Cooper. Alan invented Visual Basic and sold it to Microsoft; he was/is a visionary software developer who recognized the need to deliver software that would serve non-technical users better.
Alan taught us to center our work on serving the goals of personas — the actual, targeted users. Target personas for most software usually don’t have the same mindset and skills as a software developer or an interaction designer, and so we learned to design solutions that would bring software’s behavior closer to the mental model of the actual user, instead of reflecting the implementation model of the engineered system.
In those days, and until quite recently, software always needed to provide a graphical user interface (UI) to its users. UIs basically offer a set of inputs and outputs, along with content. You know: a dialog box that displays a set of settings to be configured, a drop-down menu with pre-defined choices, a checkbox, and a button labeled “Submit”. Over time, of course, UIs have become more dynamic in terms of their visual pattern languages and improvements like animations for state transitions. We have also seen a rise in alternative input mechanisms such as voice or gesture, particularly in mobile contexts or services.
From the start, interaction designers at Cooper would frequently seek to provide natural language inputs to better serve people’s mental models while using the constrained set of capabilities built into the software and its databases. These natural-language tools were often used for search interfaces and usually took the form of a smart set of inputs comprising logically-linked drop-down menus and free-text entry fields that together resulted in a complete sentence. We also usually had to actively advocate for engineers to build relational databases on the backend to support this innovative, user-centered UI pattern.
Given my deep roots in Cooper’s design principles, I can’t overstate my excitement at the recent explosion of so-called generative AI tools (that is, LLM/ML systems such as ChatGPT or Bard) that can accept highly unstructured natural language inputs — and then respond with natural language outputs making it seem that the system is capable of conversation. (Now of course we’ve been using natural language text strings with Google’s search box for ages, but we still got back a structured list of things that required detailed parsing.)
Being humans, we’re highly skilled at having conversations. After all, we’re trained from birth to interact with other people who share enough of our language (whether spoken, written or gestural) to make sense together. Conversations are a powerful way for the entities involved to pass along information, develop shared understanding, clarify the root of matters, and make plans; if those conversations are recorded and shared they can also provide similar value for those who observe them.
This new mode of interaction will continue to transform access to information for people the world over, and I truly believe we’ll look back at these systems arising as an inflection point for software design and development. Those old-school dialog boxes where the software displays a constrained set of options for the user to submit their choices are going to become “dialog spaces” where users have the ability to freely express their needs, preferences, and choices using their own words. This shift is happening in the mainstream already, such as in the customer service bot space for example. When conceived and applied more broadly, it will be a quantum leap forward in providing people with the ability to interact effectively with software-driven systems.
What’s less clear at this time is how the information created and embodied within such conversations can serve as a useful, meaningful repository that a person can re-access and utilize in an efficient way. Are we always going to have to go through a back-and-forth conversation to elicit information via prompts? I would certainly hope not.
Take for example the potential for AI/LLMs in healthcare settings to democratize patient access to one’s personal health records. The complex domain of medical care is relatively impenetrable to non-clinicians (not least because of the silo-ed nature of information being held in disparate EHRs). Now imagine experiencing some back pain and going to your personal AI — who’s across all of your personal health records in an ideal world. You will be able to ask it: “When did I last have that back problem, I think it started with an ‘s’…” to which it responds: “You were treated for sciatica, a problem that manifests with pain that radiates into the legs, about 3 years ago.”
And then…what? Does it offer to display the specific medical record from the past? Does it offer a review of the previously-prescribed treatment? Does the system have any inkling that this question might mean that this health problem is now recurring? While a human interlocutor could leap to all of these ideas, they yet need to be programmed into the software (no matter how opaque its LLM may be). We also don’t necessarily want to put that next-step burden on us to have to drill into that information like a miner seeking to extract a gem from the surrounding rock — after all, we’re dealing with a real-life health problem.
Furthermore, while a linear conversation is a type of communication that can produce valuable results, it’s not an efficient way to store large bodies of inter-related information, nor is it a particularly efficient way to specify highly complex or contextual inputs. Let’s not be forced to conduct linear conversations that we drive forward ourselves as the interaction model for every sort of task and need!
I’m eager to work with teams exploring these opportunities in healthcare. We need to expand on the clear potential to create novel AI-enabled interaction designs that are able to transform linear conversations into more actionable and accessible information spaces. We’re creating new and welcome input mechanisms, yet we must keep innovating on their outputs and next steps so that we can make the software we build behave in ever more contextually-appropriate, human-centered ways.