Partner modelling: heuristic criteria for LLMs

Interaction design & the traits of LLMs

I’m OK You’re OK
I’m OK You’re OK

Interaction design might be coming into focus soon for the design of generative AI. This week Claude 3 is reaping praise for its personality and affective realism. Two weeks ago, it was groq and its near-realtime speed of responses. We are increasingly, and with some inevitability, appreciating the stylistic presence of generative AI. The more human-like AI becomes, the more we appreciate the nuances of interaction.

Benchmarking falls short of AI’s social characteristics

This brings us to an interesting place for the design and implementation of generative AI. Benchmarks for AI tend to focus on measurable capabilities and performance. This, for obvious reasons — you can only benchmark what you can standardize and formalize.

But there are clearly degrees of affect and presence to these large language models that are subjective. Our impressions of the human-like character of these models are individual and personal. They depend on our prompts or interactions. They may depend, also, on the content we’re engaging in. And our degree of sophistication with interacting with chat agents. These are factors that are more difficult to benchmark.

Partner Modelling

I don’t know of any system or test with which to benchmark the affective aspects of LLMs. I’ve pulled some sections from a white paper on partner modelling below. This paper covers research into the characteristics and traits of chat agents and natural language interfaces prior to the GPT era. So Alexa, Siri, etc. These are types of chat interfaces that are neither as creative or human-like as current LLMs, nor mistaken for such. So the characteristics that users report about them befit those of conventional technologies and services.

I think the criteria that the research surfaced are interesting. They include aspects of personality that could easily describe people: consistency, reliability, competent, honest, and so on. But they don’t include relational criteria. In none of these measures would one see a relationship. There’s no consideration of friendship, trust, or kindness. The criteria do include communication skills, but not performative skills. They cover ways of characterizing the functionality and operation of a chat agent, but not the agent’s ability to engage in open dialog and conversation (let alone some kind of theater or “social” situation).

Depth of Interface — new aspects of chat agents?

It is early days yet for theorizing about the interaction design of LLMs, but I suspect that there is a depth of interface at play in the degree to which chat agents can engage with users. There will be attributes and measures of this interactional, conversational, and quasi-social fluency and competence. We will evaluate models not only for their ability to seem human and personable, but for their competence in following along and engaging in a strip of real-time interaction.

The more a model is able to approximate this, the more it will seem “real” to us. And the more it seems real, the more we will use social interaction as an approach to model design. For social competencies (of people) are themselves natural prompting techniques. (We essentially prompt one another in face to face social interactions, using context, purpose, intent, and convention etc every day.)

Front of face, back of face

I think of front-of-face as the surface, back-of-face as thinking. When in real time conversation or chat with an agent, then, front-of-face attributes govern interaction. When using an LLM to perform tasks on documents or to automate workflows, back-of-face, or operational and functional attributes will govern.

Both of these are to a degree included in the partner modelling list of characteristics. The back-of-face traits would be those pertaining to machine intelligence, such as reliability, efficiency, expertise, and precision, etc. Front-of-face traits would be those ascribed to human social dimensions, such as warm, empathetic, authentic, social, interactive, and spontaneous.

There might then also be new attributes corresponding to social interaction: awareness, attentiveness, interestedness, engagement, synchronization. These would describe the agent’s ability to make us (as users) feel co-presence, feel seen, and feel attention. LLMs could do some of this with non-verbal signaling. Imagine that the interface has a small animation that changes color to indicate that the agent is listening, thinking, or ready to reply. These would be front of face characteristics that would signal to users a sense of being with, of talking with not at.

Intelligence and speed

The traits listed in the partner modelling survey need to be supplemented to cover the high degree of intelligence exhibited by LLMs. Conventional NLP-based chat agents are functionally capable, but not “intelligent.” LLMs now have the intelligence that exceeds our own, if their range of capabilities and depth of knowledge is taken into account. When coupled to groq’s real-time LPU processing unit, the speed of this performance is surprising.

Machines that are this capable and fast now present novel qualities of “mind” or “thought” that beg for new criteria of evaluation. Chat agents as fast as the groq demo are too fast to be conversational agents. They don’t feel human because they’re too fast. What then is the measure of an LLM that is clearly better than us? Is it a measure of speed or one of comprehension? Do we claim that an agent is real-time or instantaneous, or that it is super intelligent and capable?

Speed certainly is an attribute of these agents. Speed is variable, from too slow and laggy to real-time and synchronized. Faster than that and it becomes machine-like again. In fact groq chipsets are so fast that models practically print results instantly. The streaming quality of GPT gives way to a page- or screen-printing quality. Is speed then a spectrum that runs from agent interaction to content publishing: chat agents are slowed to seem like conversation partners, accelerated agents become quasi content publishers?

Personalization

Our relationships to LLMs will change when they become personalized. Agents that remember our previous interactions with them, and which can anticipate our interests based on previous experience (along with private and personal data and profile information), will open up new possibilities for human-like interaction again. Agents will not only be able to personalize their conversation, they will be able to anticipate and act on this personal information. At this point we will be able to speak of quasi-relationships. Attributes of relationships belong to psychology, and will likely include degrees and variations on trust, friendliness, like-mindedness, kindness, humor, and the like.

Where prompting for human-like interaction with LLMs currently enjoys different ways of reasoning, prompting for social compatibility might include far more degrees of social competence. We will learn that the interaction itself can be used to achieve results. Interaction design will emphasize front-of-face characteristics. For holding a user in an interaction, being flexible with prompt responses, using personal information to anticipate specific individual interests and requirements, and to customize the experience will become ways of differentiating brand model offerings.

Partner Modelling

Here are a few excerpts from the partner modelling research paper. The attributes of communicative ability are presented in a bulleted list.

The Partner Modelling Questionnaire: A validated self-report measure of perceptions toward machines as dialogue partners

“Partner models are said to reflect perceptions of a dialogue partner’s communicative ability, and have been shown to influence language production in both human-human dialogue (HHD) and human-machine dialogue (HMD) [13, 28], with people adapting their speech and language behaviours based on their partner model for both human and machine dialogue partners [13, 30].”

Partner modelling comes from the characteristics of communication between people. It can be debated whether “people form a mental representation of their dialogue partner as a communicative and social entity.” The part up for debate would be whether we form mental representations. But regardless of whether we “model” our behaviors on conceptual models of one another, the characteristics identified in this research were found useful in evaluating machine natural language interfaces:

“These models are thought to inform people’s language production in dialogue, leading them to design utterances around their audience’s perceived capabilities and social background with the aim of increasing the chance of communicative success.”

“The basic tenet of partner modelling is that people form a mental representation of their dialogue partner as a communicative and social entity [13, 30]. Originating in psycholinguistics, the concept proposes that this mental representation informs what people say to a given interlocutor, how they say it, and the types of tasks someone might entrust their partner to carry out [13, 15]. Hence, partner models might also be understood as a heuristic account of a partner’s communicative ability and social relevance that guides a speaker toward interaction and language behaviours that are appropriate for a given interlocutor.”

The research proposes to inform the interface design of chat agents:

Speech interfaces may include a broad range of technologies, from Voice Assistants such as Amazon’s Alexa, Apple’s Siri, Google Assistant or Microsoft’s Cortana….. Essentially, when referring to speech interfaces we mean any computer system you have interacted with using speech. You may have accessed these using, among other things, a smartphone, smart speaker, desktop or laptop computer, and/or in-car.

More specifically, the model comprises of:

“an interlocutor’s cognitive representation of beliefs about their dialogue partner’s communicative ability. These perceptions are multidimensional and include judgements about cognitive, empathetic and/or functional capabilities of a dialogue partner. Initially informed by previous experience, assumptions and stereotypes, partner models are dynamically updated based on a dialogue partner’s behaviour and/or events during dialogue.”

“In this sense it is similar to accounts of mental models in cognitive psychology [e.g., 45, 46] and Norman’s explanation of mental models in human-computer interaction (HCI) [65]. Indeed, a partner model can be broadly understood as a mental model of the dialogue partner.”

Model factors:

  • Competent/Incompetent
  • Dependable/Unreliable
  • Capable/Incapable
  • Consistent/Inconsistent
  • Reliable/Uncertain
  • Clear/Ambiguous
  • Direct/Meandering
  • Expert/Amateur
  • Efficient/Inefficient
  • Honest/Misleading
  • Precise/Vague
  • Cooperative/Uncooperative
  • Human-like/Machine-like
  • Life-like/Tool-like
  • Warm/Cold
  • Empathetic/Apathetic
  • Personal/Generic
  • Authentic/Fake
  • Social/Transactional
  • Flexible/Inflexible
  • Interactive/Stop-Start
  • Interpretive/Literal
  • Spontaneous/Predetermined

The communicative competence and dependability factor, consisted of 12 items that accounted for 49 % of the variance within the model and demonstrated strong internal reliability (𝛼=0.88). The strongest loading items were competent/ incompetent, dependable/unreliable and capable/incapable. Collectively, these items reflect perceptions towards whether the machine is a dependable and competent dialogue partner.

The human-likeness in communication factor contained 7 items that accounted for 32 % of the variance within the model, and also demonstrated strong internal reliability (𝛼=0.8). The strongest loading items were human-like/machinelike, life-like/tool-like and warm/cold, which were accompanied by other items that reflect on how alike or unlike humans a system is seen to be in the way it communicates. This supports previous intuitions that humans act as an archetype for people when reasoning about or evaluating speech interface systems [34].

Finally, the communicative flexibility factor contained four items that accounted for 19 % of the variance within the model and also had good internal reliability (𝛼=0.72). Items within factor 3 included flexible/inflexible, interactive/stopstart, interpretive/literal and spontaneous/predetermined, coalescing around the concept of how flexible or predetermined dialogue agent capabilities are perceived to be.

Conclusion

Large language models are quickly becoming human-like in their interactions with us. They are faster and their comprehension is more life-like. Their stylistic qualities are more accurate. They are becoming more “intelligent” and capable. And soon will begin to use personal information to individualize their interactions.

The more life-like they become, the more interaction design of generative AI will draw on aspects of real-time human face-to-face interaction. The partner modelling research summarized here offers a good starting place for some of this interaction design theory. But we will want to augment it with characteristics that are more psychological and social in nature. And we will want to distinguish between interaction design guidelines for conversational experiences vs. AI as functional intelligence. I’ve suggested this might be characterized with a depth of interface metaphor, where front-of-face attributes govern conversation, and back-of-face attributes govern intelligence. The metaphor itself is unimportant. But I do think that the more lifelike we want these agents to be, the more our design theories will need to incorporate elements of social interaction.


Partner modelling: heuristic criteria for LLMs was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *