Prompts should be designed — not engineered
Rallying design’s relevance in the AI era.
“Any sufficiently advanced technology is indistinguishable from magic”…
Arthur C. Clarke’s third law seems to be cropping up everywhere these days, serving as a testament to the awe-inspiring power of AI. It perfectly encapsulates a feeling many of us know all too well…waking up to a new AI advancement and being gobsmacked about what it can do.
“Now it can create videos!” I find myself exclaiming to my partner over morning coffee.
However, I’ve interpreted Clarke’s third law as a reminder that technology isn’t actually magic, even if it appears to be.
It is a crucial reality check: if we become too enamored with this “magical” technology, and we don’t question its power, we will inadvertently minimize our own.
For example, there’s a weird misconception in the product world that AI is so powerful it doesn’t need designing — that simply connecting a foundational model’s API to any application will instantly unlock a value geyser.
On the contrary, as powerful as these models are, they will be useless without design. It seems like, as an industry, we haven’t shipped enough AI features — or seen enough AI failures — to fully realize this fact yet.
Nothing has convinced me more of design’s critical role in the AI era than when I’m engineering a model’s prompts, as I find myself relying on my core design skill set to do the job well.
Wire prompt, cloth prompt
Prompt engineering is how you program a general-purpose model, like ChatGPT or Gemini, to complete a specific task. This involves providing the instructions, or “chaining” instructions together for how to complete this job. However, there is far more art than science in this process.
Remember Harlow’s monkey experiments from Psychology 101? Baby monkeys were given the choice between two “mothers”: one made of wire that provided food and another made of cloth that offered comfort. Unsurprisingly, the baby monkeys showed a strong preference for the soft, comforting cloth mother.
This is how I see the difference between “engineering” and “designing” a prompt.
A wire prompt is engineered to complete a job — while a cloth prompt intuitively understands a user’s needs and is designed to provide a fluid and supportive experience for the user.
Anyone can access an API and prompt a foundational model to execute a task. Sophistication, in prompting, comes from deeply understanding the user — their attitudes and behaviors towards this experience — and integrating that knowledge into the prompt.
Cloth “mother” (left), Wire “mother” (right)
This is even more important than with traditional software interactions because of the open-ended nature of conversational experiences. As noted by designer, Emily Campbell, “the more autonomy that users have to direct computers to their personal whim, the better they will expect computers to understand them and anticipate their needs.”
Logan Kilpatrick, head of developer relations for OpenAI, boils it down to “context.” The model needs context to do its perform a specific task well. This makes prompting an “inherently human” task because it requires that we understand the nuances of what people are trying to achieve.
At risk of stating the obvious: this. is. what. we. do.
Without design, the next wave of AI applications will be brittle little things that waste an organization’s precious AI resources.
Prompt design is an iterative process
There’s a fundamental difference between traditional software engineering and prompt engineering that is worth highlighting.
In traditional software engineering, we can pinpoint the lines of code necessary to produce the desired outcome. For example, we know what kind of function to use in order to validate a user’s credentials against a database.
In prompt engineering, we can’t predict how the model will respond to the prompts we craft. Instead of direct programming, we’re steering the model towards the outcomes we aim for.
This is why an iterative prompt design process is required. In this process, we continuously design and refine, based on user and model insights. (I’ve written recently about the “two-sided user test” where a researcher studies the user and the model’s behaviors simultaneously.)
The discipline of prompt engineering is still in its infancy, with its best practices yet to be discovered. For instance, even OpenAPI only defines three categories for prompt in its playground: system, user, and assistant.
Personally, I use six categories to structure the prompt design process:
- Flow
- Role
- Mission
- Guidelines
- Examples
- Output parameters
I’m working on a prompt design guide that outlines each of these. You can subscribe for access when it’s complete. In this article, my goal is to bring the iterative nature of the prompt design process to life.
The power of user insights in prompting
I’m not implying that engineering is unnecessary. For example, consider a ‘clothing copilot’ designed to recommend outfits to a fashion company’s customers. Creating an architecture to support this functionality demands significant engineering effort; however, the prompts within this architecture require meticulous design.
Flow
A designer’s first job is to map the steps that go into achieving the end goal. Conversational AI interactions are far more open-ended than standard software workflows, yet the copilot still needs to lead users through a logical sequence. Without this, the model and your users can become lost. Or as highlighted by designer, Paz Perez, users “don’t know what to do or ask for when interacting with an AI-powered chat.”
The iterative prompt design process…
Let’s imagine that the designer initially outlined a flow consisting of three steps: 1) identify the occasion for the outfit 2) discern the user’s style and 3) present options to the user. However, during testing, the designer notices that users frequently provide extra context about the inspiration behind their outfit (e.g. “I always wear neutral colors, but I really want to start wearing more funky patterns”). Because of this, the designer recommends adding a fourth prompt to the flow, wherein the copilot probes into the user’s goal for the outfit.
Examples
The most powerful method to guide a non-deterministic model towards desired outcomes is by providing examples of what you want the model to do. This approach, known as few-shot prompting, allows the model to mimic the examples provided.
The iterative prompt design process…
To help the model identify the occasion for the outfit, the designer creates a single example within the prompt, illustrating how the model should respond to a user’s input.
- User input “I need a dress for prom.”
- Model response: occasion, prom
During testing, however, the designer notices a pattern: sometimes users mention a specific event (e.g. “my birthday”) and other times they mention a motive for buying (e.g. “I broke up with my boyfriend”). This insight leads the designer to develop two categories of examples for the prompt: one for event-based occasions and another for motivation-based reasons.
Guidelines
Examples are a powerful tool, but occasionally, the model requires extra direction. This is where guidelines become useful. A guideline helps further refine the model’s behavior, more explicitly than an example alone.
The iterative prompt design process…
In testing, the designer notices that the model is paraphrasing a user’s response and losing important detail.
For example:
- User input: “I need a dress for prom.”
- Model response: occasion, the user needs a dress for a dance
The designer adds a guideline to the prompt that explicitly specifies: “use the exact wording that the user provides; try not to paraphrase their response.” This guideline can provide additional context to shape the outcome.
It’s Aries Season
I’m hoping these simple examples help highlight how crucial designers and the iterative design process are in creating AI experiences that are usable and valuable.
Let’s face it: no one’s going to sit around waiting for designers to become valuable in the AI era, but companies need design more than ever.
When imposter syndrome strikes, just remember, nobody has all the answers. We’re all trying to figure it out as we go, together.
This article was originally published in Empathy & AI, follow for more human-centered AI content or reach out on Linkedin.
References
- Esther Inglis-Arkell, Technology isn’t Magic: Why Clarke’s Third Law always bugged me https://gizmodo.com/technology-isnt-magic-why-clarkes-third-law-always-bug-479194151
- Prompting Engineering Guide, Prompt Chaining https://www.promptingguide.ai/introduction/basics
- Emily Campbell, My emerging heuristics for assessing AI Design https://open.substack.com/pub/shapeofai/p/my-emerging-heuristics-for-assessing?r=2bjdh0&utm_campaign=post&utm_medium=email
- Logan Kilpatrick, Inside Open API Lenny’s Podcast https://www.lennyspodcast.com/inside-openai-logan-kilpatrick-head-of-developer-relations/
- Paz Perez, Designers: AI needs context https://uxdesign.cc/ai-needs-context-74ed452696e2
- Open AI prompt guide https://platform.openai.com/docs/guides/fine-tuning/use-a-fine-tuned-model
Prompts should be designed — not engineered was originally published in UX Collective on Medium, where people are continuing the conversation by highlighting and responding to this story.
Leave a Reply