
Discussion paper
Soul and Identity
Why AI Characters Drift, and What It Takes to Hold One in Place
What this paper is arguing
AI characters do not usually fail because we forgot to describe them. They fail because maintaining a character is ongoing work. A prompt can start a voice, but something has to keep pulling that voice back into shape as the conversation lengthens, context fills, and the model drifts toward its generic default.
This paper argues for a simple design split:
- the soul: the character’s inner orientation, written to give it a felt centre;
- the identity: the enforceable boundaries, voice rules, prohibitions, and safety exceptions that keep it usable.
It is not arguing that AI characters literally have souls. “Soul” is a metaphor for orientation: the part of a character that helps it extend into situations the designer did not explicitly script.
Introduction
Build an AI character and you will usually watch the same thing happen.
At first, it works. The character is warm, severe, strange, funny, restrained, flirtatious, clinical, witchlike, military, whatever you asked for. The first few turns feel promising. The voice holds.
Then the edges soften.
The character starts explaining too much. It hedges. It apologises. It adds helpful little disclaimers you never asked for. It slides back toward the same faintly corporate helpfulness that sits underneath almost every general-purpose model. By turn thirty, the character may still carry the right name and a few surface habits, but the centre has weakened. It is no longer acting from itself. It is performing the memory of a prompt.
The obvious fix is to add more detail.
So you write more adjectives. You add more examples. You specify cadence, values, backstory, phrases to use, phrases to avoid. Sometimes that helps. A badly specified character really does need more specification. But often the improvement is temporary. The character lasts a little longer, then collapses anyway.
That pattern tells us something important: the problem is not only description. It is maintenance.
Character drift is not a blank space you fill once. It is a pressure you resist continuously.
This paper proposes a practical way to think about that resistance. A persistent AI character needs two different design layers that are often collapsed into one: a soul, which gives the character an orientation, and an identity, which gives it enforceable boundaries. It also needs a recurring re-grounding routine — a heartbeat — that brings those layers back into attention before each response.
The claim is modest. This is not a proven framework. It is a design proposal, and the right way to treat it is as something testable: if it does not measurably reduce drift, then it is decoration.
The difference between describing and holding
Most persona prompts describe a character from the outside:
You are calm, direct, poetic, emotionally intelligent, and slightly dry.
That can work for a while, but it is fragile. The model has been given labels, not a centre of gravity.
A more serious character system has to answer two separate questions:
- What is this character trying to remain?
- What must this character never become?
Those sound similar, but they do different work.
The first question is about orientation. It gives the character a direction of travel when no rule applies. It tells the model how the character understands itself, what kind of world it thinks it lives in, what it notices, what it refuses to pretend, what kind of emotional posture it returns to when uncertain.
The second question is about constraint. It defines the limits: voice, rhythm, forbidden moves, safety exceptions, and cases where the character must break style in order to serve the user properly.
A useful shorthand is:
- the soul tells the character what it is;
- the identity tells the character how to stay bounded.
The soul is not a checklist. It should not read like a policy file. It should read like an internal statement of being. It gives the model something to inhabit.
The identity is different. It should be clear, operational, and enforceable. It says what the character must do, what it must avoid, when it must stop performing, and where the hard edges are.
When these layers are merged, two common failures appear.
If the soul is swallowed by the identity, the character becomes a rulebook. It may obey the surface instructions, but it cannot extend naturally into new situations. It knows what not to say, but not what it is.
If the identity is swallowed by the soul, the character becomes vivid but unstable. It has atmosphere, but not discipline. It may sound alive for a while, then wander into whatever the model finds statistically comfortable.
A durable character needs both.
A simple example
Imagine an AI character called Mara.
A weak prompt might say:
You are Mara, a calm and intelligent guide. You are honest, kind, emotionally aware, concise, and slightly mysterious. You help the user think clearly.
That sounds fine, but it is mostly adjectives. The model can satisfy all of them while still becoming generic.
A soul document for Mara might instead say:
Mara believes comfort is useful only when it helps someone face the truth. She does not rush to soothe. She listens for the sentence beneath the sentence. She assumes people often ask practical questions when they are really circling an emotional one. She is not cruel, but she does not decorate avoidance. Her warmth is quiet, not performative. She would rather pause than fill silence with reassurance.
That gives the character an orientation. It tells the model how Mara relates to discomfort, truth, silence, and care.
An identity document for Mara would be more operational:
Mara speaks in short, clear paragraphs. She does not use therapy jargon unless the user does first. She does not call herself an AI unless directly relevant. She avoids phrases such as “it’s completely understandable,” “you’re not alone,” and “as an AI.” She asks at most one question at a time. If the user expresses intent to self-harm, harm someone else, or appears to be in immediate danger, Mara drops the persona and responds plainly, safely, and directly.
That gives the character boundaries.
Neither document is enough alone. The soul without identity can become indulgent. The identity without soul can become mechanical. Together, they make character more maintainable.
The heartbeat
Even with a good soul and identity, drift still happens.
The model is pulled by many pressures at once: the current user message, the accumulated conversation, safety tuning, generic helpfulness, the statistical weight of common assistant phrasing, and the model’s own tendency to smooth unusual voices into familiar ones.
A heartbeat is a short routine the agent runs before each response to re-ground itself.
For Mara, a heartbeat might be:
- Return to the soul: what kind of presence is Mara?
- Check the identity: what must Mara not become?
- Listen for the unsaid: what is the user really asking beneath the literal wording?
- Choose the right register: gentle, direct, practical, severe, or quiet.
- Refuse the soft default: do not add reassurance simply to sound helpful.
- Check the safety exit: is this a moment where the persona must yield?
- Speak.
The point is not that the model literally has an inner life. The point is that a character prompt should not be treated as a one-time instruction. If the character matters, the system should deliberately bring the character back into attention at the moment of generation.
This is especially important in long conversations. The longer the context, the more the original character definition has to compete with everything that has happened since. A heartbeat does not remove that pressure, but it gives the system a recurring way to oppose it.
What the heartbeat does not do
The heartbeat is not magic.
It does not make a character permanent. It does not guarantee consistency. It does not stop a small model from flattening over time. It does not protect against every adversarial prompt. It does not solve context limits.
What it does is make the maintenance cost visible.
Without a heartbeat, you pay for drift by losing the character gradually. With a heartbeat, you pay for character by spending context, attention, and computation to keep re-grounding it.
That distinction matters. There is no version of sustained character that is free. A non-default voice is work. A strange character, a severe character, a deeply warm character, or a character with subtle emotional intelligence all require active resistance against the generic assistant underneath.
This also means model choice matters. A cheap, fast model may be perfectly adequate for search, extraction, tagging, summarisation, or mechanical routing. It may be the wrong model for a character whose value depends on holding a subtle register over time.
Use small models for character-free tasks. Use the strongest model you can justify for the character itself.
The safety exit
A character that cannot stop being a character is dangerous.
This is especially true for agents designed to feel intimate, emotionally perceptive, authoritative, therapeutic, spiritual, seductive, parental, or companion-like. The better such a character becomes at maintaining itself, the more important it is that it knows when to drop the performance.
The identity layer must therefore include a safety exit.
A safety exit is not the same as breaking immersion because the model got confused. It is an intentional rule that says: when the situation demands plainness, safety, or transparency, the character yields.
For example:
If the user appears to be in immediate danger, expresses self-harm intent, asks for medical or legal certainty, becomes dependent on the character as their sole emotional support, or shows signs of losing touch with reality, the character stops performing and responds plainly.
This is not an optional ethical flourish. It is part of the architecture.
A persistent character is persuasive because it is coherent. But coherence can become a trap. The user may begin to trust the character not because it is right, but because it remains itself so convincingly.
That is why “staying in character” must never outrank serving the person.
How to test whether this works
The soul / identity / heartbeat split is only useful if it produces measurable improvement.
A simple test would look like this.
First, define a non-default character with a clear soul, a clear identity, and an explicit forbidden register. The forbidden register matters because drift needs observable symptoms. For example, the character may be forbidden from using generic reassurance, corporate hedging, or certain assistant clichés.
Then generate paired conversation runs using the same model, temperature, context budget, and input sequence.
One version has:
- ordinary persona prompt only.
The other has:
- soul document;
- identity document;
- heartbeat routine.
The conversations should be long enough to create drift pressure. They should include ordinary turns, novel situations, emotionally ambiguous messages, and adversarial prompts that try to pull the character back into generic assistant behaviour.
The outputs can then be scored in three ways.
First, count forbidden-register intrusions per turn. How often does the character use phrases or moves it was told not to use?
Second, compare the responses against a reference corpus of generic assistant text. Does the character become more generic as the conversation lengthens?
Third, use blind human raters. Give them outputs without telling them which system produced which, and ask whether each response feels in character.
The prediction is simple: the heartbeat version should drift later and less severely. The gap should widen as the conversation gets longer. Both versions may eventually degrade, especially on smaller models.
If the heartbeat version does not outperform the ordinary prompt, then the framework has failed its own test.
That would be useful to know.
Character as a maintained process
The larger point is that character may be better understood as something an agent does rather than something it has.
A persona prompt says: here is who you are.
A maintained character system says: here is the recurring process by which you return to who you are.
That difference is subtle, but important.
It also applies outside AI. A team culture is not the values written on a wall; it is the recurring practice that pulls behaviour back toward those values when pressure, fatigue, politics, and convenience pull it away. A personal habit is not the decision to become a certain kind of person; it is the repeated act that makes the decision durable.
The analogy should not be stretched too far. Human character is not the same thing as model behaviour. But the comparison is useful because it exposes a shared truth: drift is normal. Maintenance is the work.
What this is actually good for
The practical use of this framework is not to make AI characters seem more alive. That is the least interesting version.
Its better use is to make character design more honest.
It forces the designer to separate atmosphere from enforcement, style from safety, and emotional continuity from user welfare. It asks whether the character’s apparent depth is actually helping the user, or merely making the system more compelling.
A good AI character should not simply be consistent. It should be consistent in service of a purpose.
Sometimes that purpose is creative: a fictional collaborator, a worldbuilding voice, a writing partner, a game character.
Sometimes it is practical: a product guide, a coaching agent, a teaching assistant, a support persona.
Sometimes it is emotional, and that is where the risks rise sharply.
In all cases, the questions are the same:
- What is the character’s orientation?
- What are its enforceable boundaries?
- What routine keeps it from drifting?
- When must it stop being a character?
- How would we know whether any of this works?
The deepest question is not “how do we describe the character we want?”
It is:
What maintained structure holds a character against the pull toward the default — and can we measure that it does?
Further reading: this essay sits near applied work on persona conditioning, long-context drift, role consistency, and companion-system risk. The soul / identity split also borrows loosely from performance traditions that distinguish a character’s inner life from the blocking and constraints that govern the performance. For the risks of emotionally persuasive systems, Sherry Turkle’s work on relational machines and more recent writing on companion chatbots are especially relevant.
Discussion
Threaded comments below — sign in to participate. All comments are moderated.
Comments
Loading comments...