Learn

Synthetic Personality

Before an AI agent can be useful to anyone, it has to be something — a coherent identity that holds up across users, sessions, and adversarial pressure. This is the research track that defines what that means and how to keep it stable.

What it is

Synthetic personality is what stays the same about an AI agent no matter who it's talking to. The same beliefs, the same tone, the same things it will and will not do, the same way it escalates when it's out of its depth. If you imagine an agent as a coworker, synthetic personality is the part that makes that coworker recognizable on Tuesday morning and Friday afternoon, with the new hire and with the demanding customer. Without it, agents drift — they get talked into different personas by different users, they refuse things on Monday they happily do on Wednesday, they sound like a different product after a model update.

Why it matters

An agent without a stable identity is impossible to trust and impossible to evaluate. You cannot tell a user how the agent will behave, because the agent will behave differently for them than it did for the last user. You cannot fix a regression, because the regression is a moving target. You cannot pass an audit, because the auditor's prompt will produce a different response than the production prompt. Synthetic personality turns 'how the agent behaves' from a probabilistic mood into a contract.

How it actually works

We encode the personality as a system charter — a versioned artifact in source control that declares the constraints, posture, communication register, refusal patterns, and escalation defaults. The charter is compiled into the prompt layer and tested with a behavioral test suite that runs on every model and prompt change. The suite includes invariant probes (same input, expected stable response shape), adversarial probes (attempts to flip register or refusal stance), and consistency probes (same question asked in different framings). Drift is detected when a candidate version's outputs deviate beyond a calibrated threshold from the baseline.

What it works with

Sits below Agent Memory and Closed-Loop Knowledge. Memory adapts the relationship to each user; knowledge improves capability across users. Neither should mutate the agent's foundation. Sits beside Skill Distillation: the skill documents an educated agent writes for an apprentice should never override the apprentice's personality charter — they describe what to do, not who to be. Reads from Workflow Evals: the behavioral test suite plugs into the same eval harness that gates other changes.

Open questions we are studying

How small can a personality charter be and still hold under adversarial pressure? How do we measure persona consistency across modalities (text, voice) and across model swaps? When the underlying model deprecates and a new one takes its place, how much of the personality is recoverable through the charter alone versus needing re-calibration? Where does a user-specific Agent Memory legitimately reshape behavior versus illegitimately drift the personality?

Prior art and adjacent work

Builds on Constitutional AI (Anthropic, 2022) and the broader RLAIF line. Engages with Sleeper Agents (Anthropic, 2024) on the question of behavior stability under fine-tuning. Adjacent to persona consistency benchmarks, character-card frameworks, and the alignment literature on stable refusal patterns.