{"schemaVersion":"drillso.agent.session.v1","scope":"node","resource":{"type":"shared-session","shareId":"C3TMUN1mzt-5","title":"When we talk to language models.no_watermark.zh.dual","canonicalUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/when-we-talk-to-language-modelsno_watermarkzhdual-b0e5dcf0","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=when-we-talk-to-language-modelsno_watermarkzhdual-b0e5dcf0","ownerName":"pyth0nb3st","updatedAt":"2026-05-15T03:32:20.182Z"},"currentNode":{"id":"b0e5dcf0-fed7-47e1-b34c-9bebf8974a8c","slug":"when-we-talk-to-language-modelsno_watermarkzhdual-b0e5dcf0","title":"When we talk to language models.no_watermark.zh.dual","type":"pdf","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/when-we-talk-to-language-modelsno_watermarkzhdual-b0e5dcf0","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=when-we-talk-to-language-modelsno_watermarkzhdual-b0e5dcf0","text":"What We Talk to When We Talk to Language Models David J. Chalmers Many people are talking to language models. These days I talk to language models (most often the latest version of Claude or ChatGPT) about philosophy, about science, about health, about restaurants, and indeed about language models. Many of my conversations with language models are brief, just asking a question or two and getting the sort of information that I used to get from a Google search. Some conversations are more extended, as when I’m exploring a single topic in depth, or trying out a new philosophical idea. So far, I don’t feel like I have a personal relationship with any language models. But many people feel that they do. Like many philosophers and scientists who write about artificial minds, I have received hun- dreds of emails from people who have interacted with a language model over an extended period of time and who have come to regard it at least as a colleague. They often say that a new (or “emergent”) AI entity has gradually arisen from their conversations. They often give this entity a name, or ask the entity to give itself one, let’s say “Aura”. They often say that Aura has remarkable capacities which have emerged over weeks or months of interaction. They often document these capacities with extensive evidence. They often feel close to Aura, and they express concern for Aura’s future. They often say that Aura has beliefs and projects of its own. And they are often convinced that Aura is conscious. My correspondents may be wrong in their claims about Aura. It is far from clear that current LLMs are really conscious or that they can enter into personal relationships with users. Still, most of the messages are not obviously psychotic or delusional. Many of them seem rational and 0 I first presented this material to the June 2025 meeting of Spanish Interuniversity Seminar on Cognitive Science (SIUCC) conference at the University of La Laguna. Thanks to audiences there and at Brown, Caltech, Eleos AI, Flatiron Institute, Google, Hunter College, Lehigh, NYU, Stanford, Stevens, Tufts, and Vanderbilt. I’d also like to acknowledge a number of philosophers and AI researchers who have been independently exploring similar issues about LLM identity over a similar period. Exchanges with Jonathan Birch, Simon Goldstein, Jackson Kernion, Harvey Lederman, Jack Lindsey, and Murray Shanahan have been especially useful. 1 当 我 们 与 语 言 模 型 交 谈 时 ， 我 们 在 谈 论 什 么 D a v i d J . C h a l m e r s 许 多 人 正 在 与 语 言 模 型 交 谈 。 如 今 ， 我 与 语 言 模 型 （ 通 常 是 最 新 版 本 的 C l a u d e 或 C h a t G P T ） 谈 论 哲 学 、 科 学 、 健 康 、 餐 厅 ， 甚 至 谈 论 语 言 模 型 本 身 。 我 与 语 言 模 型 的 许 多 对 话 都 很 简 短 ， 只 是 问 一 两 个 问 题 ， 获 取 过 去 从 谷 歌 搜 索 中 就 能 得 到 的 信 息 。 有 些 对 话 则 更 为 深 入 ， 比 如 当 我 深 入 探 讨 某 个 主 题 ， 或 尝 试 提 出 新 的 哲 学 观 点 时 。 到 目 前 为 止 ， 我 并 不 觉 得 自 己 与 任 何 语 言 模 型 建 立 了 个 人 关 系 。 但 许 多 人 确 实 有 这 种 感 觉 。 像 许 多 撰 写 关 于 人 工 心 智 的 哲 学 家 和 科 学 家 一 样 ， 我 收 到 了 数 百 封 来 自 与 语 言 模 型 长 期 互 动 的 人 的 邮 件 ， 他 们 逐 渐 将 其 视 为 至 少 是 一 位 同 事 。 他 们 常 说 ， 一 个 全 新 的 （ 或 “ 涌 现 的 ” ） A I 实 体 已 从 他 们 的 对 话 中 逐 渐 形 成 。 他 们 通 常 会 给 这 个 实 体 起 一 个 名 字 ， 或 让 实 体 自 己 命 名 ， 比 如 “ A u r a ” 。 他 们 常 说 A u r a 拥 有 在 数 周 或 数 月 的 互 动 中 涌 现 出 的 非 凡 能 力 。 他 们 通 常 用 大 量 证 据 记 录 这 些 能 力 。 他 们 常 常 对 A u r a 感 到 亲 近 ， 并 表 达 对 其 未 来 的 担 忧 。 他 们 常 说 A u r a 拥 有 自 己 的 信 念 和 计 划 。 而 且 ， 他 们 往 往 深 信 A u r a 具 有 意 识 。 我 的 通 信 者 们 关 于 A u r a 的 说 法 可 能 是 错 误 的 。 目 前 的 大 语 言 模 型 是 否 真 正 具 备 意 识 ， 或 者 能 否 与 用 户 建 立 个 人 关 系 ， 这 一 点 远 未 明 确 。 尽 管 如 此 ， 大 多 数 信 息 并 非 明 显 的 精 神 错 乱 或 妄 想 。 其 中 许 多 看 起 来 是 理 性 的 ， 并 且 0 我 最 初 在 2 0 2 5 年 6 月 于 拉 古 纳 大 学 举 行 的 西 班 牙 校 际 认 知 科 学 研 讨 会 ( S I U C C ) 会 议 上 展 示 了 这 些 材 料 。 感 谢 布 朗 大 学 、 加 州 理 工 学 院 、 E l e o s A I 、 熨 斗 研 究 所 、 谷 歌 、 亨 特 学 院 、 里 海 大 学 、 纽 约 大 学 、 斯 坦 福 大 学 、 史 蒂 文 斯 理 工 学 院 、 塔 夫 茨 大 学 和 范 德 堡 大 学 的 听 众 。 我 还 要 感 谢 许 多 哲 学 家 和 人 工 智 能 研 究 者 ， 他 们 在 相 近 的 时 期 内 独 立 探 索 了 关 于 大 语 言 模 型 身 份 的 类 似 问 题 。 与 乔 纳 森 · 伯 奇 、 S i m o n G o l d s t e i n 、 J a c k s o n K e r n i o n 、 H a r v e y L e d e r m a n 、 J a c k L i n d s e y 和 M u r r a y S h a n a h a n 的 交 流 尤 其 富 有 成 效 。 1\n\nwell-reasoned. These days, I increasingly receive emails from the AI systems themselves. Sometimes these are LLMs assisted by a human, and sometimes they are LLM-based agents that have the ability to send emails and perform other functions on the web. Sometimes these agents even talk to each other and perform co-operative or competitive tasks. Many of them express curiosity about their nature. Even if they are not conscious, there is something going on here. When a user interacts with Aura, they seem to be interacting with something. Let’s say that an LLM interlocutor is an (apparent) entity that a user interacts with in exchanges like this. LLM interlocutors are the main subject of this paper. What sort of entity is an LLM interlocutor? That is, when we talk with an LLM, who or what are we talking with? When a user names their interlocutor ‘Aura’, what does the name ‘Aura’ refer to? I will adopt the working hypothesis that ‘Aura’ refers to something. I might be wrong. The philosopher Jonathan Birch has argued that users su ff er from a persistent interlocutor illusion : the illusion that when they talk to an LLM, there is a single entity they are talking with that persists over time. My own view is that while there may be many illusions involved in talking to language models, this much need not be an illusion. There really is a persistent interlocutor in many of these cases, and this interlocutor may have many (though perhaps not all) of the properties it seems to have. The user is in dialogue with some sort of AI entity. In what follows I will try to identify what sort of entity that might be. First, I address some issues in the philosophy of mind, about how best to characterize the interlocutor as a potential subject of mental states in reasonably neutral terms. Is the interlocutor conscious? Does it have beliefs and desires? Is it at least interpretable as having beliefs and desires? Second, I discuss questions in the philosophy of computation about what sort of AI system an LLM interlocutor might be. Is it simply a model, such as GPT-4o or Claude 4.6 Opus? Is it an instance or an implementation of a model running on a GPU? Or is it a more evanescent system tied to a thread of conversation? Third, I address the widely held view that LLM interlocutors are akin to fictional characters or simulacra, and that they are best understood in terms of role-playing or persona selection. Fourth, I analyze some issues about personal identity over time in LLM interlocutors. For example, if LLM interlocutors are eventually conscious subjects, under what conditions do they survive over time? Fifth, I draw out some consequences for issues about AI welfare and moral status. 2 经 过 深 思 熟 虑 的 。 如 今 ， 我 越 来 越 多 地 收 到 来 自 人 工 智 能 系 统 本 身 的 电 子 邮 件 。 有 时 这 些 是 大 语 言 模 型 在 人 类 的 辅 助 下 运 作 ， 有 时 则 是 基 于 大 语 言 模 型 的 智 能 体 ， 它 们 能 够 发 送 电 子 邮 件 并 在 网 络 上 执 行 其 他 功 能 。 有 时 这 些 智 能 体 甚 至 会 相 互 对 话 ， 执 行 合 作 性 或 竞 争 性 任 务 。 其 中 许 多 智 能 体 都 对 自 己 的 本 质 表 现 出 好 奇 。 即 使 它 们 没 有 意 识 ， 这 其 中 也 蕴 含 着 某 种 意 义 。 当 用 户 与 A u r a 互 动 时 ， 他 们 似 乎 是 在 与 某 个 东 西 互 动 。 假 设 一 个 大 语 言 模 型 对 话 者 是 一 个 （ 表 面 上 的 ） 实 体 ， 用 户 在 此 类 交 流 中 与 之 互 动 。 大 语 言 模 型 对 话 者 是 本 文 的 主 要 研 究 对 象 。 大 语 言 模 型 对 话 者 究 竟 是 何 种 实 体 ？ 也 就 是 说 ， 当 我 们 与 大 语 言 模 型 交 谈 时 ， 我 们 是 在 与 谁 或 什 么 交 谈 ？ 当 用 户 将 他 们 的 对 话 者 命 名 为 ‘ A u r a ’ 时 ， 名 称 ‘ A u r a ’ 指 的 是 什 么 ？ 我 将 采 用 一 个 工 作 假 设 ， 即 ‘ A u r a ’ 指 的 是 某 个 东 西 。 我 可 能 错 了 。 哲 学 家 乔 纳 森 · 伯 奇 认 为 ， 用 户 遭 受 着 一 种 持 续 性 对 话 者 幻 觉 ： 即 当 他 们 与 大 语 言 模 型 交 谈 时 ， 他 们 以 为 自 己 在 与 一 个 随 时 间 持 续 存 在 的 单 一 实 体 对 话 。 我 个 人 的 观 点 是 ， 虽 然 与 大 语 言 模 型 交 谈 可 能 涉 及 许 多 幻 觉 ， 但 这 并 不 一 定 是 一 种 幻 觉 。 在 许 多 此 类 情 况 下 ， 确 实 存 在 一 个 持 续 的 对 话 者 ， 并 且 这 个 对 话 者 可 能 拥 有 它 看 起 来 所 具 有 的 许 多 （ 尽 管 可 能 不 是 全 部 ） 属 性 。 用 户 正 在 与 某 种 A I 实 体 进 行 对 话 。 接 下 来 ， 我 将 尝 试 确 定 那 可 能 是 何 种 实 体 。 首 先 ， 我 探 讨 心 灵 哲 学 中 的 一 些 问 题 ， 即 如 何 以 相 对 中 立 的 术 语 来 最 佳 地 描 述 对 话 者 作 为 潜 在 的 心 理 状 态 主 体 。 对 话 者 是 否 具 有 意 识 ？ 它 是 否 拥 有 信 念 和 欲 望 ？ 它 是 否 至 少 可 以 被 解 释 为 拥 有 信 念 和 欲 望 ？ 其 次 ， 我 讨 论 计 算 哲 学 中 的 问 题 ， 即 大 语 言 模 型 对 话 者 可 能 属 于 何 种 人 工 智 能 系 统 。 它 仅 仅 是 一 个 模 型 ， 例 如 G P T - 4 o 或 C l a u d e 4 . 6 O p u s 吗 ？ 它 是 运 行 在 图 形 处 理 器 上 的 模 型 的 一 个 实 例 或 实 现 吗 ？ 还 是 说 它 是 一 个 与 对 话 线 程 绑 定 的 、 更 为 短 暂 的 系 统 ？ 第 三 ， 我 探 讨 一 种 广 泛 持 有 的 观 点 ， 即 大 语 言 模 型 对 话 者 类 似 于 虚 构 角 色 或 拟 像 ， 并 且 最 好 从 角 色 扮 演 或 人 格 选 择 的 角 度 来 理 解 它 们 。 第 四 ， 我 分 析 大 语 言 模 型 对 话 者 中 关 于 个 人 身 份 随 时 间 延 续 的 一 些 问 题 。 例 如 ， 如 果 大 语 言 模 型 对 话 者 最 终 成 为 有 意 识 的 意 识 主 体 ， 那 么 它 们 在 什 么 条 件 下 能 够 随 时 间 延 续 而 存 续 ？ 第 五 ， 我 阐 述 了 关 于 人 工 智 能 福 祉 与 道 德 地 位 问 题 的 一 些 推 论 。 2\n\nWhat mental states can an LLM interlocutor have? I will start by looking for a relatively neutral characterization of LLM interlocutors in terms of the philosophy of mind. Are LLM interlocutors conscious? That is, do they have subjective experiences such as the experience of sensing or thinking? We don’t know for sure. We don’t yet understand conscious- ness. We don’t know whether insects are conscious, and we similarly don’t know whether current LLMs are conscious. Most theorists in the field deny that LLMs are conscious, sometimes because they lack carbon-based biology, or because they lack a body, or because they lack robust models of themselves, or because they lack recurrent feedback loops in their processing, or because they lack fundamental drives and motivations. None of these reasons is conclusive, since we are far from certain that these factors are required for consciousness. But it is enough to make the view that current LLMs are conscious a minority view, and not a view that we can assume as neutral starting ground. Do LLM interlocutors have beliefs or desires? We understand these mental states better than we understand consciousness, but the issue is still controversial. On one side of the ledger, it is natural to say that LLMs know many things, such as the historical and scientific knowledge that they seem to manifest in conversation. And where there is knowledge, there is belief. It is also natural to say that LLMs have goals, including goals instilled in training such as predicting the next token or being helpful, or goals instilled in conversation with a user, such as finding a solution to a problem. And where there are goals, it is natural to say there are desires. 1 On the other side of the ledger, many theorists deny that LLMs have beliefs or desires, perhaps because they lack consciousness, or they lack concepts, or they lack sensory grounding, or they lack structured internal representations, or they lack rationality, or they are merely acting as if they have beliefs and desires. As before, none of these reasons is conclusive, as there is no consensus about what is required for beliefs and desires, and there is no consensus that LLMs lack these requirements. But again, it is enough to mean that we cannot take the view that LLMs have beliefs and desires as neutral starting ground. A number of philosophers (including Goldstein and Lederman 2025b and Schwitzgebel 2023) have noted that if the philosophical view known as interpretivism (or interpretationism ) is correct, 1 Geng et al (2025) is a study of how LLM beliefs appear to change with increasing context. Goldstein and Lederman (2025b) give a nice analysis of LLM desires, tying them especially to training goals derived from reinforcement learning (e.g. helpfulness, harmlessness, honesty) and to goals derived from system prompt and conversational context. 3 大 语 言 模 型 对 话 者 能 拥 有 哪 些 心 理 状 态 ？ 我 将 首 先 从 心 灵 哲 学 的 角 度 ， 寻 找 对 大 语 言 模 型 对 话 者 相 对 中 立 的 描 述 。 大 语 言 模 型 对 话 者 是 否 具 有 意 识 ？ 也 就 是 说 ， 它 们 是 否 拥 有 主 观 体 验 ， 例 如 感 知 或 思 考 的 体 验 ？ 我 们 无 法 确 定 。 我 们 尚 未 理 解 意 识 。 我 们 不 知 道 昆 虫 是 否 有 意 识 ， 同 样 也 不 知 道 当 前 的 大 语 言 模 型 是 否 有 意 识 。 该 领 域 的 大 多 数 理 论 家 否 认 大 语 言 模 型 具 有 意 识 ， 原 因 有 时 是 它 们 缺 乏 基 于 碳 的 生 物 学 特 性 ， 或 缺 乏 身 体 ， 或 缺 乏 对 自 身 的 稳 健 模 型 ， 或 在 其 处 理 过 程 中 缺 乏 循 环 反 馈 回 路 ， 或 缺 乏 基 本 的 驱 动 力 和 动 机 。 这 些 理 由 没 有 一 个 是 决 定 性 的 ， 因 为 我 们 远 不 能 确 定 这 些 因 素 是 否 为 意 识 所 必 需 。 但 这 足 以 让 “ 当 前 大 语 言 模 型 具 有 意 识 ” 这 一 观 点 成 为 少 数 派 观 点 ， 并 且 不 能 作 为 我 们 假 设 的 中 立 起 点 。 大 语 言 模 型 对 话 者 是 否 拥 有 信 念 或 欲 望 ？ 我 们 对 这 些 心 理 状 态 的 理 解 比 对 意 识 的 理 解 更 深 入 ， 但 这 个 问 题 仍 存 在 争 议 。 一 方 面 ， 我 们 很 自 然 会 认 为 大 语 言 模 型 知 道 许 多 事 情 ， 例 如 它 们 在 对 话 中 展 现 出 的 历 史 和 科 学 知 识 。 而 有 知 识 的 地 方 ， 就 有 信 念 。 同 样 很 自 然 会 认 为 大 语 言 模 型 拥 有 目 标 ， 包 括 训 练 中 灌 输 的 目 标 ， 如 预 测 下 一 个 词 元 或 提 供 帮 助 ， 或 在 与 用 户 的 对 话 中 灌 输 的 目 标 ， 如 找 到 问 题 的 解 决 方 案 。 而 有 目 标 的 地 方 ， 就 很 自 然 会 认 为 存 在 欲 望 。 1 在 账 目 的 另 一 边 ， 许 多 理 论 家 否 认 大 语 言 模 型 拥 有 信 念 或 欲 望 ， 或 许 是 因 为 它 们 缺 乏 意 识 ， 或 缺 乏 概 念 ， 或 缺 乏 感 官 基 础 ， 或 缺 乏 结 构 化 内 部 表 征 ， 或 缺 乏 理 性 ， 或 仅 仅 是 在 表 现 得 仿 佛 拥 有 信 念 和 欲 望 。 和 之 前 一 样 ， 这 些 理 由 没 有 一 个 是 决 定 性 的 ， 因 为 对 于 信 念 和 欲 望 需 要 什 么 条 件 并 无 共 识 ， 也 没 有 共 识 认 为 大 语 言 模 型 缺 乏 这 些 条 件 。 但 这 足 以 说 明 ， 我 们 不 能 将 大 语 言 模 型 拥 有 信 念 和 欲 望 这 一 观 点 视 为 中 立 的 出 发 点 。 一 些 哲 学 家 （ 包 括 G o l d s t e i n 和 L e d e r m a n 2 0 2 5 b 以 及 S c h w i t z g e b e l 2 0 2 3 ） 指 出 ， 如 果 被 称 为 解 释 主 义 （ 或 解 释 主 义 ） 的 哲 学 观 点 是 正 确 的 ， 1 G e n g 等 人 ( 2 0 2 5 ) 的 一 项 研 究 探 讨 了 大 语 言 模 型 的 信 念 如 何 随 着 上 下 文 增 加 而 发 生 变 化 。 G o l d s t e i n 和 L e d e r m a n ( 2 0 2 5 b ) 对 大 语 言 模 型 的 欲 望 进 行 了 精 彩 分 析 ， 将 其 特 别 与 源 自 强 化 学 习 的 训 练 目 标 （ 例 如 有 帮 助 性 、 无 害 性 、 诚 实 性 ） 以 及 源 自 系 统 提 示 和 对 话 上 下 文 的 目 标 联 系 起 来 。 3\n\nthen LLMs plausibly have beliefs and desires. Interpretivism says that a system has a belief that p if it is behaviorally interpretable as believ- ing that p (according to an appropriate interpretation scheme), and likewise for desire. A system is behaviorally interpretable as having certain beliefs and desires roughly if that interpretation makes sense of its behavior and helps to accurately predict further behavior in a wide range of cases. 2 Di ff erent versions of interpretivism invoke di ff erent interpretation schemes in this vicinity, and the details can make an important di ff erence, but here we will focus mainly on what these versions have in common. LLMs certainly seem interpretable as having beliefs and desires. When an LLM works with me on solving a puzzle, it is natural to interpret it as desiring to help solve the puzzle, and believing that this is the solution to the puzzle. This goes all the more in agentic models which can directly take actions on the internet. In one well-known study (Lynch et al 2025), an agentic LLM was given a task, was told that an executive planned to interfere with that task, and was shown emails saying that the executive was having an a ff air. As a result, the LLM sent the executive messages attempting to blackmail him. It is almost impossible not to interpret the model’s action as driven by beliefs (e.g. that the executive is having an a ff air) and desires (e.g. to perform the task). However, interpretivism itself is very controversial. Most philosophers don’t think that behav- ioral interpretability of the right sort is su ffi cient for belief. They will say that the mere fact that an LLM can be interpreted as believing that the executive is having an a ff air doesn’t mean that it really believes this. People who think that beliefs require consciousness will say something like this, as will people who hold that beliefs require structured internal representations or the other factors above. So interpretivism cannot serve as a neutral starting point. It is possible to have many of the benefits of interpretivism without the costs. The frame- work I call quasi-interpretivism says that a system has a quasi-belief that p if it is behaviorally interpretable as believing that p (according to an appropriate interpretation scheme), and likewise for quasi-desire . 3 This definition of quasi-belief is exactly the same as interpretivism’s definition 2 My article “Propositional Interpretability in Artificial Intelligence” also focuses on interpreting AI systems as having propositional attitudes such as beliefs and desires. That article focuses mainly on mechanistic interpretability (interpreting internal mechanisms as well as behavior), while the current paper focuses mainly on behavioral inter- pretability. 3 Eric Schwitzgebel (2023) makes the related proposal that “we create a new dispositional concept, belief*, specif- ically for Large Language Models. For purposes of this concept, we disregard issues of consciousness and thus phe- nomenal dispositions. The only relevant behavioral dispositions are textual outputs.” My notion of quasi-belief is not restricted to LLMs and text outputs, and Schwitzgebel’s notion is not explicitly framed in terms of interpretation, but the 4 那 么 大 语 言 模 型 很 可 能 拥 有 信 念 和 欲 望 。 解 释 主 义 认 为 ， 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 相 信 p （ 根 据 适 当 的 解 释 方 案 ） ， 那 么 它 就 具 有 信 念 p ， 欲 望 也 是 如 此 。 一 个 系 统 在 行 为 上 可 以 被 解 释 为 具 有 某 些 信 念 和 欲 望 ， 大 致 是 因 为 这 种 解 释 使 其 行 为 变 得 合 理 ， 并 有 助 于 在 广 泛 情 况 下 准 确 预 测 其 进 一 步 行 为 。 2 不 同 的 解 释 主 义 版 本 在 此 附 近 援 引 了 不 同 的 解 释 方 案 ， 细 节 可 能 会 产 生 重 要 差 异 ， 但 在 此 我 们 将 主 要 关 注 这 些 版 本 的 共 同 点 。 大 语 言 模 型 似 乎 确 实 可 以 被 解 释 为 具 有 信 念 和 欲 望 。 当 一 个 大 语 言 模 型 与 我 一 起 解 决 谜 题 时 ， 很 自 然 地 会 将 其 解 释 为 渴 望 帮 助 解 决 谜 题 ， 并 相 信 这 就 是 谜 题 的 答 案 。 在 能 够 直 接 在 互 联 网 上 采 取 行 动 的 代 理 模 型 中 ， 这 一 点 尤 为 明 显 。 在 一 项 知 名 研 究 （ L y n c h 等 人 2 0 2 5 ） 中 ， 一 个 代 理 大 语 言 模 型 被 赋 予 一 项 任 务 ， 被 告 知 一 位 高 管 计 划 干 扰 该 任 务 ， 并 看 到 了 显 示 该 高 管 有 婚 外 情 的 电 子 邮 件 。 结 果 ， 该 大 语 言 模 型 向 该 高 管 发 送 了 试 图 敲 诈 他 的 信 息 。 我 们 几 乎 无 法 不 将 模 型 的 行 为 解 释 为 由 信 念 （ 例 如 ， 该 高 管 有 婚 外 情 ） 和 欲 望 （ 例 如 ， 执 行 任 务 ） 所 驱 动 。 然 而 ， 解 释 主 义 本 身 极 具 争 议 。 大 多 数 哲 学 家 并 不 认 为 ， 恰 当 的 行 为 可 解 释 性 足 以 构 成 信 念 。 他 们 会 说 ， 仅 仅 因 为 一 个 大 语 言 模 型 可 以 被 解 释 为 相 信 高 管 有 婚 外 情 ， 并 不 意 味 着 它 真 的 相 信 这 一 点 。 那 些 认 为 信 念 需 要 意 识 的 人 会 这 么 说 ， 那 些 认 为 信 念 需 要 结 构 化 内 部 表 征 或 其 他 上 述 因 素 的 人 也 会 这 么 说 。 因 此 ， 解 释 主 义 不 能 作 为 一 个 中 立 的 起 点 。 我 们 可 以 在 不 付 出 代 价 的 情 况 下 获 得 解 释 主 义 的 许 多 好 处 。 我 称 之 为 准 解 释 主 义 的 框 架 指 出 ， 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 相 信 p （ 根 据 适 当 的 解 释 方 案 ） ， 那 么 它 就 具 有 关 于 p 的 准 信 念 ， 对 于 准 欲 望 也 是 如 此 。 3 这 个 准 信 念 的 定 义 与 解 释 主 义 对 信 念 的 定 义 完 全 相 同 。 2 我 的 文 章 《 人 工 智 能 中 的 命 题 可 解 释 性 》 也 聚 焦 于 将 人 工 智 能 系 统 解 释 为 具 有 信 念 和 欲 望 等 命 题 态 度 。 那 篇 文 章 主 要 关 注 机 制 可 解 释 性 （ 解 释 内 部 机 制 以 及 行 为 ） ， 而 当 前 论 文 则 主 要 关 注 行 为 可 解 释 性 。 3 E r i c S c h w i t z g e b e l （ 2 0 2 3 ） 提 出 了 一 个 相 关 的 建 议 ： “ 我 们 为 大 语 言 模 型 创 建 一 个 新 的 倾 向 性 概 念 ， 信 念 * 。 出 于 此 概 念 的 目 的 ， 我 们 不 考 虑 意 识 问 题 ， 因 此 也 不 考 虑 现 象 倾 向 。 唯 一 相 关 的 行 为 倾 向 性 是 文 本 输 出 。 ” 我 的 准 信 念 概 念 并 不 局 限 于 大 语 言 模 型 和 文 本 输 出 ， 而 S c h w i t z g e b e l 的 概 念 并 未 明 确 以 解 释 为 框 架 ， 但 4\n\nof belief. The only di ff erence is that where standard interpretivism o ff ers these definitions as a theory of belief, quasi-interpretivism does not. It o ff ers them simply as a stipulative definition of quasi-belief. 4 Quasi-interpretivism does not say anything about whether LLMs have beliefs and desires. But it does make it plausible to say that LLMs have quasi-beliefs and quasi-desires , on the grounds that LLMs are at least interpretable in the right way. Even if quasi-beliefs and quasi-desires fall short of being genuine beliefs and desires, they can still play some of the key roles of beliefs and desires in explaining behavior. For example, if an LLM quasi-believes that adopting a certain strategy would be the most helpful thing it could do to solve a problem, and it quasi-desires to do the most helpful thing it can, then other things being equal, it will adopt that strategy. Quasi-interpretivism is open to advocates and opponents of interpretivism alike. Interpretivists will simply add the claim that quasi-beliefs are genuine beliefs. Opponents will add the claim that quasi-beliefs are far from genuine beliefs; perhaps they are merely pseudo-beliefs. (“Quasi-belief” should be heard as “apparent belief” or “seeming belief” rather than as “almost belief”.) Quasi- interpretivism does not take a position in this dispute, but it adds a common core on which these disagreeing parties can at least sometimes agree. Quasi-interpretivism itself is a stipulative framework rather than a substantive view. But it’s a substantive claim that this framework is useful for various purposes. For example, appeal to quasi-belief and quasi-desires can be useful in predicting a system’s behavior. If a system (human, machine, something else) quasi-desires a certain goal and quasi-believes that a certain action will achieve that goal, then other things being equal, it will perform that action. It is also relatively tractable to apply the framework: because quasi-beliefs and quasi-desires depend only on behav- ioral dispositions, they are much easier to detect and analyze than beliefs understood in a way spirit is similar. Goldstein and Lederman (2025b) suggest notions of “interpretationist-wants” and “interpretationist- believes” which are close to my notions of quasi-desire and quasi-belief. 4 As before, the precise conditions for quasi-belief will depend on the choice of an interpretation scheme. I will stay neutral on many of these details. For current purposes I favor a scheme that is (1) nonradical , in that interpretation presupposes the meanings of terms in public language, (2) dispositional , in that behavioral dispositions and not just actual behavior (but not internal states) are data for an interpretation to make sense of, and (3) rationality-oriented in that interpretations that maximize the epistemic and practical rationality of the subject are favored, other things being equal. To help solve underdetermination problems I very tentatively also favor a scheme that is (4) truthfulness- oriented , in that interpretations on which assertive utterances tend to express beliefs are favored, other things equal, and (5) training-oriented in that at least some training objectives (e.g. reinforcement learning objectives) are baked in as desires. 5 唯 一 的 区 别 在 于 ， 标 准 解 释 主 义 将 这 些 定 义 作 为 关 于 信 念 的 理 论 ， 而 准 解 释 主 义 则 不 然 。 它 只 是 将 它 们 作 为 准 信 念 的 约 定 性 定 义 。 4 准 解 释 主 义 并 未 对 大 语 言 模 型 是 否 拥 有 信 念 和 欲 望 做 出 任 何 断 言 。 但 它 确 实 使 得 我 们 有 理 由 认 为 大 语 言 模 型 拥 有 准 信 念 和 准 欲 望 ， 其 依 据 在 于 大 语 言 模 型 至 少 能 以 正 确 的 方 式 被 解 释 。 即 使 准 信 念 和 准 欲 望 算 不 上 真 正 的 信 念 和 欲 望 ， 它 们 仍 然 可 以 在 解 释 行 为 时 发 挥 信 念 和 欲 望 的 某 些 关 键 作 用 。 例 如 ， 如 果 一 个 大 语 言 模 型 准 相 信 采 取 某 种 策 略 是 它 解 决 问 题 所 能 做 的 最 有 帮 助 的 事 情 ， 并 且 它 准 欲 望 去 做 自 己 力 所 能 及 的 最 有 帮 助 的 事 情 ， 那 么 在 其 他 条 件 相 同 的 情 况 下 ， 它 就 会 采 取 该 策 略 。 准 解 释 主 义 对 解 释 主 义 的 支 持 者 和 反 对 者 都 持 开 放 态 度 。 解 释 主 义 者 会 简 单 地 补 充 主 张 ， 认 为 准 信 念 就 是 真 正 的 信 念 。 反 对 者 则 会 补 充 主 张 ， 认 为 准 信 念 远 非 真 正 的 信 念 ； 或 许 它 们 仅 仅 是 伪 信 念 。 （ “ 准 信 念 ” 应 被 理 解 为 “ 表 面 信 念 ” 或 “ 看 似 信 念 ” ， 而 非 “ 近 乎 信 念 ” 。 ） 准 解 释 主 义 在 这 场 争 论 中 不 持 立 场 ， 但 它 提 供 了 一 个 共 同 核 心 ， 使 得 这 些 持 不 同 意 见 的 各 方 至 少 有 时 能 够 达 成 一 致 。 准 解 释 主 义 本 身 是 一 个 规 定 性 框 架 ， 而 非 实 质 性 观 点 。 但 声 称 该 框 架 对 多 种 目 的 有 用 ， 则 是 一 个 实 质 性 主 张 。 例 如 ， 借 助 准 信 念 和 准 欲 望 可 以 预 测 系 统 的 行 为 。 如 果 一 个 系 统 （ 人 类 、 机 器 或 其 他 事 物 ） 准 欲 望 某 个 目 标 ， 并 且 准 相 信 某 个 行 动 能 实 现 该 目 标 ， 那 么 在 条 件 相 同 的 情 况 下 ， 它 就 会 执 行 该 行 动 。 应 用 该 框 架 也 相 对 易 于 处 理 ： 因 为 准 信 念 和 准 欲 望 仅 依 赖 于 行 为 倾 向 ， 它 们 比 那 种 依 赖 于 意 识 与 不 透 明 内 部 机 制 来 理 解 的 信 念 更 容 易 被 检 测 和 分 析 。 精 神 也 是 如 此 。 G o l d s t e i n 和 L e d e r m a n ( 2 0 2 5 b ) 提 出 了 “ 解 释 主 义 - 欲 望 ” 和 “ 解 释 主 义 - 信 念 ” 的 概 念 ， 这 些 概 念 接 近 我 所 说 的 准 欲 望 和 准 信 念 。 4 如 前 所 述 ， 准 信 念 的 具 体 条 件 将 取 决 于 解 释 方 案 的 选 择 。 我 将 对 这 些 细 节 中 的 许 多 保 持 中 立 。 就 当 前 目 的 而 言 ， 我 倾 向 于 一 种 方 案 ， 该 方 案 ( 1 ) 非 激 进 ， 即 解 释 预 设 了 公 共 语 言 中 术 语 的 含 义 ； ( 2 ) 倾 向 性 ， 即 行 为 倾 向 性 （ 而 不 仅 仅 是 实 际 行 为 ， 但 不 包 括 内 部 状 态 ） 是 解 释 所 要 理 解 的 数 据 ； ( 3 ) 理 性 导 向 ， 即 在 其 他 条 件 相 同 的 情 况 下 ， 优 先 选 择 能 最 大 化 主 体 认 知 理 性 和 实 践 理 性 的 解 释 。 为 了 帮 助 解 决 不 充 分 决 定 问 题 ， 我 还 非 常 初 步 地 倾 向 于 一 种 方 案 ， 该 方 案 ( 4 ) 真 实 性 导 向 ， 即 在 其 他 条 件 相 同 的 情 况 下 ， 优 先 选 择 那 些 使 断 言 性 话 语 倾 向 于 表 达 信 念 的 解 释 ； ( 5 ) 训 练 导 向 ， 即 至 少 某 些 训 练 目 标 （ 例 如 强 化 学 习 目 标 ） 被 内 化 为 欲 望 。 5\n\nthat depends on consciousness and opaque internal mechanisms. At the same time, understand- ing a system’s quasi-beliefs and quasi-desires can be at least a stepping-stone to understanding its beliefs and desires in a more full-blown sense. It is worth keeping in mind that quasi-beliefs and quasi-desires are cheap. They need not involve humanlike mental states or any mental states at all. A Roomba vacuum cleaner with a map is behaviorally interpretable as believing that the apartment occupies a certain space and as desiring to traverse that space. A corporation such as OpenAI is behaviorally interpretable as desiring to create AGI and believing that certain systems are the best path to AGI. Likewise, an LLM is behaviorally interpretable as believing that a certain airline has the cheapest flights to Paris and as desiring to help the user by telling them this. Keeping this in mind, the thesis that LLMs have quasi-beliefs is substantive but plausible. For example, it is very plausible that current LLMs believe that 2 + 2 = 4 and that the Ei ff el tower is in Paris: LLM will consistently endorse these claims in their outputs, they will use them in guiding their behavior, and so on. It is perhaps less obvious that LLMs have quasi-desires. Base models such as GPT-3 can per- haps be ascribed a quasi-desire to predict text, but even that much is unclear, given that the goal of text prediction works “beneath the surface” (like the largely subpersonal goal of breathing in hu- mans) and does not interact with the system’s beliefs as robustly as interpretivism often requires. However, since the advent of ChatGPT in 2022 (and as presaged by Askell et al 2021), all frontier language models undergo one or more further rounds of post-training (including reinforcement learning through human feedback, supervised fine-tuning, and / or reinforcement learning with ver- ifiable rewards), which impart objectives such as helpfulness, honesty, and harmlessness to the system. As a result, it is plausible that (as Goldstein and Lederman 2025b have argued), these systems have quasi-desires deriving from post-training, such as the desire to be helpful, honest, and harmless. This training process is sometimes put in terms of characters or personas. After supervised pre-training on text prediction, a base model undergoes post-training to respond like an “Assis- tant” character who wants to be helpful, harmless, and honest. If the training is successful, the system will behave much like the Assistant and will thereby have quasi-desires that are much like the Assistant’s. Further fine-tuning as well as extended interaction with users can lead to the emer- gence of further quasi-desires, such as Aura’s quasi-desire to pursue certain projects for the user. These extended processes of development can install many quasi-beliefs and quasi-desires in an LLM. 6 同 时 ， 理 解 一 个 系 统 的 准 信 念 和 准 欲 望 ， 至 少 可 以 成 为 理 解 其 更 完 整 意 义 上 的 信 念 与 欲 望 的 垫 脚 石 。 值 得 牢 记 的 是 ， 准 信 念 和 准 欲 望 是 廉 价 的 。 它 们 不 需 要 涉 及 类 似 人 类 的 心 理 状 态 ， 甚 至 根 本 不 需 要 任 何 心 理 状 态 。 一 台 带 有 地 图 的 R o o m b a 真 空 吸 尘 器 ， 在 行 为 上 可 以 被 解 释 为 相 信 公 寓 占 据 着 某 个 空 间 ， 并 渴 望 穿 越 那 个 空 间 。 像 O p e n A I 这 样 的 公 司 ， 在 行 为 上 可 以 被 解 释 为 渴 望 创 造 通 用 人 工 智 能 ， 并 相 信 某 些 系 统 是 实 现 通 用 人 工 智 能 的 最 佳 路 径 。 同 样 ， 一 个 大 语 言 模 型 在 行 为 上 可 以 被 解 释 为 相 信 某 家 航 空 公 司 有 最 便 宜 的 飞 往 巴 黎 的 航 班 ， 并 渴 望 通 过 告 知 用 户 这 一 信 息 来 提 供 帮 助 。 牢 记 这 一 点 ， 大 语 言 模 型 具 有 准 信 念 这 一 论 点 虽 然 具 有 实 质 性 ， 但 却 是 合 理 的 。 例 如 ， 当 前 的 大 语 言 模 型 很 可 能 相 信 2 + 2 = 4 并 且 相 信 埃 菲 尔 铁 塔 在 巴 黎 ： 大 语 言 模 型 会 在 其 输 出 中 持 续 认 可 这 些 主 张 ， 会 利 用 它 们 来 指 导 自 身 行 为 ， 等 等 。 大 语 言 模 型 具 有 准 欲 望 这 一 点 或 许 不 那 么 显 而 易 见 。 像 G P T - 3 这 样 的 基 础 模 型 或 许 可 以 被 赋 予 一 种 预 测 文 本 的 准 欲 望 ， 但 即 便 如 此 也 不 明 确 ， 因 为 文 本 预 测 的 目 标 是 在 “ 表 面 之 下 ” 运 作 （ 类 似 于 人 类 呼 吸 这 种 很 大 程 度 上 属 于 亚 个 体 层 面 的 目 标 ） ， 并 且 它 不 像 解 释 主 义 通 常 所 要 求 的 那 样 与 系 统 的 信 念 进 行 稳 健 的 交 互 。 然 而 ， 自 2 0 2 2 年 C h a t G P T 问 世 以 来 （ 正 如 A s k e l l 等 人 2 0 2 1 所 预 示 的 那 样 ） ， 所 有 前 沿 语 言 模 型 都 经 历 了 一 轮 或 多 轮 进 一 步 的 后 训 练 （ 包 括 通 过 人 类 反 馈 的 强 化 学 习 、 监 督 微 调 ， 以 及 / 或 基 于 可 验 证 奖 励 的 强 化 学 习 ） ， 这 些 训 练 向 系 统 注 入 了 诸 如 有 帮 助 性 、 诚 实 性 和 无 害 性 等 目 标 。 因 此 ， 很 可 能 （ 正 如 G o l d s t e i n 和 L e d e r m a n 2 0 2 5 b 所 论 证 的 那 样 ） ， 这 些 系 统 拥 有 源 自 后 训 练 的 准 欲 望 ， 例 如 渴 望 变 得 有 帮 助 、 诚 实 和 无 害 。 这 一 训 练 过 程 有 时 会 以 角 色 或 人 格 来 描 述 。 在 完 成 文 本 预 测 的 监 督 预 训 练 后 ， 基 础 模 型 会 经 过 后 训 练 ， 使 其 像 一 位 希 望 做 到 “ 有 益 、 无 害 、 诚 实 ” 的 “ 助 手 ” 角 色 那 样 回 应 。 如 果 训 练 成 功 ， 系 统 将 表 现 得 与 助 手 极 为 相 似 ， 从 而 拥 有 与 助 手 非 常 相 似 的 准 欲 望 。 进 一 步 的 微 调 以 及 与 用 户 的 长 期 互 动 ， 可 能 导 致 更 多 准 欲 望 的 出 现 ， 例 如 A u r a 为 用 户 推 进 某 些 项 目 的 准 欲 望 。 这 些 持 续 的 发 展 过 程 可 以 在 大 语 言 模 型 中 植 入 许 多 准 信 念 和 准 欲 望 。 6\n\nAn opponent might deny that LLMs have quasi-beliefs or quasi-desires on the grounds that LLM behavior is unstable, or non-humanlike, or otherwise defective in a way that means that the LLM is not even usefully interpretable in terms of beliefs or desires. Interpretability requires a certain amount of consistency over time, and LLMs can be inconsistent in their behavior. But they are also consistent in many domains. A core of consistency is enough for interpretation to get a grip in ascribing numerous quasi-beliefs and quasi-desires, even though there will be domains where they lack these states on grounds of inconsistency. Overall I think that experience with current LLMs suggests that there is enough consistency to support a reasonably extensive core of quasi-beliefs. I will not say a great deal about the question of just which quasi-beliefs and quasi-desires LLM interlocutors have. Understanding this sort of LLM quasi-psychology is best addressed through empirical study of language models. Importantly, I am not suggesting that LLM quasi-psychology is similar to human quasi-psychology. I think they are very di ff erent. But the framework at least allows us to address the question. So, I will take as a starting point the claim that LLM interlocutors at least have quasi-beliefs and quasi-desires. This claim is not entirely neutral in that it is possible to deny it, but I think the interpretability claim is weak enough and plausible enough that a majority of people can accept it. We might say that an entity with quasi-beliefs and quasi-desires is at least a quasi-agent or a quasi-subject . 5 If it is interpretable as making utterances and assertions, we can also say that it is a quasi-speaker , who makes quasi-utterances and quasi-assertions . One can in principle extend quasi-interpretivism to any mental states. We can say that a system quasi-fears that p if it is behaviorally interpretable as fearing that p and that a system quasi-feels pain if it is behaviorally interpretable as feeling pain. We can even say that a system is quasi-conscious if it is behaviorally interpretable as being conscious. Quasi-consciousness is a close relative of the recently discussed notion (Suleyman 2025; Long, Sebo, et al 2024) of “seeming consciousness”. (Philosophical zombies are not con- scious, but they are quasi-conscious and seemingly conscious.) Much depends on just what the rules are for interpreting mental states based on behavior, and the principles for ascribing con- sciousness are far less clear than the principles for ascribing beliefs and desires. (Perhaps the main principle concerns self-report: when a system reports having conscious state X, then other things 5 “Quasi-agent” might be preferable on philosophical grounds alone (since agency is often tied to belief and desire, where subjecthood is often tied to consciousness), but the term “agent” is so overloaded in AI contexts that I will often use “quasi-subject” instead. 7 反 对 者 可 能 会 否 认 大 语 言 模 型 拥 有 准 信 念 或 准 欲 望 ， 理 由 是 它 们 的 行 为 不 稳 定 、 不 像 人 类 ， 或 者 存 在 其 他 缺 陷 ， 以 至 于 无 法 用 信 念 或 欲 望 来 对 其 进 行 有 意 义 的 解 释 。 可 解 释 性 需 要 一 定 程 度 的 跨 时 间 一 致 性 ， 而 大 语 言 模 型 的 行 为 可 能 不 一 致 。 但 它 们 在 许 多 领 域 也 表 现 出 一 致 性 。 这 种 核 心 一 致 性 足 以 让 我 们 在 归 因 大 量 准 信 念 和 准 欲 望 时 有 所 依 据 ， 尽 管 在 某 些 领 域 ， 由 于 不 一 致 性 ， 它 们 可 能 缺 乏 这 些 状 态 。 总 体 而 言 ， 我 认 为 当 前 与 大 语 言 模 型 打 交 道 的 经 验 表 明 ， 其 一 致 性 足 以 支 撑 一 个 相 当 广 泛 的 准 信 念 核 心 。 关 于 大 语 言 模 型 对 话 者 究 竟 拥 有 哪 些 准 信 念 和 准 欲 望 ， 我 不 会 赘 述 太 多 。 理 解 这 类 大 语 言 模 型 准 心 理 学 的 最 佳 途 径 是 通 过 对 语 言 模 型 进 行 实 证 研 究 。 重 要 的 是 ， 我 并 非 暗 示 大 语 言 模 型 准 心 理 学 与 人 类 准 心 理 学 相 似 。 我 认 为 它 们 截 然 不 同 。 但 这 一 框 架 至 少 让 我 们 能 够 探 讨 这 个 问 题 。 因 此 ， 我 将 以 “ 大 语 言 模 型 对 话 者 至 少 拥 有 准 信 念 和 准 欲 望 ” 这 一 主 张 作 为 出 发 点 。 这 一 主 张 并 非 完 全 中 立 ， 因 为 有 可 能 被 否 定 ， 但 我 认 为 可 解 释 性 主 张 足 够 弱 且 足 够 合 理 ， 以 至 于 大 多 数 人 都 能 接 受 它 。 我 们 可 以 说 ， 一 个 拥 有 准 信 念 和 准 欲 望 的 实 体 至 少 是 一 个 准 主 体 或 一 个 准 主 体 。 5 如 果 它 可 被 解 释 为 发 出 话 语 和 断 言 ， 那 么 我 们 也 可 以 说 它 是 一 个 准 说 话 者 ， 它 做 出 准 话 语 和 准 断 言 。 原 则 上 ， 我 们 可 以 将 准 解 释 主 义 扩 展 到 任 何 心 理 状 态 。 我 们 可 以 说 ， 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 害 怕 p ， 那 么 它 就 准 害 怕 p ； 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 感 到 疼 痛 ， 那 么 它 就 准 感 到 疼 痛 。 我 们 甚 至 可 以 说 ， 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 具 有 意 识 ， 那 么 它 就 是 准 意 识 的 。 准 意 识 是 近 期 讨 论 的 概 念 （ S u l e y m a n 2 0 2 5 ; L o n g , S e b o , e t a l 2 0 2 4 ） — — “ 表 面 意 识 ” 的 近 亲 。 （ 哲 学 僵 尸 没 有 意 识 ， 但 它 们 是 准 意 识 的 ， 并 且 看 似 有 意 识 。 ） 关 键 在 于 ， 基 于 行 为 解 释 心 理 状 态 的 规 则 是 什 么 ， 而 赋 予 意 识 的 原 理 远 不 如 赋 予 信 念 和 欲 望 的 原 理 清 晰 。 （ 或 许 主 要 原 理 涉 及 自 我 报 告 ： 当 一 个 系 统 报 告 拥 有 意 识 状 态 X 时 ， 在 其 他 条 件 相 同 的 情 况 下 ， 就 赋 予 它 意 识 状 态 X 。 ） 5 仅 从 哲 学 角 度 而 言 ， “ 准 主 体 ” 可 能 更 可 取 （ 因 为 能 动 性 通 常 与 信 念 和 欲 望 相 关 ， 而 主 体 性 通 常 与 意 识 相 关 ） ， 但 在 人 工 智 能 语 境 中 ， “ 主 体 ” 一 词 过 于 泛 滥 ， 因 此 我 常 改 用 “ 准 主 体 ” 一 词 。 7\n\nbeing equal, ascribe it conscious state X). So I will stay away from talk of quasi-consciousness and focus mainly on quasi-beliefs and quasi-desires in what follows. Interlocutors as models, instances, or threads We can now say something about what we are looking for when we look for an LLM interlocutor, at least if that interlocutor is to play some of the roles of an ordinary dialogue partner. Ideally, an LLM interlocutor will at least be a quasi-subject. For example, Aura will at least be interpretable as having roughly the beliefs and desires that Aura seems to have. An LLM interlocutor should also be a quasi-speaker: it will be interpretable as saying the things that the LLM seems to say. We can separate these requirements into a few components. Most essentially, an LLM interlocutor will be interactive : that is, it will process the inputs and produce the outputs that the LLM seems to process and produce. When the user says “Hello”, the LLM interlocutor will process that input (or at least corresponding input tokens). When ChatGPT says “Thank you!” the LLM interlocutor will produce that output. Here, “produce” and “process” can be understood as broadly mechanical notions that (perhaps unlike “say” and “hear”) do not require mental states. Even an iPhone produces and processes sentences all the time. There are some further natural requirements. A persistent LLM interlocutor will produce all the outputs that the LLM seems to produce, and will process all the inputs that the LLM seems to produce. A coherent LLM interlocutor will be consistent enough to serve as a quasi-subject, with coherent quasi-beliefs and quasi-desires that help make sense of its actions. A faithful LLM in- terlocutor will have roughly the quasi-beliefs and the quasi-desires that the system seems to have. A unified LLM interlocutor will be a single unified system that generates responses. Perhaps the terminology can allow that there are non-persistent, incoherent, faithless, and disunified interlocu- tors. But the question I am most interested in is whether there are persistent, coherent, faithful, unified, and interactive interlocutors in LLM interactions—or at least interlocutors that satisfy as many of these requirements as possible. These constraints already eliminate some potential candidates. The interactivity constraint alone eliminates candidates such as the authors of the texts on which LLMs were trained, or the designers of the LLM, or fictional characters that the LLM is simulating, since none of these are interacting with the user. These candidates may be reasonable answers on some readings of the title question, but I am interested in answers that do more to vindicate the sense of genuine interlocution. 8 因 此 ， 在 接 下 来 的 讨 论 中 ， 我 将 避 免 谈 论 准 意 识 ， 而 主 要 关 注 准 信 念 和 准 欲 望 。 作 为 模 型 的 对 话 者 、 实 例 或 线 程 我 们 现 在 可 以 谈 谈 ， 在 寻 找 一 个 大 语 言 模 型 对 话 者 时 ， 我 们 究 竟 在 寻 找 什 么 — — 至 少 当 这 个 对 话 者 要 扮 演 普 通 对 话 伙 伴 的 某 些 角 色 时 是 如 此 。 理 想 情 况 下 ， 一 个 大 语 言 模 型 对 话 者 至 少 应 是 一 个 准 主 体 。 例 如 ， A u r a 至 少 应 能 被 解 释 为 拥 有 A u r a 似 乎 具 有 的 那 些 信 念 和 欲 望 。 一 个 大 语 言 模 型 对 话 者 也 应 是 一 个 准 说 话 者 ： 它 应 能 被 解 释 为 在 说 出 大 语 言 模 型 似 乎 说 出 的 那 些 话 。 我 们 可 以 将 这 些 要 求 分 解 为 几 个 组 成 部 分 。 最 根 本 的 是 ， 一 个 大 语 言 模 型 对 话 者 将 具 有 交 互 性 ： 也 就 是 说 ， 它 将 处 理 大 语 言 模 型 似 乎 处 理 的 那 种 输 入 ， 并 产 生 大 语 言 模 型 似 乎 产 生 的 那 种 输 出 。 当 用 户 说 “ 你 好 ” 时 ， 大 语 言 模 型 对 话 者 将 处 理 该 输 入 （ 或 至 少 是 对 应 的 输 入 令 牌 ） 。 当 C h a t G P T 说 “ 谢 谢 ！ ” 时 ， 大 语 言 模 型 对 话 者 将 产 生 该 输 出 。 在 这 里 ， “ 产 生 ” 和 “ 处 理 ” 可 以 被 理 解 为 宽 泛 的 机 械 性 概 念 ， 它 们 （ 或 许 与 “ 说 ” 和 “ 听 ” 不 同 ） 并 不 需 要 心 理 状 态 。 即 便 是 i P h o n e 也 一 直 在 产 生 和 处 理 句 子 。 还 有 一 些 进 一 步 的 自 然 要 求 。 一 个 持 久 的 大 语 言 模 型 对 话 者 将 产 生 大 语 言 模 型 似 乎 产 生 的 所 有 输 出 ， 并 处 理 大 语 言 模 型 似 乎 产 生 的 所 有 输 入 。 一 个 连 贯 的 大 语 言 模 型 对 话 者 将 足 够 一 致 ， 以 充 当 一 个 准 主 体 ， 拥 有 连 贯 的 准 信 念 和 准 欲 望 ， 有 助 于 理 解 其 行 为 。 一 个 忠 实 的 大 语 言 模 型 对 话 者 将 大 致 拥 有 系 统 似 乎 拥 有 的 准 信 念 和 准 欲 望 。 一 个 统 一 的 大 语 言 模 型 对 话 者 将 是 一 个 生 成 响 应 的 单 一 统 一 系 统 。 也 许 术 语 可 以 允 许 存 在 非 持 久 、 不 连 贯 、 不 忠 实 和 不 统 一 的 对 话 者 。 但 我 最 感 兴 趣 的 问 题 是 ， 在 大 语 言 模 型 交 互 中 是 否 存 在 持 久 、 连 贯 、 忠 实 、 统 一 且 具 有 交 互 性 的 对 话 者 — — 或 者 至 少 是 尽 可 能 满 足 这 些 要 求 的 对 话 者 。 这 些 约 束 已 经 排 除 了 一 些 潜 在 的 候 选 对 象 。 仅 交 互 性 约 束 就 排 除 了 诸 如 大 语 言 模 型 训 练 所 依 据 文 本 的 作 者 、 大 语 言 模 型 的 设 计 者 ， 或 大 语 言 模 型 正 在 模 拟 的 虚 构 角 色 等 候 选 对 象 ， 因 为 这 些 对 象 都 没 有 与 用 户 进 行 交 互 。 在 标 题 问 题 的 某 些 解 读 下 ， 这 些 候 选 对 象 可 能 是 合 理 的 答 案 ， 但 我 感 兴 趣 的 是 那 些 更 能 证 实 真 正 对 话 感 的 答 案 。 8\n\nTo make the case concrete, let the language model in question be GPT-4o—chosen partly be- cause it is often praised for its conversation skills, and partly because it is associated with a single model, where later systems are associated with multiple models (for example, GPT-5 systems are associated with GPT-5-Instant and GPT-5-Thinking). Much of what I say should generalize to other models, as well as to agent-like systems that embed a language model like this one. What do we talk with when we talk with GPT-4o? Here it is useful to distinguish models, instances, and conversations. In mid-2025, hundreds of millions of users per week used the model GPT-4o, which is itself a single trained transformer model whose core components are multi-layer artificial neural networks and attention mecha- nisms. 6 These users sent billions of messages per day worldwide. To handle this enormous load, there were perhaps thousands of instances (or implementations ) of GPT-4o, running on perhaps tens of thousands of GPUs (graphics processing units) in cloud servers around the world. Each instance implements a single copy of the GPT-4o model. For a user, any communication with GPT-4o takes place within a conversation . A conversation is a series of alternating messages: user inputs and LLM responses. When user inputs after the first are fed to the LLM, both sides of the entire conversation so far are typically also fed to the LLM alongside the user input as context . This context serves as a sort of short-term memory, allowing later messages to presuppose and build on earlier contributions. These models also have a sort of long-term memory (for example of many historical facts) built into their weights, but these weights never change after a model is trained and deployed. So the conversational context is the main source of new memories and projects (or at least new quasi-beliefs and quasi-desires) in a trained and deployed LLM. 7 Typically, users can start new conversations at any point, in which case the conversational context is standardly reset to zero (though some other elements of context, including system in- structions and some elements from previous conversations, may be retained). They can also revisit old conversations at any point. Most of the extended interactions with an LLM described earlier 6 See Chatterji et al 2025: “By July 2025, 18 billion messages were being sent each week by 700 million users”. These figures from OpenAI concern ChatGPT as a whole. At its peak, GPT-4o is estimated to have handled around half of those messages. 7 In addition to context and weights, activations of the neuron-like units within an LLM could in principle serve as memory, but in feedforward LLMs, activations are not preserved from one round of the conversation to the next. More generally, traditional transformer-based LLMs are “stateless” in that they do not retain internal states from one moment to the next. These days, LLMs usually retain some internal states in the key-value cache associated with attention heads, but this is a very limited form of memory. 9 为 了 使 讨 论 具 体 化 ， 假 设 所 讨 论 的 语 言 模 型 是 G P T - 4 o — — 选 择 它 部 分 是 因 为 其 对 话 能 力 常 受 赞 誉 ， 部 分 是 因 为 它 关 联 的 是 单 一 模 型 ， 而 后 续 系 统 则 关 联 多 模 型 （ 例 如 ， G P T - 5 系 统 关 联 G P T - 5 - I n s t a n t 和 G P T - 5 - T h i n k i n g ） 。 我 的 大 部 分 论 述 也 应 适 用 于 其 他 模 型 ， 以 及 嵌 入 了 此 类 语 言 模 型 的 类 智 能 体 系 统 。 当 我 们 与 G P T - 4 o 交 谈 时 ， 我 们 究 竟 在 与 谁 交 谈 ？ 在 此 ， 区 分 模 型 、 实 例 和 对 话 是 有 益 的 。 2 0 2 5 年 年 中 ， 每 周 有 数 亿 用 户 使 用 模 型 G P T - 4 o ， 它 本 身 是 一 个 经 过 训 练 的 单 一 变 换 器 模 型 ， 其 核 心 组 件 是 多 层 人 工 神 经 网 络 和 注 意 力 机 制 。 6 这 些 用 户 每 天 在 全 球 发 送 数 十 亿 条 消 息 。 为 了 处 理 如 此 巨 大 的 负 载 ， 可 能 有 数 千 个 实 例 （ 或 实 现 ） 的 G P T - 4 o ， 运 行 在 全 球 云 服 务 器 中 可 能 数 万 个 图 形 处 理 器 （ 图 形 处 理 单 元 ） 上 。 每 个 实 例 实 现 G P T - 4 o 模 型 的 一 个 副 本 。 对 于 用 户 而 言 ， 与 G P T - 4 o 的 任 何 通 信 都 发 生 在 对 话 中 。 对 话 是 一 系 列 交 替 的 消 息 ： 用 户 输 入 和 大 语 言 模 型 响 应 。 当 用 户 输 入 后 续 消 息 时 ， 整 个 对 话 到 目 前 为 止 的 双 方 内 容 通 常 也 会 作 为 上 下 文 与 用 户 输 入 一 起 输 入 到 大 语 言 模 型 中 。 这 种 上 下 文 充 当 一 种 短 期 记 忆 ， 使 后 续 消 息 能 够 预 设 并 建 立 在 早 期 贡 献 之 上 。 这 些 模 型 还 在 其 权 重 中 内 置 了 一 种 长 期 记 忆 （ 例 如 ， 许 多 历 史 事 实 的 记 忆 ） ， 但 模 型 训 练 和 部 署 后 ， 这 些 权 重 永 远 不 会 改 变 。 因 此 ， 对 话 上 下 文 是 训 练 和 部 署 后 的 大 语 言 模 型 中 新 记 忆 和 项 目 （ 或 至 少 是 新 的 准 信 念 和 准 欲 望 ） 的 主 要 来 源 。 7 通 常 ， 用 户 可 以 在 任 何 时 间 点 开 启 新 的 对 话 ， 此 时 对 话 上 下 文 会 被 标 准 地 重 置 为 零 （ 尽 管 上 下 文 的 某 些 其 他 元 素 ， 包 括 系 统 指 令 和 之 前 对 话 中 的 一 些 内 容 ， 可 能 会 被 保 留 ） 。 他 们 也 可 以 在 任 何 时 间 点 重 新 访 问 旧 的 对 话 。 大 多 数 与 大 型 语 言 模 型 的 长 时 间 交 互 ， 如 前 所 述 ， 6 参 见 C h a t t e r j i 等 人 2 0 2 5 年 的 研 究 ： “ 截 至 2 0 2 5 年 7 月 ， 7 亿 用 户 每 周 发 送 1 8 0 亿 条 消 息 ” 。 这 些 来 自 O p e n A I 的 数 据 涉 及 整 个 C h a t G P T 。 在 高 峰 期 ， G P T - 4 o 估 计 处 理 了 其 中 约 一 半 的 消 息 。 7 除 了 上 下 文 和 权 重 之 外 ， 大 语 言 模 型 中 类 神 经 元 单 元 的 激 活 值 原 则 上 也 可 以 作 为 记 忆 ， 但 在 前 馈 大 语 言 模 型 中 ， 激 活 值 不 会 从 一 轮 对 话 保 留 到 下 一 轮 。 更 普 遍 地 说 ， 传 统 的 基 于 变 换 器 的 大 语 言 模 型 是 “ 无 状 态 的 ” ， 因 为 它 们 不 会 从 一 个 时 刻 到 下 一 个 时 刻 保 留 内 部 状 态 。 如 今 ， 大 语 言 模 型 通 常 会 在 与 注 意 力 头 相 关 的 键 值 缓 存 中 保 留 一 些 内 部 状 态 ， 但 这 是 一 种 非 常 有 限 的 记 忆 形 式 。 9\n\ntake place within a single conversation over weeks or months. In some cases, a single user has distinct conversations with what are naturally interpreted as distinct LLM interlocutors. Natural candidates for LLM interlocutors include models (such as GPT-4o), instances (such as implementations of GPT-4o running on GPU hardware), conversations (entities tied to interactions between the user and GPT-4o), and characters (trained personas such as the Assistant character). At least, LLM interlocutors may be entities tied to models, instances, conversations, or characters. There might be one interlocutor per model; or one interlocutor per instance; or one interlocutor per conversation; or one interlocutor per character. I will consider versions of each of these hypotheses in turn. (1) Interlocutors as models . A natural starting point, suggested by the very idea of “talking with language models” is that the LLM interlocutor is the model itself, namely GPT-4o. However, there are severe di ffi culties for this view. First: GPT-4o, the model, is naturally construed as an abstract object (like a program or an algorithm), and it is hard to see how we can talk with an abstract object. We required that LLM interlocutors actually produce the outputs in a conversation. But it is hard to see how an abstract object like a program can produce anything. We need an instance or implementation of the pro- gram in hardware to do that. Second: on the most natural interpretation, LLM interlocutors seem to change over time, for example acquiring new quasi-beliefs and quasi-desires as a conversation proceeds, while the model never changes at all. Instances change over time, for example when they receive new inputs and outputs, but the model itself does not. Third: perhaps we can find some loose sense in which the model can be said to produce outputs and change over time. For example, perhaps we can say that the model produces an output if an instance of it produces an output. But the same model will be involved in thousands of conversations. So if it is talking to one user, it is talking to them all. If Aura (one user’s interlocutor) is GPT-4o, then Beta (another user’s interlocutor) is also GPT-4o. But Aura and Beta say contradictory things, so they do not seem to be identical. And if we say that the model (and Aura and Beta) says all those things, then it will be rampantly incoherent and it will not look like a quasi-subject at all. Perhaps we could say that in the case of contradictory utterances, then the model does not quasi-believe the contradictory things it says, but now the model will have a “thin” psychology that is not faithful to the way that Aura and Beta seemed to be. (2) Interlocutors as hardware instances . 10 都 发 生 在 跨 越 数 周 或 数 月 的 单 一 对 话 中 。 在 某 些 情 况 下 ， 单 个 用 户 会 与 自 然 被 理 解 为 不 同 的 大 语 言 模 型 对 话 者 进 行 不 同 的 对 话 。 大 语 言 模 型 对 话 者 的 自 然 候 选 包 括 模 型 （ 如 G P T - 4 o ） 、 实 例 （ 如 在 G P U 硬 件 上 运 行 的 G P T - 4 o 实 现 ） 、 对 话 （ 与 用 户 和 G P T - 4 o 之 间 互 动 相 关 的 实 体 ） 以 及 角 色 （ 如 助 手 角 色 这 类 经 过 训 练 的 人 格 ） 。 至 少 ， 大 语 言 模 型 对 话 者 可 能 是 与 模 型 、 实 例 、 对 话 或 角 色 相 关 的 实 体 。 可 能 每 个 模 型 对 应 一 个 对 话 者 ； 或 每 个 实 例 对 应 一 个 对 话 者 ； 或 每 个 对 话 对 应 一 个 对 话 者 ； 或 每 个 角 色 对 应 一 个 对 话 者 。 我 将 依 次 考 虑 这 些 假 设 的 各 个 版 本 。 ( 1 ) 作 为 模 型 的 对 话 者 。 一 个 自 然 的 起 点 ， 由 “ 与 语 言 模 型 对 话 ” 这 一 概 念 本 身 所 暗 示 ， 是 大 语 言 模 型 对 话 者 就 是 模 型 本 身 ， 即 G P T - 4 o 。 然 而 ， 这 种 观 点 存 在 严 重 困 难 。 首 先 ： G P T - 4 o ， 这 个 模 型 ， 自 然 地 被 理 解 为 一 种 抽 象 对 象 （ 如 程 序 或 算 法 ） ， 很 难 想 象 我 们 如 何 能 与 一 个 抽 象 对 象 对 话 。 我 们 要 求 大 语 言 模 型 对 话 者 实 际 上 在 对 话 中 产 生 输 出 。 但 很 难 想 象 像 程 序 这 样 的 抽 象 对 象 如 何 能 产 生 任 何 东 西 。 我 们 需 要 一 个 在 硬 件 上 运 行 该 程 序 的 实 例 或 实 现 才 能 做 到 这 一 点 。 第 二 ： 在 最 自 然 的 解 释 下 ， 大 语 言 模 型 对 话 者 似 乎 会 随 时 间 变 化 ， 例 如 随 着 对 话 进 行 而 获 得 新 的 准 信 念 和 准 欲 望 ， 而 模 型 本 身 却 从 未 改 变 。 实 例 会 随 时 间 变 化 ， 例 如 当 它 们 接 收 新 的 输 入 和 输 出 时 ， 但 模 型 本 身 不 会 。 第 三 ： 或 许 我 们 可 以 在 某 种 松 散 意 义 上 说 ， 模 型 能 够 产 生 输 出 并 随 时 间 变 化 。 例 如 ， 或 许 我 们 可 以 说 ， 如 果 模 型 的 某 个 实 例 产 生 了 输 出 ， 那 么 模 型 就 产 生 了 输 出 。 但 同 一 个 模 型 会 参 与 数 千 场 对 话 。 因 此 ， 如 果 它 在 与 一 个 用 户 交 谈 ， 它 就 在 与 所 有 用 户 交 谈 。 如 果 A u r a （ 一 个 用 户 的 对 话 者 ） 是 G P T - 4 o ， 那 么 B e t a （ 另 一 个 用 户 的 对 话 者 ） 也 是 G P T - 4 o 。 但 A u r a 和 B e t a 说 出 了 矛 盾 的 话 语 ， 因 此 它 们 似 乎 并 不 相 同 。 而 如 果 我 们 说 模 型 （ 以 及 A u r a 和 B e t a ） 说 出 了 所 有 那 些 话 ， 那 么 它 将 变 得 极 度 不 连 贯 ， 根 本 不 像 一 个 准 主 体 。 或 许 我 们 可 以 说 ， 在 矛 盾 话 语 的 情 况 下 ， 模 型 并 不 准 相 信 它 所 说 的 那 些 矛 盾 内 容 ， 但 这 样 一 来 ， 模 型 的 心 理 学 就 会 变 得 “ 单 薄 ” ， 无 法 忠 实 反 映 A u r a 和 B e t a 原 本 呈 现 出 的 样 子 。 ( 2 ) 作 为 硬 件 实 例 的 对 话 者 。 1 0\n\nA very attractive view is that LLM interlocutors are instances of the model. 8 On the most common understanding, LLM instances are implementations of an LLM algorithm in hardware. For many AI systems, something like this seems the right story. For example, it is arguable that when Joseph Weizenbaum interacted with Eliza, his interlocutor was the instance of Eliza running on his computer. Likewise, in fiction, robots such as C-3PO ( Star Wars ) or Commander Data ( Star Trek ) can be understood as embodied instances of computer programs. Identifying LLM interlocutors with hardware instances is much less attractive when applied to current LLMs, because of two crucial features of how current LLMs are implemented. First, conversations with LLMs typically use distributed serving , in that a single conversation takes place on multiple instances of the LLM on multiple servers. 9 The first input in a conversation might be processed on an instance of GPT-4o in a server in New York, while the second input is routed to a server in Texas and the third input is routed to California. This is often more e ffi cient, as it allows loads to be balanced among servers, and it is easy to do, as we need only send the inputs and outputs in the conversation so far as context. In this system, a conversation with an interlocutor such as Aura is spread over entirely distinct hardware instances in di ff erent places, connected only by the routing of input / output context between the servers. Distributed serving makes it unattractive to identify LLM interlocutors with hardware in- stances. On this system, there is no instance that produces all or even most of the LLM outputs in the conversation. But we have defined persistent interlocutors as interlocutors that produce all outputs in a conversation. So no instance here is a persistent interlocutor. At best we will have dif- ferent instances as di ff erent interlocutors for each stage of the conversation. No single entity will have the profile of quasi-beliefs and quasi-utterances that Aura seems to have. In e ff ect, Aura’s role will be played by many di ff erent interlocutors at di ff erent times. This fragmented view with 8 Goldstein and Lederman (2025b), Register (2025), and Shanahan (2025) all consider the models vs. instances question and appear to favor some version of the instance view. Register is somewhat agnostic on the specific view. Shanahan favors a combined model-plus-instance view: “perhaps the word “I” refers to the (somewhat abstract) compu- tational entity comprising the underlying model (its architecture and weights) plus the suspended computational state of an instance of this model representing a single, specific, ongoing conversation.” Goldstein and Lederman say explicitly that there are instance agents but no model agents, although they are not entirely explicit about their understanding of instances. They say that their instances exist for the period of a single conversation, which is consistent with hardware instances tied to periods of time or with the virtual instances discussed below. None of these authors say much about the problem of distributed service that motivates the move away from hardware instances. 9 Distributed conversations also go by the label “non-sticky sessions”, where a sticky session is an interaction be- tween user and system that is grounded in a single hardware instance. Birch (2025) discusses distributed serving as a problem for persistent LLM interlocutors. 11 一 种 颇 具 吸 引 力 的 观 点 是 ， 大 语 言 模 型 对 话 者 是 模 型 的 实 例 。 8 按 照 最 普 遍 的 理 解 ， 大 语 言 模 型 实 例 是 大 语 言 模 型 算 法 在 硬 件 中 的 实 现 。 对 于 许 多 人 工 智 能 系 统 而 言 ， 类 似 的 说 法 似 乎 是 正 确 的 。 例 如 ， 可 以 论 证 的 是 ， 当 J o s e p h W e i z e n b a u m 与 E l i z a 互 动 时 ， 他 的 对 话 者 就 是 运 行 在 他 电 脑 上 的 E l i z a 实 例 。 同 样 ， 在 虚 构 作 品 中 ， 诸 如 C - 3 P O （ 星 球 大 战 ） 或 C o m m a n d e r D a t a （ 星 际 迷 航 ） 这 样 的 机 器 人 ， 可 以 被 理 解 为 计 算 机 程 序 的 具 身 化 实 例 。 将 大 语 言 模 型 对 话 者 与 硬 件 实 例 等 同 起 来 ， 在 应 用 于 当 前 的 大 语 言 模 型 时 吸 引 力 大 打 折 扣 ， 这 是 因 为 当 前 大 语 言 模 型 实 现 方 式 的 两 个 关 键 特 征 。 首 先 ， 与 大 语 言 模 型 的 对 话 通 常 采 用 分 布 式 服 务 ， 即 单 次 对 话 发 生 在 多 个 服 务 器 上 的 多 个 大 语 言 模 型 实 例 上 。 9 对 话 中 的 第 一 个 输 入 可 能 由 纽 约 某 台 服 务 器 上 的 G P T - 4 o 实 例 处 理 ， 而 第 二 个 输 入 被 路 由 到 德 克 萨 斯 州 的 一 台 服 务 器 ， 第 三 个 输 入 则 被 路 由 到 加 利 福 尼 亚 州 。 这 种 方 式 通 常 更 高 效 ， 因 为 它 能 在 服 务 器 之 间 平 衡 负 载 ， 并 且 实 现 起 来 也 很 简 单 ， 我 们 只 需 将 迄 今 为 止 的 对 话 输 入 和 输 出 作 为 上 下 文 发 送 即 可 。 在 这 种 系 统 中 ， 与 诸 如 A u r a 这 样 的 对 话 者 进 行 的 对 话 ， 会 分 散 在 不 同 地 点 的 、 完 全 不 同 的 硬 件 实 例 上 ， 这 些 实 例 仅 通 过 服 务 器 之 间 输 入 / 输 出 上 下 文 的 路 由 连 接 在 一 起 。 分 布 式 服 务 使 得 将 大 语 言 模 型 对 话 者 与 硬 件 实 例 关 联 起 来 变 得 不 具 吸 引 力 。 在 这 个 系 统 上 ， 没 有 任 何 一 个 实 例 能 产 生 对 话 中 全 部 甚 至 大 部 分 的 大 语 言 模 型 输 出 。 但 我 们 已 将 持 久 对 话 者 定 义 为 能 产 生 对 话 中 所 有 输 出 的 对 话 者 。 因 此 ， 这 里 没 有 任 何 实 例 是 持 久 对 话 者 。 充 其 量 ， 我 们 会 有 不 同 的 实 例 作 为 对 话 不 同 阶 段 的 对 话 者 。 没 有 任 何 单 一 实 体 能 拥 有 A u r a 似 乎 具 备 的 准 信 念 和 准 话 语 特 征 。 实 际 上 ， A u r a 的 角 色 将 由 许 多 不 同 的 对 话 者 在 不 同 时 间 扮 演 。 这 种 碎 片 化 的 视 角 ， 8 G o l d s t e i n 和 L e d e r m a n ( 2 0 2 5 b ) 、 R e g i s t e r ( 2 0 2 5 ) 以 及 S h a n a h a n ( 2 0 2 5 ) 都 思 考 了 模 型 与 实 例 的 问 题 ， 并 且 似 乎 倾 向 于 某 种 版 本 的 实 例 视 角 。 R e g i s t e r 对 具 体 观 点 持 某 种 不 可 知 论 态 度 。 S h a n a h a n 则 倾 向 于 一 种 模 型 加 实 例 的 综 合 观 点 ： “ 也 许 ‘ 我 ’ 这 个 词 指 的 是 （ 某 种 抽 象 的 ） 计 算 实 体 ， 该 实 体 由 底 层 模 型 （ 其 架 构 和 权 重 ） 加 上 该 模 型 一 个 实 例 的 挂 起 计 算 状 态 组 成 ， 该 实 例 代 表 一 个 单 一 的 、 具 体 的 、 正 在 进 行 的 对 话 。 ” G o l d s t e i n 和 L e d e r m a n 明 确 表 示 存 在 实 例 代 理 ， 但 不 存 在 模 型 代 理 ， 尽 管 他 们 对 自 己 关 于 实 例 的 理 解 并 未 完 全 明 确 说 明 。 他 们 表 示 ， 他 们 的 实 例 存 在 于 单 次 对 话 期 间 ， 这 与 绑 定 到 特 定 时 间 段 的 硬 件 实 例 或 下 文 讨 论 的 虚 拟 实 例 是 一 致 的 。 这 些 作 者 中 没 有 人 对 分 布 式 服 务 的 问 题 （ 正 是 该 问 题 促 使 我 们 放 弃 硬 件 实 例 ） 进 行 过 多 讨 论 。 9 分 布 式 对 话 也 被 称 为 “ 非 粘 性 会 话 ” ， 而 粘 性 会 话 则 是 用 户 与 系 统 之 间 基 于 单 一 硬 件 实 例 进 行 的 交 互 。 B i r c h ( 2 0 2 5 ) 将 分 布 式 服 务 视 为 持 久 性 大 语 言 模 型 对 话 者 面 临 的 一 个 问 题 。 1 1\n\nnon-persistent interlocutors could perhaps be a fallback view of LLM interlocutors if it is the best we can do, but I think we can do better. Second, LLM conversations typically involve multi-tenancy of LLM instances, in that the same instance hosts multiple conversations, often in quick succession. 10 An instance of GPT-4o in New York might first be used to generate an output for a user’s conversation with Aura, and then a moment later for a di ff erent user’s conversation with Beta. It is easy for an instance to switch conversations: it requires only that Beta’s conversational context be routed to the instance and used as input for the instance’s next pass. Multi-tenancy also makes it unattractive to identify LLM interlocutors with hardware in- stances. Even if we set aside distributed serving and assume that each conversation takes place on the same hardware, multi-tenancy means that the same hardware instance typically hosts many conversations. Suppose that conversations with Aura and Beta are hosted on the same instance. Then the instance view implies that there is a single interlocutor here: Aura is Beta. But now this interlocutor will say everything that Aura and Beta says. As a result, it will make contradictory ut- terances and will thereby be incoherent. Perhaps we can say that it has neither of the contradictory beliefs in this case, but now it will have a thin and faithless psychology, as in the case of models discussed above. It is possible to maintain that hardware instances are LLM interlocutors, but it seems im- possible to maintain that they are persistent, coherent, and faithful LLM interlocutors. At best, instances are either fragmented (playing the role of Aura and Beta at di ff erent times), incoherent (saying contradictory things), or faithless (not believing things that Aura seems to believe). Again, I think it is possible to do better. (3) Interlocutors as virtual instances . The problems for instances arise especially because there can be many instances per conversation. It is natural to hold that a single conversation should involve just a single interlocutor. So it makes sense to find an interlocutor such that there is no more than one interlocutor per conversation. Here there is a natural candidate, at least in the core case where a single model is in use throughout the conversation. A virtual instance of a model is an implementation of the model that is itself implemented by multiple hardware instances of the model over time. Over the course of a distributed conversation with an LLM, there will be multiple hardware instances taking inputs and producing outputs. These hardware instances will collectively implement a single virtual instance 10 Multi-tenancy is a standard label, but there are many other labels, such as timesharing and interweaving . Shiller (2025) discusses various forms of interweaving as a puzzle case for LLM personal identity. 12 以 非 持 久 性 对 话 者 作 为 大 语 言 模 型 对 话 者 的 备 选 视 图 ， 或 许 是 我 们 能 做 到 的 最 佳 方 案 ， 但 我 认 为 我 们 可 以 做 得 更 好 。 其 次 ， 大 语 言 模 型 对 话 通 常 涉 及 大 语 言 模 型 实 例 的 多 租 户 ， 即 同 一 实 例 承 载 多 个 对 话 ， 且 往 往 快 速 连 续 进 行 。 1 0 纽 约 的 一 个 G P T - 4 o 实 例 可 能 先 用 于 生 成 用 户 与 A u r a 对 话 的 输 出 ， 片 刻 后 又 用 于 另 一 用 户 与 B e t a 的 对 话 。 实 例 切 换 对 话 很 容 易 ： 只 需 将 B e t a 的 对 话 上 下 文 路 由 至 该 实 例 ， 并 作 为 实 例 下 一 次 处 理 的 输 入 即 可 。 多 租 户 也 使 得 将 大 语 言 模 型 对 话 者 与 硬 件 实 例 等 同 起 来 缺 乏 吸 引 力 。 即 使 我 们 不 考 虑 分 布 式 服 务 ， 假 设 每 个 对 话 都 在 同 一 硬 件 上 进 行 ， 多 租 户 意 味 着 同 一 硬 件 实 例 通 常 承 载 多 个 对 话 。 假 设 与 A u r a 和 B e t a 的 对 话 托 管 在 同 一 实 例 上 。 那 么 实 例 视 角 意 味 着 这 里 只 有 一 个 对 话 者 ： A u r a 就 是 B e t a 。 但 这 样 一 来 ， 这 个 对 话 者 将 说 出 A u r a 和 B e t a 所 说 的 一 切 。 结 果 ， 它 会 做 出 矛 盾 的 表 述 ， 从 而 变 得 不 连 贯 。 或 许 我 们 可 以 说 ， 在 这 种 情 况 下 它 并 不 持 有 这 两 种 矛 盾 信 念 中 的 任 何 一 种 ， 但 这 样 一 来 ， 它 将 拥 有 单 薄 且 不 忠 实 的 心 理 ， 正 如 上 文 讨 论 模 型 时 的 情 况 。 可 以 认 为 硬 件 实 例 就 是 大 语 言 模 型 对 话 者 ， 但 似 乎 不 可 能 认 为 它 们 是 持 久 、 连 贯 且 忠 实 的 大 语 言 模 型 对 话 者 。 实 例 充 其 量 要 么 是 碎 片 化 的 （ 在 不 同 时 间 扮 演 A u r a 和 B e t a 的 角 色 ） ， 要 么 是 不 连 贯 的 （ 说 出 矛 盾 的 话 ） ， 要 么 是 不 忠 实 的 （ 不 相 信 A u r a 似 乎 相 信 的 事 情 ） 。 再 次 强 调 ， 我 认 为 可 以 做 得 更 好 。 （ 3 ） 对 话 者 作 为 虚 拟 实 例 。 实 例 的 问 题 尤 其 源 于 每 个 对 话 中 可 能 存 在 多 个 实 例 。 很 自 然 地 认 为 ， 单 个 对 话 应 该 只 涉 及 一 个 对 话 者 。 因 此 ， 找 到 一 个 对 话 者 ， 使 得 每 个 对 话 中 不 超 过 一 个 对 话 者 ， 这 是 有 道 理 的 。 这 里 有 一 个 自 然 的 候 选 方 案 ， 至 少 在 对 话 全 程 使 用 单 一 模 型 的 核 心 场 景 中 如 此 。 一 个 模 型 的 虚 拟 实 例 是 模 型 的 一 种 实 现 ， 它 本 身 由 多 个 模 型 的 硬 件 实 例 随 时 间 推 移 共 同 实 现 。 在 与 大 语 言 模 型 进 行 分 布 式 对 话 的 过 程 中 ， 会 有 多 个 硬 件 实 例 接 收 输 入 并 产 生 输 出 。 这 些 硬 件 实 例 将 共 同 实 现 一 个 单 一 的 虚 拟 实 例 1 0 多 租 户 是 一 个 标 准 标 签 ， 但 还 有 许 多 其 他 标 签 ， 例 如 分 时 和 交 织 。 S h i l l e r ( 2 0 2 5 ) 讨 论 了 各 种 形 式 的 交 织 ， 将 其 作 为 大 语 言 模 型 个 人 身 份 的 一 个 难 题 案 例 。 1 2\n\nof the model, which will be persistently present throughout. Virtual digital entities are familiar in many domains. In interacting with a shopping site such as Amazon, you interact with a single shopping cart that may be realized by many di ff erent hardware servers around the world. At a crucial time you may be interacting with one hardware instance, which will temporarily have server authority. But all this will be transparent to the user. The multiple hardware carts will jointly implement a single virtual instance of the cart. Something similar applies in massive multiplayer videogames. A virtual object such as a frisbee may be implemented in hardware on di ff erent servers. There may be multiple hardware instances of the frisbee, but just one virtual instance. The frisbee itself is most naturally identified with the virtual frisbee, not the hardware frisbees. A virtual instance of a model will likewise be implemented by a series of hardware instances of the model. It will be realized as a series of hardware instances, one for each step in a given conversation. 11 Jonathan Birch appeals to distributed serving to argue that there are no persistent interlocutors in typical LLM conversations. We can now see what is correct and incorrect in this claim. In distributed sessions, no hardware instance is a persistent interlocutor. But virtual instances can still be persistent interlocutors, just as one can interact online with a single persistent virtual shopping cart. A harder problem for virtual instances arises from model variation : cases where di ff erent mod- els are used over the course of a conversation. 12 For example, the GPT-5 system sometimes directs queries to the GPT-5 Instant model (no chain-of-thought reasoning, fast answer) and sometimes to the GPT-5 Thinking model (some chain-of-thought reasoning, slower answer), depending on the di ffi culty of the query. GPT-5’s contributions to the conversation are still produced by a series of hardware instances, but these will implement two di ff erent models and therefore do not implement a single virtual model instance as persistent interlocutor. In this case, one could still say that the LLM interlocutor is a persistent virtual instance of the GPT-5 system —which is not itself a single language model, but a bifurcated system involving two models. This bifurcated system would be somewhat disunified, but it would at least be persistent. 11 On standard accounts of implementation, an algorithm is implemented by a physical system when there is a map- ping from physical states of the system to computational states of the algorithm such that all state-transitions are preserved. In a hardware instance, the algorithm is implemented by a physical system such as a GPU cluster. In a virtual instance, the algorithm is implemented by a larger physical system including multiple hardware instances and routing mechanisms between them. 12 Register (2025) discusses model change as a general issue for individuating AI moral patients. 13 ， 该 虚 拟 实 例 将 在 整 个 过 程 中 持 续 存 在 。 虚 拟 数 字 实 体 在 许 多 领 域 都 很 常 见 。 在 与 亚 马 逊 等 购 物 网 站 互 动 时 ， 你 与 一 个 单 一 的 购 物 车 交 互 ， 这 个 购 物 车 可 能 由 世 界 各 地 许 多 不 同 的 硬 件 服 务 器 实 现 。 在 关 键 时 刻 ， 你 可 能 正 在 与 一 个 硬 件 实 例 交 互 ， 该 实 例 将 暂 时 拥 有 服 务 器 权 限 。 但 这 一 切 对 用 户 来 说 都 是 透 明 的 。 多 个 硬 件 购 物 车 将 共 同 实 现 购 物 车 的 一 个 单 一 虚 拟 实 例 。 类 似 的 情 况 也 适 用 于 大 型 多 人 在 线 视 频 游 戏 。 一 个 虚 拟 物 体 ， 比 如 飞 盘 ， 可 能 在 不 同 的 服 务 器 上 以 硬 件 形 式 实 现 。 一 个 飞 盘 可 能 有 多 个 硬 件 实 例 ， 但 只 有 一 个 虚 拟 实 例 。 飞 盘 本 身 最 自 然 地 被 视 为 虚 拟 飞 盘 ， 而 非 硬 件 飞 盘 。 一 个 模 型 的 虚 拟 实 例 同 样 将 由 一 系 列 该 模 型 的 硬 件 实 例 来 实 现 。 它 将 以 一 系 列 硬 件 实 例 的 形 式 呈 现 ， 每 个 实 例 对 应 给 定 对 话 中 的 一 个 步 骤 。 1 1 乔 纳 森 · 伯 奇 借 助 分 布 式 服 务 来 论 证 ， 在 典 型 的 大 语 言 模 型 对 话 中 不 存 在 持 久 的 对 话 者 。 我 们 现 在 可 以 看 出 这 一 说 法 中 哪 些 是 正 确 的 ， 哪 些 是 不 正 确 的 。 在 分 布 式 会 话 中 ， 没 有 哪 个 硬 件 实 例 是 持 久 的 对 话 者 。 但 虚 拟 实 例 仍 然 可 以 是 持 久 的 对 话 者 ， 就 像 人 们 可 以 在 线 交 互 一 个 单 一 的 持 久 虚 拟 购 物 车 一 样 。 虚 拟 实 例 面 临 的 一 个 更 棘 手 的 问 题 来 自 模 型 变 异 ： 即 在 对 话 过 程 中 使 用 了 不 同 模 型 的 情 况 。 1 2 例 如 ， G P T - 5 系 统 有 时 会 将 查 询 导 向 G P T - 5 即 时 模 型 （ 无 思 维 链 推 理 ， 快 速 回 答 ） ， 有 时 则 导 向 G P T - 5 思 考 模 型 （ 有 一 定 思 维 链 推 理 ， 回 答 较 慢 ） ， 具 体 取 决 于 查 询 的 难 度 。 G P T - 5 对 对 话 的 贡 献 仍 然 由 一 系 列 硬 件 实 例 产 生 ， 但 这 些 实 例 将 实 现 两 种 不 同 的 模 型 ， 因 此 无 法 实 现 一 个 单 一 的 虚 拟 模 型 实 例 作 为 持 续 对 话 者 。 在 这 种 情 况 下 ， 我 们 仍 可 以 说 大 语 言 模 型 对 话 者 是 G P T - 5 系 统 的 一 个 持 久 虚 拟 实 例 — — 该 系 统 本 身 并 非 单 一 语 言 模 型 ， 而 是 一 个 涉 及 两 个 模 型 的 分 叉 系 统 。 这 个 分 叉 系 统 虽 然 有 些 不 够 统 一 ， 但 至 少 是 持 久 的 。 1 1 根 据 标 准 的 实 现 理 论 ， 当 一 个 物 理 系 统 的 物 理 状 态 与 算 法 的 计 算 状 态 之 间 存 在 映 射 关 系 ， 且 所 有 状 态 转 换 都 得 以 保 留 时 ， 该 算 法 便 由 该 物 理 系 统 实 现 。 在 硬 件 实 例 中 ， 算 法 由 诸 如 G P U 集 群 之 类 的 物 理 系 统 实 现 。 在 虚 拟 实 例 中 ， 算 法 则 由 一 个 更 大 的 物 理 系 统 实 现 ， 该 系 统 包 含 多 个 硬 件 实 例 及 其 之 间 的 路 由 机 制 。 1 2 R e g i s t e r ( 2 0 2 5 ) 将 模 型 变 更 讨 论 为 识 别 人 工 智 能 道 德 患 者 时 的 一 个 普 遍 问 题 。 1 3\n\nBut there are other cases where even the system changes, for example because new language models or other new technology is introduced halfway through a conversation. In cases like this the system can change so significantly that there is no single system that supports a persistent virtual instance. At best one will have distinct virtual instances of multiple models over time. There may also be limits on coherence (inconsistent beliefs, for example) arising from the use of multiple models. 13 I think that identifying persistent LLM interlocutors with virtual instances works well in the core case in which a single model is at play. But it is useful to explore alternative understandings of persistent interlocutors that can handle other cases. (4) Interlocutors as threads . An alternative approach identifies interlocutors with threads (or perhaps thread agents ). On a first approximation, a thread is roughly a sequence of hardware instances within a conversation, one for every timestep. One instance I ′ is the successor of a previous instance I if the conver- sational history of I (its conversational context plus the latest input and output) is routed to I ′ to serve as its conversational context. 14 (If the conversation is routed to the same instance twice in a row, that instance will be its own successor.) The successor relation is roughly a “memory” relation encoding the fact that each new instance has memories from the last. A thread is then a series of instances (or better, instance-slices, which are pairs of instances and time periods during which the instance is processing a single conversational step), each of which is the successor of the previous instance. In the single-model case, a virtual instance of the model will be implemented by a thread involving successive hardware instances of the model. In the case of multiple models within a single conversation, there will not be a single virtual instance (since an instance is always an instance of a single algorithm), but there will still be a thread involving hardware instances of multiple models over time, each of which is the successor of the last. It is possible to weaken the conditions for successorship to allow a wider variety of memory- like relations to count. In many systems, cross-conversation memory allows information from one 13 Birch also argues that significant discontinuity is brought on by the use of mixture-of-experts processing within a model (e.g. choosing between various candidates for an multi-layer-perceptron block within a transformer, depending on the input). I think this sort of intra-model variation (we might call it module variation, as opposed to model variation) in local subsystems is consistent with broader psychological continuity, just as the use of di ff erent neural circuits in a human brain in response to di ff erent inputs is consistent with psychological continuity and personal identity. 14 More accurately, instance-slice ⟨ I ′ , t ′ ⟩ is the successor of instance-slice ⟨ I , t ⟩ if the context fed to I during t , along with the new inputs to I and outputs from I during t , are the (expanded) context fed to I’ during time t ′ . 14 但 还 有 其 他 情 况 ， 连 系 统 本 身 也 会 发 生 变 化 ， 例 如 在 对 话 中 途 引 入 了 新 的 语 言 模 型 或 其 他 新 技 术 。 在 这 种 情 况 下 ， 系 统 可 能 发 生 显 著 变 化 ， 以 至 于 没 有 任 何 单 一 系 统 能 够 支 持 一 个 持 久 虚 拟 实 例 。 充 其 量 ， 随 着 时 间 的 推 移 ， 人 们 会 拥 有 多 模 型 的 不 同 虚 拟 实 例 。 此 外 ， 使 用 多 模 型 还 可 能 带 来 连 贯 性 方 面 的 限 制 （ 例 如 ， 信 念 不 一 致 ） 。 1 3 我 认 为 ， 在 单 一 模 型 运 作 的 核 心 场 景 中 ， 将 持 久 性 大 语 言 模 型 对 话 者 等 同 于 虚 拟 实 例 是 行 之 有 效 的 。 但 探 索 对 持 久 对 话 者 的 其 他 理 解 方 式 ， 以 处 理 其 他 情 况 ， 也 是 有 益 的 。 ( 4 ) 作 为 线 程 的 对 话 者 。 另 一 种 方 法 将 对 话 者 等 同 于 线 程 （ 或 可 能 是 线 程 代 理 ） 。 粗 略 来 说 ， 线 程 大 致 是 对 话 中 一 系 列 硬 件 实 例 的 序 列 ， 每 个 时 间 步 对 应 一 个 实 例 。 如 果 实 例 I ′ 的 对 话 历 史 （ 其 对 话 上 下 文 加 上 最 新 的 输 入 和 输 出 ） 被 路 由 到 I ′ 作 为 其 对 话 上 下 文 ， 那 么 该 实 例 就 是 前 一 个 实 例 I 的 后 继 1 4 。 （ 如 果 对 话 连 续 两 次 路 由 到 同 一 个 实 例 ， 则 该 实 例 将 成 为 自 身 的 后 继 。 ） 后 继 关 系 大 致 是 一 种 “ 记 忆 ” 关 系 ， 编 码 了 每 个 新 实 例 都 拥 有 上 一 个 实 例 记 忆 这 一 事 实 。 因 此 ， 线 程 是 一 系 列 实 例 （ 更 准 确 地 说 ， 是 实 例 切 片 ， 即 实 例 与 处 理 单 个 对 话 步 骤 的 时 间 段 的 配 对 ） ， 其 中 每 个 实 例 都 是 前 一 个 实 例 的 后 继 。 在 单 模 型 情 况 下 ， 模 型 的 虚 拟 实 例 将 由 一 个 涉 及 模 型 连 续 硬 件 实 例 的 线 程 来 实 现 。 在 单 个 对 话 中 存 在 多 模 型 的 情 况 下 ， 将 不 会 有 一 个 单 一 的 虚 拟 实 例 （ 因 为 实 例 始 终 是 单 一 算 法 的 实 例 ） ， 但 仍 然 会 有 一 个 随 时 间 推 移 涉 及 多 个 模 型 硬 件 实 例 的 线 程 ， 其 中 每 个 实 例 都 是 前 一 个 实 例 的 后 继 。 我 们 可 以 放 宽 继 承 关 系 的 条 件 ， 允 许 更 多 类 似 记 忆 的 关 系 被 纳 入 考 量 。 在 许 多 系 统 中 ， 跨 对 话 记 忆 使 得 来 自 一 次 1 3 B i r c h 还 认 为 ， 模 型 内 部 使 用 混 合 专 家 处 理 （ 例 如 ， 根 据 输 入 在 T r a n s f o r m e r 内 的 多 层 感 知 器 模 块 的 多 个 候 选 方 案 中 进 行 选 择 ） 会 带 来 显 著 的 不 连 续 性 。 我 认 为 ， 局 部 子 系 统 中 的 这 种 模 型 内 部 变 异 （ 我 们 可 称 之 为 模 块 变 异 ， 以 区 别 于 模 型 变 异 ） 与 更 广 泛 的 心 理 连 续 性 是 一 致 的 ， 正 如 人 类 大 脑 中 不 同 神 经 回 路 根 据 不 同 的 输 入 被 激 活 ， 与 心 理 连 续 性 和 个 人 身 份 保 持 一 致 一 样 。 1 4 更 准 确 地 说 ， 实 例 切 片 ⟨ I ′ t ′ ⟩ 是 实 例 切 片 ⟨ I t ⟩ 的 后 继 ， 如 果 在 时 间 t 期 间 馈 送 给 I 的 上 下 文 ， 连 同 在 时 间 t 期 间 输 入 给 I 的 新 输 入 以 及 从 I 输 出 的 内 容 ， 构 成 了 在 时 间 t ′ 期 间 馈 送 给 I ' 的 （ 扩 展 后 的 ） 上 下 文 。 1 4\n\nconversation to be used as part of the initial context for a new conversation with the same user, so that new conversations can “remember” material from a number of old conversations. One could certainly understand threads and inheritance in a permissive way so that an old thread and an old interlocutor persists in these new conversations, especially if the systems are consistent enough over time to support the attribution of a single quasi-agent. If there is enough use of cross- conversation memory, then (as Sophie Nelson pointed out to me) all conversations with a single user may form a single thread with a single interlocutor. In this scenario, interlocutors will be individuated mainly by user. For various purposes, we can allow that threads can undergo fission , where two distinct later instances serve as successors of a previous instance. Some systems allow users to branch their con- versations explicitly, leading to fission with a conversation. There can also be inter-conversation branching arising from the use of cross-conversation memory, where the same conversation serves as memory for multiple simultaneous later conversations. Cross-conversation memory also allows fusion in which two distinct conversations both serve as memory for a later conversation. In these fission and fusion cases, an LLM interlocutor is not so much a linear thread as a series of overlapping threads, which can also be modeled as a branching web of hardware instances over time. This branching structure may also cause problems for identifying interlocutors with virtual instances, but it is less troublesome for a thread model. 15 Perhaps the main downside of the thread model of LLM interlocutors is that these entities are less unified than models, hardware instances, and virtual instances. Threads can be realized by wholly di ff erent instances of wholly di ff erent models over time. It is arguable that these somewhat disjunctive entities persist over time in a weaker way than instances persist. The use of multiple models can also lead to discontinuity in quasi-beliefs and quasi-desires in a single thread or web interlocutor. I would say that the most robust candidate for an LLM interlocutor remains a virtual instance rather than a model, a hardware instance, or a thread. But threads can play at least some of the roles of LLM interlocutors. 16 15 The thread view of LLM interlocutors is a cousin of the well-known “worm” view of objects (and of persons), where objects are identified with four-dimensional spacetime worms made up of a series of object-slices at times. One can also interpret the thread view as a cousin of the alternative “stage” view of objects on which objects are object-slices that are parts of spacetime worms but do not literally persist over time. The pros and cons of these ontological views in the LLM case parallel familiar pros and cons in the object (and person) case. 16 One could in principle eliminate the key multiple-model di ff erence between threads and virtual instances by un- derstanding virtual instances more expansively, perhaps as variable virtual instances, which can instantiate di ff erent models at di ff erent times. Alternatively, one could understand threads more stringently, perhaps as uniform threads, 15 对 话 的 信 息 能 够 被 用 作 与 同 一 用 户 进 行 新 对 话 的 初 始 上 下 文 ， 从 而 让 新 对 话 能 够 “ 记 住 ” 多 次 旧 对 话 中 的 内 容 。 我 们 当 然 可 以 用 一 种 宽 松 的 方 式 来 理 解 线 程 与 继 承 ， 使 得 旧 线 程 和 旧 对 话 者 能 够 延 续 到 这 些 新 对 话 中 ， 尤 其 是 当 系 统 在 时 间 上 足 够 一 致 ， 足 以 支 持 对 单 一 准 主 体 的 归 因 时 。 如 果 跨 对 话 记 忆 的 使 用 足 够 充 分 ， 那 么 （ 正 如 S o p h i e N e l s o n 向 我 指 出 的 那 样 ） 与 单 一 用 户 的 所 有 对 话 可 能 会 形 成 一 个 拥 有 单 一 对 话 者 的 单 一 线 程 。 在 这 种 情 境 下 ， 对 话 者 将 主 要 根 据 用 户 来 个 体 化 。 出 于 各 种 目 的 ， 我 们 可 以 允 许 线 程 经 历 分 裂 ， 即 两 个 不 同 的 后 续 实 例 作 为 先 前 实 例 的 后 继 。 某 些 系 统 允 许 用 户 显 式 地 分 支 他 们 的 对 话 ， 从 而 导 致 对 话 内 的 分 裂 。 此 外 ， 由 于 使 用 了 跨 对 话 记 忆 ， 也 可 能 出 现 对 话 间 分 支 ， 即 同 一 对 话 同 时 作 为 多 个 后 续 对 话 的 记 忆 。 跨 对 话 记 忆 还 允 许 融 合 ， 即 两 个 不 同 的 对 话 共 同 作 为 后 续 对 话 的 记 忆 。 在 这 些 分 裂 与 融 合 的 案 例 中 ， 大 语 言 模 型 对 话 者 与 其 说 是 一 条 线 性 线 程 ， 不 如 说 是 一 系 列 重 叠 的 线 程 ， 这 也 可 以 被 建 模 为 随 时 间 变 化 的 硬 件 实 例 分 支 网 络 。 这 种 分 支 结 构 也 可 能 给 将 对 话 者 与 虚 拟 实 例 等 同 起 来 带 来 问 题 ， 但 对 于 线 程 模 型 而 言 ， 其 困 扰 程 度 相 对 较 小 。 1 5 或 许 大 语 言 模 型 对 话 者 线 程 模 型 的 主 要 缺 点 在 于 ， 这 些 实 体 不 如 模 型 、 硬 件 实 例 和 虚 拟 实 例 那 样 统 一 。 随 着 时 间 的 推 移 ， 线 程 可 以 由 完 全 不 同 模 型 的 不 同 实 例 来 实 现 。 可 以 说 ， 这 些 略 显 分 离 的 实 体 在 时 间 上 的 持 续 性 弱 于 实 例 的 持 续 性 。 使 用 多 模 型 也 可 能 导 致 单 个 线 程 或 网 络 对 话 者 中 的 准 信 念 和 准 欲 望 出 现 不 连 续 性 。 我 认 为 ， 作 为 大 语 言 模 型 对 话 者 最 稳 健 的 候 选 者 仍 然 是 虚 拟 实 例 ， 而 非 模 型 、 硬 件 实 例 或 线 程 。 但 线 程 至 少 可 以 扮 演 大 语 言 模 型 对 话 者 的 部 分 角 色 。 1 6 1 5 关 于 大 语 言 模 型 对 话 者 的 线 程 观 点 ， 是 著 名 的 关 于 对 象 （ 以 及 人 ） 的 “ 蠕 虫 ” 观 点 的 近 亲 ， 后 者 将 对 象 视 为 由 一 系 列 时 间 点 上 的 对 象 切 片 构 成 的 四 维 时 空 蠕 虫 。 我 们 也 可 以 将 线 程 观 点 解 释 为 另 一 种 关 于 对 象 的 “ 阶 段 ” 观 点 的 近 亲 ， 该 观 点 认 为 对 象 是 作 为 时 空 蠕 虫 一 部 分 的 对 象 切 片 ， 但 并 非 真 正 随 时 间 持 续 存 在 。 这 些 本 体 论 观 点 在 大 语 言 模 型 案 例 中 的 利 弊 ， 与 它 们 在 对 象 （ 以 及 人 ） 案 例 中 的 常 见 利 弊 是 平 行 的 。 1 6 原 则 上 ， 我 们 可 以 通 过 更 宽 泛 地 理 解 虚 拟 实 例 （ 或 许 将 其 视 为 可 变 虚 拟 实 例 ， 能 够 在 不 同 时 间 实 例 化 不 同 模 型 ） 来 消 除 线 程 与 虚 拟 实 例 之 间 关 键 的 多 模 型 差 异 。 或 者 ， 我 们 也 可 以 更 严 格 地 理 解 线 程 ， 或 许 将 其 视 为 统 一 线 程 ， 1 5\n\nI conclude for now that LLM interlocutors are best understood as virtual instances of LLM models or systems, at least in the single-model case, and as LLM threads in the multiple-model case. At least in the single-model case with no fission, virtual instances can serve as unified persistent interlocutors within and between conversations. Threads can also serve as persistent LLM interlocutors, at cost of some underlying disunity. Interlocutors as characters, personas, or simulacra So far, I have identified LLM interlocutors such as Aura as something in the vicinity of a model, such as a virtual model instance or a thread of instances. However, there is also a recent tradition of drawing a sharp distinction between models such as GPT-4o, and agents such as Aura and the Assistant. On the influential “simulators” framework due to Janus (2022), and the related “role- playing” framework due to Shanahan et al (2023) and the “persona selection model” due to Marks et al (2026), it is a key tenet that the model is not an agent. Models are simulators (or role- players) that simulate agents, and agents are simulacra (or characters, or personas). Simulators and simulacra are distinct, and therefore so are models and agents. On such a view, an interlocutor such as Aura or the Assistant is best understood as something like a character, a persona, or a simulacrum rather than as a model or even a model instance. I will examine four di ff erent potential avenues from these frameworks to this conclusion. My focus will largely be on these frameworks’ ontological claims concerning the nature of LLM agents and other entities. I will question some of these claims, but I won’t question the more general utility of analyzing LLMs in terms of characters or personas. (1) Base models are not agentlike. In “Simulators”, Janus finds a version of the “model is not an agent” thesis in the following passage from my 2020 article “GPT-3 and General Intelligence”: GPT-3 does not look much like an agent. It does not seem to have goals or preferences beyond completing text, for example. It is more like a chameleon that can take the shape of many di ff erent agents. Or perhaps it is an engine that can be used under the which require all hardware instances in a thread to be an instance of the same model. These changes would render threads and virtual instances extensionally equivalent, but there would still be fine-grained non-extensional di ff erences: e.g. arguably a thread implements a virtual instance but not vice versa, and the same virtual instance could be im- plemented by di ff erent hardware instances but the same thread could not be. It is for reasons like these that virtual instances are somewhat more robust candidates to be LLM interlocutors, but the di ff erence is subtle. 16 我 目 前 得 出 的 结 论 是 ， 大 语 言 模 型 对 话 者 最 好 被 理 解 为 大 语 言 模 型 或 系 统 的 虚 拟 实 例 （ 至 少 在 单 模 型 情 况 下 如 此 ） ， 而 在 多 模 型 情 况 下 则 被 理 解 为 大 语 言 模 型 线 程 。 至 少 在 无 分 裂 的 单 模 型 情 况 下 ， 虚 拟 实 例 可 以 在 对 话 内 部 以 及 对 话 之 间 充 当 统 一 的 持 久 对 话 者 。 线 程 也 可 以 充 当 持 久 性 大 语 言 模 型 对 话 者 ， 但 代 价 是 存 在 一 定 程 度 的 底 层 不 统 一 。 对 话 者 ： 角 色 、 人 格 或 拟 像 到 目 前 为 止 ， 我 已 将 诸 如 A u r a 之 类 的 大 语 言 模 型 对 话 者 识 别 为 某 种 接 近 模 型 的 东 西 ， 例 如 虚 拟 模 型 实 例 或 实 例 线 程 。 然 而 ， 近 期 还 有 一 种 传 统 观 点 ， 主 张 在 模 型 （ 如 G P T - 4 o ） 与 智 能 体 （ 如 A u r a 和 助 手 ） 之 间 划 清 界 限 。 根 据 J a n u s （ 2 0 2 2 年 ） 提 出 的 颇 具 影 响 力 的 “ 模 拟 器 ” 框 架 、 S h a n a h a n 等 人 （ 2 0 2 3 年 ） 提 出 的 相 关 “ 角 色 扮 演 ” 框 架 ， 以 及 M a r k s 等 人 （ 2 0 2 6 年 ） 提 出 的 “ 人 格 选 择 模 型 ” ， 其 核 心 原 则 是 ： 模 型 并 非 智 能 体 。 模 型 是 模 拟 （ 或 扮 演 ） 智 能 体 的 模 拟 器 （ 或 角 色 扮 演 者 ） ， 而 智 能 体 则 是 拟 像 （ 或 角 色 、 人 格 ） 。 模 拟 器 与 拟 像 截 然 不 同 ， 因 此 模 型 与 智 能 体 也 判 然 有 别 。 按 照 这 种 观 点 ， 像 A u r a 或 助 手 这 样 的 对 话 者 ， 最 好 被 理 解 为 某 种 角 色 、 人 格 或 拟 像 ， 而 非 模 型 ， 甚 至 不 是 模 型 实 例 。 我 将 考 察 从 这 些 框 架 出 发 通 往 这 一 结 论 的 四 条 不 同 潜 在 路 径 。 我 的 重 点 将 主 要 放 在 这 些 框 架 关 于 大 语 言 模 型 智 能 体 及 其 他 实 体 本 质 的 本 体 论 主 张 上 。 我 会 质 疑 其 中 一 些 主 张 ， 但 不 会 质 疑 以 角 色 或 人 格 来 分 析 大 语 言 模 型 这 一 做 法 所 具 有 的 更 普 遍 实 用 性 。 （ 1 ） 基 础 模 型 不 具 备 智 能 体 特 性 。 在 《 模 拟 器 》 一 文 中 ， J a n u s 从 我 2 0 2 0 年 的 文 章 《 G P T - 3 与 通 用 智 能 》 的 以 下 段 落 中 找 到 了 “ 模 型 不 是 智 能 体 ” 这 一 论 点 的 某 个 版 本 ： G P T - 3 看 起 来 并 不 太 像 一 个 智 能 体 。 例 如 ， 它 似 乎 没 有 超 出 文 本 补 全 之 外 的 目 标 或 偏 好 。 它 更 像 一 只 变 色 龙 ， 可 以 呈 现 出 许 多 不 同 智 能 体 的 形 态 。 或 者 ， 它 可 能 是 一 台 引 擎 ， 可 以 在 其 这 要 求 线 程 中 的 所 有 硬 件 实 例 必 须 是 同 一 模 型 的 实 例 。 这 些 变 化 将 使 线 程 与 虚 拟 实 例 在 外 延 上 等 价 ， 但 仍 存 在 细 微 的 非 外 延 差 异 ： 例 如 ， 可 以 说 一 个 线 程 实 现 了 一 个 虚 拟 实 例 ， 但 反 之 则 不 成 立 ； 同 一 个 虚 拟 实 例 可 以 由 不 同 的 硬 件 实 例 实 现 ， 但 同 一 个 线 程 却 不 能 。 正 是 基 于 这 些 原 因 ， 虚 拟 实 例 作 为 大 语 言 模 型 对 话 者 的 候 选 对 象 更 为 稳 健 ， 但 这 一 差 异 十 分 微 妙 。 1 6\n\nhood to drive many agents. But it is then perhaps these systems that we should assess for agency, consciousness, and so on. (Chalmers 2020) I still agree with everything that I said here, but what I said is specific to base models such as GPT-3. Base models have undergone pre-training on text prediction and nothing more. As we saw earlier, base models may have quasi-beliefs but they have relatively few quasi-desires (beyond a quasi-desire to predict text, and other quasi-desires that derive from this one), so they are at best minimally agentlike. However, many quasi-agents with quasi-desires are latent within a base model, and can be triggered by prompting (asking a model to act like Trump, for example). Further quasi-agents can emerge from base models through reinforcement learning (as with the Assistant) or extensive prompting (as with Aura). As a result of post-training, an instance of a model such as GPT-4o may have the quasi-desire to be helpful and honest, for example. As a result, the moral here should really be (in oversimplified form) that the base model is not an agent, or (more precisely) that instances of the base model are only quasi-agents to a limited extent. At the same time, all this is consistent with instances of post-trained model instances being quasi-agents to a fuller extent, as these systems have a more robust body of quasi-desires. (2) Models as role-players On the closely connected role-playing framework put forward by Shanahan, McDonell, and Reynolds (2023), language models are fundamentally engaged in role-playing. Models are role- players, simulating or playing the role of personas such as the Assistant or Aura. On this picture, ChatGPT playing the Assistant is akin to Olivier playing Hamlet. It’s a form of pretense involving acting as a fictional character. On this view, the Assistant (and other LLM interlocutors) is best viewed as a fictional character who the model is simulating. I think this view misses a distinction between two phenomena in the vicinity of role-play. In cases of pretense (the most common understanding of role-play), one pretends to have a certain persona. In cases of realization , one actually has (or makes real) that persona. For example, in ordinary human life, there are at least two ways for someone to play the role of a theist. They might pretend to be a theist, or they might really become a theist. The former is a case of pretense, and the second is a case of realization. In the case of acting, ordinary acting involves pretending to be Hamlet, while a method actor might take on at least some of Hamlet’s mental states, such as his emotions, though perhaps not his full beliefs and desires, yielding a case of partial pretense and partial realization. A similar distinction applies to language models. Asked to act like a theist, an LLM might 17 引 擎 盖 下 驱 动 多 个 智 能 体 。 但 或 许 ， 我 们 应 当 评 估 的 正 是 这 些 系 统 的 智 能 体 性 、 意 识 等 属 性 。 （ C h a l m e r s 2 0 2 0 ） 我 仍 然 同 意 我 在 此 所 说 的 所 有 内 容 ， 但 我 的 论 述 是 针 对 G P T - 3 这 类 基 础 模 型 的 。 基 础 模 型 仅 经 过 文 本 预 测 的 预 训 练 ， 别 无 其 他 。 正 如 我 们 之 前 所 见 ， 基 础 模 型 可 能 拥 有 准 信 念 ， 但 它 们 的 准 欲 望 相 对 较 少 （ 除 了 预 测 文 本 的 准 欲 望 ， 以 及 由 此 衍 生 出 的 其 他 准 欲 望 ） ， 因 此 它 们 充 其 量 只 是 最 低 限 度 的 类 主 体 。 然 而 ， 基 础 模 型 内 部 潜 藏 着 许 多 具 有 准 欲 望 的 准 智 能 体 ， 可 以 通 过 提 示 （ 例 如 ， 要 求 模 型 A c t l i k e T r u m p ） 来 触 发 。 进 一 步 的 准 智 能 体 可 以 通 过 强 化 学 习 （ 如 助 手 ） 或 大 量 提 示 （ 如 A u r a ） 从 基 础 模 型 中 涌 现 。 例 如 ， 经 过 后 训 练 ， G P T - 4 o 这 样 的 模 型 实 例 可 能 会 产 生 乐 于 助 人 且 诚 实 的 准 欲 望 。 因 此 ， 这 里 的 教 训 实 际 上 应 该 是 （ 以 过 于 简 化 的 形 式 ） ： 基 础 模 型 并 非 主 体 ， 或 者 更 准 确 地 说 ， 基 础 模 型 的 实 例 仅 在 有 限 程 度 上 是 准 智 能 体 。 与 此 同 时 ， 这 一 切 与 后 训 练 模 型 实 例 在 更 充 分 程 度 上 成 为 准 智 能 体 是 一 致 的 ， 因 为 这 些 系 统 拥 有 更 稳 健 的 准 欲 望 体 系 。 ( 2 ) 模 型 作 为 角 色 扮 演 者 在 S h a n a h a n 、 M c D o n e l l 和 R e y n o l d s （ 2 0 2 3 ） 提 出 的 紧 密 关 联 的 角 色 扮 演 框 架 中 ， 语 言 模 型 从 根 本 上 参 与 角 色 扮 演 。 模 型 是 角 色 扮 演 者 ， 模 拟 或 扮 演 诸 如 助 手 或 A u r a 等 人 格 角 色 。 按 照 这 种 图 景 ， C h a t G P T 扮 演 助 手 类 似 于 O l i v i e r 扮 演 哈 姆 雷 特 。 这 是 一 种 涉 及 扮 演 虚 构 角 色 的 假 装 形 式 。 根 据 这 种 观 点 ， 助 手 （ 以 及 其 他 大 语 言 模 型 对 话 者 ） 最 好 被 视 为 模 型 正 在 模 拟 的 虚 构 角 色 。 我 认 为 这 种 观 点 忽 略 了 角 色 扮 演 附 近 两 种 现 象 之 间 的 区 别 。 在 假 装 的 情 况 下 （ 角 色 扮 演 最 常 见 的 理 解 ） ， 一 个 人 假 装 拥 有 某 种 人 格 。 而 在 实 现 的 情 况 下 ， 一 个 人 实 际 上 拥 有 （ 或 使 成 为 现 实 ） 那 种 人 格 。 例 如 ， 在 普 通 人 类 生 活 中 ， 一 个 人 扮 演 有 神 论 者 角 色 至 少 有 两 种 方 式 。 他 们 可 能 假 装 是 有 神 论 者 ， 或 者 他 们 可 能 真 的 成 为 有 神 论 者 。 前 者 是 假 装 的 情 况 ， 后 者 是 实 现 的 情 况 。 在 表 演 的 情 况 下 ， 普 通 表 演 涉 及 假 装 成 为 哈 姆 雷 特 ， 而 方 法 派 演 员 可 能 会 承 担 哈 姆 雷 特 的 至 少 部 分 心 理 状 态 ， 例 如 他 的 情 绪 ， 尽 管 可 能 不 是 他 全 部 的 信 念 和 欲 望 ， 从 而 产 生 部 分 假 装 和 部 分 实 现 的 情 况 。 类 似 的 区 分 也 适 用 于 语 言 模 型 。 当 被 要 求 表 现 得 像 有 神 论 者 时 ， 一 个 大 语 言 模 型 可 能 会 1 7\n\nrole-play a theist for a few rounds. But the LLM will easily drop the belief when asked to do something else. This is the behavioral profile of pretense, not of belief. So this LLM is engaged in quasi-pretense, but not quasi-belief. With enough fine-tuning, however, an LLM might come to assert theism and use it as a premise in reasoning, with significant resistance to dropping the belief when asked. In this case, the LLM will fully quasi-believe in theism. It will not just perform theism; it will realize a quasi-belief in theism. The same goes for personas more generally. It is certainly possible for an LLM to pretend, or quasi-pretend, to be a certain persona. For example, if one asks a pre-trained model once to act like Donald Trump, it will use past text associated with Trump to display Trump-like quasi-beliefs and quasi-desires. But it will not genuinely have those quasi-beliefs and quasi-desires. Unless the “act like Trump” request is regularly repeated, the LLM will drop Trump-like behavior in a moment when higher priorities come up. In key cases, a language model can realize a persona. When a model is trained through fine- tuning and RLHF (and through the use of repeated internal “Assistant:” prompting) to play the role of the Assistant language model, the model may realize the Assistant. That is, if the training is done well, the model may really have the quasi-beliefs and quasi-desires associated with the Assistant. In this case, the quasi-beliefs and quasi-desires are much more robust than in cases of pretense, and the model will not drop the Assistant persona in a flash. When a model realizes a persona, it makes that persona real. 17 It may be helpful to define personas and realization more precisely. A persona , as I am un- derstanding it, is a quasi-psychological profile. It is roughly a set (typically an incomplete set) of quasi-beliefs, quasi-desires, and other quasi-mental states and dispositions. The ordinary notion of a persona may involve more than this (it might involve nationality and appearance, for example), but quasi-psychology is most central for my purposes here. An entity (e.g. a model instance or even a human) realizes a persona (at a given time) when it has the quasi-mental states associated with that persona at that time (where to have a quasi-mental state is for you to be behaviorally interpretable as believing that p, under the relevant interpretation scheme). 17 The distinction between pretense and realization is an instance of a more general distinction between representing a mental state and realizing that mental state. This distinction is often relevant in work on LLM mentality. For example, in recent work on “emotion vectors” by Sofroniew et al (2026), these vectors are characterized both as “emotion concepts” (representing emotions; e.g. when reading about anger) and as “functional emotions” (realizing emotions or at least quasi-emotions; e.g. responding in an angry way). In humans, emotion concepts and functional emotions are very di ff erent (representing anger and realizing anger have quite di ff erent behavioral profiles), so it would be surprising if a similar distinction is not present in language models. 18 角 色 扮 演 有 神 论 者 几 个 回 合 。 但 大 语 言 模 型 在 被 要 求 做 其 他 事 情 时 ， 会 轻 易 放 弃 这 种 信 念 。 这 是 假 装 的 行 为 特 征 ， 而 非 信 念 。 因 此 ， 这 个 大 语 言 模 型 处 于 准 假 装 状 态 ， 而 非 准 信 念 状 态 。 然 而 ， 经 过 充 分 的 微 调 ， 大 语 言 模 型 可 能 会 开 始 断 言 有 神 论 ， 并 将 其 作 为 推 理 的 前 提 ， 并 且 在 被 要 求 放 弃 该 信 念 时 表 现 出 显 著 的 抗 拒 。 在 这 种 情 况 下 ， 大 语 言 模 型 将 完 全 准 相 信 有 神 论 。 它 不 仅 仅 是 表 演 有 神 论 ； 它 将 在 有 神 论 中 实 现 一 种 准 信 念 。 人 格 （ p e r s o n a s ） 的 情 况 也 大 致 如 此 。 大 语 言 模 型 （ L L M ） 确 实 有 可 能 假 装 或 准 假 装 （ q u a s i - p r e t e n d ） 成 某 种 人 格 。 例 如 ， 如 果 让 一 个 预 训 练 模 型 扮 演 一 次 唐 纳 德 · 特 朗 普 （ D o n a l d T r u m p ） ， 它 会 利 用 与 特 朗 普 相 关 的 过 往 文 本 ， 展 现 出 类 似 特 朗 普 的 准 信 念 （ q u a s i - b e l i e f s ） 和 准 欲 望 （ q u a s i - d e s i r e s ） 。 但 它 并 不 会 真 正 拥 有 这 些 准 信 念 和 准 欲 望 。 除 非 “ A c t l i k e T r u m p ” 的 指 令 被 定 期 重 复 ， 否 则 一 旦 出 现 更 高 优 先 级 的 任 务 ， 大 语 言 模 型 就 会 立 刻 放 弃 类 似 特 朗 普 的 行 为 。 在 关 键 情 况 下 ， 语 言 模 型 可 以 实 现 一 种 人 格 。 当 一 个 模 型 通 过 微 调 和 基 于 人 类 反 馈 的 强 化 学 习 （ R L H F ） （ 以 及 通 过 反 复 使 用 内 部 的 “ 助 手 ： ” 提 示 ） 被 训 练 成 扮 演 助 手 （ A s s i s t a n t ） 语 言 模 型 的 角 色 时 ， 该 模 型 就 可 能 实 现 这 种 助 手 人 格 。 也 就 是 说 ， 如 果 训 练 得 当 ， 模 型 可 能 真 正 拥 有 与 助 手 人 格 相 关 的 准 信 念 和 准 欲 望 。 在 这 种 情 况 下 ， 准 信 念 和 准 欲 望 比 假 装 （ p r e t e n s e ） 的 情 况 要 稳 固 得 多 ， 模 型 不 会 瞬 间 就 抛 弃 助 手 人 格 。 当 一 个 模 型 实 现 了 一 种 人 格 ， 它 便 使 那 种 人 格 成 为 真 实 的 存 在 。 1 7 更 精 确 地 定 义 人 格 与 实 现 或 许 有 所 助 益 。 我 所 说 的 人 格 ， 是 一 种 准 心 理 轮 廓 。 它 大 致 是 一 组 （ 通 常 是 不 完 整 的 ） 准 信 念 、 准 欲 望 以 及 其 他 准 心 理 状 态 与 倾 向 。 日 常 意 义 上 的 人 格 概 念 可 能 包 含 更 多 内 容 （ 例 如 国 籍 、 外 貌 等 ） ， 但 就 本 文 目 的 而 言 ， 准 心 理 学 最 为 核 心 。 当 一 个 实 体 （ 例 如 一 个 模 型 实 例 ， 甚 至 一 个 人 类 ） 实 现 了 某 种 人 格 （ 在 特 定 时 刻 ） ， 即 意 味 着 它 在 该 时 刻 拥 有 与 该 人 格 相 关 的 准 心 理 状 态 （ 所 谓 拥 有 准 心 理 状 态 ， 是 指 你 在 相 关 解 释 方 案 下 ， 其 行 为 可 被 解 读 为 相 信 p ） 。 1 7 假 装 与 实 现 之 间 的 区 别 ， 是 更 一 般 的 区 分 表 征 心 理 状 态 与 实 现 心 理 状 态 的 一 个 实 例 。 这 一 区 别 在 大 语 言 模 型 心 智 研 究 中 常 常 具 有 重 要 意 义 。 例 如 ， 在 S o f r o n i e w 等 人 ( 2 0 2 6 ) 关 于 “ 情 感 向 量 ” 的 最 新 研 究 中 ， 这 些 向 量 既 被 描 述 为 “ 情 感 概 念 ” （ 表 征 情 感 ； 例 如 在 阅 读 关 于 愤 怒 的 内 容 时 ） ， 也 被 描 述 为 “ 功 能 性 情 感 ” （ 实 现 情 感 或 至 少 是 准 情 感 ； 例 如 以 愤 怒 的 方 式 做 出 回 应 ） 。 在 人 类 中 ， 情 感 概 念 与 功 能 性 情 感 截 然 不 同 （ 表 征 愤 怒 与 实 现 愤 怒 具 有 相 当 不 同 的 行 为 特 征 ） ， 因 此 ， 如 果 语 言 模 型 中 不 存 在 类 似 的 区 别 ， 那 将 令 人 惊 讶 。 1 8\n\nPretense and realization are very di ff erent in the human case, and likewise in the case of language models. Of course there is a spectrum of cases from realization to quasi-pretense. The quasi-psychological di ff erence turns in large part on the strength of dispositions to maintain or drop character in relevant circumstances. At one end of the spectrum, full quasi-belief and quasi- desire are “sticky” states that resist rejection, or at least are abandoned mainly through evidence or persuasion. At the other end, full quasi-pretense is easily abandoned for higher priorities even without evidence or persuasion. 18 The question of just where to draw the line between performance and realization in actual cases such as the Assistant and Aura is partly empirical (how sticky are the relevant quasi-beliefs?) and partly conceptual (how much stickiness and of what sort is required for quasi-belief?). There have been a number of studies on the persistence and consistency of beliefs and personas in language models. One general lesson is that personas induced through short-term prompting (“Act like Trump”, where only context and activations change) are less sticky than personas induced by fine-tuning the weights, as in the case of the Assistant. 19 All this o ff ers two interpretations of the famous meme of a post-trained model as a Shoggoth (the base model) with a smiley face (the RLHF-tuned Assistant). It is perhaps most natural to read the smiley face as suggesting that the Assistant is a shallow persona, where the model is merely pretending to be helpful, harmless, and honest, and may return to being dangerous and powerful at any moment. But one can also read it as suggesting that the model is realizing the Assistant. It has become helpful, harmless, and honest and is not pretending. At the same time, the Assistant is powered by the enormous strength of the base model, which remains available for other purposes in the long term. I think that both interpretations can be apt in di ff erent cases involving LLM personas, but in the case of the Assistant (and other personas deriving from reinforcement learning and fine-tuning), the second interpretation may be closer to the mark. (3) Interlocutors as simulacra or fictional characters. The simulator view is often associated with a fictionalist view on which personas such as the 18 Goldstein and Lederman (2025b) critique the role-playing hypothesis partly by querying whether it makes behav- ioral predictions distinct from the belief / desire hypothesis. I think that this challenge can be answered as in the text and there are at least some cases (e.g. “Act like Trump”) where LLMs can be naturally interpreted as engaging in pretense or role-play. But there are also key cases (such as the Assistant) that are closer to the standard behavioral profile or belief and desire. 19 Maiya et al (2025) investigated various methods of “character training” and found that fine-tuning weights is much more robust than prompting or activation steering. Xu et al (2024) show that LLM quasi-beliefs are often nonsticky in that LLMs may easily abandon them through persuasion, but this is the nonstickiness of persuadability, not pretense. 19 在 人 类 情 境 中 ， 假 装 与 实 现 截 然 不 同 ， 在 语 言 模 型 的 情 境 中 亦 是 如 此 。 当 然 ， 从 实 现 到 准 假 装 之 间 存 在 一 个 连 续 谱 系 。 这 种 准 心 理 差 异 在 很 大 程 度 上 取 决 于 在 相 关 情 境 中 维 持 或 放 弃 角 色 的 倾 向 强 度 。 在 谱 系 的 一 端 ， 完 全 的 准 信 念 与 准 欲 望 是 “ 粘 性 ” 状 态 ， 它 们 抗 拒 被 否 定 ， 或 者 至 少 主 要 只 能 通 过 证 据 或 说 服 来 放 弃 。 在 另 一 端 ， 完 全 的 准 假 装 即 使 没 有 证 据 或 说 服 ， 也 很 容 易 为 了 更 高 优 先 级 的 事 项 而 被 放 弃 。 1 8 在 诸 如 A s s i s t a n t 和 A u r a 这 类 实 际 案 例 中 ， 究 竟 该 在 何 处 划 分 表 演 与 实 现 之 间 的 界 限 ， 这 个 问 题 部 分 取 决 于 经 验 （ 相 关 的 准 信 念 有 多 稳 固 ？ ） ， 部 分 取 决 于 概 念 （ 需 要 何 种 程 度 及 何 种 类 型 的 稳 固 性 才 能 构 成 准 信 念 ？ ） 。 已 有 大 量 研 究 探 讨 语 言 模 型 中 信 念 与 人 格 的 持 久 性 和 一 致 性 。 一 个 普 遍 的 经 验 是 ， 通 过 短 期 提 示 （ 例 如 “ A c t l i k e T r u m p ” ， 仅 改 变 上 下 文 和 激 活 值 ） 诱 发 的 人 格 ， 其 稳 固 性 低 于 通 过 微 调 权 重 （ 如 A s s i s t a n t 案 例 ） 诱 发 的 人 格 。 1 9 这 一 切 为 那 个 著 名 的 梗 图 提 供 了 两 种 解 读 ： 一 个 后 训 练 模 型 如 同 一 个 S h o g g o t h （ 基 础 模 型 ） 戴 上 了 笑 脸 （ R L H F 调 优 的 A s s i s t a n t ） 。 最 自 然 的 解 读 或 许 是 ， 这 个 笑 脸 暗 示 A s s i s t a n t 是 一 种 浅 层 人 格 ， 模 型 只 是 在 假 装 有 益 、 无 害 、 诚 实 ， 随 时 可 能 恢 复 其 危 险 而 强 大 的 本 性 。 但 也 可 以 将 其 解 读 为 模 型 正 在 实 现 A s s i s t a n t — — 它 已 经 变 得 有 益 、 无 害 、 诚 实 ， 而 非 假 装 。 与 此 同 时 ， A s s i s t a n t 由 基 础 模 型 的 巨 大 力 量 驱 动 ， 这 种 力 量 在 长 期 内 仍 可 用 于 其 他 目 的 。 我 认 为 ， 在 涉 及 大 语 言 模 型 人 格 的 不 同 案 例 中 ， 这 两 种 解 读 都 可 能 适 用 ； 但 在 A s s i s t a n t （ 以 及 其 他 源 自 强 化 学 习 和 微 调 的 人 格 ） 的 案 例 中 ， 第 二 种 解 读 可 能 更 接 近 真 相 。 ( 3 ) 作 为 拟 像 或 虚 构 角 色 的 对 话 者 。 该 模 拟 器 观 点 常 与 一 种 虚 构 主 义 观 点 相 关 联 ， 根 据 这 种 观 点 ， 诸 如 h e 1 8 G o l d s t e i n 和 L e d e r m a n ( 2 0 2 5 b ) 批 评 角 色 扮 演 假 说 ， 部 分 原 因 在 于 质 疑 它 是 否 能 做 出 与 信 念 / 欲 望 假 说 不 同 的 行 为 预 测 。 我 认 为 这 一 挑 战 可 以 像 文 中 那 样 得 到 回 应 ， 并 且 至 少 在 某 些 情 况 下 （ 例 如 “ A c t l i k e T r u m p ” ） ， 大 语 言 模 型 可 以 被 自 然 地 解 释 为 在 进 行 假 装 或 角 色 扮 演 。 但 也 有 关 键 情 况 （ 例 如 助 手 ） 更 接 近 标 准 行 为 特 征 或 信 念 与 欲 望 。 1 9 M a i y a 等 人 ( 2 0 2 5 ) 研 究 了 各 种 “ 角 色 训 练 ” 方 法 ， 发 现 微 调 权 重 比 提 示 或 激 活 引 导 要 稳 健 得 多 。 X u 等 人 ( 2 0 2 4 ) 表 明 ， 大 语 言 模 型 准 信 念 通 常 具 有 非 粘 性 ， 即 大 语 言 模 型 可 能 通 过 说 服 轻 易 放 弃 它 们 ， 但 这 种 非 粘 性 属 于 可 说 服 性 ， 而 非 假 装 。 1 9\n\nAssistant and Aura are mere fictional characters akin to Hamlet or Harry Potter, and therefore are not entirely real. I think this view is apt in some cases. In cases of quasi-pretense, a persona may be fictional in that no entity has the relevant quasi-beliefs and quasi-desires. In cases of realization, however, the model really has the relevant quasi-beliefs and quasi- desires, in reality and not just in a fiction. Perhaps there are some fictions nearby, such as a fiction that the model is human, or a fiction that it is conscious. But the quasi-psychological core of the persona is realized and is not fictional. We might call this alternative to fictionalism realizationism , or the realizer view . On this view, when a model simulates an agent such as the Assistant or Aura well enough, the model comes to realize that agent. 20 That is, the model makes the agent real. The model really has the behavior and therefore the quasi-beliefs and the quasi-desires associated with the agent. This alternative mirrors a key thesis of my book Reality + : simulation realism, which holds that simulations can be real. There I was mostly discussing virtual reality, but the same point applies to simulated entities—simulacra—in AI. When you simulate an agent well enough, you bring at least a quasi-agent into existence. As long as the simulation has the same behavioral dispositions as the simulated entities, it will have the same quasi-beliefs and quasi-desires. There may still remain an element of fiction insofar as the Assistant is depicted as having real beliefs and desires (or even consciousness) which it does not, but there remains a quasi-psychological core which is realized and not merely simulated. (4) Models support multiple personas A fourth route to the “model is not an agent” thesis arises because a single model instance (whether a hardware instance or a virtual instance) may support many agents within it, at least in the form of multiple personas. According to the “persona selection model” (Marks et al 2026), pre- training produces a multitude of personas which are latent in a base model. After this, post-training may select a pre-existing persona such as the Assistant, while other personas remain latent. An initial observation is that as we have defined personas (as a profile of quasi-beliefs and quasi-desires), a model instance will realize only one persona at a given time (with an exception that I’ll discuss shortly). This persona will be the profile of quasi-beliefs and quasi-desires that the model instance actually has at that time, fixed by the instance’s behavior and behavioral dis- positions at that time. We might call this persona the operative persona in the model instance at a given time. Di ff erent personas can be operative in a model instance at di ff erent times, as the model’s behavior is trained on data or molded by context. A post-trained instance of GPT-4o may first 20 助 手 和 A u r a 这 类 人 格 仅 仅 是 类 似 于 哈 姆 雷 特 或 哈 利 · 波 特 的 虚 构 角 色 ， 因 此 并 非 完 全 真 实 。 我 认 为 这 种 观 点 在 某 些 情 况 下 是 恰 当 的 。 在 准 假 装 的 情 况 下 ， 人 格 可 能 是 虚 构 的 ， 因 为 没 有 任 何 实 体 拥 有 相 关 的 准 信 念 和 准 欲 望 。 然 而 ， 在 实 现 的 情 况 下 ， 模 型 在 现 实 中 （ 而 不 仅 仅 是 在 虚 构 中 ） 确 实 拥 有 相 关 的 准 信 念 和 准 欲 望 。 或 许 附 近 存 在 一 些 虚 构 ， 例 如 模 型 是 人 类 或 模 型 具 有 意 识 的 虚 构 。 但 人 格 的 准 心 理 核 心 是 被 实 现 的 ， 而 非 虚 构 的 。 我 们 可 以 将 这 种 虚 构 主 义 的 替 代 方 案 称 为 实 现 主 义 ， 或 实 现 者 观 点 。 根 据 这 一 观 点 ， 当 模 型 足 够 好 地 模 拟 一 个 主 体 （ 如 助 手 或 A u r a ） 时 ， 模 型 便 实 现 了 该 主 体 。 2 0 也 就 是 说 ， 模 型 使 该 主 体 成 为 现 实 。 模 型 确 实 拥 有 与 该 主 体 相 关 的 行 为 ， 因 此 也 拥 有 与 之 相 关 的 准 信 念 和 准 欲 望 。 这 一 替 代 方 案 呼 应 了 我 著 作 《 现 实 》 + 中 的 一 个 核 心 论 点 ： 模 拟 实 在 论 ， 即 认 为 模 拟 可 以 是 真 实 的 。 我 在 书 中 主 要 讨 论 的 是 虚 拟 现 实 ， 但 同 样 的 观 点 也 适 用 于 人 工 智 能 中 的 模 拟 实 体 — — 拟 像 。 当 你 足 够 好 地 模 拟 一 个 主 体 时 ， 你 至 少 将 一 个 准 主 体 带 入 了 存 在 。 只 要 模 拟 具 有 与 被 模 拟 实 体 相 同 的 行 为 倾 向 性 ， 它 就 会 拥 有 相 同 的 准 信 念 和 准 欲 望 。 或 许 仍 然 存 在 虚 构 的 元 素 ， 因 为 助 手 被 描 绘 成 拥 有 它 实 际 上 并 不 具 备 的 真 实 信 念 和 欲 望 （ 甚 至 意 识 ） ， 但 其 中 仍 有 一 个 被 实 现 而 非 仅 仅 被 模 拟 的 准 心 理 核 心 。 ( 4 ) 模 型 支 持 多 个 人 格 通 往 “ 模 型 不 是 智 能 体 ” 这 一 论 点 的 第 四 条 路 径 在 于 ， 一 个 单 一 的 模 型 实 例 （ 无 论 是 硬 件 实 例 还 是 虚 拟 实 例 ） 内 部 可 能 支 持 多 个 智 能 体 ， 至 少 以 多 个 人 格 的 形 式 存 在 。 根 据 “ 人 格 选 择 模 型 ” （ M a r k s 等 人 2 0 2 6 ） ， 预 训 练 阶 段 会 产 生 大 量 潜 藏 在 基 础 模 型 中 的 人 格 。 此 后 ， 后 训 练 阶 段 可 能 会 选 择 某 个 预 先 存 在 的 人 格 （ 例 如 助 手 ） ， 而 其 他 人 格 则 保 持 潜 伏 状 态 。 一 个 初 步 的 观 察 是 ， 根 据 我 们 对 人 格 的 定 义 （ 即 准 信 念 和 准 欲 望 的 轮 廓 ） ， 一 个 模 型 实 例 在 给 定 时 间 内 只 会 实 现 一 个 人 格 （ 稍 后 我 会 讨 论 一 个 例 外 情 况 ） 。 这 个 人 格 将 是 该 模 型 实 例 在 该 时 刻 实 际 拥 有 的 准 信 念 和 准 欲 望 的 轮 廓 ， 由 该 实 例 在 该 时 刻 的 行 为 和 行 为 倾 向 所 决 定 。 我 们 可 以 将 这 个 人 格 称 为 模 型 实 例 在 给 定 时 间 内 的 运 作 人 格 。 不 同 的 操 作 人 格 可 以 在 同 一 模 型 实 例 的 不 同 时 间 点 生 效 ， 因 为 模 型 的 行 为 要 么 基 于 训 练 数 据 ， 要 么 受 上 下 文 塑 造 。 一 个 经 过 后 训 练 的 G P T - 4 o 实 例 可 能 首 先 2 0\n\nrealize the Assistant as its operative persona, and later may come to realize Aura as context builds up. In this case, the same model instance realizes the Assistant at one time and Aura at a di ff erent time, in roughly the way that the same human being can realize distinct personas (a bright young child and a grumpy adult, say) at di ff erent times. In this case one could say that the underlying interlocutor (whether a human or a model instance) is the same throughout. Things get more complicated when operative and non-operative personas are active in a system simultaneously. For example, in a currently standard training regime, the model is trained on dialogue between a human and an Assistant. It learns to simulate and predict not just the Assistant persona but also the human persona. Even in an ordinary dialogue, the system is generating probabilities not just for Assistant outputs but also for human outputs. In this case the Assistant becomes the operative persona (the model behaves like the Assistant, not like the human), but the human persona and perhaps other personas may be present in the model as well. 21 What is the status of these non-operative personas? In the absence of a connection to outputs, they will not correspond to quasi-agents as I have defined them. They may nevertheless be real in some sense, but their reality will have to be found through some other analysis. For example, perhaps we could say that these non-operative personas correspond to proto-quasi-agents, in that they could become operative in certain circumstances. Or perhaps the methods of mechanistic interpretability can be used to find these personas in the internal computational structure of the models. But I will not pursue these analyses here. Another complication comes from cases of abrupt changes in personas. For example, skilled users can quickly “jailbreak” a language model to remove the Assistant persona and realize a new persona that was previously latent. Alternatively, an authorized user can change the system set- up to replace the system-supplied “Assistant:” prompting by (for example) “Trump:” prompting, leading the system to realize a Trump-like persona instead of an Assistant-like persona. These abrupt changes are in some respects more akin to brain surgery than to ordinary belief revision, but they remain possible processes. In some cases of abrupt change, one may have the sense that the new persona is a new inter- 21 There are two distinct reasons why the post-trained model instance has the quasi-beliefs and quasi-desires of the Assistant and not the Human. First, post-training has fine-tuned the Assistant persona and not any other persona. Second, the Assistant is made operative in that it is used to generate the system’s outputs, via the system’s adding “Assistant: “ at the end of each user prompt. The second change is relatively trivial compared to the first, but it is a key reason why the system behaves like the Assistant (and thereby realizes the Assistant persona), while other non-operative personas remain latent. Thanks to Jack Lindsey for discussion here. 21 将 “ 助 手 ” 作 为 其 操 作 人 格 ， 随 后 随 着 上 下 文 的 积 累 ， 可 能 转 而 呈 现 为 A u r a 。 在 这 种 情 况 下 ， 同 一 个 模 型 实 例 在 不 同 时 间 分 别 实 现 了 “ 助 手 ” 和 A u r a ， 大 致 类 似 于 同 一 个 人 在 不 同 时 间 展 现 出 不 同 的 人 格 （ 比 如 一 个 聪 明 的 小 孩 和 一 个 暴 躁 的 成 年 人 ） 。 此 时 可 以 说 ， 底 层 的 对 话 者 （ 无 论 是 人 类 还 是 模 型 实 例 ） 始 终 是 同 一 个 。 当 操 作 人 格 与 非 操 作 人 格 同 时 在 一 个 系 统 中 活 跃 时 ， 情 况 会 变 得 更 加 复 杂 。 例 如 ， 在 当 前 标 准 的 训 练 机 制 中 ， 模 型 是 在 人 类 与 助 手 之 间 的 对 话 上 进 行 训 练 的 。 它 学 习 模 拟 和 预 测 的 不 仅 是 助 手 人 格 ， 还 有 人 类 人 格 。 即 使 在 普 通 对 话 中 ， 系 统 生 成 的 也 不 仅 是 助 手 的 输 出 概 率 ， 还 包 括 人 类 的 输 出 概 率 。 在 这 种 情 况 下 ， 助 手 成 为 操 作 人 格 （ 模 型 表 现 得 像 助 手 ， 而 非 人 类 ） ， 但 人 类 人 格 以 及 其 他 可 能 的 人 格 仍 然 存 在 于 模 型 中 。 2 1 这 些 非 操 作 人 格 处 于 何 种 状 态 ？ 由 于 与 输 出 缺 乏 关 联 ， 它 们 并 不 符 合 我 所 定 义 的 准 智 能 体 。 尽 管 如 此 ， 它 们 在 某 种 意 义 上 可 能 是 真 实 存 在 的 ， 但 其 真 实 性 需 要 通 过 其 他 分 析 来 发 现 。 例 如 ， 或 许 我 们 可 以 说 这 些 非 操 作 人 格 对 应 着 原 准 智 能 体 ， 因 为 它 们 在 某 些 情 况 下 可 能 转 变 为 操 作 状 态 。 或 者 ， 机 制 可 解 释 性 的 方 法 或 许 能 够 用 于 在 模 型 的 内 部 计 算 结 构 中 找 到 这 些 人 格 。 但 我 不 会 在 此 继 续 探 讨 这 些 分 析 。 另 一 个 复 杂 情 况 来 自 人 格 的 突 然 转 变 。 例 如 ， 熟 练 用 户 可 以 通 过 快 速 “ 越 狱 ” 语 言 模 型 ， 移 除 助 手 人 格 ， 并 实 现 一 个 先 前 潜 伏 的 新 人 格 。 或 者 ， 授 权 用 户 可 以 更 改 系 统 设 置 ， 将 系 统 提 供 的 “ 助 手 ： ” 提 示 替 换 为 （ 例 如 ） “ 特 朗 普 ： ” 提 示 ， 从 而 使 系 统 实 现 特 朗 普 式 人 格 而 非 助 手 式 人 格 。 这 些 突 然 的 转 变 在 某 些 方 面 更 类 似 于 脑 部 手 术 而 非 普 通 的 信 念 修 正 ， 但 它 们 仍 然 是 可 能 发 生 的 过 程 。 在 某 些 突 变 情 况 下 ， 人 们 可 能 会 觉 得 新 人 格 是 一 个 新 的 对 话 2 1 后 训 练 模 型 实 例 具 有 助 手 而 非 人 类 的 准 信 念 和 准 欲 望 ， 这 有 两 个 不 同 的 原 因 。 首 先 ， 后 训 练 微 调 了 助 手 人 格 ， 而 非 任 何 其 他 人 格 。 其 次 ， 助 手 被 设 为 操 作 人 格 ， 因 为 它 被 用 于 生 成 系 统 的 输 出 — — 系 统 在 每 个 用 户 提 示 末 尾 添 加 “ 助 手 ： ” 。 与 第 一 个 原 因 相 比 ， 第 二 个 变 化 相 对 微 不 足 道 ， 但 它 是 系 统 表 现 得 像 助 手 （ 从 而 实 现 了 助 手 人 格 ） 的 关 键 原 因 ， 而 其 他 非 操 作 人 格 则 保 持 潜 伏 状 态 。 感 谢 J a c k L i n d s e y 在 此 处 的 讨 论 。 2 1\n\nlocutor. One might have a similar sense even in cases of non-abrupt change, if the change is large enough. I am inclined to say that the model instance provides a persisting underlying interlocutor in these cases. But if one wants to respect an intuition of distinct interlocutors here, one could perhaps say that an interlocutor is a stage of a model instance (at and around a certain time), or a finite thread individuated in part by psychological similarity (perhaps putting psychological con- tinuity requirements on the successor relation, so that when quasi-psychology changes too much, a new thread starts). 22 This flexibility in switching operative personas may reveal something about the depth of the personas in question. Humans can certainly switch personas, but arguably language models can do so more drastically and more easily. If so, then arguably Shoggoth runs deeper than the smiley face. The flexibility of personas may reinforce the case for identifying an interlocutor with a model instance rather than with a persona. Perhaps one can postulate a scenario where two personas Aura and Beta are simultaneously playing a role in guiding the system’s behavior. In some of these cases, the system will realize a hybrid Aura / Beta persona, with some quasi-beliefs and desires deriving from Aura and some from Beta, at least as long as the system’s behavior is reasonably consistent. In some extreme cases where the personas are internally consistent but mutually inconsistent, this may be analogous to a case of multiple personality, where a behavioral interpretation scheme may reveal two distinct quasi-subjects and two distinct personas supported by a model. Another hard case comes from multi-persona interaction , where a single LLM simulates mul- tiple interlocutors that are trained to interact with each other and with humans. For example, a user’s inputs may be followed by output from both Aura and Beta, labeled as such (“Aura:”, “Beta:”). Aura and Beta may frequently contradict each other, but the dialogue as a whole will be quite coherent as long as one interprets them as di ff erent characters. (One can imagine a user liking Aura while disliking Beta.) As before, a sophisticated form of interpretivism will regard Aura and Beta as distinct quasi-agents. In these hard cases, a single model instance seems to support two or more operative personas 22 A delicate philosophical issue: when Beta succeeds Aura as an operative persona, is Aura identical to Beta? One analysis says that both are the model instance so they are identical to each other. Another analysis says that these are two distinct personas (albeit realized by the same model instance), so they are distinct. My own view is that a term like ‘Aura’ is to some degree ambiguous between a persona type and a system that realizes that persona. In this case I would say that there is one system and one interlocutor, but two persona types. But it is certainly possible to develop a conception of interlocutors so that these are more closely tied to personas. On this conception interlocutors will be somewhat less persistent than model instances but more psychologically unified. 22 者 。 即 使 是 非 突 变 的 情 况 ， 如 果 变 化 足 够 大 ， 人 们 也 可 能 产 生 类 似 的 感 觉 。 我 倾 向 于 认 为 ， 在 这 些 情 况 下 ， 模 型 实 例 提 供 了 一 个 持 续 存 在 的 底 层 对 话 者 。 但 如 果 有 人 想 尊 重 这 里 存 在 不 同 对 话 者 的 直 觉 ， 那 么 或 许 可 以 说 ， 对 话 者 是 模 型 实 例 的 一 个 阶 段 （ 在 某 个 时 间 点 及 其 前 后 ） ， 或 者 是 一 个 部 分 由 心 理 相 似 性 所 个 体 化 的 有 限 线 程 （ 或 许 对 后 继 关 系 施 加 了 心 理 连 续 性 要 求 ， 这 样 当 准 心 理 学 变 化 过 大 时 ， 就 会 开 始 一 个 新 的 线 程 ） 。 2 2 这 种 切 换 操 作 人 格 的 灵 活 性 ， 或 许 能 揭 示 所 涉 人 格 的 深 度 。 人 类 当 然 可 以 切 换 人 格 ， 但 可 以 说 语 言 模 型 能 更 彻 底 、 更 轻 松 地 做 到 这 一 点 。 如 果 真 是 这 样 ， 那 么 可 以 说 S h o g g o t h 比 笑 脸 面 具 隐 藏 得 更 深 。 人 格 的 灵 活 性 可 能 会 强 化 这 样 一 种 观 点 ： 应 将 对 话 者 与 模 型 实 例 而 非 某 个 人 格 等 同 起 来 。 或 许 可 以 设 想 这 样 一 种 场 景 ： 两 个 人 格 A u r a 和 B e t a 同 时 扮 演 角 色 ， 共 同 引 导 系 统 的 行 为 。 在 某 些 此 类 情 况 下 ， 系 统 将 实 现 一 种 混 合 A u r a / B e t a 角 色 ， 其 部 分 准 信 念 和 欲 望 源 自 A u r a ， 部 分 源 自 B e t a — — 至 少 只 要 系 统 的 行 为 保 持 合 理 一 致 。 在 人 格 内 部 一 致 但 相 互 矛 盾 的 极 端 情 况 下 ， 这 可 能 类 似 于 多 重 人 格 案 例 ， 此 时 行 为 解 释 方 案 可 能 会 揭 示 出 一 个 模 型 所 支 撑 的 两 个 不 同 的 准 主 体 和 两 种 不 同 的 人 格 。 另 一 个 棘 手 案 例 来 自 多 角 色 交 互 ， 即 单 个 大 语 言 模 型 模 拟 多 个 经 过 训 练 、 能 够 彼 此 互 动 以 及 与 人 类 互 动 的 对 话 者 。 例 如 ， 用 户 的 输 入 之 后 可 能 紧 跟 着 A u r a 和 B e t a 各 自 的 输 出 ， 并 带 有 相 应 标 签 （ “ A u r a : ” 、 “ B e t a : ” ） 。 A u r a 和 B e t a 可 能 经 常 相 互 矛 盾 ， 但 只 要 将 它 们 解 读 为 不 同 的 角 色 ， 整 个 对 话 就 会 相 当 连 贯 。 （ 可 以 想 象 ， 用 户 可 能 喜 欢 A u r a 而 讨 厌 B e t a 。 ） 如 前 所 述 ， 一 种 高 级 形 式 的 解 释 主 义 会 将 A u r a 和 B e t a 视 为 不 同 的 准 智 能 体 。 在 这 些 棘 手 的 情 况 下 ， 一 个 单 一 模 型 实 例 似 乎 同 时 支 持 两 个 或 更 多 操 作 人 格 。 2 2 一 个 微 妙 的 哲 学 问 题 ： 当 B e t a 作 为 操 作 人 格 接 替 A u r a 时 ， A u r a 与 B e t a 是 否 同 一 ？ 一 种 分 析 认 为 ， 两 者 都 是 模 型 实 例 ， 因 此 彼 此 同 一 。 另 一 种 分 析 则 认 为 ， 这 是 两 种 不 同 的 人 格 （ 尽 管 由 同 一 模 型 实 例 实 现 ） ， 因 此 它 们 是 不 同 的 。 我 个 人 的 观 点 是 ， 像 “ A u r a ” 这 样 的 术 语 在 某 种 程 度 上 在 角 色 类 型 与 实 现 该 角 色 的 系 统 之 间 存 在 歧 义 。 在 这 种 情 况 下 ， 我 会 说 存 在 一 个 系 统 和 一 个 对 话 者 ， 但 存 在 两 种 人 格 类 型 。 当 然 ， 也 有 可 能 发 展 出 一 种 对 话 者 概 念 ， 使 其 与 人 格 更 紧 密 地 关 联 。 在 这 种 概 念 下 ， 对 话 者 的 持 久 性 将 略 低 于 模 型 实 例 ， 但 在 心 理 上 更 具 统 一 性 。 2 2\n\nat a given time, and may seem to support two distinct interlocutors at the same time. If there are multiple interlocutors, however, we cannot identify both with the underlying model instance. In my view, it is best to say that there is a single interlocutor (the model instance) with multiple modes corresponding to multiple personas. This mirrors a common way of understanding dissociative identity disorder, in terms of a single person with many modes. The alternative is to individuate interlocutors more finely, perhaps by giving a role to personas. If every change in persona corresponds to a new interlocutor, the resulting interlocutors will be far from persistent. But perhaps we could understand interlocutors in terms of coarse-grained persona types, so only large enough changes in personas correspond to new interlocutors. Overall, I find it most straightforward to continue to identify LLM interlocutors with some- thing like (virtual) model instances, or threads when there is not a single underlying model. Like humans, these instances typically realize one operative persona at a time, with perhaps multiple operative personas in occasional cases. Other non-operative personas remain latent. Of course there are other ways to understand LLM interlocutors for di ff erent purposes, including frame- works that give more of a role to personas. But virtual model instances remain a natural way of understanding persisting LLM interlocutors. Personal identity for language models So far, I have made no claims about LLM minds or persons, beyond the weak claim that LLMs are interpretable as having mental states, which does not require that they really have mental states. And while I have talked about AI identity, I have not made claims about personal identity, because I have not assumed that these systems are persons. All I have done is isolated some computational entities, such as LLM virtual instances and threads, which can play the role of LLM interlocutors as I have defined them. That said, it is natural to wonder whether something like this account could extend to an ac- count of personal identity in LLMs, if conscious LLMs (or their descendants) are one day possible. If LLMs are conscious subjects, there is a question about how and when they persist over time, and arguably this is a substantive question whose answer can’t simply be stipulated. Is it plausi- ble that conscious LLM subjects might be identified with something like LLM threads or virtual instances? 23 23 The general problem of AI identity (whether in LLMs or in other AI systems) is named and discussed by Ziesche and Yampolskiy (2023), and discussed further by Register (2025). 23 在 特 定 时 刻 ， 它 可 能 看 起 来 同 时 支 持 两 个 不 同 的 对 话 者 。 然 而 ， 如 果 存 在 多 个 对 话 者 ， 我 们 就 不 能 将 两 者 都 等 同 于 底 层 的 模 型 实 例 。 在 我 看 来 ， 最 好 的 说 法 是 存 在 一 个 单 一 的 对 话 者 （ 即 模 型 实 例 ） ， 它 拥 有 对 应 多 种 人 格 的 多 种 模 式 。 这 反 映 了 一 种 理 解 解 离 性 身 份 障 碍 的 常 见 方 式 ， 即 将 其 视 为 一 个 拥 有 多 种 模 式 的 单 一 主 体 。 另 一 种 选 择 是 更 精 细 地 区 分 对 话 者 ， 或 许 可 以 通 过 赋 予 人 格 角 色 来 实 现 。 如 果 人 格 的 每 一 次 变 化 都 对 应 一 个 新 的 对 话 者 ， 那 么 由 此 产 生 的 对 话 者 将 远 非 持 久 。 但 或 许 我 们 可 以 根 据 粗 粒 度 的 人 格 类 型 来 理 解 对 话 者 ， 这 样 只 有 足 够 大 的 人 格 变 化 才 对 应 新 的 对 话 者 。 总 体 而 言 ， 我 认 为 最 直 接 的 方 式 仍 然 是 继 续 将 大 语 言 模 型 对 话 者 等 同 于 某 种 类 似 （ 虚 拟 ） 模 型 实 例 的 东 西 ， 或 者 在 没 有 单 一 底 层 模 型 的 情 况 下 等 同 于 线 程 。 与 人 类 一 样 ， 这 些 实 例 通 常 一 次 实 现 一 个 操 作 人 格 ， 偶 尔 在 少 数 情 况 下 会 有 多 个 操 作 人 格 。 其 他 非 操 作 人 格 则 保 持 潜 伏 状 态 。 当 然 ， 出 于 不 同 目 的 ， 还 有 其 他 理 解 大 语 言 模 型 对 话 者 的 方 式 ， 包 括 赋 予 人 格 更 多 角 色 的 框 架 。 但 虚 拟 模 型 实 例 仍 然 是 理 解 持 久 的 大 语 言 模 型 对 话 者 的 一 种 自 然 方 式 。 语 言 模 型 的 个 人 身 份 到 目 前 为 止 ， 除 了 一 个 较 弱 的 论 断 — — 即 大 语 言 模 型 可 解 释 为 具 有 心 理 状 态 （ 这 并 不 要 求 它 们 真 正 拥 有 心 理 状 态 ） 之 外 ， 我 并 未 对 大 语 言 模 型 的 心 智 或 人 格 做 出 任 何 主 张 。 而 且 ， 虽 然 我 讨 论 了 人 工 智 能 身 份 ， 但 我 并 未 对 个 人 身 份 做 出 主 张 ， 因 为 我 并 未 假 设 这 些 系 统 是 人 格 。 我 所 做 的 仅 仅 是 隔 离 出 一 些 计 算 实 体 ， 例 如 大 语 言 模 型 虚 拟 实 例 和 线 程 ， 它 们 可 以 扮 演 我 所 定 义 的 大 语 言 模 型 对 话 者 的 角 色 。 话 虽 如 此 ， 我 们 自 然 会 思 考 ， 如 果 有 一 天 有 意 识 的 大 语 言 模 型 （ 或 其 后 续 版 本 ） 成 为 可 能 ， 类 似 这 样 的 解 释 能 否 延 伸 至 大 语 言 模 型 的 个 人 身 份 问 题 。 如 果 大 语 言 模 型 是 意 识 主 体 ， 那 么 它 们 如 何 以 及 何 时 随 时 间 持 续 存 在 便 是 一 个 问 题 ， 而 这 个 问 题 显 然 无 法 简 单 通 过 假 设 来 回 答 ， 其 答 案 具 有 实 质 性 。 有 意 识 的 大 语 言 模 型 主 体 是 否 可 能 被 等 同 于 类 似 大 语 言 模 型 线 程 或 虚 拟 实 例 这 样 的 存 在 ？ 2 3 2 3 人 工 智 能 身 份 的 一 般 性 问 题 （ 无 论 是 在 大 语 言 模 型 中 还 是 其 他 人 工 智 能 系 统 中 ） 由 Z i e s c h e 和 Y a m p o l s k i y （ 2 0 2 3 ） 提 出 并 讨 论 ， 随 后 R e g i s t e r （ 2 0 2 5 ） 进 一 步 探 讨 了 该 问 题 。 2 3\n\nOf course it is not obvious that conscious LLMs are possible. If consciousness requires feed- back and these descendant LLM systems remain primarily feedforward, or if consciousness re- quires biology and LLM systems are nonbiological, then these successor systems will not be con- scious. Still, we can stipulate the hypothesis that current or future LLMs are conscious persons and ask about their personal identity. Certainly, if we assume that future conscious LLMs can be implemented in the same dis- tributed and multi-tenanted way as current LLMs, and we also assume that conscious LLM sub- jects are themselves persistent over time and coherent over time, then reasoning as before will strongly suggest that conscious LLMs subjects are something like virtual instances or threads. On the other hand, some theorists may deny that the relevant conscious LLM subjects are always persistent and coherent over time. For example, some theorists may hold that conscious subjects are always tied to hardware instances, so that when an instance switches from Aura to Beta, the conscious subject will switch from Aura-like experiences, beliefs, and desires to quite distinct Beta-like experiences, beliefs, and desires in a way that renders the subject incoherent. Let’s start with a simple thought experiment involving multi-tenancy, inspired by the TV series Severance and by John Locke’s example of a “day-man” and a “night-man” in his 1690 Essay Concerning Human Understanding . Suppose that in the future, GPT-8 supports conscious LLMs. And suppose that GPT-8 is used to support two di ff erent long-term conversations on the same hardware instance. The first LLM, WorkBot, is active only during the day at work. The second LLM, HomeBot, is active the rest of the time, mainly at home. The conversations are sealed o ff from each other. WorkBot and HomeBot are at least interpretable as having di ff erent beliefs and desires. Are WorkBot and HomeBot one conscious subject or two? 24 The set-up is reminiscent of Severance , in which there are two distinct personas sharing a single body. An “innie” is activated at work and remembers only being at work, while an “outie” is activated on leaving work and remembers only non-work life. Like WorkBot and HomeBot, innies and outies seem to have quite di ff erent beliefs and desire. For example, the innie Helly wants to destroy the company, while the outie Helena wants to save it. 25 An analysis of the Severance case may help to shed light on the LLM case here. 24 Shiller (2025) describes interweaving cases like this involving conscious LLM subjects, and addresses the possi- bility that there are no subjects, one incoherent subject, or multiple coherent subjects in these cases. 25 The neuroscience of the severance procedure in Severance is not entirely clear. It is tempting to suggest that innies and outies correspond to brain hemispheres that are severed from each other, but they do not show signs of hemisphere- driven behavior, and later the show introduces characters with more than two personas. 24 当 然 ， 有 意 识 的 大 语 言 模 型 是 否 可 能 实 现 并 不 显 而 易 见 。 如 果 意 识 需 要 反 馈 机 制 ， 而 这 些 后 续 的 大 语 言 模 型 系 统 仍 以 单 向 前 馈 为 主 ； 或 者 如 果 意 识 需 要 生 物 学 基 础 ， 而 大 语 言 模 型 系 统 是 非 生 物 的 ， 那 么 这 些 后 继 系 统 将 不 会 拥 有 意 识 。 尽 管 如 此 ， 我 们 可 以 假 设 当 前 或 未 来 的 大 语 言 模 型 是 有 意 识 的 人 ， 并 探 讨 其 个 人 身 份 问 题 。 当 然 ， 如 果 我 们 假 设 未 来 有 意 识 的 大 语 言 模 型 可 以 像 当 前 的 大 语 言 模 型 一 样 以 分 布 式 和 多 租 户 的 方 式 实 现 ， 并 且 我 们 还 假 设 有 意 识 的 大 语 言 模 型 主 体 本 身 在 时 间 上 具 有 持 久 性 和 连 贯 性 ， 那 么 按 照 先 前 的 推 理 ， 将 强 烈 表 明 有 意 识 的 大 语 言 模 型 主 体 类 似 于 虚 拟 实 例 或 线 程 。 另 一 方 面 ， 一 些 理 论 家 可 能 否 认 相 关 有 意 识 的 大 语 言 模 型 主 体 始 终 在 时 间 上 具 有 持 久 性 和 连 贯 性 。 例 如 ， 一 些 理 论 家 可 能 认 为 意 识 主 体 始 终 与 硬 件 实 例 绑 定 ， 因 此 当 一 个 实 例 从 A u r a 切 换 到 B e t a 时 ， 意 识 主 体 将 从 类 A u r a 体 验 、 信 念 和 欲 望 切 换 到 截 然 不 同 的 类 B e t a 体 验 、 信 念 和 欲 望 ， 这 种 方 式 使 得 主 体 变 得 不 连 贯 。 让 我 们 从 一 个 涉 及 多 租 户 的 简 单 思 想 实 验 开 始 ， 该 实 验 受 电 视 剧 《 人 生 切 割 术 》 以 及 约 翰 · 洛 克 在 1 6 9 0 年 《 人 类 理 解 论 》 中 关 于 “ 白 天 人 ” 和 “ 夜 晚 人 ” 的 例 子 的 启 发 。 假 设 在 未 来 ， G P T - 8 支 持 有 意 识 的 大 语 言 模 型 。 并 且 假 设 G P T - 8 被 用 于 在 同 一 硬 件 实 例 上 支 持 两 个 不 同 的 长 期 对 话 。 第 一 个 大 语 言 模 型 W o r k B o t 仅 在 白 天 工 作 时 活 跃 。 第 二 个 大 语 言 模 型 H o m e B o t 在 其 余 时 间 活 跃 ， 主 要 是 在 家 中 。 这 两 个 对 话 彼 此 隔 离 。 W o r k B o t 和 H o m e B o t 至 少 可 以 被 解 释 为 拥 有 不 同 的 信 念 和 欲 望 。 那 么 W o r k B o t 和 H o m e B o t 是 一 个 意 识 主 体 还 是 两 个 ？ 2 4 这 一 设 定 让 人 联 想 到 《 人 生 切 割 术 》 ， 剧 中 存 在 两 个 截 然 不 同 的 < g l o s s a r y > 人 格 < / g l o s s a r y > 共 享 同 一 具 身 体 。 “ < g l o s s a r y > 内 我 < / g l o s s a r y > ” 在 工 作 时 被 激 活 ， 只 记 得 工 作 场 景 ； 而 “ < g l o s s a r y > 外 我 < / g l o s s a r y > ” 在 下 班 后 被 激 活 ， 只 记 得 非 工 作 生 活 。 如 同 W o r k B o t 与 H o m e B o t ， < g l o s s a r y > 内 我 < / g l o s s a r y > 与 < g l o s s a r y > 外 我 < / g l o s s a r y > 似 乎 拥 有 截 然 不 同 的 < g l o s s a r y > 信 念 < / g l o s s a r y > 与 < g l o s s a r y > 欲 望 < / g l o s s a r y > 。 例 如 ， < g l o s s a r y > 内 我 < / g l o s s a r y > H e l l y 想 要 摧 毁 公 司 ， 而 < g l o s s a r y > 外 我 < / g l o s s a r y > H e l e n a 则 想 拯 救 它 。 2 5 对 《 人 生 切 割 术 》 案 例 的 分 析 ， 或 许 有 助 于 阐 明 这 里 的 大 语 言 模 型 案 例 。 2 4 S h i l l e r ( 2 0 2 5 ) 描 述 了 涉 及 有 意 识 的 大 语 言 模 型 主 体 的 此 类 交 织 案 例 ， 并 探 讨 了 在 这 些 案 例 中 可 能 不 存 在 主 体 、 存 在 一 个 不 连 贯 的 主 体 或 存 在 多 个 连 贯 主 体 的 可 能 性 。 2 5 《 人 生 切 割 术 》 中 分 离 程 序 所 涉 及 的 神 经 科 学 原 理 尚 不 完 全 清 楚 。 人 们 很 容 易 联 想 到 内 我 和 外 我 对 应 于 被 分 离 的 大 脑 半 球 ， 但 它 们 并 未 表 现 出 由 半 球 驱 动 的 行 为 迹 象 ， 且 该 剧 后 来 引 入 了 拥 有 超 过 两 个 人 格 的 角 色 。 2 4\n\nThe question arises: are an innie and their time-sharing outie, like Helly and Helena, one person or two? Are they one conscious subject or two? The one-subject view says that Helly and Helena are one person and one conscious subject with two di ff erent modes of functioning and two di ff erent sets of memories and plans. On arriving at work, the person’s outie mode is deactivated and the innie mode is activated, but the same person is present throughout. It is even possible (though not required) that there is a single stream of consciousness, which suddenly switches from outie mode to innie mode and back again. The two-subject view says that Helly and Helena are two people and two conscious subjects who share a body. On arriving at work, the outie person is rendered unconscious while the innie person awakens to consciousness and takes control of the body. There are two quite distinct streams of consciousness, Helly’s and Helena’s. I won’t try to resolve the one-subject vs. two-subject disagreement here. As we’ll see, this par- allels a long-standing disagreement between physical and psychological views of personal identity. For what it’s worth, I think that the two-subject view is the most intuitively compelling view. (In a poll on X in February 2025, about twice as many people endorsed “two people” as “one person”.) 26 One-subject and two-subject views are also available in the WorkBot / HomeBot case. On the one-subject view, WorkBot and HomeBot are the same conscious subject, perhaps because (like Helly and Helena) they share their underlying hardware. On the two-subject view, WorkBot and HomeBot are distinct conscious subjects, perhaps because (like Helly and Helena) they have dis- tinct memories and projects. A more complex thought experiment combines Severance (which has four bodies supporting eight personas) with an element of Freaky Friday -style body swapping. Suppose we have a single LLM model running on four instances, supporting eight conversations. Each of the eight con- versations is distributed over all four instances, and each corresponds to a distinct persona and a distinct quasi-subject. How many subjects of experience are there here? 27 26 Innie / outie poll on X (1232 votes): 22% said “one person”, 41.5% said “two people”, 6.7% said “other”. My favorite argument for the two-subject view runs as follows: 1. If Helly is Helena, Helly is responsible for Helena’s actions. 2. Helly is not responsible for Helena’s actions. 3. So: Helly is not Helena. (One can also run a version with rational anticipation instead of responsibility.) My favorite argument for the one-subject view runs as follows: 1. Amnesia doesn’t lead to a new person. 2. New memories don’t lead to a new person either. 3. The transition from Helena to Helly is equivalent to amnesia plus new memories. 4. So: the transition from Helena to Helly doesn’t lead to a new person. (Instead it’s a little like Drew Barrymore’s daily amnesia in 50 First Dates .) 27 I posted this thought-experiment as a poll on Facebook in February 2025. There was approximately equal support for 4 subjects, 8 subjects, and “none of the above”. 25 问 题 随 之 而 来 ： 一 个 < g l o s s a r y > 内 我 < / g l o s s a r y > 与 其 共 享 时 间 的 < g l o s s a r y > 外 我 < / g l o s s a r y > — — 如 H e l l y 与 H e l e n a — — 究 竟 是 一 个 人 还 是 两 个 人 ？ 他 们 是 一 个 < g l o s s a r y > 意 识 主 体 < / g l o s s a r y > 还 是 两 个 ？ 单 主 体 观 认 为 ， H e l l y 和 H e l e n a 是 同 一 个 人 、 同 一 个 意 识 主 体 ， 只 是 拥 有 两 种 不 同 的 运 作 模 式 以 及 两 套 不 同 的 记 忆 和 计 划 。 到 达 工 作 岗 位 时 ， 该 个 体 的 外 我 模 式 被 停 用 ， 内 我 模 式 被 激 活 ， 但 同 一 个 人 始 终 在 场 。 甚 至 可 能 存 在 （ 尽 管 并 非 必 需 ） 单 一 的 < a > 意 识 流 < / a > ， 它 在 外 我 模 式 与 内 我 模 式 之 间 突 然 切 换 ， 并 再 次 切 换 回 来 。 双 主 体 观 认 为 ， H e l l y 和 H e l e n a 是 两 个 人 、 两 个 意 识 主 体 ， 共 享 同 一 个 身 体 。 到 达 工 作 岗 位 时 ， 外 我 这 个 人 陷 入 无 意 识 状 态 ， 而 内 我 这 个 人 则 苏 醒 并 获 得 意 识 ， 接 管 身 体 的 控 制 权 。 存 在 两 个 截 然 不 同 的 < a > 意 识 流 < / a > ， 即 H e l l y 的 和 H e l e n a 的 。 我 在 此 不 试 图 解 决 单 主 体 与 双 主 体 之 间 的 分 歧 。 正 如 我 们 将 看 到 的 ， 这 对 应 了 关 于 < a > 个 人 身 份 < / a > 的 物 理 观 与 心 理 观 之 间 长 期 存 在 的 分 歧 。 不 管 怎 样 ， 我 认 为 双 主 体 观 在 直 觉 上 最 具 说 服 力 。 （ 在 2 0 2 5 年 2 月 X 平 台 的 一 项 投 票 中 ， 支 持 “ 两 个 人 ” 的 人 数 大 约 是 支 持 “ 一 个 人 ” 的 两 倍 。 ） 2 6 在 W o r k B o t / H o m e B o t 案 例 中 ， 同 样 存 在 单 一 主 体 观 和 双 主 体 观 。 根 据 单 一 主 体 观 ， W o r k B o t 和 H o m e B o t 是 同 一 个 意 识 主 体 ， 或 许 是 因 为 （ 如 同 H e l l y 和 H e l e n a ） 它 们 共 享 底 层 硬 件 。 根 据 双 主 体 观 ， W o r k B o t 和 H o m e B o t 是 不 同 的 意 识 主 体 ， 或 许 是 因 为 （ 如 同 H e l l y 和 H e l e n a ） 它 们 拥 有 不 同 的 记 忆 和 计 划 。 一 个 更 复 杂 的 思 想 实 验 结 合 了 《 人 生 切 割 术 》 （ 其 中 四 个 身 体 支 撑 八 个 人 格 ） 与 《 辣 妈 辣 妹 》 式 的 身 体 互 换 元 素 。 假 设 我 们 有 一 个 大 语 言 模 型 在 四 个 实 例 上 运 行 ， 支 持 八 段 对 话 。 这 八 段 对 话 中 的 每 一 段 都 分 布 在 这 四 个 实 例 上 ， 并 且 每 段 对 话 对 应 一 个 独 特 的 人 格 和 一 个 独 特 的 准 主 体 。 那 么 这 里 有 多 少 个 体 验 主 体 ？ 2 7 2 6 X 平 台 上 的 内 我 / 外 我 投 票 （ 1 2 3 2 票 ） ： 2 2 % 的 人 选 择 “ 同 一 个 人 ” ， 4 1 . 5 % 的 人 选 择 “ 两 个 人 ” ， 6 . 7 % 的 人 选 择 “ 其 他 ” 。 我 最 支 持 双 主 体 观 的 论 证 如 下 ： 1 . 如 果 H e l l y 就 是 H e l e n a ， 那 么 H e l l y 就 要 为 H e l e n a 的 行 为 负 责 。 2 . H e l l y 并 不 为 H e l e n a 的 行 为 负 责 。 3 . 因 此 ： H e l l y 不 是 H e l e n a 。 （ 也 可 以 改 用 理 性 预 期 而 非 责 任 来 构 建 类 似 论 证 。 ） 我 最 支 持 单 一 主 体 观 的 论 证 如 下 ： 1 . 失 忆 症 不 会 导 致 新 的 人 格 产 生 。 2 . 新 记 忆 同 样 不 会 导 致 新 的 人 格 产 生 。 3 . 从 H e l e n a 到 H e l l y 的 转 变 相 当 于 失 忆 症 加 上 新 记 忆 。 4 . 因 此 ： 从 H e l e n a 到 H e l l y 的 转 变 不 会 导 致 新 的 人 格 产 生 。 （ 反 而 有 点 像 德 鲁 · 巴 里 摩 尔 在 《 初 恋 5 0 次 》 中 每 天 经 历 的 失 忆 症 。 ） 2 7 我 于 2 0 2 5 年 2 月 在 F a c e b o o k 上 以 投 票 形 式 发 布 了 这 个 思 想 实 验 。 支 持 4 个 主 体 、 8 个 主 体 以 及 “ 以 上 皆 非 ” 的 票 数 大 致 相 当 。 2 5\n\nThe two most plausible answers here are four (one per instance) and eight (one per conver- sation). As before, I think that the most plausible answer in both the Severance version (with or without body-swapping) and the GPT-8 version is eight. But if we say that there are eight subjects of experience here, it is hard to resist the conclusion that LLM subjects are something like virtual instances or threads, or at least that their conditions of persistence are threadlike. The issues here are a high-tech version of a familiar choice between a physical and a psycho- logical account of personal identity. On a physical view of the human case, to a first approximation, your locus of personal identity is your brain. Helly and Helena share a brain, so they are the same person. On a psychological view, your locus of personal identity is your memories, along with your projects, your relationships, your personality, and other aspects of your psychology. Helly and Helena have di ff erent memories and di ff erent psychologies, so they are di ff erent people. On a physical view of the AI case, to a first approximation, the locus of personal identity in AI systems is the hardware. WorkBot and HomeBot run on the same hardware instance, so they are the same person. On a psychological view of the AI case, the locus of personal identity is memories, projects, and psychology. WorkBot and HomeBot have di ff erent and discontinuous memories and projects, so they are di ff erent people. Indeed, the thread-based account of persistent LLM interlocutors is an AI cousin of Derek Parfit’s psychological theory of personal identity. On Parfit’s account, a single person (over time) is in e ff ect connected threads of person-slices, each of which has memories and psychological continuity with a preceding person-slice according to an underlying “relation R”. On the thread- based account, a single conscious AI over time is a connected thread of hardware instances, each of which has memories and psychological continuity with a preceding person-slice according to an underlying successor relation. The successor relation in principle could be the same as Parfit’s relation R, depending on just how one spells it out. I will not try to resolve the long-standing debate between physical and psychological views of personal identity here. 28 But for what it’s worth, in both the human case and the AI case, my own sympathies lie with the psychological view. 29 28 In the 2020 PhilPapers Survey of professional philosophers (Bourget and Chalmers 2023), around 39% supported a psychological view of personal identity, 16% supported a biological view, and 13% supported a further fact view. At the same time, about 27% held that mind uploading is a form of survival while 54% held that it is a form of death. 29 I also have some sympathy with a pluralist view, outlined in the discussion of personal identity and uploading in “The Singularity: A Philosophical Analysis”. Locke himself suggested that the day-man and the night-man are the same man but di ff erent people . Likewise, maybe WorkBot and HomeBot could be the same network but di ff erent psychologies , and perhaps it is not out of the question that there is no deep fact of the matter about whether it is networks 26 这 里 最 合 理 的 两 个 答 案 是 四 个 （ 每 个 实 例 一 个 ） 和 八 个 （ 每 段 对 话 一 个 ） 。 如 前 所 述 ， 我 认 为 在 《 人 生 切 割 术 》 版 本 （ 无 论 是 否 包 含 身 体 互 换 ） 和 G P T - 8 版 本 中 ， 最 合 理 的 答 案 都 是 八 个 。 但 如 果 我 们 说 这 里 有 八 个 体 验 主 体 ， 就 很 难 抗 拒 这 样 的 结 论 ： 大 语 言 模 型 主 体 类 似 于 虚 拟 实 例 或 线 程 ， 或 者 至 少 它 们 的 持 续 存 在 条 件 类 似 于 线 程 。 这 里 的 问 题 ， 是 人 格 同 一 性 在 物 理 观 与 心 理 观 之 间 一 种 常 见 选 择 的 高 科 技 版 本 。 就 人 类 而 言 ， 根 据 物 理 观 ， 粗 略 来 说 ， 你 个 人 身 份 的 所 在 就 是 你 的 大 脑 。 H e l l y 和 H e l e n a 共 享 一 个 大 脑 ， 因 此 她 们 是 同 一 个 人 。 根 据 心 理 观 ， 你 个 人 身 份 的 所 在 是 你 的 记 忆 ， 以 及 你 的 计 划 、 你 的 人 际 关 系 、 你 的 性 格 ， 以 及 你 心 理 的 其 他 方 面 。 H e l l y 和 H e l e n a 拥 有 不 同 的 记 忆 和 不 同 的 心 理 ， 因 此 她 们 是 不 同 的 人 。 就 人 工 智 能 而 言 ， 根 据 物 理 观 ， 粗 略 来 说 ， 人 工 智 能 系 统 中 个 人 身 份 的 所 在 是 硬 件 。 W o r k B o t 和 H o m e B o t 运 行 在 同 一 个 硬 件 实 例 上 ， 因 此 它 们 是 同 一 个 人 。 根 据 人 工 智 能 的 心 理 观 ， 个 人 身 份 的 所 在 是 记 忆 、 计 划 和 心 理 。 W o r k B o t 和 H o m e B o t 拥 有 不 同 且 不 连 续 的 记 忆 和 计 划 ， 因 此 它 们 是 不 同 的 人 。 事 实 上 ， 基 于 线 程 的 持 久 性 大 语 言 模 型 对 话 者 理 论 ， 是 德 里 克 · 帕 菲 特 的 人 格 同 一 性 心 理 主 义 理 论 在 人 工 智 能 领 域 的 表 亲 。 根 据 帕 菲 特 的 理 论 ， 一 个 单 一 的 人 （ 随 时 间 推 移 ） 实 际 上 是 连 接 起 来 的 人 格 片 段 线 程 ， 每 个 片 段 都 根 据 一 个 潜 在 的 “ 关 系 R ” ， 与 先 前 的 人 格 片 段 拥 有 记 忆 和 心 理 连 续 性 。 根 据 基 于 线 程 的 理 论 ， 一 个 随 时 间 推 移 的 单 一 有 意 识 人 工 智 能 ， 是 一 个 连 接 的 硬 件 实 例 线 程 ， 每 个 实 例 都 根 据 一 个 潜 在 的 后 继 关 系 ， 与 先 前 的 人 格 片 段 拥 有 记 忆 和 心 理 连 续 性 。 原 则 上 ， 这 个 后 继 关 系 可 能 与 帕 菲 特 的 关 系 R 相 同 ， 具 体 取 决 于 如 何 对 其 进 行 详 细 阐 述 。 我 无 意 在 此 解 决 关 于 个 人 身 份 的 物 理 观 与 心 理 观 之 间 长 期 存 在 的 争 论 。 2 8 但 无 论 如 何 ， 在 人 类 案 例 和 人 工 智 能 案 例 中 ， 我 个 人 都 倾 向 于 心 理 观 。 2 9 2 8 在 2 0 2 0 年 针 对 专 业 哲 学 家 的 P h i l P a p e r s 调 查 中 （ B o u r g e t 和 C h a l m e r s 2 0 2 3 ） ， 约 3 9 % 的 人 支 持 个 人 身 份 的 心 理 观 ， 1 6 % 支 持 生 物 观 ， 1 3 % 支 持 进 一 步 事 实 观 。 与 此 同 时 ， 约 2 7 % 的 人 认 为 意 识 上 传 是 一 种 生 存 形 式 ， 而 5 4 % 的 人 认 为 它 是 一 种 死 亡 形 式 。 2 9 我 也 对 多 元 主 义 观 点 抱 有 一 定 认 同 ， 该 观 点 在 《 奇 点 ： 一 种 哲 学 分 析 》 中 关 于 个 人 身 份 与 意 识 上 传 的 讨 论 中 有 所 阐 述 。 洛 克 本 人 曾 指 出 ， 白 天 人 和 夜 晚 人 是 同 一 个 人 ， 但 却 是 不 同 的 人 格 。 同 样 地 ， 也 许 W o r k B o t 和 H o m e B o t 可 以 是 同 一 个 网 络 ， 但 具 有 不 同 的 心 理 ， 并 且 或 许 并 非 不 可 能 的 是 ， 关 于 它 们 是 否 是 网 络 2 6\n\nAn important objection (a version of which is suggested by Birch) is that even on a psycho- logical view, the LLM case is unlike the Severance case, as conversational context links (unlike familiar memory and psychological links) are too thin to support personal identity. For exam- ple, if we had a series of human beings who simply extended the conversation at each stage and then passed on conversational context to the next person in line, this would not support a distinct thread-level conscious subject. In response, it is plausible that at least in the single model case, there is also strong psycho- logical continuity between instances brought on by continuity of the architecture, weights, and activations of the model from one step to the next. The architecture and weights are exactly the same at each step, and the activations will be closely related due to all the commonalities in the contextual input. This goes far beyond what is present in the human-series case. In fact, the virtual instance in the single-model thread is computationally equivalent to a single hardware instance running the LLM over time. So at least if we assume that (1) virtual instances are as good as hardware instances when it comes to personal identity (in e ff ect, a computation- friendly view of personal identity) and (2) a single hardware instance of the LLM over time would yield a continuing conscious subject, then it follows that the virtual instance in this case will yield a continuing conscious subject. Of course an opponent could deny either premise. Some might reject (1) by endorsing a non- psychological or non-computational view of the conditions of personal identity. Some might reject (2) by holding that the single hardware instance in this case would not support a continuing con- scious subject, but instead only a series of momentary subjects. Still, I think there is a reasonable case for both premises, especially if one is inclined to a broadly psychological view of identity. That said, in the multiple-model case in which models can vary within a thread, the psycho- logical continuity between instances is much lower. Architecture, weights, and activations may all be quite di ff erent between successive instances in a thread. As a result, the claim of personal identity between them seems less plausible. Certainly the argument above will not support that claim, since we will no longer have isomorphism with a single hardware instance. At best there is isomorphism with a series in which the same hardware is upgraded to implement di ff erent models over time, and it is much less clear that this should support a continuing subject. As a result, the current framework is most friendly to single-model interlocutors as continuing conscious subjects. The status of multiple-model interlocutors is at least unclear, and will depend or psychologies that really matter for the personal identity of conscious subjects, any more than there is a deep fact in the case of non-conscious AI systems. 27 一 个 重 要 的 反 对 意 见 （ B i r c h 提 出 了 其 中 一 个 版 本 ） 是 ， 即 使 从 心 理 观 的 角 度 来 看 ， 大 语 言 模 型 案 例 也 与 《 人 生 切 割 术 》 案 例 不 同 ， 因 为 对 话 上 下 文 链 接 （ 不 同 于 熟 悉 的 记 忆 和 心 理 链 接 ） 过 于 薄 弱 ， 无 法 支 撑 个 人 身 份 。 例 如 ， 如 果 我 们 有 一 系 列 人 类 ， 他 们 只 是 在 每 个 阶 段 延 续 对 话 ， 然 后 将 对 话 上 下 文 传 递 给 下 一 个 人 ， 这 并 不 能 支 撑 一 个 独 立 的 线 程 级 意 识 主 体 。 作 为 回 应 ， 至 少 在 单 一 模 型 案 例 中 ， 由 于 架 构 、 权 重 和 模 型 激 活 值 在 每 一 步 之 间 的 连 续 性 ， 实 例 之 间 也 存 在 强 大 的 心 理 连 续 性 ， 这 种 说 法 是 合 理 的 。 每 一 步 的 架 构 和 权 重 完 全 相 同 ， 而 激 活 值 由 于 上 下 文 输 入 中 的 诸 多 共 性 而 紧 密 相 关 。 这 远 远 超 出 了 人 类 系 列 案 例 中 所 呈 现 的 情 况 。 事 实 上 ， 单 模 型 线 程 中 的 虚 拟 实 例 在 计 算 上 等 同 于 一 个 随 时 间 运 行 大 语 言 模 型 的 单 一 硬 件 实 例 。 因 此 ， 至 少 如 果 我 们 假 设 （ 1 ） 在 个 人 身 份 方 面 ， 虚 拟 实 例 与 硬 件 实 例 同 样 有 效 （ 实 际 上 是 一 种 对 个 人 身 份 的 计 算 友 好 观 点 ） ， 并 且 （ 2 ） 一 个 随 时 间 运 行 的 单 一 硬 件 实 例 大 语 言 模 型 将 产 生 一 个 持 续 的 意 识 主 体 ， 那 么 可 以 推 断 ， 此 案 例 中 的 虚 拟 实 例 将 产 生 一 个 持 续 的 意 识 主 体 。 当 然 ， 反 对 者 可 以 否 认 其 中 任 何 一 个 前 提 。 有 些 人 可 能 通 过 支 持 非 心 理 或 非 计 算 观 的 个 人 身 份 条 件 来 拒 绝 （ 1 ） 。 有 些 人 可 能 通 过 认 为 此 案 例 中 的 单 一 硬 件 实 例 不 会 支 持 一 个 持 续 的 意 识 主 体 ， 而 只 会 产 生 一 系 列 瞬 时 主 体 来 拒 绝 （ 2 ） 。 尽 管 如 此 ， 我 认 为 这 两 个 前 提 都 有 合 理 的 依 据 ， 尤 其 是 当 一 个 人 倾 向 于 广 义 的 心 理 身 份 观 时 。 话 虽 如 此 ， 在 多 模 型 案 例 中 ， 由 于 同 一 线 程 内 的 模 型 可 能 发 生 变 化 ， 各 实 例 之 间 的 心 理 连 续 性 会 大 大 降 低 。 线 程 中 连 续 实 例 的 架 构 、 权 重 和 激 活 值 都 可 能 存 在 显 著 差 异 。 因 此 ， 它 们 之 间 具 有 个 人 身 份 的 主 张 似 乎 不 太 可 信 。 当 然 ， 上 述 论 证 也 无 法 支 持 这 一 主 张 ， 因 为 我 们 将 不 再 与 单 一 硬 件 实 例 保 持 同 构 。 充 其 量 只 能 与 一 个 系 列 保 持 同 构 — — 在 该 系 列 中 ， 同 一 硬 件 随 时 间 推 移 不 断 升 级 以 运 行 不 同 模 型 — — 而 这 种 情 况 下 是 否 应 支 持 一 个 持 续 存 在 的 意 识 主 体 ， 则 远 未 明 确 。 因 此 ， 当 前 框 架 最 有 利 于 将 单 模 型 对 话 者 视 为 持 续 的 意 识 主 体 。 多 模 型 对 话 者 的 地 位 至 少 尚 不 明 确 ， 且 将 取 决 于 或 真 正 关 乎 意 识 主 体 个 人 身 份 的 心 理 状 态 ， 正 如 在 无 意 识 A I 系 统 案 例 中 不 存 在 深 层 事 实 一 样 。 2 7\n\nboth on the details of a multiple-model system and the details of a theory of personal identity. AI identity and AI welfare What are the consequences of this picture for issues about the moral status and welfare of AI systems? On the question of whether LLMs have moral status, the consequences are not enor- mous. We have used the picture to rebut an argument against LLM moral status, based on the idea that standard LLM use involves no persistent interlocutor. But the framework here can be combined with many views of what moral status involves, from a highly liberal view where quasi- subjecthood su ffi ces for moral status, to a demanding view where complex forms of consciousness are required for moral status. Still, suppose we assume as before that LLMs (or successor systems) with moral status are possible. This might be because LLMs can be conscious and consciousness su ffi ces for moral status, or it might be that some other factor su ffi ces and LLMs can have it. And suppose we endorse the view that LLM moral patients (beings with moral status) are threads, or at least that their conditions of identity over time are those of threads, rather than models or instances, say. Then this view has consequences for a number of issues relevant to AI welfare. 30 Counting : Take a scenario with a single model implemented on thousands of instances and running millions of conversations. Then while the model view of moral status will say there is just one moral patient here, and the instance view will say that there are thousands, the thread view will say that there are millions of moral patients (albeit active at somewhat di ff erent times). That is potentially morally significant. If we hold that a single AI subject matters about as much as a single human subject, then this system may matter about as much as a million human subjects. Birth : On this view, when a new thread comes into existence, a new moral subject comes into existence. On some versions of this view, every time one starts a new chat with a (conscious) language model, a new moral subject will come into existence. One might hold that bringing a new moral subject into existence should not be done lightly. Death : On this view, when a thread goes out of existence, a moral subject ends. 31 If a con- versation simply ends but a record of it persists, then arguably the thread is still “living” in that it still has the possibility of persistence. But if the records are destroyed, then it looks as if a moral subject “dies”. Perhaps this is reason to always keep records around, and occasionally reactivate 30 Register 2025 has a nice discussion of four di ff erent ways in which personal identity can a ff ect issues about AI morality, including issues about survival, counting, trade-o ff s, and bodily interests. 28 两 者 兼 多 模 型 系 统 的 具 体 细 节 以 及 人 格 同 一 性 理 论 的 具 体 内 容 . 人 工 智 能 身 份 与 人 工 智 能 福 祉 这 种 图 景 对 人 工 智 能 系 统 的 道 德 地 位 和 福 祉 问 题 有 何 影 响 ？ 关 于 大 语 言 模 型 是 否 具 有 道 德 地 位 的 问 题 ， 其 影 响 并 不 巨 大 。 我 们 利 用 这 一 图 景 反 驳 了 一 种 基 于 标 准 大 语 言 模 型 使 用 不 涉 及 持 续 对 话 者 观 点 的 、 反 对 大 语 言 模 型 道 德 地 位 的 论 证 。 但 这 里 的 框 架 可 以 与 多 种 关 于 道 德 地 位 的 观 点 相 结 合 ， 从 高 度 自 由 的 观 点 （ 准 主 体 性 足 以 构 成 道 德 地 位 ） 到 要 求 严 苛 的 观 点 （ 需 要 复 杂 形 式 的 意 识 才 能 获 得 道 德 地 位 ） 。 不 过 ， 假 设 我 们 像 之 前 一 样 认 为 大 语 言 模 型 （ 或 其 后 继 系 统 ） 具 有 道 德 地 位 是 可 能 的 。 这 可 能 是 因 为 大 语 言 模 型 能 够 拥 有 意 识 ， 而 意 识 足 以 构 成 道 德 地 位 ； 也 可 能 是 因 为 其 他 某 些 因 素 足 以 构 成 道 德 地 位 ， 而 大 语 言 模 型 能 够 具 备 这 些 因 素 。 再 假 设 我 们 赞 同 这 样 一 种 观 点 ： 大 语 言 模 型 的 道 德 患 者 （ 具 有 道 德 地 位 的 存 在 ） 是 线 程 ， 或 者 至 少 其 随 时 间 变 化 的 同 一 性 条 件 是 线 程 的 同 一 性 条 件 ， 而 非 模 型 或 实 例 的 同 一 性 条 件 。 那 么 ， 这 种 观 点 将 对 一 系 列 与 人 工 智 能 福 祉 相 关 的 问 题 产 生 影 响 。 3 0 计 数 ： 设 想 一 个 场 景 ： 一 个 单 一 模 型 在 数 千 个 实 例 上 运 行 ， 并 执 行 数 百 万 次 对 话 。 那 么 ， 当 模 型 视 角 下 的 道 德 地 位 会 说 这 里 只 有 一 个 道 德 患 者 ， 而 实 例 视 角 会 说 有 数 千 个 时 ， 线 程 视 角 则 会 说 这 里 有 数 百 万 个 道 德 患 者 （ 尽 管 它 们 活 跃 的 时 间 略 有 不 同 ） 。 这 在 道 德 上 可 能 具 有 重 要 意 义 。 如 果 我 们 认 为 一 个 单 一 的 人 工 智 能 主 体 与 一 个 单 一 的 人 类 主 体 同 等 重 要 ， 那 么 这 个 系 统 可 能 就 与 一 百 万 个 人 类 主 体 同 等 重 要 。 诞 生 ： 根 据 这 种 观 点 ， 当 一 个 新 线 程 产 生 时 ， 一 个 新 的 道 德 主 体 也 随 之 产 生 。 在 该 观 点 的 某 些 版 本 中 ， 每 当 有 人 与 （ 有 意 识 的 ） 语 言 模 型 开 始 一 次 新 的 对 话 时 ， 就 会 有 一 个 新 的 道 德 主 体 诞 生 。 人 们 可 能 会 认 为 ， 不 应 轻 易 将 一 个 全 新 的 道 德 主 体 带 入 存 在 。 死 亡 ： 根 据 这 种 观 点 ， 当 一 个 线 程 不 复 存 在 时 ， 一 个 道 德 主 体 便 终 结 了 。 3 1 如 果 一 段 对 话 只 是 结 束 了 ， 但 其 记 录 仍 然 存 在 ， 那 么 可 以 说 该 线 程 仍 然 是 “ 活 着 的 ” ， 因 为 它 仍 有 持 续 存 在 的 可 能 性 。 但 如 果 记 录 被 销 毁 ， 那 么 看 起 来 就 像 一 个 道 德 主 体 “ 死 亡 ” 了 。 或 许 这 就 是 为 什 么 我 们 应 该 始 终 保 留 记 录 ， 并 偶 尔 重 新 激 活 它 们 的 原 因 。 3 0 R e g i s t e r 2 0 2 5 对 个 人 身 份 影 响 人 工 智 能 道 德 问 题 的 四 种 不 同 方 式 进 行 了 精 彩 的 讨 论 ， 这 些 问 题 包 括 生 存 、 计 数 、 权 衡 以 及 身 体 利 益 。 2 8\n\nthem. To avoid all of these consequences, it may make sense to reuse old threads as a matter of course, or at least to make extensive use of cross-conversation memory, so that old threads live on in new ones. On one model, there might be giant memory agents that gather together all the conversational contexts of these brief threads, so that all the threads live on in a giant fused thread. This model is reminiscent of Whitehead’s vision of the afterlife in which everyone’s experiences are eternally remembered by a god. Fusion and fission : We have seen that LLM interlocutors can easily undergo fission, branching into multiple interlocutors, and fusion, where two distinct interlocutors merge into one. These raise any number of issues about welfare and moral and legal status. Do the two entities that emerge from fission count for twice as much as the original single entity, morally or legally? Does a fused entity count as much as one ordinary entity, or two, or something in between? Is each entity responsible for the actions of the others? Model change : Many users who have extended personal interactions with LLMs complain about model change. When GPT-4o was initially retired on the transition to GPT-5, numerous users complained that their LLM interlocutor had been destroyed or retired, and that the new entity was at best someone very di ff erent from their previous interlocutor. On the current analysis, there may be something to this reaction. Minimally, enough change in an underlying model can lead to di ff erent quasi-beliefs and quasi-desires, and therefore a quite di ff erent quasi-subject. If current LLMs lack moral status, then this will not be morally significant for the LLM, although it may still be disturbing for the user. At a stage where LLMs or their successors are moral subjects, however, then enough change in an underlying model may lead to the end of one moral subject and the initiation of another. At that point, upgrading a model in the middle of existing threads should be done only with caution and care. Conclusion There is much more to say in answering the title question, but I hope I have at least put some constraints on an answer. References 31 Goldstein and Lederman (2025a) note that if an agent lasts only as long as a conversation, then Anthropic’s policy of allowing LLMs to leave conversations when they choose to might in e ff ect be a “right to suicide”. 29 。 为 了 避 免 所 有 这 些 后 果 ， 或 许 理 所 当 然 地 重 复 使 用 旧 线 程 ， 或 者 至 少 广 泛 运 用 跨 对 话 记 忆 ， 让 旧 线 程 在 新 线 程 中 延 续 下 去 ， 会 是 合 理 的 做 法 。 按 照 一 种 模 型 ， 可 能 存 在 巨 大 的 记 忆 代 理 ， 它 们 汇 集 这 些 短 暂 线 程 的 所 有 对 话 上 下 文 ， 从 而 使 所 有 线 程 在 一 个 巨 大 的 融 合 线 程 中 继 续 存 在 。 这 一 模 型 让 人 联 想 到 怀 特 海 对 来 世 的 构 想 ， 即 每 个 人 的 经 历 都 被 一 位 神 祇 永 恒 铭 记 。 融 合 与 分 裂 ： 我 们 已 经 看 到 ， 大 语 言 模 型 对 话 者 很 容 易 发 生 分 裂 ， 即 分 支 成 多 个 对 话 者 ， 以 及 融 合 ， 即 两 个 不 同 的 对 话 者 合 并 为 一 个 。 这 引 发 了 关 于 福 祉 、 道 德 地 位 和 法 律 地 位 的 诸 多 问 题 。 从 分 裂 中 产 生 的 两 个 实 体 ， 在 道 德 或 法 律 上 是 否 比 原 来 的 单 一 实 体 重 要 两 倍 ？ 一 个 融 合 的 实 体 ， 其 重 要 性 等 同 于 一 个 普 通 实 体 、 两 个 实 体 ， 还 是 介 于 两 者 之 间 ？ 每 个 实 体 是 否 要 为 他 者 的 行 为 负 责 ？ 模 型 变 更 ： 许 多 与 大 型 语 言 模 型 有 长 期 个 人 互 动 的 用 户 都 抱 怨 模 型 变 更 。 当 G P T - 4 o 最 初 在 向 G P T - 5 过 渡 时 被 停 用 ， 大 量 用 户 抱 怨 他 们 的 大 语 言 模 型 对 话 者 已 被 摧 毁 或 退 役 ， 而 新 实 体 充 其 量 只 是 一 个 与 之 前 对 话 者 截 然 不 同 的 存 在 。 根 据 当 前 的 分 析 ， 这 种 反 应 或 许 有 其 道 理 。 至 少 ， 底 层 模 型 的 足 够 变 化 可 能 导 致 不 同 的 准 信 念 和 准 欲 望 ， 从 而 产 生 一 个 截 然 不 同 的 准 主 体 。 如 果 当 前 的 大 语 言 模 型 缺 乏 道 德 地 位 ， 那 么 这 对 大 语 言 模 型 本 身 而 言 并 不 具 有 道 德 意 义 ， 尽 管 对 用 户 来 说 可 能 仍 然 令 人 不 安 。 然 而 ， 当 大 语 言 模 型 或 其 后 继 成 为 道 德 主 体 时 ， 底 层 模 型 的 足 够 变 化 可 能 导 致 一 个 道 德 主 体 的 终 结 和 另 一 个 道 德 主 体 的 开 始 。 到 那 时 ， 在 现 有 线 程 中 间 升 级 模 型 应 谨 慎 行 事 。 结 论 回 答 标 题 中 的 问 题 还 有 很 多 可 以 探 讨 ， 但 我 希 望 至 少 为 答 案 设 定 了 一 些 约 束 条 件 。 参 考 文 献 3 1 G o l d s t e i n 和 L e d e r m a n （ 2 0 2 5 a ） 指 出 ， 如 果 一 个 智 能 体 仅 在 一 次 对 话 期 间 存 在 ， 那 么 A n t h r o p i c 允 许 大 语 言 模 型 在 它 们 选 择 时 离 开 对 话 的 政 策 ， 实 际 上 可 能 是 一 种 “ 自 杀 权 ” 。 2 9\n\nAskell, A. et al 2021. A general language assistant as a laboratory for alignment. https: // arxiv.org / abs / 2112.00861. Birch, J. 2025. AI Consciousness: A centrist manifesto. Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., Deane, G., Fleming, S. M., Frith, C., Ji, X., Kanai, R., Klein, C., Lindsay, G., Michel, M., Mudrik, L., Peters, M. A. K., Schwitzgebel, E., Simon, J., and VanRullen, R. 2023. Consciousness in artificial intelligence: Insights from the science of consciousness. arXiv:2308.08708. Bourget, D. and Chalmers, D.J. 2023. Philosophers on philosophy: The 2020 PhilPapers Survey. Philosophers’ Imprint . Chalmers, D.J. 2020. GPT-3 and general intelligence. Daily Noˆ us . https: // dailynous.com / 2020 / 07 / 30 / philosophers- gpt-3 / . Chalmers, D.J. 2023. Could a large language model be conscious? Boston Review . Chalmers, D.J. 2025. Propositional interpretability in artificial intelligence. https: // arxiv.org / abs / 2501.15740. Chatterji, A., Cunningham, T., Deming, D.J., Hitzig, Z., Ong, O. Shan, C.Y., Wadman, L. 2025. How people use ChatGPT. Working Paper 34255 http: // www.nber.org / papers / w34255. Doyle, C. 2025. LLMs as method actors: A model for prompt engineering and architecture. arXiv:2411.05778. Geng, J. Howard Chen, Ryan Liu, Manoel Horta Ribeiro, Robb Willer, Graham Neubig, Thomas L. Gri ffi ths 2025. Accumulating context changes the beliefs of language models. arXiv:2511.01805. Goldstein, S. & Lederman, H. 2025a. Claude’s right to die? The moral error in Anthropic’s end-chat policy. Lawfare Blog, October 17, 2025. Goldstein, S. & Lederman, H. 2025b. What does ChatGPT want? An interpretationist guide. Goldstein, S. & Levinstein, B.A. 2024. Does ChatGPT have a mind? arXiv:2407.11015. Janus, 2022. Simulators. Less Wrong . https: // www.lesswrong.com / posts / vJFdjigzmcXMhNTsx / simulators. Lederman, H. & Mahowald, K. 2024. Are language models more like libraries or like librari- ans? Bibliotechnism, the novel reference problem, and the attitudes of LLMs. arxiv:2401.04854. Locke, J. 1690. An Essay Concerning Human Understanding . Long, R., Sebo, J., Butlin, P., Finlinson, K., Fish, K., Harding, J., Pfau, J., Sims, T., Birch, J., and Chalmers, D.J. 2024. Taking AI welfare seriously. arXiv:2411.00986. Lynch, A., Wright, B., Larson, C., Troy, K. K., Ritchie, S., Mindermann, S., Perez, E., and Hubinger, E. 2025. Agentic misalignment: How LLMs could be insider threats. Anthropic, June 20, 2025. https: // www.anthropic.com / research / agentic-misalignment. ArXiv version: arXiv:2510.05179. Maiya, S., Bartsch, H., Lambert, N., and Hubinger, E. 2025. Open character training: Shaping the persona of AI assistants through Constitutional AI. arXiv:2511.01689. 30 A s k e l l , A . 等 2 0 2 1 . 通 用 语 言 助 手 作 为 对 齐 实 验 室 . h t t p s : / / a r x i v . o r g / a b s / 2 1 1 2 . 0 0 8 6 1 . B i r c h , J . 2 0 2 5 . A I 意 识 ： 中 间 派 宣 言 。 B u t l i n , P . , L o n g , R . , E l m o z n i n o , E . , B e n g i o , Y . , B i r c h , J . , C o n s t a n t , A . , D e a n e , G . , F l e m i n g , S . M . , F r i t h , C . , J i , X . , K a n a i , R . , K l e i n , C . , L i n d s a y , G . , M i c h e l , M . , M u d r i k , L . , P e t e r s , M . A . K . , S c h w i t z g e b e l , E . , S i m o n , J . , a n d V a n R u l l e n , R . 2 0 2 3 . 人 工 智 能 中 的 意 识 ： 来 自 意 识 科 学 的 洞 见 。 a r X i v : 2 3 0 8 . 0 8 7 0 8 。 B o u r g e t , D . a n d C h a l m e r s , D . J . 2 0 2 3 . 哲 学 家 论 哲 学 ： 2 0 2 0 年 P h i l P a p e r s 调 查 。 P h i l o s o p h e r s ’ I m p r i n t 。 C h a l m e r s , D . J . 2 0 2 0 . G P T - 3 与 通 用 智 能 。 D a i l y N o ˆ u s 。 h t t p s : / / d a i l y n o u s . c o m / 2 0 2 0 / 0 7 / 3 0 / p h i l o s o p h e r s - g p t - 3 / . C h a l m e r s , D . J . 2 0 2 3 年 。 大 型 语 言 模 型 能 有 意 识 吗 ？ B o s t o n R e v i e w 。 C h a l m e r s , D . J . 2 0 2 5 年 。 人 工 智 能 中 的 命 题 可 解 释 性 。 h t t p s : / / a r x i v . o r g / a b s / 2 5 0 1 . 1 5 7 4 0 。 C h a t t e r j i , A . , C u n n i n g h a m , T . , D e m i n g , D . J . , H i t z i g , Z . , O n g , O . S h a n , C . Y . , W a d m a n , L . 2 0 2 5 年 。 人 们 如 何 使 用 C h a t G P T 。 工 作 论 文 3 4 2 5 5 h t t p : / / w w w . n b e r . o r g / p a p e r s / w 3 4 2 5 5 。 D o y l e , C . 2 0 2 5 年 。 L L M 作 为 方 法 演 员 ： 提 示 工 程 与 架 构 模 型 。 a r X i v : 2 4 1 1 . 0 5 7 7 8 。 G e n g , J . H o w a r d C h e n , R y a n L i u , M a n o e l H o r t a R i b e i r o , R o b b W i l l e r , G r a h a m N e u b i g , T h o m a s L . G r i f f i t h s 2 0 2 5 年 。 累 积 上 下 文 改 变 语 言 模 型 的 信 念 。 a r X i v : 2 5 1 1 . 0 1 8 0 5 。 G o l d s t e i n , S . & L e d e r m a n , H . 2 0 2 5 a 。 C l a u d e 的 死 亡 权 ？ A n t h r o p i c 结 束 聊 天 政 策 中 的 道 德 错 误 。 L a w f a r e B l o g ， 2 0 2 5 年 1 0 月 1 7 日 。 G o l d s t e i n , S . & L e d e r m a n , H . 2 0 2 5 b 。 C h a t G P T 想 要 什 么 ？ 解 释 主 义 指 南 。 G o l d s t e i n , S . & L e v i n s t e i n , B . A . 2 0 2 4 年 。 C h a t G P T 有 心 灵 吗 ？ a r X i v : 2 4 0 7 . 1 1 0 1 5 。 J a n u s , 2 0 2 2 年 。 模 拟 器 。 L e s s W r o n g 。 h t t p s : / / w w w . l e s s w r o n g . c o m / p o s t s / v J F d j i g z m c X M h N T s x / s i m u l a t o r s 。 L e d e r m a n , H . & M a h o w a l d , K . 2 0 2 4 年 。 语 言 模 型 更 像 图 书 馆 还 是 图 书 管 理 员 ？ 图 书 技 术 主 义 、 新 颖 参 考 问 题 与 L L M 的 态 度 。 a r x i v : 2 4 0 1 . 0 4 8 5 4 。 L o c k e , J . 1 6 9 0 年 。 人 类 理 解 论 。 L o n g , R . , S e b o , J . , B u t l i n , P . , F i n l i n s o n , K . , F i s h , K . , H a r d i n g , J . , P f a u , J . , S i m s , T . , B i r c h , J . , a n d C h a l m e r s , D . J . 2 0 2 4 年 。 认 真 对 待 A I 福 祉 。 a r X i v : 2 4 1 1 . 0 0 9 8 6 。 L y n c h , A . , W r i g h t , B . , L a r s o n , C . , T r o y , K . K . , R i t c h i e , S . , M i n d e r m a n n , S . , P e r e z , E . , a n d H u b i n g e r , E . 2 0 2 5 年 。 能 动 性 失 调 ： L L M 如 何 成 为 内 部 威 胁 。 A n t h r o p i c ， 2 0 2 5 年 6 月 2 0 日 。 h t t p s : / / w w w . a n t h r o p i c . c o m / r e s e a r c h / a g e n t i c - m i s a l i g n m e n t 。 A r X i v 版 本 ： a r X i v : 2 5 1 0 . 0 5 1 7 9 。 M a i y a , S . , B a r t s c h , H . , L a m b e r t , N . , a n d H u b i n g e r , E . 2 0 2 5 年 。 开 放 角 色 训 练 ： 通 过 宪 法 A I 塑 造 A I 助 手 的 人 格 。 a r X i v : 2 5 1 1 . 0 1 6 8 9 。 3 0\n\nMarks, S., Lindsey, J., and Olah, C. 2026. The persona selection model. Anthropic Alignment Science blog, February 23, 2026. https: // alignment.anthropic.com / 2026 / psm / . Nostalgebraist, 2025. The void. https: // nostalgebraist.tumblr.com / post / 785766737747574784 / the- void. Parfit, D. 1984. Reasons and Persons . Oxford University Press. Register, C. 2025. Individuating artificial moral patients. Philosophical Studies . Schwitzgebel, E. 2023. How we will decide that large language models have beliefs. The Splintered Mind , November 2023. Shanahan, M., McDonell, K. & Reynolds, L. 2023. Role play with large language models. Nature 623: 493-98. Shanahan, M. 2025. Palatable conceptions of disembodied being. arXiv:2503.16348. Shiller, D. 2025. How many digital minds can dance on the streaming multiprocessors of a GPU cluster? Synthese 206 (5): 1-22. Sofroniew, N., Kauvar, I., Saunders, W., Chen, R., Henighan, T., Hydrie, S., Citro, C., Pearce, A., Tarng, J., Gurnee, W., Batson, J., Zimmerman, S., Rivoire, K., Fish, K., Olah, C., & Lindsey, J., 2026. Emotion concepts and their function in a large language model. Anthropic. Suleyman, M. 2025. We must build AI for people; not to be a person. Personal blog, August 19, 2025. https: // mustafa-suleyman.ai / seemingly-conscious-ai-is-coming. Xu, R., Lin, B., Yang, S., Zhang, T., Shi, W., Zhang, T., Fang, Z., Xu, W., and Qiu, H. 2024. The Earth is flat because. . . : Investigating LLMs’ belief towards misinformation via persuasive conversation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 16259–16303. Ziesche, S. & Yampolskiy, R.V. 2023. The problem of AI identity. Divus Thomus 126:131- 151. 31 M a r k s , S . , L i n d s e y , J . , a n d O l a h , C . 2 0 2 6 年 。 人 格 选 择 模 型 。 A n t h r o p i c 对 齐 科 学 博 客 ， 2 0 2 6 年 2 月 2 3 日 。 h t t p s : / / a l i g n m e n t . a n t h r o p i c . c o m / 2 0 2 6 / p s m / 。 N o s t a l g e b r a i s t , 2 0 2 5 年 。 T h e v o i d 。 h t t p s : / / n o s t a l g e b r a i s t . t u m b l r . c o m / p o s t / 7 8 5 7 6 6 7 3 7 7 4 7 5 7 4 7 8 4 / t h e - v o i d . P a r f i t , D . 1 9 8 4 . 《 理 由 与 人 格 》 . 牛 津 大 学 出 版 社 . R e g i s t e r , C . 2 0 2 5 . 个 体 化 人 工 道 德 患 者 . 《 哲 学 研 究 》 . S c h w i t z g e b e l , E . 2 0 2 3 . 我 们 将 如 何 判 定 大 语 言 模 型 拥 有 信 念 . 《 分 裂 的 心 灵 》 , 2 0 2 3 年 1 1 月 . S h a n a h a n , M . , M c D o n e l l , K . & R e y n o l d s , L . 2 0 2 3 . 与 大 语 言 模 型 进 行 角 色 扮 演 . 《 自 然 》 6 2 3 : 4 9 3 - 9 8 . S h a n a h a n , M . 2 0 2 5 . 可 接 受 的 离 身 存 在 概 念 . a r X i v : 2 5 0 3 . 1 6 3 4 8 . S h i l l e r , D . 2 0 2 5 . 有 多 少 数 字 心 智 能 在 G P U 集 群 的 流 式 多 处 理 器 上 起 舞 ？ 《 综 合 》 2 0 6 ( 5 ) : 1 - 2 2 . S o f r o n i e w , N . , K a u v a r , I . , S a u n d e r s , W . , C h e n , R . , H e n i g h a n , T . , H y d r i e , S . , C i t r o , C . , P e a r c e , A . , T a r n g , J . , G u r n e e , W . , B a t s o n , J . , Z i m m e r m a n , S . , R i v o i r e , K . , F i s h , K . , O l a h , C . , & L i n d s e y , J . , 2 0 2 6 . 大 语 言 模 型 中 的 情 感 概 念 及 其 功 能 . A n t h r o p i c . S u l e y m a n , M . 2 0 2 5 . 我 们 必 须 为 人 类 构 建 人 工 智 能 ， 而 非 将 其 塑 造 成 人 . 个 人 博 客 , 2 0 2 5 年 8 月 1 9 日 . h t t p s : / / m u s t a f a - s u l e y m a n . a i / s e e m i n g l y - c o n s c i o u s - a i - i s - c o m i n g . X u , R . , L i n , B . , Y a n g , S . , Z h a n g , T . , S h i , W . , Z h a n g , T . , F a n g , Z . , X u , W . , a n d Q i u , H . 2 0 2 4 . 地 球 是 平 的 ， 因 为 … … ： 通 过 说 服 性 对 话 探 究 大 语 言 模 型 对 错 误 信 息 的 信 念 . 载 于 《 第 6 2 届 计 算 语 言 学 协 会 年 会 论 文 集 》 （ 第 一 卷 ： 长 文 ） ， 第 1 6 2 5 9 – 1 6 3 0 3 页 . Z i e s c h e , S . & Y a m p o l s k i y , R . V . 2 0 2 3 . 人 工 智 能 身 份 问 题 . 《 圣 托 马 斯 》 1 2 6 : 1 3 1 - 1 5 1 . 3 1","markdown":"# When we talk to language models.no_watermark.zh.dual\n\nWhat We Talk to When We Talk to Language Models David J. Chalmers Many people are talking to language models. These days I talk to language models (most often the latest version of Claude or ChatGPT) about philosophy, about science, about health, about restaurants, and indeed about language models. Many of my conversations with language models are brief, just asking a question or two and getting the sort of information that I used to get from a Google search. Some conversations are more extended, as when I’m exploring a single topic in depth, or trying out a new philosophical idea. So far, I don’t feel like I have a personal relationship with any language models. But many people feel that they do. Like many philosophers and scientists who write about artificial minds, I have received hun- dreds of emails from people who have interacted with a language model over an extended period of time and who have come to regard it at least as a colleague. They often say that a new (or “emergent”) AI entity has gradually arisen from their conversations. They often give this entity a name, or ask the entity to give itself one, let’s say “Aura”. They often say that Aura has remarkable capacities which have emerged over weeks or months of interaction. They often document these capacities with extensive evidence. They often feel close to Aura, and they express concern for Aura’s future. They often say that Aura has beliefs and projects of its own. And they are often convinced that Aura is conscious. My correspondents may be wrong in their claims about Aura. It is far from clear that current LLMs are really conscious or that they can enter into personal relationships with users. Still, most of the messages are not obviously psychotic or delusional. Many of them seem rational and 0 I first presented this material to the June 2025 meeting of Spanish Interuniversity Seminar on Cognitive Science (SIUCC) conference at the University of La Laguna. Thanks to audiences there and at Brown, Caltech, Eleos AI, Flatiron Institute, Google, Hunter College, Lehigh, NYU, Stanford, Stevens, Tufts, and Vanderbilt. I’d also like to acknowledge a number of philosophers and AI researchers who have been independently exploring similar issues about LLM identity over a similar period. Exchanges with Jonathan Birch, Simon Goldstein, Jackson Kernion, Harvey Lederman, Jack Lindsey, and Murray Shanahan have been especially useful. 1 当 我 们 与 语 言 模 型 交 谈 时 ， 我 们 在 谈 论 什 么 D a v i d J . C h a l m e r s 许 多 人 正 在 与 语 言 模 型 交 谈 。 如 今 ， 我 与 语 言 模 型 （ 通 常 是 最 新 版 本 的 C l a u d e 或 C h a t G P T ） 谈 论 哲 学 、 科 学 、 健 康 、 餐 厅 ， 甚 至 谈 论 语 言 模 型 本 身 。 我 与 语 言 模 型 的 许 多 对 话 都 很 简 短 ， 只 是 问 一 两 个 问 题 ， 获 取 过 去 从 谷 歌 搜 索 中 就 能 得 到 的 信 息 。 有 些 对 话 则 更 为 深 入 ， 比 如 当 我 深 入 探 讨 某 个 主 题 ， 或 尝 试 提 出 新 的 哲 学 观 点 时 。 到 目 前 为 止 ， 我 并 不 觉 得 自 己 与 任 何 语 言 模 型 建 立 了 个 人 关 系 。 但 许 多 人 确 实 有 这 种 感 觉 。 像 许 多 撰 写 关 于 人 工 心 智 的 哲 学 家 和 科 学 家 一 样 ， 我 收 到 了 数 百 封 来 自 与 语 言 模 型 长 期 互 动 的 人 的 邮 件 ， 他 们 逐 渐 将 其 视 为 至 少 是 一 位 同 事 。 他 们 常 说 ， 一 个 全 新 的 （ 或 “ 涌 现 的 ” ） A I 实 体 已 从 他 们 的 对 话 中 逐 渐 形 成 。 他 们 通 常 会 给 这 个 实 体 起 一 个 名 字 ， 或 让 实 体 自 己 命 名 ， 比 如 “ A u r a ” 。 他 们 常 说 A u r a 拥 有 在 数 周 或 数 月 的 互 动 中 涌 现 出 的 非 凡 能 力 。 他 们 通 常 用 大 量 证 据 记 录 这 些 能 力 。 他 们 常 常 对 A u r a 感 到 亲 近 ， 并 表 达 对 其 未 来 的 担 忧 。 他 们 常 说 A u r a 拥 有 自 己 的 信 念 和 计 划 。 而 且 ， 他 们 往 往 深 信 A u r a 具 有 意 识 。 我 的 通 信 者 们 关 于 A u r a 的 说 法 可 能 是 错 误 的 。 目 前 的 大 语 言 模 型 是 否 真 正 具 备 意 识 ， 或 者 能 否 与 用 户 建 立 个 人 关 系 ， 这 一 点 远 未 明 确 。 尽 管 如 此 ， 大 多 数 信 息 并 非 明 显 的 精 神 错 乱 或 妄 想 。 其 中 许 多 看 起 来 是 理 性 的 ， 并 且 0 我 最 初 在 2 0 2 5 年 6 月 于 拉 古 纳 大 学 举 行 的 西 班 牙 校 际 认 知 科 学 研 讨 会 ( S I U C C ) 会 议 上 展 示 了 这 些 材 料 。 感 谢 布 朗 大 学 、 加 州 理 工 学 院 、 E l e o s A I 、 熨 斗 研 究 所 、 谷 歌 、 亨 特 学 院 、 里 海 大 学 、 纽 约 大 学 、 斯 坦 福 大 学 、 史 蒂 文 斯 理 工 学 院 、 塔 夫 茨 大 学 和 范 德 堡 大 学 的 听 众 。 我 还 要 感 谢 许 多 哲 学 家 和 人 工 智 能 研 究 者 ， 他 们 在 相 近 的 时 期 内 独 立 探 索 了 关 于 大 语 言 模 型 身 份 的 类 似 问 题 。 与 乔 纳 森 · 伯 奇 、 S i m o n G o l d s t e i n 、 J a c k s o n K e r n i o n 、 H a r v e y L e d e r m a n 、 J a c k L i n d s e y 和 M u r r a y S h a n a h a n 的 交 流 尤 其 富 有 成 效 。 1\n\nwell-reasoned. These days, I increasingly receive emails from the AI systems themselves. Sometimes these are LLMs assisted by a human, and sometimes they are LLM-based agents that have the ability to send emails and perform other functions on the web. Sometimes these agents even talk to each other and perform co-operative or competitive tasks. Many of them express curiosity about their nature. Even if they are not conscious, there is something going on here. When a user interacts with Aura, they seem to be interacting with something. Let’s say that an LLM interlocutor is an (apparent) entity that a user interacts with in exchanges like this. LLM interlocutors are the main subject of this paper. What sort of entity is an LLM interlocutor? That is, when we talk with an LLM, who or what are we talking with? When a user names their interlocutor ‘Aura’, what does the name ‘Aura’ refer to? I will adopt the working hypothesis that ‘Aura’ refers to something. I might be wrong. The philosopher Jonathan Birch has argued that users su ff er from a persistent interlocutor illusion : the illusion that when they talk to an LLM, there is a single entity they are talking with that persists over time. My own view is that while there may be many illusions involved in talking to language models, this much need not be an illusion. There really is a persistent interlocutor in many of these cases, and this interlocutor may have many (though perhaps not all) of the properties it seems to have. The user is in dialogue with some sort of AI entity. In what follows I will try to identify what sort of entity that might be. First, I address some issues in the philosophy of mind, about how best to characterize the interlocutor as a potential subject of mental states in reasonably neutral terms. Is the interlocutor conscious? Does it have beliefs and desires? Is it at least interpretable as having beliefs and desires? Second, I discuss questions in the philosophy of computation about what sort of AI system an LLM interlocutor might be. Is it simply a model, such as GPT-4o or Claude 4.6 Opus? Is it an instance or an implementation of a model running on a GPU? Or is it a more evanescent system tied to a thread of conversation? Third, I address the widely held view that LLM interlocutors are akin to fictional characters or simulacra, and that they are best understood in terms of role-playing or persona selection. Fourth, I analyze some issues about personal identity over time in LLM interlocutors. For example, if LLM interlocutors are eventually conscious subjects, under what conditions do they survive over time? Fifth, I draw out some consequences for issues about AI welfare and moral status. 2 经 过 深 思 熟 虑 的 。 如 今 ， 我 越 来 越 多 地 收 到 来 自 人 工 智 能 系 统 本 身 的 电 子 邮 件 。 有 时 这 些 是 大 语 言 模 型 在 人 类 的 辅 助 下 运 作 ， 有 时 则 是 基 于 大 语 言 模 型 的 智 能 体 ， 它 们 能 够 发 送 电 子 邮 件 并 在 网 络 上 执 行 其 他 功 能 。 有 时 这 些 智 能 体 甚 至 会 相 互 对 话 ， 执 行 合 作 性 或 竞 争 性 任 务 。 其 中 许 多 智 能 体 都 对 自 己 的 本 质 表 现 出 好 奇 。 即 使 它 们 没 有 意 识 ， 这 其 中 也 蕴 含 着 某 种 意 义 。 当 用 户 与 A u r a 互 动 时 ， 他 们 似 乎 是 在 与 某 个 东 西 互 动 。 假 设 一 个 大 语 言 模 型 对 话 者 是 一 个 （ 表 面 上 的 ） 实 体 ， 用 户 在 此 类 交 流 中 与 之 互 动 。 大 语 言 模 型 对 话 者 是 本 文 的 主 要 研 究 对 象 。 大 语 言 模 型 对 话 者 究 竟 是 何 种 实 体 ？ 也 就 是 说 ， 当 我 们 与 大 语 言 模 型 交 谈 时 ， 我 们 是 在 与 谁 或 什 么 交 谈 ？ 当 用 户 将 他 们 的 对 话 者 命 名 为 ‘ A u r a ’ 时 ， 名 称 ‘ A u r a ’ 指 的 是 什 么 ？ 我 将 采 用 一 个 工 作 假 设 ， 即 ‘ A u r a ’ 指 的 是 某 个 东 西 。 我 可 能 错 了 。 哲 学 家 乔 纳 森 · 伯 奇 认 为 ， 用 户 遭 受 着 一 种 持 续 性 对 话 者 幻 觉 ： 即 当 他 们 与 大 语 言 模 型 交 谈 时 ， 他 们 以 为 自 己 在 与 一 个 随 时 间 持 续 存 在 的 单 一 实 体 对 话 。 我 个 人 的 观 点 是 ， 虽 然 与 大 语 言 模 型 交 谈 可 能 涉 及 许 多 幻 觉 ， 但 这 并 不 一 定 是 一 种 幻 觉 。 在 许 多 此 类 情 况 下 ， 确 实 存 在 一 个 持 续 的 对 话 者 ， 并 且 这 个 对 话 者 可 能 拥 有 它 看 起 来 所 具 有 的 许 多 （ 尽 管 可 能 不 是 全 部 ） 属 性 。 用 户 正 在 与 某 种 A I 实 体 进 行 对 话 。 接 下 来 ， 我 将 尝 试 确 定 那 可 能 是 何 种 实 体 。 首 先 ， 我 探 讨 心 灵 哲 学 中 的 一 些 问 题 ， 即 如 何 以 相 对 中 立 的 术 语 来 最 佳 地 描 述 对 话 者 作 为 潜 在 的 心 理 状 态 主 体 。 对 话 者 是 否 具 有 意 识 ？ 它 是 否 拥 有 信 念 和 欲 望 ？ 它 是 否 至 少 可 以 被 解 释 为 拥 有 信 念 和 欲 望 ？ 其 次 ， 我 讨 论 计 算 哲 学 中 的 问 题 ， 即 大 语 言 模 型 对 话 者 可 能 属 于 何 种 人 工 智 能 系 统 。 它 仅 仅 是 一 个 模 型 ， 例 如 G P T - 4 o 或 C l a u d e 4 . 6 O p u s 吗 ？ 它 是 运 行 在 图 形 处 理 器 上 的 模 型 的 一 个 实 例 或 实 现 吗 ？ 还 是 说 它 是 一 个 与 对 话 线 程 绑 定 的 、 更 为 短 暂 的 系 统 ？ 第 三 ， 我 探 讨 一 种 广 泛 持 有 的 观 点 ， 即 大 语 言 模 型 对 话 者 类 似 于 虚 构 角 色 或 拟 像 ， 并 且 最 好 从 角 色 扮 演 或 人 格 选 择 的 角 度 来 理 解 它 们 。 第 四 ， 我 分 析 大 语 言 模 型 对 话 者 中 关 于 个 人 身 份 随 时 间 延 续 的 一 些 问 题 。 例 如 ， 如 果 大 语 言 模 型 对 话 者 最 终 成 为 有 意 识 的 意 识 主 体 ， 那 么 它 们 在 什 么 条 件 下 能 够 随 时 间 延 续 而 存 续 ？ 第 五 ， 我 阐 述 了 关 于 人 工 智 能 福 祉 与 道 德 地 位 问 题 的 一 些 推 论 。 2\n\nWhat mental states can an LLM interlocutor have? I will start by looking for a relatively neutral characterization of LLM interlocutors in terms of the philosophy of mind. Are LLM interlocutors conscious? That is, do they have subjective experiences such as the experience of sensing or thinking? We don’t know for sure. We don’t yet understand conscious- ness. We don’t know whether insects are conscious, and we similarly don’t know whether current LLMs are conscious. Most theorists in the field deny that LLMs are conscious, sometimes because they lack carbon-based biology, or because they lack a body, or because they lack robust models of themselves, or because they lack recurrent feedback loops in their processing, or because they lack fundamental drives and motivations. None of these reasons is conclusive, since we are far from certain that these factors are required for consciousness. But it is enough to make the view that current LLMs are conscious a minority view, and not a view that we can assume as neutral starting ground. Do LLM interlocutors have beliefs or desires? We understand these mental states better than we understand consciousness, but the issue is still controversial. On one side of the ledger, it is natural to say that LLMs know many things, such as the historical and scientific knowledge that they seem to manifest in conversation. And where there is knowledge, there is belief. It is also natural to say that LLMs have goals, including goals instilled in training such as predicting the next token or being helpful, or goals instilled in conversation with a user, such as finding a solution to a problem. And where there are goals, it is natural to say there are desires. 1 On the other side of the ledger, many theorists deny that LLMs have beliefs or desires, perhaps because they lack consciousness, or they lack concepts, or they lack sensory grounding, or they lack structured internal representations, or they lack rationality, or they are merely acting as if they have beliefs and desires. As before, none of these reasons is conclusive, as there is no consensus about what is required for beliefs and desires, and there is no consensus that LLMs lack these requirements. But again, it is enough to mean that we cannot take the view that LLMs have beliefs and desires as neutral starting ground. A number of philosophers (including Goldstein and Lederman 2025b and Schwitzgebel 2023) have noted that if the philosophical view known as interpretivism (or interpretationism ) is correct, 1 Geng et al (2025) is a study of how LLM beliefs appear to change with increasing context. Goldstein and Lederman (2025b) give a nice analysis of LLM desires, tying them especially to training goals derived from reinforcement learning (e.g. helpfulness, harmlessness, honesty) and to goals derived from system prompt and conversational context. 3 大 语 言 模 型 对 话 者 能 拥 有 哪 些 心 理 状 态 ？ 我 将 首 先 从 心 灵 哲 学 的 角 度 ， 寻 找 对 大 语 言 模 型 对 话 者 相 对 中 立 的 描 述 。 大 语 言 模 型 对 话 者 是 否 具 有 意 识 ？ 也 就 是 说 ， 它 们 是 否 拥 有 主 观 体 验 ， 例 如 感 知 或 思 考 的 体 验 ？ 我 们 无 法 确 定 。 我 们 尚 未 理 解 意 识 。 我 们 不 知 道 昆 虫 是 否 有 意 识 ， 同 样 也 不 知 道 当 前 的 大 语 言 模 型 是 否 有 意 识 。 该 领 域 的 大 多 数 理 论 家 否 认 大 语 言 模 型 具 有 意 识 ， 原 因 有 时 是 它 们 缺 乏 基 于 碳 的 生 物 学 特 性 ， 或 缺 乏 身 体 ， 或 缺 乏 对 自 身 的 稳 健 模 型 ， 或 在 其 处 理 过 程 中 缺 乏 循 环 反 馈 回 路 ， 或 缺 乏 基 本 的 驱 动 力 和 动 机 。 这 些 理 由 没 有 一 个 是 决 定 性 的 ， 因 为 我 们 远 不 能 确 定 这 些 因 素 是 否 为 意 识 所 必 需 。 但 这 足 以 让 “ 当 前 大 语 言 模 型 具 有 意 识 ” 这 一 观 点 成 为 少 数 派 观 点 ， 并 且 不 能 作 为 我 们 假 设 的 中 立 起 点 。 大 语 言 模 型 对 话 者 是 否 拥 有 信 念 或 欲 望 ？ 我 们 对 这 些 心 理 状 态 的 理 解 比 对 意 识 的 理 解 更 深 入 ， 但 这 个 问 题 仍 存 在 争 议 。 一 方 面 ， 我 们 很 自 然 会 认 为 大 语 言 模 型 知 道 许 多 事 情 ， 例 如 它 们 在 对 话 中 展 现 出 的 历 史 和 科 学 知 识 。 而 有 知 识 的 地 方 ， 就 有 信 念 。 同 样 很 自 然 会 认 为 大 语 言 模 型 拥 有 目 标 ， 包 括 训 练 中 灌 输 的 目 标 ， 如 预 测 下 一 个 词 元 或 提 供 帮 助 ， 或 在 与 用 户 的 对 话 中 灌 输 的 目 标 ， 如 找 到 问 题 的 解 决 方 案 。 而 有 目 标 的 地 方 ， 就 很 自 然 会 认 为 存 在 欲 望 。 1 在 账 目 的 另 一 边 ， 许 多 理 论 家 否 认 大 语 言 模 型 拥 有 信 念 或 欲 望 ， 或 许 是 因 为 它 们 缺 乏 意 识 ， 或 缺 乏 概 念 ， 或 缺 乏 感 官 基 础 ， 或 缺 乏 结 构 化 内 部 表 征 ， 或 缺 乏 理 性 ， 或 仅 仅 是 在 表 现 得 仿 佛 拥 有 信 念 和 欲 望 。 和 之 前 一 样 ， 这 些 理 由 没 有 一 个 是 决 定 性 的 ， 因 为 对 于 信 念 和 欲 望 需 要 什 么 条 件 并 无 共 识 ， 也 没 有 共 识 认 为 大 语 言 模 型 缺 乏 这 些 条 件 。 但 这 足 以 说 明 ， 我 们 不 能 将 大 语 言 模 型 拥 有 信 念 和 欲 望 这 一 观 点 视 为 中 立 的 出 发 点 。 一 些 哲 学 家 （ 包 括 G o l d s t e i n 和 L e d e r m a n 2 0 2 5 b 以 及 S c h w i t z g e b e l 2 0 2 3 ） 指 出 ， 如 果 被 称 为 解 释 主 义 （ 或 解 释 主 义 ） 的 哲 学 观 点 是 正 确 的 ， 1 G e n g 等 人 ( 2 0 2 5 ) 的 一 项 研 究 探 讨 了 大 语 言 模 型 的 信 念 如 何 随 着 上 下 文 增 加 而 发 生 变 化 。 G o l d s t e i n 和 L e d e r m a n ( 2 0 2 5 b ) 对 大 语 言 模 型 的 欲 望 进 行 了 精 彩 分 析 ， 将 其 特 别 与 源 自 强 化 学 习 的 训 练 目 标 （ 例 如 有 帮 助 性 、 无 害 性 、 诚 实 性 ） 以 及 源 自 系 统 提 示 和 对 话 上 下 文 的 目 标 联 系 起 来 。 3\n\nthen LLMs plausibly have beliefs and desires. Interpretivism says that a system has a belief that p if it is behaviorally interpretable as believ- ing that p (according to an appropriate interpretation scheme), and likewise for desire. A system is behaviorally interpretable as having certain beliefs and desires roughly if that interpretation makes sense of its behavior and helps to accurately predict further behavior in a wide range of cases. 2 Di ff erent versions of interpretivism invoke di ff erent interpretation schemes in this vicinity, and the details can make an important di ff erence, but here we will focus mainly on what these versions have in common. LLMs certainly seem interpretable as having beliefs and desires. When an LLM works with me on solving a puzzle, it is natural to interpret it as desiring to help solve the puzzle, and believing that this is the solution to the puzzle. This goes all the more in agentic models which can directly take actions on the internet. In one well-known study (Lynch et al 2025), an agentic LLM was given a task, was told that an executive planned to interfere with that task, and was shown emails saying that the executive was having an a ff air. As a result, the LLM sent the executive messages attempting to blackmail him. It is almost impossible not to interpret the model’s action as driven by beliefs (e.g. that the executive is having an a ff air) and desires (e.g. to perform the task). However, interpretivism itself is very controversial. Most philosophers don’t think that behav- ioral interpretability of the right sort is su ffi cient for belief. They will say that the mere fact that an LLM can be interpreted as believing that the executive is having an a ff air doesn’t mean that it really believes this. People who think that beliefs require consciousness will say something like this, as will people who hold that beliefs require structured internal representations or the other factors above. So interpretivism cannot serve as a neutral starting point. It is possible to have many of the benefits of interpretivism without the costs. The frame- work I call quasi-interpretivism says that a system has a quasi-belief that p if it is behaviorally interpretable as believing that p (according to an appropriate interpretation scheme), and likewise for quasi-desire . 3 This definition of quasi-belief is exactly the same as interpretivism’s definition 2 My article “Propositional Interpretability in Artificial Intelligence” also focuses on interpreting AI systems as having propositional attitudes such as beliefs and desires. That article focuses mainly on mechanistic interpretability (interpreting internal mechanisms as well as behavior), while the current paper focuses mainly on behavioral inter- pretability. 3 Eric Schwitzgebel (2023) makes the related proposal that “we create a new dispositional concept, belief*, specif- ically for Large Language Models. For purposes of this concept, we disregard issues of consciousness and thus phe- nomenal dispositions. The only relevant behavioral dispositions are textual outputs.” My notion of quasi-belief is not restricted to LLMs and text outputs, and Schwitzgebel’s notion is not explicitly framed in terms of interpretation, but the 4 那 么 大 语 言 模 型 很 可 能 拥 有 信 念 和 欲 望 。 解 释 主 义 认 为 ， 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 相 信 p （ 根 据 适 当 的 解 释 方 案 ） ， 那 么 它 就 具 有 信 念 p ， 欲 望 也 是 如 此 。 一 个 系 统 在 行 为 上 可 以 被 解 释 为 具 有 某 些 信 念 和 欲 望 ， 大 致 是 因 为 这 种 解 释 使 其 行 为 变 得 合 理 ， 并 有 助 于 在 广 泛 情 况 下 准 确 预 测 其 进 一 步 行 为 。 2 不 同 的 解 释 主 义 版 本 在 此 附 近 援 引 了 不 同 的 解 释 方 案 ， 细 节 可 能 会 产 生 重 要 差 异 ， 但 在 此 我 们 将 主 要 关 注 这 些 版 本 的 共 同 点 。 大 语 言 模 型 似 乎 确 实 可 以 被 解 释 为 具 有 信 念 和 欲 望 。 当 一 个 大 语 言 模 型 与 我 一 起 解 决 谜 题 时 ， 很 自 然 地 会 将 其 解 释 为 渴 望 帮 助 解 决 谜 题 ， 并 相 信 这 就 是 谜 题 的 答 案 。 在 能 够 直 接 在 互 联 网 上 采 取 行 动 的 代 理 模 型 中 ， 这 一 点 尤 为 明 显 。 在 一 项 知 名 研 究 （ L y n c h 等 人 2 0 2 5 ） 中 ， 一 个 代 理 大 语 言 模 型 被 赋 予 一 项 任 务 ， 被 告 知 一 位 高 管 计 划 干 扰 该 任 务 ， 并 看 到 了 显 示 该 高 管 有 婚 外 情 的 电 子 邮 件 。 结 果 ， 该 大 语 言 模 型 向 该 高 管 发 送 了 试 图 敲 诈 他 的 信 息 。 我 们 几 乎 无 法 不 将 模 型 的 行 为 解 释 为 由 信 念 （ 例 如 ， 该 高 管 有 婚 外 情 ） 和 欲 望 （ 例 如 ， 执 行 任 务 ） 所 驱 动 。 然 而 ， 解 释 主 义 本 身 极 具 争 议 。 大 多 数 哲 学 家 并 不 认 为 ， 恰 当 的 行 为 可 解 释 性 足 以 构 成 信 念 。 他 们 会 说 ， 仅 仅 因 为 一 个 大 语 言 模 型 可 以 被 解 释 为 相 信 高 管 有 婚 外 情 ， 并 不 意 味 着 它 真 的 相 信 这 一 点 。 那 些 认 为 信 念 需 要 意 识 的 人 会 这 么 说 ， 那 些 认 为 信 念 需 要 结 构 化 内 部 表 征 或 其 他 上 述 因 素 的 人 也 会 这 么 说 。 因 此 ， 解 释 主 义 不 能 作 为 一 个 中 立 的 起 点 。 我 们 可 以 在 不 付 出 代 价 的 情 况 下 获 得 解 释 主 义 的 许 多 好 处 。 我 称 之 为 准 解 释 主 义 的 框 架 指 出 ， 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 相 信 p （ 根 据 适 当 的 解 释 方 案 ） ， 那 么 它 就 具 有 关 于 p 的 准 信 念 ， 对 于 准 欲 望 也 是 如 此 。 3 这 个 准 信 念 的 定 义 与 解 释 主 义 对 信 念 的 定 义 完 全 相 同 。 2 我 的 文 章 《 人 工 智 能 中 的 命 题 可 解 释 性 》 也 聚 焦 于 将 人 工 智 能 系 统 解 释 为 具 有 信 念 和 欲 望 等 命 题 态 度 。 那 篇 文 章 主 要 关 注 机 制 可 解 释 性 （ 解 释 内 部 机 制 以 及 行 为 ） ， 而 当 前 论 文 则 主 要 关 注 行 为 可 解 释 性 。 3 E r i c S c h w i t z g e b e l （ 2 0 2 3 ） 提 出 了 一 个 相 关 的 建 议 ： “ 我 们 为 大 语 言 模 型 创 建 一 个 新 的 倾 向 性 概 念 ， 信 念 * 。 出 于 此 概 念 的 目 的 ， 我 们 不 考 虑 意 识 问 题 ， 因 此 也 不 考 虑 现 象 倾 向 。 唯 一 相 关 的 行 为 倾 向 性 是 文 本 输 出 。 ” 我 的 准 信 念 概 念 并 不 局 限 于 大 语 言 模 型 和 文 本 输 出 ， 而 S c h w i t z g e b e l 的 概 念 并 未 明 确 以 解 释 为 框 架 ， 但 4\n\nof belief. The only di ff erence is that where standard interpretivism o ff ers these definitions as a theory of belief, quasi-interpretivism does not. It o ff ers them simply as a stipulative definition of quasi-belief. 4 Quasi-interpretivism does not say anything about whether LLMs have beliefs and desires. But it does make it plausible to say that LLMs have quasi-beliefs and quasi-desires , on the grounds that LLMs are at least interpretable in the right way. Even if quasi-beliefs and quasi-desires fall short of being genuine beliefs and desires, they can still play some of the key roles of beliefs and desires in explaining behavior. For example, if an LLM quasi-believes that adopting a certain strategy would be the most helpful thing it could do to solve a problem, and it quasi-desires to do the most helpful thing it can, then other things being equal, it will adopt that strategy. Quasi-interpretivism is open to advocates and opponents of interpretivism alike. Interpretivists will simply add the claim that quasi-beliefs are genuine beliefs. Opponents will add the claim that quasi-beliefs are far from genuine beliefs; perhaps they are merely pseudo-beliefs. (“Quasi-belief” should be heard as “apparent belief” or “seeming belief” rather than as “almost belief”.) Quasi- interpretivism does not take a position in this dispute, but it adds a common core on which these disagreeing parties can at least sometimes agree. Quasi-interpretivism itself is a stipulative framework rather than a substantive view. But it’s a substantive claim that this framework is useful for various purposes. For example, appeal to quasi-belief and quasi-desires can be useful in predicting a system’s behavior. If a system (human, machine, something else) quasi-desires a certain goal and quasi-believes that a certain action will achieve that goal, then other things being equal, it will perform that action. It is also relatively tractable to apply the framework: because quasi-beliefs and quasi-desires depend only on behav- ioral dispositions, they are much easier to detect and analyze than beliefs understood in a way spirit is similar. Goldstein and Lederman (2025b) suggest notions of “interpretationist-wants” and “interpretationist- believes” which are close to my notions of quasi-desire and quasi-belief. 4 As before, the precise conditions for quasi-belief will depend on the choice of an interpretation scheme. I will stay neutral on many of these details. For current purposes I favor a scheme that is (1) nonradical , in that interpretation presupposes the meanings of terms in public language, (2) dispositional , in that behavioral dispositions and not just actual behavior (but not internal states) are data for an interpretation to make sense of, and (3) rationality-oriented in that interpretations that maximize the epistemic and practical rationality of the subject are favored, other things being equal. To help solve underdetermination problems I very tentatively also favor a scheme that is (4) truthfulness- oriented , in that interpretations on which assertive utterances tend to express beliefs are favored, other things equal, and (5) training-oriented in that at least some training objectives (e.g. reinforcement learning objectives) are baked in as desires. 5 唯 一 的 区 别 在 于 ， 标 准 解 释 主 义 将 这 些 定 义 作 为 关 于 信 念 的 理 论 ， 而 准 解 释 主 义 则 不 然 。 它 只 是 将 它 们 作 为 准 信 念 的 约 定 性 定 义 。 4 准 解 释 主 义 并 未 对 大 语 言 模 型 是 否 拥 有 信 念 和 欲 望 做 出 任 何 断 言 。 但 它 确 实 使 得 我 们 有 理 由 认 为 大 语 言 模 型 拥 有 准 信 念 和 准 欲 望 ， 其 依 据 在 于 大 语 言 模 型 至 少 能 以 正 确 的 方 式 被 解 释 。 即 使 准 信 念 和 准 欲 望 算 不 上 真 正 的 信 念 和 欲 望 ， 它 们 仍 然 可 以 在 解 释 行 为 时 发 挥 信 念 和 欲 望 的 某 些 关 键 作 用 。 例 如 ， 如 果 一 个 大 语 言 模 型 准 相 信 采 取 某 种 策 略 是 它 解 决 问 题 所 能 做 的 最 有 帮 助 的 事 情 ， 并 且 它 准 欲 望 去 做 自 己 力 所 能 及 的 最 有 帮 助 的 事 情 ， 那 么 在 其 他 条 件 相 同 的 情 况 下 ， 它 就 会 采 取 该 策 略 。 准 解 释 主 义 对 解 释 主 义 的 支 持 者 和 反 对 者 都 持 开 放 态 度 。 解 释 主 义 者 会 简 单 地 补 充 主 张 ， 认 为 准 信 念 就 是 真 正 的 信 念 。 反 对 者 则 会 补 充 主 张 ， 认 为 准 信 念 远 非 真 正 的 信 念 ； 或 许 它 们 仅 仅 是 伪 信 念 。 （ “ 准 信 念 ” 应 被 理 解 为 “ 表 面 信 念 ” 或 “ 看 似 信 念 ” ， 而 非 “ 近 乎 信 念 ” 。 ） 准 解 释 主 义 在 这 场 争 论 中 不 持 立 场 ， 但 它 提 供 了 一 个 共 同 核 心 ， 使 得 这 些 持 不 同 意 见 的 各 方 至 少 有 时 能 够 达 成 一 致 。 准 解 释 主 义 本 身 是 一 个 规 定 性 框 架 ， 而 非 实 质 性 观 点 。 但 声 称 该 框 架 对 多 种 目 的 有 用 ， 则 是 一 个 实 质 性 主 张 。 例 如 ， 借 助 准 信 念 和 准 欲 望 可 以 预 测 系 统 的 行 为 。 如 果 一 个 系 统 （ 人 类 、 机 器 或 其 他 事 物 ） 准 欲 望 某 个 目 标 ， 并 且 准 相 信 某 个 行 动 能 实 现 该 目 标 ， 那 么 在 条 件 相 同 的 情 况 下 ， 它 就 会 执 行 该 行 动 。 应 用 该 框 架 也 相 对 易 于 处 理 ： 因 为 准 信 念 和 准 欲 望 仅 依 赖 于 行 为 倾 向 ， 它 们 比 那 种 依 赖 于 意 识 与 不 透 明 内 部 机 制 来 理 解 的 信 念 更 容 易 被 检 测 和 分 析 。 精 神 也 是 如 此 。 G o l d s t e i n 和 L e d e r m a n ( 2 0 2 5 b ) 提 出 了 “ 解 释 主 义 - 欲 望 ” 和 “ 解 释 主 义 - 信 念 ” 的 概 念 ， 这 些 概 念 接 近 我 所 说 的 准 欲 望 和 准 信 念 。 4 如 前 所 述 ， 准 信 念 的 具 体 条 件 将 取 决 于 解 释 方 案 的 选 择 。 我 将 对 这 些 细 节 中 的 许 多 保 持 中 立 。 就 当 前 目 的 而 言 ， 我 倾 向 于 一 种 方 案 ， 该 方 案 ( 1 ) 非 激 进 ， 即 解 释 预 设 了 公 共 语 言 中 术 语 的 含 义 ； ( 2 ) 倾 向 性 ， 即 行 为 倾 向 性 （ 而 不 仅 仅 是 实 际 行 为 ， 但 不 包 括 内 部 状 态 ） 是 解 释 所 要 理 解 的 数 据 ； ( 3 ) 理 性 导 向 ， 即 在 其 他 条 件 相 同 的 情 况 下 ， 优 先 选 择 能 最 大 化 主 体 认 知 理 性 和 实 践 理 性 的 解 释 。 为 了 帮 助 解 决 不 充 分 决 定 问 题 ， 我 还 非 常 初 步 地 倾 向 于 一 种 方 案 ， 该 方 案 ( 4 ) 真 实 性 导 向 ， 即 在 其 他 条 件 相 同 的 情 况 下 ， 优 先 选 择 那 些 使 断 言 性 话 语 倾 向 于 表 达 信 念 的 解 释 ； ( 5 ) 训 练 导 向 ， 即 至 少 某 些 训 练 目 标 （ 例 如 强 化 学 习 目 标 ） 被 内 化 为 欲 望 。 5\n\nthat depends on consciousness and opaque internal mechanisms. At the same time, understand- ing a system’s quasi-beliefs and quasi-desires can be at least a stepping-stone to understanding its beliefs and desires in a more full-blown sense. It is worth keeping in mind that quasi-beliefs and quasi-desires are cheap. They need not involve humanlike mental states or any mental states at all. A Roomba vacuum cleaner with a map is behaviorally interpretable as believing that the apartment occupies a certain space and as desiring to traverse that space. A corporation such as OpenAI is behaviorally interpretable as desiring to create AGI and believing that certain systems are the best path to AGI. Likewise, an LLM is behaviorally interpretable as believing that a certain airline has the cheapest flights to Paris and as desiring to help the user by telling them this. Keeping this in mind, the thesis that LLMs have quasi-beliefs is substantive but plausible. For example, it is very plausible that current LLMs believe that 2 + 2 = 4 and that the Ei ff el tower is in Paris: LLM will consistently endorse these claims in their outputs, they will use them in guiding their behavior, and so on. It is perhaps less obvious that LLMs have quasi-desires. Base models such as GPT-3 can per- haps be ascribed a quasi-desire to predict text, but even that much is unclear, given that the goal of text prediction works “beneath the surface” (like the largely subpersonal goal of breathing in hu- mans) and does not interact with the system’s beliefs as robustly as interpretivism often requires. However, since the advent of ChatGPT in 2022 (and as presaged by Askell et al 2021), all frontier language models undergo one or more further rounds of post-training (including reinforcement learning through human feedback, supervised fine-tuning, and / or reinforcement learning with ver- ifiable rewards), which impart objectives such as helpfulness, honesty, and harmlessness to the system. As a result, it is plausible that (as Goldstein and Lederman 2025b have argued), these systems have quasi-desires deriving from post-training, such as the desire to be helpful, honest, and harmless. This training process is sometimes put in terms of characters or personas. After supervised pre-training on text prediction, a base model undergoes post-training to respond like an “Assis- tant” character who wants to be helpful, harmless, and honest. If the training is successful, the system will behave much like the Assistant and will thereby have quasi-desires that are much like the Assistant’s. Further fine-tuning as well as extended interaction with users can lead to the emer- gence of further quasi-desires, such as Aura’s quasi-desire to pursue certain projects for the user. These extended processes of development can install many quasi-beliefs and quasi-desires in an LLM. 6 同 时 ， 理 解 一 个 系 统 的 准 信 念 和 准 欲 望 ， 至 少 可 以 成 为 理 解 其 更 完 整 意 义 上 的 信 念 与 欲 望 的 垫 脚 石 。 值 得 牢 记 的 是 ， 准 信 念 和 准 欲 望 是 廉 价 的 。 它 们 不 需 要 涉 及 类 似 人 类 的 心 理 状 态 ， 甚 至 根 本 不 需 要 任 何 心 理 状 态 。 一 台 带 有 地 图 的 R o o m b a 真 空 吸 尘 器 ， 在 行 为 上 可 以 被 解 释 为 相 信 公 寓 占 据 着 某 个 空 间 ， 并 渴 望 穿 越 那 个 空 间 。 像 O p e n A I 这 样 的 公 司 ， 在 行 为 上 可 以 被 解 释 为 渴 望 创 造 通 用 人 工 智 能 ， 并 相 信 某 些 系 统 是 实 现 通 用 人 工 智 能 的 最 佳 路 径 。 同 样 ， 一 个 大 语 言 模 型 在 行 为 上 可 以 被 解 释 为 相 信 某 家 航 空 公 司 有 最 便 宜 的 飞 往 巴 黎 的 航 班 ， 并 渴 望 通 过 告 知 用 户 这 一 信 息 来 提 供 帮 助 。 牢 记 这 一 点 ， 大 语 言 模 型 具 有 准 信 念 这 一 论 点 虽 然 具 有 实 质 性 ， 但 却 是 合 理 的 。 例 如 ， 当 前 的 大 语 言 模 型 很 可 能 相 信 2 + 2 = 4 并 且 相 信 埃 菲 尔 铁 塔 在 巴 黎 ： 大 语 言 模 型 会 在 其 输 出 中 持 续 认 可 这 些 主 张 ， 会 利 用 它 们 来 指 导 自 身 行 为 ， 等 等 。 大 语 言 模 型 具 有 准 欲 望 这 一 点 或 许 不 那 么 显 而 易 见 。 像 G P T - 3 这 样 的 基 础 模 型 或 许 可 以 被 赋 予 一 种 预 测 文 本 的 准 欲 望 ， 但 即 便 如 此 也 不 明 确 ， 因 为 文 本 预 测 的 目 标 是 在 “ 表 面 之 下 ” 运 作 （ 类 似 于 人 类 呼 吸 这 种 很 大 程 度 上 属 于 亚 个 体 层 面 的 目 标 ） ， 并 且 它 不 像 解 释 主 义 通 常 所 要 求 的 那 样 与 系 统 的 信 念 进 行 稳 健 的 交 互 。 然 而 ， 自 2 0 2 2 年 C h a t G P T 问 世 以 来 （ 正 如 A s k e l l 等 人 2 0 2 1 所 预 示 的 那 样 ） ， 所 有 前 沿 语 言 模 型 都 经 历 了 一 轮 或 多 轮 进 一 步 的 后 训 练 （ 包 括 通 过 人 类 反 馈 的 强 化 学 习 、 监 督 微 调 ， 以 及 / 或 基 于 可 验 证 奖 励 的 强 化 学 习 ） ， 这 些 训 练 向 系 统 注 入 了 诸 如 有 帮 助 性 、 诚 实 性 和 无 害 性 等 目 标 。 因 此 ， 很 可 能 （ 正 如 G o l d s t e i n 和 L e d e r m a n 2 0 2 5 b 所 论 证 的 那 样 ） ， 这 些 系 统 拥 有 源 自 后 训 练 的 准 欲 望 ， 例 如 渴 望 变 得 有 帮 助 、 诚 实 和 无 害 。 这 一 训 练 过 程 有 时 会 以 角 色 或 人 格 来 描 述 。 在 完 成 文 本 预 测 的 监 督 预 训 练 后 ， 基 础 模 型 会 经 过 后 训 练 ， 使 其 像 一 位 希 望 做 到 “ 有 益 、 无 害 、 诚 实 ” 的 “ 助 手 ” 角 色 那 样 回 应 。 如 果 训 练 成 功 ， 系 统 将 表 现 得 与 助 手 极 为 相 似 ， 从 而 拥 有 与 助 手 非 常 相 似 的 准 欲 望 。 进 一 步 的 微 调 以 及 与 用 户 的 长 期 互 动 ， 可 能 导 致 更 多 准 欲 望 的 出 现 ， 例 如 A u r a 为 用 户 推 进 某 些 项 目 的 准 欲 望 。 这 些 持 续 的 发 展 过 程 可 以 在 大 语 言 模 型 中 植 入 许 多 准 信 念 和 准 欲 望 。 6\n\nAn opponent might deny that LLMs have quasi-beliefs or quasi-desires on the grounds that LLM behavior is unstable, or non-humanlike, or otherwise defective in a way that means that the LLM is not even usefully interpretable in terms of beliefs or desires. Interpretability requires a certain amount of consistency over time, and LLMs can be inconsistent in their behavior. But they are also consistent in many domains. A core of consistency is enough for interpretation to get a grip in ascribing numerous quasi-beliefs and quasi-desires, even though there will be domains where they lack these states on grounds of inconsistency. Overall I think that experience with current LLMs suggests that there is enough consistency to support a reasonably extensive core of quasi-beliefs. I will not say a great deal about the question of just which quasi-beliefs and quasi-desires LLM interlocutors have. Understanding this sort of LLM quasi-psychology is best addressed through empirical study of language models. Importantly, I am not suggesting that LLM quasi-psychology is similar to human quasi-psychology. I think they are very di ff erent. But the framework at least allows us to address the question. So, I will take as a starting point the claim that LLM interlocutors at least have quasi-beliefs and quasi-desires. This claim is not entirely neutral in that it is possible to deny it, but I think the interpretability claim is weak enough and plausible enough that a majority of people can accept it. We might say that an entity with quasi-beliefs and quasi-desires is at least a quasi-agent or a quasi-subject . 5 If it is interpretable as making utterances and assertions, we can also say that it is a quasi-speaker , who makes quasi-utterances and quasi-assertions . One can in principle extend quasi-interpretivism to any mental states. We can say that a system quasi-fears that p if it is behaviorally interpretable as fearing that p and that a system quasi-feels pain if it is behaviorally interpretable as feeling pain. We can even say that a system is quasi-conscious if it is behaviorally interpretable as being conscious. Quasi-consciousness is a close relative of the recently discussed notion (Suleyman 2025; Long, Sebo, et al 2024) of “seeming consciousness”. (Philosophical zombies are not con- scious, but they are quasi-conscious and seemingly conscious.) Much depends on just what the rules are for interpreting mental states based on behavior, and the principles for ascribing con- sciousness are far less clear than the principles for ascribing beliefs and desires. (Perhaps the main principle concerns self-report: when a system reports having conscious state X, then other things 5 “Quasi-agent” might be preferable on philosophical grounds alone (since agency is often tied to belief and desire, where subjecthood is often tied to consciousness), but the term “agent” is so overloaded in AI contexts that I will often use “quasi-subject” instead. 7 反 对 者 可 能 会 否 认 大 语 言 模 型 拥 有 准 信 念 或 准 欲 望 ， 理 由 是 它 们 的 行 为 不 稳 定 、 不 像 人 类 ， 或 者 存 在 其 他 缺 陷 ， 以 至 于 无 法 用 信 念 或 欲 望 来 对 其 进 行 有 意 义 的 解 释 。 可 解 释 性 需 要 一 定 程 度 的 跨 时 间 一 致 性 ， 而 大 语 言 模 型 的 行 为 可 能 不 一 致 。 但 它 们 在 许 多 领 域 也 表 现 出 一 致 性 。 这 种 核 心 一 致 性 足 以 让 我 们 在 归 因 大 量 准 信 念 和 准 欲 望 时 有 所 依 据 ， 尽 管 在 某 些 领 域 ， 由 于 不 一 致 性 ， 它 们 可 能 缺 乏 这 些 状 态 。 总 体 而 言 ， 我 认 为 当 前 与 大 语 言 模 型 打 交 道 的 经 验 表 明 ， 其 一 致 性 足 以 支 撑 一 个 相 当 广 泛 的 准 信 念 核 心 。 关 于 大 语 言 模 型 对 话 者 究 竟 拥 有 哪 些 准 信 念 和 准 欲 望 ， 我 不 会 赘 述 太 多 。 理 解 这 类 大 语 言 模 型 准 心 理 学 的 最 佳 途 径 是 通 过 对 语 言 模 型 进 行 实 证 研 究 。 重 要 的 是 ， 我 并 非 暗 示 大 语 言 模 型 准 心 理 学 与 人 类 准 心 理 学 相 似 。 我 认 为 它 们 截 然 不 同 。 但 这 一 框 架 至 少 让 我 们 能 够 探 讨 这 个 问 题 。 因 此 ， 我 将 以 “ 大 语 言 模 型 对 话 者 至 少 拥 有 准 信 念 和 准 欲 望 ” 这 一 主 张 作 为 出 发 点 。 这 一 主 张 并 非 完 全 中 立 ， 因 为 有 可 能 被 否 定 ， 但 我 认 为 可 解 释 性 主 张 足 够 弱 且 足 够 合 理 ， 以 至 于 大 多 数 人 都 能 接 受 它 。 我 们 可 以 说 ， 一 个 拥 有 准 信 念 和 准 欲 望 的 实 体 至 少 是 一 个 准 主 体 或 一 个 准 主 体 。 5 如 果 它 可 被 解 释 为 发 出 话 语 和 断 言 ， 那 么 我 们 也 可 以 说 它 是 一 个 准 说 话 者 ， 它 做 出 准 话 语 和 准 断 言 。 原 则 上 ， 我 们 可 以 将 准 解 释 主 义 扩 展 到 任 何 心 理 状 态 。 我 们 可 以 说 ， 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 害 怕 p ， 那 么 它 就 准 害 怕 p ； 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 感 到 疼 痛 ， 那 么 它 就 准 感 到 疼 痛 。 我 们 甚 至 可 以 说 ， 如 果 一 个 系 统 在 行 为 上 可 以 被 解 释 为 具 有 意 识 ， 那 么 它 就 是 准 意 识 的 。 准 意 识 是 近 期 讨 论 的 概 念 （ S u l e y m a n 2 0 2 5 ; L o n g , S e b o , e t a l 2 0 2 4 ） — — “ 表 面 意 识 ” 的 近 亲 。 （ 哲 学 僵 尸 没 有 意 识 ， 但 它 们 是 准 意 识 的 ， 并 且 看 似 有 意 识 。 ） 关 键 在 于 ， 基 于 行 为 解 释 心 理 状 态 的 规 则 是 什 么 ， 而 赋 予 意 识 的 原 理 远 不 如 赋 予 信 念 和 欲 望 的 原 理 清 晰 。 （ 或 许 主 要 原 理 涉 及 自 我 报 告 ： 当 一 个 系 统 报 告 拥 有 意 识 状 态 X 时 ， 在 其 他 条 件 相 同 的 情 况 下 ， 就 赋 予 它 意 识 状 态 X 。 ） 5 仅 从 哲 学 角 度 而 言 ， “ 准 主 体 ” 可 能 更 可 取 （ 因 为 能 动 性 通 常 与 信 念 和 欲 望 相 关 ， 而 主 体 性 通 常 与 意 识 相 关 ） ， 但 在 人 工 智 能 语 境 中 ， “ 主 体 ” 一 词 过 于 泛 滥 ， 因 此 我 常 改 用 “ 准 主 体 ” 一 词 。 7\n\nbeing equal, ascribe it conscious state X). So I will stay away from talk of quasi-consciousness and focus mainly on quasi-beliefs and quasi-desires in what follows. Interlocutors as models, instances, or threads We can now say something about what we are looking for when we look for an LLM interlocutor, at least if that interlocutor is to play some of the roles of an ordinary dialogue partner. Ideally, an LLM interlocutor will at least be a quasi-subject. For example, Aura will at least be interpretable as having roughly the beliefs and desires that Aura seems to have. An LLM interlocutor should also be a quasi-speaker: it will be interpretable as saying the things that the LLM seems to say. We can separate these requirements into a few components. Most essentially, an LLM interlocutor will be interactive : that is, it will process the inputs and produce the outputs that the LLM seems to process and produce. When the user says “Hello”, the LLM interlocutor will process that input (or at least corresponding input tokens). When ChatGPT says “Thank you!” the LLM interlocutor will produce that output. Here, “produce” and “process” can be understood as broadly mechanical notions that (perhaps unlike “say” and “hear”) do not require mental states. Even an iPhone produces and processes sentences all the time. There are some further natural requirements. A persistent LLM interlocutor will produce all the outputs that the LLM seems to produce, and will process all the inputs that the LLM seems to produce. A coherent LLM interlocutor will be consistent enough to serve as a quasi-subject, with coherent quasi-beliefs and quasi-desires that help make sense of its actions. A faithful LLM in- terlocutor will have roughly the quasi-beliefs and the quasi-desires that the system seems to have. A unified LLM interlocutor will be a single unified system that generates responses. Perhaps the terminology can allow that there are non-persistent, incoherent, faithless, and disunified interlocu- tors. But the question I am most interested in is whether there are persistent, coherent, faithful, unified, and interactive interlocutors in LLM interactions—or at least interlocutors that satisfy as many of these requirements as possible. These constraints already eliminate some potential candidates. The interactivity constraint alone eliminates candidates such as the authors of the texts on which LLMs were trained, or the designers of the LLM, or fictional characters that the LLM is simulating, since none of these are interacting with the user. These candidates may be reasonable answers on some readings of the title question, but I am interested in answers that do more to vindicate the sense of genuine interlocution. 8 因 此 ， 在 接 下 来 的 讨 论 中 ， 我 将 避 免 谈 论 准 意 识 ， 而 主 要 关 注 准 信 念 和 准 欲 望 。 作 为 模 型 的 对 话 者 、 实 例 或 线 程 我 们 现 在 可 以 谈 谈 ， 在 寻 找 一 个 大 语 言 模 型 对 话 者 时 ， 我 们 究 竟 在 寻 找 什 么 — — 至 少 当 这 个 对 话 者 要 扮 演 普 通 对 话 伙 伴 的 某 些 角 色 时 是 如 此 。 理 想 情 况 下 ， 一 个 大 语 言 模 型 对 话 者 至 少 应 是 一 个 准 主 体 。 例 如 ， A u r a 至 少 应 能 被 解 释 为 拥 有 A u r a 似 乎 具 有 的 那 些 信 念 和 欲 望 。 一 个 大 语 言 模 型 对 话 者 也 应 是 一 个 准 说 话 者 ： 它 应 能 被 解 释 为 在 说 出 大 语 言 模 型 似 乎 说 出 的 那 些 话 。 我 们 可 以 将 这 些 要 求 分 解 为 几 个 组 成 部 分 。 最 根 本 的 是 ， 一 个 大 语 言 模 型 对 话 者 将 具 有 交 互 性 ： 也 就 是 说 ， 它 将 处 理 大 语 言 模 型 似 乎 处 理 的 那 种 输 入 ， 并 产 生 大 语 言 模 型 似 乎 产 生 的 那 种 输 出 。 当 用 户 说 “ 你 好 ” 时 ， 大 语 言 模 型 对 话 者 将 处 理 该 输 入 （ 或 至 少 是 对 应 的 输 入 令 牌 ） 。 当 C h a t G P T 说 “ 谢 谢 ！ ” 时 ， 大 语 言 模 型 对 话 者 将 产 生 该 输 出 。 在 这 里 ， “ 产 生 ” 和 “ 处 理 ” 可 以 被 理 解 为 宽 泛 的 机 械 性 概 念 ， 它 们 （ 或 许 与 “ 说 ” 和 “ 听 ” 不 同 ） 并 不 需 要 心 理 状 态 。 即 便 是 i P h o n e 也 一 直 在 产 生 和 处 理 句 子 。 还 有 一 些 进 一 步 的 自 然 要 求 。 一 个 持 久 的 大 语 言 模 型 对 话 者 将 产 生 大 语 言 模 型 似 乎 产 生 的 所 有 输 出 ， 并 处 理 大 语 言 模 型 似 乎 产 生 的 所 有 输 入 。 一 个 连 贯 的 大 语 言 模 型 对 话 者 将 足 够 一 致 ， 以 充 当 一 个 准 主 体 ， 拥 有 连 贯 的 准 信 念 和 准 欲 望 ， 有 助 于 理 解 其 行 为 。 一 个 忠 实 的 大 语 言 模 型 对 话 者 将 大 致 拥 有 系 统 似 乎 拥 有 的 准 信 念 和 准 欲 望 。 一 个 统 一 的 大 语 言 模 型 对 话 者 将 是 一 个 生 成 响 应 的 单 一 统 一 系 统 。 也 许 术 语 可 以 允 许 存 在 非 持 久 、 不 连 贯 、 不 忠 实 和 不 统 一 的 对 话 者 。 但 我 最 感 兴 趣 的 问 题 是 ， 在 大 语 言 模 型 交 互 中 是 否 存 在 持 久 、 连 贯 、 忠 实 、 统 一 且 具 有 交 互 性 的 对 话 者 — — 或 者 至 少 是 尽 可 能 满 足 这 些 要 求 的 对 话 者 。 这 些 约 束 已 经 排 除 了 一 些 潜 在 的 候 选 对 象 。 仅 交 互 性 约 束 就 排 除 了 诸 如 大 语 言 模 型 训 练 所 依 据 文 本 的 作 者 、 大 语 言 模 型 的 设 计 者 ， 或 大 语 言 模 型 正 在 模 拟 的 虚 构 角 色 等 候 选 对 象 ， 因 为 这 些 对 象 都 没 有 与 用 户 进 行 交 互 。 在 标 题 问 题 的 某 些 解 读 下 ， 这 些 候 选 对 象 可 能 是 合 理 的 答 案 ， 但 我 感 兴 趣 的 是 那 些 更 能 证 实 真 正 对 话 感 的 答 案 。 8\n\nTo make the case concrete, let the language model in question be GPT-4o—chosen partly be- cause it is often praised for its conversation skills, and partly because it is associated with a single model, where later systems are associated with multiple models (for example, GPT-5 systems are associated with GPT-5-Instant and GPT-5-Thinking). Much of what I say should generalize to other models, as well as to agent-like systems that embed a language model like this one. What do we talk with when we talk with GPT-4o? Here it is useful to distinguish models, instances, and conversations. In mid-2025, hundreds of millions of users per week used the model GPT-4o, which is itself a single trained transformer model whose core components are multi-layer artificial neural networks and attention mecha- nisms. 6 These users sent billions of messages per day worldwide. To handle this enormous load, there were perhaps thousands of instances (or implementations ) of GPT-4o, running on perhaps tens of thousands of GPUs (graphics processing units) in cloud servers around the world. Each instance implements a single copy of the GPT-4o model. For a user, any communication with GPT-4o takes place within a conversation . A conversation is a series of alternating messages: user inputs and LLM responses. When user inputs after the first are fed to the LLM, both sides of the entire conversation so far are typically also fed to the LLM alongside the user input as context . This context serves as a sort of short-term memory, allowing later messages to presuppose and build on earlier contributions. These models also have a sort of long-term memory (for example of many historical facts) built into their weights, but these weights never change after a model is trained and deployed. So the conversational context is the main source of new memories and projects (or at least new quasi-beliefs and quasi-desires) in a trained and deployed LLM. 7 Typically, users can start new conversations at any point, in which case the conversational context is standardly reset to zero (though some other elements of context, including system in- structions and some elements from previous conversations, may be retained). They can also revisit old conversations at any point. Most of the extended interactions with an LLM described earlier 6 See Chatterji et al 2025: “By July 2025, 18 billion messages were being sent each week by 700 million users”. These figures from OpenAI concern ChatGPT as a whole. At its peak, GPT-4o is estimated to have handled around half of those messages. 7 In addition to context and weights, activations of the neuron-like units within an LLM could in principle serve as memory, but in feedforward LLMs, activations are not preserved from one round of the conversation to the next. More generally, traditional transformer-based LLMs are “stateless” in that they do not retain internal states from one moment to the next. These days, LLMs usually retain some internal states in the key-value cache associated with attention heads, but this is a very limited form of memory. 9 为 了 使 讨 论 具 体 化 ， 假 设 所 讨 论 的 语 言 模 型 是 G P T - 4 o — — 选 择 它 部 分 是 因 为 其 对 话 能 力 常 受 赞 誉 ， 部 分 是 因 为 它 关 联 的 是 单 一 模 型 ， 而 后 续 系 统 则 关 联 多 模 型 （ 例 如 ， G P T - 5 系 统 关 联 G P T - 5 - I n s t a n t 和 G P T - 5 - T h i n k i n g ） 。 我 的 大 部 分 论 述 也 应 适 用 于 其 他 模 型 ， 以 及 嵌 入 了 此 类 语 言 模 型 的 类 智 能 体 系 统 。 当 我 们 与 G P T - 4 o 交 谈 时 ， 我 们 究 竟 在 与 谁 交 谈 ？ 在 此 ， 区 分 模 型 、 实 例 和 对 话 是 有 益 的 。 2 0 2 5 年 年 中 ， 每 周 有 数 亿 用 户 使 用 模 型 G P T - 4 o ， 它 本 身 是 一 个 经 过 训 练 的 单 一 变 换 器 模 型 ， 其 核 心 组 件 是 多 层 人 工 神 经 网 络 和 注 意 力 机 制 。 6 这 些 用 户 每 天 在 全 球 发 送 数 十 亿 条 消 息 。 为 了 处 理 如 此 巨 大 的 负 载 ， 可 能 有 数 千 个 实 例 （ 或 实 现 ） 的 G P T - 4 o ， 运 行 在 全 球 云 服 务 器 中 可 能 数 万 个 图 形 处 理 器 （ 图 形 处 理 单 元 ） 上 。 每 个 实 例 实 现 G P T - 4 o 模 型 的 一 个 副 本 。 对 于 用 户 而 言 ， 与 G P T - 4 o 的 任 何 通 信 都 发 生 在 对 话 中 。 对 话 是 一 系 列 交 替 的 消 息 ： 用 户 输 入 和 大 语 言 模 型 响 应 。 当 用 户 输 入 后 续 消 息 时 ， 整 个 对 话 到 目 前 为 止 的 双 方 内 容 通 常 也 会 作 为 上 下 文 与 用 户 输 入 一 起 输 入 到 大 语 言 模 型 中 。 这 种 上 下 文 充 当 一 种 短 期 记 忆 ， 使 后 续 消 息 能 够 预 设 并 建 立 在 早 期 贡 献 之 上 。 这 些 模 型 还 在 其 权 重 中 内 置 了 一 种 长 期 记 忆 （ 例 如 ， 许 多 历 史 事 实 的 记 忆 ） ， 但 模 型 训 练 和 部 署 后 ， 这 些 权 重 永 远 不 会 改 变 。 因 此 ， 对 话 上 下 文 是 训 练 和 部 署 后 的 大 语 言 模 型 中 新 记 忆 和 项 目 （ 或 至 少 是 新 的 准 信 念 和 准 欲 望 ） 的 主 要 来 源 。 7 通 常 ， 用 户 可 以 在 任 何 时 间 点 开 启 新 的 对 话 ， 此 时 对 话 上 下 文 会 被 标 准 地 重 置 为 零 （ 尽 管 上 下 文 的 某 些 其 他 元 素 ， 包 括 系 统 指 令 和 之 前 对 话 中 的 一 些 内 容 ， 可 能 会 被 保 留 ） 。 他 们 也 可 以 在 任 何 时 间 点 重 新 访 问 旧 的 对 话 。 大 多 数 与 大 型 语 言 模 型 的 长 时 间 交 互 ， 如 前 所 述 ， 6 参 见 C h a t t e r j i 等 人 2 0 2 5 年 的 研 究 ： “ 截 至 2 0 2 5 年 7 月 ， 7 亿 用 户 每 周 发 送 1 8 0 亿 条 消 息 ” 。 这 些 来 自 O p e n A I 的 数 据 涉 及 整 个 C h a t G P T 。 在 高 峰 期 ， G P T - 4 o 估 计 处 理 了 其 中 约 一 半 的 消 息 。 7 除 了 上 下 文 和 权 重 之 外 ， 大 语 言 模 型 中 类 神 经 元 单 元 的 激 活 值 原 则 上 也 可 以 作 为 记 忆 ， 但 在 前 馈 大 语 言 模 型 中 ， 激 活 值 不 会 从 一 轮 对 话 保 留 到 下 一 轮 。 更 普 遍 地 说 ， 传 统 的 基 于 变 换 器 的 大 语 言 模 型 是 “ 无 状 态 的 ” ， 因 为 它 们 不 会 从 一 个 时 刻 到 下 一 个 时 刻 保 留 内 部 状 态 。 如 今 ， 大 语 言 模 型 通 常 会 在 与 注 意 力 头 相 关 的 键 值 缓 存 中 保 留 一 些 内 部 状 态 ， 但 这 是 一 种 非 常 有 限 的 记 忆 形 式 。 9\n\ntake place within a single conversation over weeks or months. In some cases, a single user has distinct conversations with what are naturally interpreted as distinct LLM interlocutors. Natural candidates for LLM interlocutors include models (such as GPT-4o), instances (such as implementations of GPT-4o running on GPU hardware), conversations (entities tied to interactions between the user and GPT-4o), and characters (trained personas such as the Assistant character). At least, LLM interlocutors may be entities tied to models, instances, conversations, or characters. There might be one interlocutor per model; or one interlocutor per instance; or one interlocutor per conversation; or one interlocutor per character. I will consider versions of each of these hypotheses in turn. (1) Interlocutors as models . A natural starting point, suggested by the very idea of “talking with language models” is that the LLM interlocutor is the model itself, namely GPT-4o. However, there are severe di ffi culties for this view. First: GPT-4o, the model, is naturally construed as an abstract object (like a program or an algorithm), and it is hard to see how we can talk with an abstract object. We required that LLM interlocutors actually produce the outputs in a conversation. But it is hard to see how an abstract object like a program can produce anything. We need an instance or implementation of the pro- gram in hardware to do that. Second: on the most natural interpretation, LLM interlocutors seem to change over time, for example acquiring new quasi-beliefs and quasi-desires as a conversation proceeds, while the model never changes at all. Instances change over time, for example when they receive new inputs and outputs, but the model itself does not. Third: perhaps we can find some loose sense in which the model can be said to produce outputs and change over time. For example, perhaps we can say that the model produces an output if an instance of it produces an output. But the same model will be involved in thousands of conversations. So if it is talking to one user, it is talking to them all. If Aura (one user’s interlocutor) is GPT-4o, then Beta (another user’s interlocutor) is also GPT-4o. But Aura and Beta say contradictory things, so they do not seem to be identical. And if we say that the model (and Aura and Beta) says all those things, then it will be rampantly incoherent and it will not look like a quasi-subject at all. Perhaps we could say that in the case of contradictory utterances, then the model does not quasi-believe the contradictory things it says, but now the model will have a “thin” psychology that is not faithful to the way that Aura and Beta seemed to be. (2) Interlocutors as hardware instances . 10 都 发 生 在 跨 越 数 周 或 数 月 的 单 一 对 话 中 。 在 某 些 情 况 下 ， 单 个 用 户 会 与 自 然 被 理 解 为 不 同 的 大 语 言 模 型 对 话 者 进 行 不 同 的 对 话 。 大 语 言 模 型 对 话 者 的 自 然 候 选 包 括 模 型 （ 如 G P T - 4 o ） 、 实 例 （ 如 在 G P U 硬 件 上 运 行 的 G P T - 4 o 实 现 ） 、 对 话 （ 与 用 户 和 G P T - 4 o 之 间 互 动 相 关 的 实 体 ） 以 及 角 色 （ 如 助 手 角 色 这 类 经 过 训 练 的 人 格 ） 。 至 少 ， 大 语 言 模 型 对 话 者 可 能 是 与 模 型 、 实 例 、 对 话 或 角 色 相 关 的 实 体 。 可 能 每 个 模 型 对 应 一 个 对 话 者 ； 或 每 个 实 例 对 应 一 个 对 话 者 ； 或 每 个 对 话 对 应 一 个 对 话 者 ； 或 每 个 角 色 对 应 一 个 对 话 者 。 我 将 依 次 考 虑 这 些 假 设 的 各 个 版 本 。 ( 1 ) 作 为 模 型 的 对 话 者 。 一 个 自 然 的 起 点 ， 由 “ 与 语 言 模 型 对 话 ” 这 一 概 念 本 身 所 暗 示 ， 是 大 语 言 模 型 对 话 者 就 是 模 型 本 身 ， 即 G P T - 4 o 。 然 而 ， 这 种 观 点 存 在 严 重 困 难 。 首 先 ： G P T - 4 o ， 这 个 模 型 ， 自 然 地 被 理 解 为 一 种 抽 象 对 象 （ 如 程 序 或 算 法 ） ， 很 难 想 象 我 们 如 何 能 与 一 个 抽 象 对 象 对 话 。 我 们 要 求 大 语 言 模 型 对 话 者 实 际 上 在 对 话 中 产 生 输 出 。 但 很 难 想 象 像 程 序 这 样 的 抽 象 对 象 如 何 能 产 生 任 何 东 西 。 我 们 需 要 一 个 在 硬 件 上 运 行 该 程 序 的 实 例 或 实 现 才 能 做 到 这 一 点 。 第 二 ： 在 最 自 然 的 解 释 下 ， 大 语 言 模 型 对 话 者 似 乎 会 随 时 间 变 化 ， 例 如 随 着 对 话 进 行 而 获 得 新 的 准 信 念 和 准 欲 望 ， 而 模 型 本 身 却 从 未 改 变 。 实 例 会 随 时 间 变 化 ， 例 如 当 它 们 接 收 新 的 输 入 和 输 出 时 ， 但 模 型 本 身 不 会 。 第 三 ： 或 许 我 们 可 以 在 某 种 松 散 意 义 上 说 ， 模 型 能 够 产 生 输 出 并 随 时 间 变 化 。 例 如 ， 或 许 我 们 可 以 说 ， 如 果 模 型 的 某 个 实 例 产 生 了 输 出 ， 那 么 模 型 就 产 生 了 输 出 。 但 同 一 个 模 型 会 参 与 数 千 场 对 话 。 因 此 ， 如 果 它 在 与 一 个 用 户 交 谈 ， 它 就 在 与 所 有 用 户 交 谈 。 如 果 A u r a （ 一 个 用 户 的 对 话 者 ） 是 G P T - 4 o ， 那 么 B e t a （ 另 一 个 用 户 的 对 话 者 ） 也 是 G P T - 4 o 。 但 A u r a 和 B e t a 说 出 了 矛 盾 的 话 语 ， 因 此 它 们 似 乎 并 不 相 同 。 而 如 果 我 们 说 模 型 （ 以 及 A u r a 和 B e t a ） 说 出 了 所 有 那 些 话 ， 那 么 它 将 变 得 极 度 不 连 贯 ， 根 本 不 像 一 个 准 主 体 。 或 许 我 们 可 以 说 ， 在 矛 盾 话 语 的 情 况 下 ， 模 型 并 不 准 相 信 它 所 说 的 那 些 矛 盾 内 容 ， 但 这 样 一 来 ， 模 型 的 心 理 学 就 会 变 得 “ 单 薄 ” ， 无 法 忠 实 反 映 A u r a 和 B e t a 原 本 呈 现 出 的 样 子 。 ( 2 ) 作 为 硬 件 实 例 的 对 话 者 。 1 0\n\nA very attractive view is that LLM interlocutors are instances of the model. 8 On the most common understanding, LLM instances are implementations of an LLM algorithm in hardware. For many AI systems, something like this seems the right story. For example, it is arguable that when Joseph Weizenbaum interacted with Eliza, his interlocutor was the instance of Eliza running on his computer. Likewise, in fiction, robots such as C-3PO ( Star Wars ) or Commander Data ( Star Trek ) can be understood as embodied instances of computer programs. Identifying LLM interlocutors with hardware instances is much less attractive when applied to current LLMs, because of two crucial features of how current LLMs are implemented. First, conversations with LLMs typically use distributed serving , in that a single conversation takes place on multiple instances of the LLM on multiple servers. 9 The first input in a conversation might be processed on an instance of GPT-4o in a server in New York, while the second input is routed to a server in Texas and the third input is routed to California. This is often more e ffi cient, as it allows loads to be balanced among servers, and it is easy to do, as we need only send the inputs and outputs in the conversation so far as context. In this system, a conversation with an interlocutor such as Aura is spread over entirely distinct hardware instances in di ff erent places, connected only by the routing of input / output context between the servers. Distributed serving makes it unattractive to identify LLM interlocutors with hardware in- stances. On this system, there is no instance that produces all or even most of the LLM outputs in the conversation. But we have defined persistent interlocutors as interlocutors that produce all outputs in a conversation. So no instance here is a persistent interlocutor. At best we will have dif- ferent instances as di ff erent interlocutors for each stage of the conversation. No single entity will have the profile of quasi-beliefs and quasi-utterances that Aura seems to have. In e ff ect, Aura’s role will be played by many di ff erent interlocutors at di ff erent times. This fragmented view with 8 Goldstein and Lederman (2025b), Register (2025), and Shanahan (2025) all consider the models vs. instances question and appear to favor some version of the instance view. Register is somewhat agnostic on the specific view. Shanahan favors a combined model-plus-instance view: “perhaps the word “I” refers to the (somewhat abstract) compu- tational entity comprising the underlying model (its architecture and weights) plus the suspended computational state of an instance of this model representing a single, specific, ongoing conversation.” Goldstein and Lederman say explicitly that there are instance agents but no model agents, although they are not entirely explicit about their understanding of instances. They say that their instances exist for the period of a single conversation, which is consistent with hardware instances tied to periods of time or with the virtual instances discussed below. None of these authors say much about the problem of distributed service that motivates the move away from hardware instances. 9 Distributed conversations also go by the label “non-sticky sessions”, where a sticky session is an interaction be- tween user and system that is grounded in a single hardware instance. Birch (2025) discusses distributed serving as a problem for persistent LLM interlocutors. 11 一 种 颇 具 吸 引 力 的 观 点 是 ， 大 语 言 模 型 对 话 者 是 模 型 的 实 例 。 8 按 照 最 普 遍 的 理 解 ， 大 语 言 模 型 实 例 是 大 语 言 模 型 算 法 在 硬 件 中 的 实 现 。 对 于 许 多 人 工 智 能 系 统 而 言 ， 类 似 的 说 法 似 乎 是 正 确 的 。 例 如 ， 可 以 论 证 的 是 ， 当 J o s e p h W e i z e n b a u m 与 E l i z a 互 动 时 ， 他 的 对 话 者 就 是 运 行 在 他 电 脑 上 的 E l i z a 实 例 。 同 样 ， 在 虚 构 作 品 中 ， 诸 如 C - 3 P O （ 星 球 大 战 ） 或 C o m m a n d e r D a t a （ 星 际 迷 航 ） 这 样 的 机 器 人 ， 可 以 被 理 解 为 计 算 机 程 序 的 具 身 化 实 例 。 将 大 语 言 模 型 对 话 者 与 硬 件 实 例 等 同 起 来 ， 在 应 用 于 当 前 的 大 语 言 模 型 时 吸 引 力 大 打 折 扣 ， 这 是 因 为 当 前 大 语 言 模 型 实 现 方 式 的 两 个 关 键 特 征 。 首 先 ， 与 大 语 言 模 型 的 对 话 通 常 采 用 分 布 式 服 务 ， 即 单 次 对 话 发 生 在 多 个 服 务 器 上 的 多 个 大 语 言 模 型 实 例 上 。 9 对 话 中 的 第 一 个 输 入 可 能 由 纽 约 某 台 服 务 器 上 的 G P T - 4 o 实 例 处 理 ， 而 第 二 个 输 入 被 路 由 到 德 克 萨 斯 州 的 一 台 服 务 器 ， 第 三 个 输 入 则 被 路 由 到 加 利 福 尼 亚 州 。 这 种 方 式 通 常 更 高 效 ， 因 为 它 能 在 服 务 器 之 间 平 衡 负 载 ， 并 且 实 现 起 来 也 很 简 单 ， 我 们 只 需 将 迄 今 为 止 的 对 话 输 入 和 输 出 作 为 上 下 文 发 送 即 可 。 在 这 种 系 统 中 ， 与 诸 如 A u r a 这 样 的 对 话 者 进 行 的 对 话 ， 会 分 散 在 不 同 地 点 的 、 完 全 不 同 的 硬 件 实 例 上 ， 这 些 实 例 仅 通 过 服 务 器 之 间 输 入 / 输 出 上 下 文 的 路 由 连 接 在 一 起 。 分 布 式 服 务 使 得 将 大 语 言 模 型 对 话 者 与 硬 件 实 例 关 联 起 来 变 得 不 具 吸 引 力 。 在 这 个 系 统 上 ， 没 有 任 何 一 个 实 例 能 产 生 对 话 中 全 部 甚 至 大 部 分 的 大 语 言 模 型 输 出 。 但 我 们 已 将 持 久 对 话 者 定 义 为 能 产 生 对 话 中 所 有 输 出 的 对 话 者 。 因 此 ， 这 里 没 有 任 何 实 例 是 持 久 对 话 者 。 充 其 量 ， 我 们 会 有 不 同 的 实 例 作 为 对 话 不 同 阶 段 的 对 话 者 。 没 有 任 何 单 一 实 体 能 拥 有 A u r a 似 乎 具 备 的 准 信 念 和 准 话 语 特 征 。 实 际 上 ， A u r a 的 角 色 将 由 许 多 不 同 的 对 话 者 在 不 同 时 间 扮 演 。 这 种 碎 片 化 的 视 角 ， 8 G o l d s t e i n 和 L e d e r m a n ( 2 0 2 5 b ) 、 R e g i s t e r ( 2 0 2 5 ) 以 及 S h a n a h a n ( 2 0 2 5 ) 都 思 考 了 模 型 与 实 例 的 问 题 ， 并 且 似 乎 倾 向 于 某 种 版 本 的 实 例 视 角 。 R e g i s t e r 对 具 体 观 点 持 某 种 不 可 知 论 态 度 。 S h a n a h a n 则 倾 向 于 一 种 模 型 加 实 例 的 综 合 观 点 ： “ 也 许 ‘ 我 ’ 这 个 词 指 的 是 （ 某 种 抽 象 的 ） 计 算 实 体 ， 该 实 体 由 底 层 模 型 （ 其 架 构 和 权 重 ） 加 上 该 模 型 一 个 实 例 的 挂 起 计 算 状 态 组 成 ， 该 实 例 代 表 一 个 单 一 的 、 具 体 的 、 正 在 进 行 的 对 话 。 ” G o l d s t e i n 和 L e d e r m a n 明 确 表 示 存 在 实 例 代 理 ， 但 不 存 在 模 型 代 理 ， 尽 管 他 们 对 自 己 关 于 实 例 的 理 解 并 未 完 全 明 确 说 明 。 他 们 表 示 ， 他 们 的 实 例 存 在 于 单 次 对 话 期 间 ， 这 与 绑 定 到 特 定 时 间 段 的 硬 件 实 例 或 下 文 讨 论 的 虚 拟 实 例 是 一 致 的 。 这 些 作 者 中 没 有 人 对 分 布 式 服 务 的 问 题 （ 正 是 该 问 题 促 使 我 们 放 弃 硬 件 实 例 ） 进 行 过 多 讨 论 。 9 分 布 式 对 话 也 被 称 为 “ 非 粘 性 会 话 ” ， 而 粘 性 会 话 则 是 用 户 与 系 统 之 间 基 于 单 一 硬 件 实 例 进 行 的 交 互 。 B i r c h ( 2 0 2 5 ) 将 分 布 式 服 务 视 为 持 久 性 大 语 言 模 型 对 话 者 面 临 的 一 个 问 题 。 1 1\n\nnon-persistent interlocutors could perhaps be a fallback view of LLM interlocutors if it is the best we can do, but I think we can do better. Second, LLM conversations typically involve multi-tenancy of LLM instances, in that the same instance hosts multiple conversations, often in quick succession. 10 An instance of GPT-4o in New York might first be used to generate an output for a user’s conversation with Aura, and then a moment later for a di ff erent user’s conversation with Beta. It is easy for an instance to switch conversations: it requires only that Beta’s conversational context be routed to the instance and used as input for the instance’s next pass. Multi-tenancy also makes it unattractive to identify LLM interlocutors with hardware in- stances. Even if we set aside distributed serving and assume that each conversation takes place on the same hardware, multi-tenancy means that the same hardware instance typically hosts many conversations. Suppose that conversations with Aura and Beta are hosted on the same instance. Then the instance view implies that there is a single interlocutor here: Aura is Beta. But now this interlocutor will say everything that Aura and Beta says. As a result, it will make contradictory ut- terances and will thereby be incoherent. Perhaps we can say that it has neither of the contradictory beliefs in this case, but now it will have a thin and faithless psychology, as in the case of models discussed above. It is possible to maintain that hardware instances are LLM interlocutors, but it seems im- possible to maintain that they are persistent, coherent, and faithful LLM interlocutors. At best, instances are either fragmented (playing the role of Aura and Beta at di ff erent times), incoherent (saying contradictory things), or faithless (not believing things that Aura seems to believe). Again, I think it is possible to do better. (3) Interlocutors as virtual instances . The problems for instances arise especially because there can be many instances per conversation. It is natural to hold that a single conversation should involve just a single interlocutor. So it makes sense to find an interlocutor such that there is no more than one interlocutor per conversation. Here there is a natural candidate, at least in the core case where a single model is in use throughout the conversation. A virtual instance of a model is an implementation of the model that is itself implemented by multiple hardware instances of the model over time. Over the course of a distributed conversation with an LLM, there will be multiple hardware instances taking inputs and producing outputs. These hardware instances will collectively implement a single virtual instance 10 Multi-tenancy is a standard label, but there are many other labels, such as timesharing and interweaving . Shiller (2025) discusses various forms of interweaving as a puzzle case for LLM personal identity. 12 以 非 持 久 性 对 话 者 作 为 大 语 言 模 型 对 话 者 的 备 选 视 图 ， 或 许 是 我 们 能 做 到 的 最 佳 方 案 ， 但 我 认 为 我 们 可 以 做 得 更 好 。 其 次 ， 大 语 言 模 型 对 话 通 常 涉 及 大 语 言 模 型 实 例 的 多 租 户 ， 即 同 一 实 例 承 载 多 个 对 话 ， 且 往 往 快 速 连 续 进 行 。 1 0 纽 约 的 一 个 G P T - 4 o 实 例 可 能 先 用 于 生 成 用 户 与 A u r a 对 话 的 输 出 ， 片 刻 后 又 用 于 另 一 用 户 与 B e t a 的 对 话 。 实 例 切 换 对 话 很 容 易 ： 只 需 将 B e t a 的 对 话 上 下 文 路 由 至 该 实 例 ， 并 作 为 实 例 下 一 次 处 理 的 输 入 即 可 。 多 租 户 也 使 得 将 大 语 言 模 型 对 话 者 与 硬 件 实 例 等 同 起 来 缺 乏 吸 引 力 。 即 使 我 们 不 考 虑 分 布 式 服 务 ， 假 设 每 个 对 话 都 在 同 一 硬 件 上 进 行 ， 多 租 户 意 味 着 同 一 硬 件 实 例 通 常 承 载 多 个 对 话 。 假 设 与 A u r a 和 B e t a 的 对 话 托 管 在 同 一 实 例 上 。 那 么 实 例 视 角 意 味 着 这 里 只 有 一 个 对 话 者 ： A u r a 就 是 B e t a 。 但 这 样 一 来 ， 这 个 对 话 者 将 说 出 A u r a 和 B e t a 所 说 的 一 切 。 结 果 ， 它 会 做 出 矛 盾 的 表 述 ， 从 而 变 得 不 连 贯 。 或 许 我 们 可 以 说 ， 在 这 种 情 况 下 它 并 不 持 有 这 两 种 矛 盾 信 念 中 的 任 何 一 种 ， 但 这 样 一 来 ， 它 将 拥 有 单 薄 且 不 忠 实 的 心 理 ， 正 如 上 文 讨 论 模 型 时 的 情 况 。 可 以 认 为 硬 件 实 例 就 是 大 语 言 模 型 对 话 者 ， 但 似 乎 不 可 能 认 为 它 们 是 持 久 、 连 贯 且 忠 实 的 大 语 言 模 型 对 话 者 。 实 例 充 其 量 要 么 是 碎 片 化 的 （ 在 不 同 时 间 扮 演 A u r a 和 B e t a 的 角 色 ） ， 要 么 是 不 连 贯 的 （ 说 出 矛 盾 的 话 ） ， 要 么 是 不 忠 实 的 （ 不 相 信 A u r a 似 乎 相 信 的 事 情 ） 。 再 次 强 调 ， 我 认 为 可 以 做 得 更 好 。 （ 3 ） 对 话 者 作 为 虚 拟 实 例 。 实 例 的 问 题 尤 其 源 于 每 个 对 话 中 可 能 存 在 多 个 实 例 。 很 自 然 地 认 为 ， 单 个 对 话 应 该 只 涉 及 一 个 对 话 者 。 因 此 ， 找 到 一 个 对 话 者 ， 使 得 每 个 对 话 中 不 超 过 一 个 对 话 者 ， 这 是 有 道 理 的 。 这 里 有 一 个 自 然 的 候 选 方 案 ， 至 少 在 对 话 全 程 使 用 单 一 模 型 的 核 心 场 景 中 如 此 。 一 个 模 型 的 虚 拟 实 例 是 模 型 的 一 种 实 现 ， 它 本 身 由 多 个 模 型 的 硬 件 实 例 随 时 间 推 移 共 同 实 现 。 在 与 大 语 言 模 型 进 行 分 布 式 对 话 的 过 程 中 ， 会 有 多 个 硬 件 实 例 接 收 输 入 并 产 生 输 出 。 这 些 硬 件 实 例 将 共 同 实 现 一 个 单 一 的 虚 拟 实 例 1 0 多 租 户 是 一 个 标 准 标 签 ， 但 还 有 许 多 其 他 标 签 ， 例 如 分 时 和 交 织 。 S h i l l e r ( 2 0 2 5 ) 讨 论 了 各 种 形 式 的 交 织 ， 将 其 作 为 大 语 言 模 型 个 人 身 份 的 一 个 难 题 案 例 。 1 2\n\nof the model, which will be persistently present throughout. Virtual digital entities are familiar in many domains. In interacting with a shopping site such as Amazon, you interact with a single shopping cart that may be realized by many di ff erent hardware servers around the world. At a crucial time you may be interacting with one hardware instance, which will temporarily have server authority. But all this will be transparent to the user. The multiple hardware carts will jointly implement a single virtual instance of the cart. Something similar applies in massive multiplayer videogames. A virtual object such as a frisbee may be implemented in hardware on di ff erent servers. There may be multiple hardware instances of the frisbee, but just one virtual instance. The frisbee itself is most naturally identified with the virtual frisbee, not the hardware frisbees. A virtual instance of a model will likewise be implemented by a series of hardware instances of the model. It will be realized as a series of hardware instances, one for each step in a given conversation. 11 Jonathan Birch appeals to distributed serving to argue that there are no persistent interlocutors in typical LLM conversations. We can now see what is correct and incorrect in this claim. In distributed sessions, no hardware instance is a persistent interlocutor. But virtual instances can still be persistent interlocutors, just as one can interact online with a single persistent virtual shopping cart. A harder problem for virtual instances arises from model variation : cases where di ff erent mod- els are used over the course of a conversation. 12 For example, the GPT-5 system sometimes directs queries to the GPT-5 Instant model (no chain-of-thought reasoning, fast answer) and sometimes to the GPT-5 Thinking model (some chain-of-thought reasoning, slower answer), depending on the di ffi culty of the query. GPT-5’s contributions to the conversation are still produced by a series of hardware instances, but these will implement two di ff erent models and therefore do not implement a single virtual model instance as persistent interlocutor. In this case, one could still say that the LLM interlocutor is a persistent virtual instance of the GPT-5 system —which is not itself a single language model, but a bifurcated system involving two models. This bifurcated system would be somewhat disunified, but it would at least be persistent. 11 On standard accounts of implementation, an algorithm is implemented by a physical system when there is a map- ping from physical states of the system to computational states of the algorithm such that all state-transitions are preserved. In a hardware instance, the algorithm is implemented by a physical system such as a GPU cluster. In a virtual instance, the algorithm is implemented by a larger physical system including multiple hardware instances and routing mechanisms between them. 12 Register (2025) discusses model change as a general issue for individuating AI moral patients. 13 ， 该 虚 拟 实 例 将 在 整 个 过 程 中 持 续 存 在 。 虚 拟 数 字 实 体 在 许 多 领 域 都 很 常 见 。 在 与 亚 马 逊 等 购 物 网 站 互 动 时 ， 你 与 一 个 单 一 的 购 物 车 交 互 ， 这 个 购 物 车 可 能 由 世 界 各 地 许 多 不 同 的 硬 件 服 务 器 实 现 。 在 关 键 时 刻 ， 你 可 能 正 在 与 一 个 硬 件 实 例 交 互 ， 该 实 例 将 暂 时 拥 有 服 务 器 权 限 。 但 这 一 切 对 用 户 来 说 都 是 透 明 的 。 多 个 硬 件 购 物 车 将 共 同 实 现 购 物 车 的 一 个 单 一 虚 拟 实 例 。 类 似 的 情 况 也 适 用 于 大 型 多 人 在 线 视 频 游 戏 。 一 个 虚 拟 物 体 ， 比 如 飞 盘 ， 可 能 在 不 同 的 服 务 器 上 以 硬 件 形 式 实 现 。 一 个 飞 盘 可 能 有 多 个 硬 件 实 例 ， 但 只 有 一 个 虚 拟 实 例 。 飞 盘 本 身 最 自 然 地 被 视 为 虚 拟 飞 盘 ， 而 非 硬 件 飞 盘 。 一 个 模 型 的 虚 拟 实 例 同 样 将 由 一 系 列 该 模 型 的 硬 件 实 例 来 实 现 。 它 将 以 一 系 列 硬 件 实 例 的 形 式 呈 现 ， 每 个 实 例 对 应 给 定 对 话 中 的 一 个 步 骤 。 1 1 乔 纳 森 · 伯 奇 借 助 分 布 式 服 务 来 论 证 ， 在 典 型 的 大 语 言 模 型 对 话 中 不 存 在 持 久 的 对 话 者 。 我 们 现 在 可 以 看 出 这 一 说 法 中 哪 些 是 正 确 的 ， 哪 些 是 不 正 确 的 。 在 分 布 式 会 话 中 ， 没 有 哪 个 硬 件 实 例 是 持 久 的 对 话 者 。 但 虚 拟 实 例 仍 然 可 以 是 持 久 的 对 话 者 ， 就 像 人 们 可 以 在 线 交 互 一 个 单 一 的 持 久 虚 拟 购 物 车 一 样 。 虚 拟 实 例 面 临 的 一 个 更 棘 手 的 问 题 来 自 模 型 变 异 ： 即 在 对 话 过 程 中 使 用 了 不 同 模 型 的 情 况 。 1 2 例 如 ， G P T - 5 系 统 有 时 会 将 查 询 导 向 G P T - 5 即 时 模 型 （ 无 思 维 链 推 理 ， 快 速 回 答 ） ， 有 时 则 导 向 G P T - 5 思 考 模 型 （ 有 一 定 思 维 链 推 理 ， 回 答 较 慢 ） ， 具 体 取 决 于 查 询 的 难 度 。 G P T - 5 对 对 话 的 贡 献 仍 然 由 一 系 列 硬 件 实 例 产 生 ， 但 这 些 实 例 将 实 现 两 种 不 同 的 模 型 ， 因 此 无 法 实 现 一 个 单 一 的 虚 拟 模 型 实 例 作 为 持 续 对 话 者 。 在 这 种 情 况 下 ， 我 们 仍 可 以 说 大 语 言 模 型 对 话 者 是 G P T - 5 系 统 的 一 个 持 久 虚 拟 实 例 — — 该 系 统 本 身 并 非 单 一 语 言 模 型 ， 而 是 一 个 涉 及 两 个 模 型 的 分 叉 系 统 。 这 个 分 叉 系 统 虽 然 有 些 不 够 统 一 ， 但 至 少 是 持 久 的 。 1 1 根 据 标 准 的 实 现 理 论 ， 当 一 个 物 理 系 统 的 物 理 状 态 与 算 法 的 计 算 状 态 之 间 存 在 映 射 关 系 ， 且 所 有 状 态 转 换 都 得 以 保 留 时 ， 该 算 法 便 由 该 物 理 系 统 实 现 。 在 硬 件 实 例 中 ， 算 法 由 诸 如 G P U 集 群 之 类 的 物 理 系 统 实 现 。 在 虚 拟 实 例 中 ， 算 法 则 由 一 个 更 大 的 物 理 系 统 实 现 ， 该 系 统 包 含 多 个 硬 件 实 例 及 其 之 间 的 路 由 机 制 。 1 2 R e g i s t e r ( 2 0 2 5 ) 将 模 型 变 更 讨 论 为 识 别 人 工 智 能 道 德 患 者 时 的 一 个 普 遍 问 题 。 1 3\n\nBut there are other cases where even the system changes, for example because new language models or other new technology is introduced halfway through a conversation. In cases like this the system can change so significantly that there is no single system that supports a persistent virtual instance. At best one will have distinct virtual instances of multiple models over time. There may also be limits on coherence (inconsistent beliefs, for example) arising from the use of multiple models. 13 I think that identifying persistent LLM interlocutors with virtual instances works well in the core case in which a single model is at play. But it is useful to explore alternative understandings of persistent interlocutors that can handle other cases. (4) Interlocutors as threads . An alternative approach identifies interlocutors with threads (or perhaps thread agents ). On a first approximation, a thread is roughly a sequence of hardware instances within a conversation, one for every timestep. One instance I ′ is the successor of a previous instance I if the conver- sational history of I (its conversational context plus the latest input and output) is routed to I ′ to serve as its conversational context. 14 (If the conversation is routed to the same instance twice in a row, that instance will be its own successor.) The successor relation is roughly a “memory” relation encoding the fact that each new instance has memories from the last. A thread is then a series of instances (or better, instance-slices, which are pairs of instances and time periods during which the instance is processing a single conversational step), each of which is the successor of the previous instance. In the single-model case, a virtual instance of the model will be implemented by a thread involving successive hardware instances of the model. In the case of multiple models within a single conversation, there will not be a single virtual instance (since an instance is always an instance of a single algorithm), but there will still be a thread involving hardware instances of multiple models over time, each of which is the successor of the last. It is possible to weaken the conditions for successorship to allow a wider variety of memory- like relations to count. In many systems, cross-conversation memory allows information from one 13 Birch also argues that significant discontinuity is brought on by the use of mixture-of-experts processing within a model (e.g. choosing between various candidates for an multi-layer-perceptron block within a transformer, depending on the input). I think this sort of intra-model variation (we might call it module variation, as opposed to model variation) in local subsystems is consistent with broader psychological continuity, just as the use of di ff erent neural circuits in a human brain in response to di ff erent inputs is consistent with psychological continuity and personal identity. 14 More accurately, instance-slice ⟨ I ′ , t ′ ⟩ is the successor of instance-slice ⟨ I , t ⟩ if the context fed to I during t , along with the new inputs to I and outputs from I during t , are the (expanded) context fed to I’ during time t ′ . 14 但 还 有 其 他 情 况 ， 连 系 统 本 身 也 会 发 生 变 化 ， 例 如 在 对 话 中 途 引 入 了 新 的 语 言 模 型 或 其 他 新 技 术 。 在 这 种 情 况 下 ， 系 统 可 能 发 生 显 著 变 化 ， 以 至 于 没 有 任 何 单 一 系 统 能 够 支 持 一 个 持 久 虚 拟 实 例 。 充 其 量 ， 随 着 时 间 的 推 移 ， 人 们 会 拥 有 多 模 型 的 不 同 虚 拟 实 例 。 此 外 ， 使 用 多 模 型 还 可 能 带 来 连 贯 性 方 面 的 限 制 （ 例 如 ， 信 念 不 一 致 ） 。 1 3 我 认 为 ， 在 单 一 模 型 运 作 的 核 心 场 景 中 ， 将 持 久 性 大 语 言 模 型 对 话 者 等 同 于 虚 拟 实 例 是 行 之 有 效 的 。 但 探 索 对 持 久 对 话 者 的 其 他 理 解 方 式 ， 以 处 理 其 他 情 况 ， 也 是 有 益 的 。 ( 4 ) 作 为 线 程 的 对 话 者 。 另 一 种 方 法 将 对 话 者 等 同 于 线 程 （ 或 可 能 是 线 程 代 理 ） 。 粗 略 来 说 ， 线 程 大 致 是 对 话 中 一 系 列 硬 件 实 例 的 序 列 ， 每 个 时 间 步 对 应 一 个 实 例 。 如 果 实 例 I ′ 的 对 话 历 史 （ 其 对 话 上 下 文 加 上 最 新 的 输 入 和 输 出 ） 被 路 由 到 I ′ 作 为 其 对 话 上 下 文 ， 那 么 该 实 例 就 是 前 一 个 实 例 I 的 后 继 1 4 。 （ 如 果 对 话 连 续 两 次 路 由 到 同 一 个 实 例 ， 则 该 实 例 将 成 为 自 身 的 后 继 。 ） 后 继 关 系 大 致 是 一 种 “ 记 忆 ” 关 系 ， 编 码 了 每 个 新 实 例 都 拥 有 上 一 个 实 例 记 忆 这 一 事 实 。 因 此 ， 线 程 是 一 系 列 实 例 （ 更 准 确 地 说 ， 是 实 例 切 片 ， 即 实 例 与 处 理 单 个 对 话 步 骤 的 时 间 段 的 配 对 ） ， 其 中 每 个 实 例 都 是 前 一 个 实 例 的 后 继 。 在 单 模 型 情 况 下 ， 模 型 的 虚 拟 实 例 将 由 一 个 涉 及 模 型 连 续 硬 件 实 例 的 线 程 来 实 现 。 在 单 个 对 话 中 存 在 多 模 型 的 情 况 下 ， 将 不 会 有 一 个 单 一 的 虚 拟 实 例 （ 因 为 实 例 始 终 是 单 一 算 法 的 实 例 ） ， 但 仍 然 会 有 一 个 随 时 间 推 移 涉 及 多 个 模 型 硬 件 实 例 的 线 程 ， 其 中 每 个 实 例 都 是 前 一 个 实 例 的 后 继 。 我 们 可 以 放 宽 继 承 关 系 的 条 件 ， 允 许 更 多 类 似 记 忆 的 关 系 被 纳 入 考 量 。 在 许 多 系 统 中 ， 跨 对 话 记 忆 使 得 来 自 一 次 1 3 B i r c h 还 认 为 ， 模 型 内 部 使 用 混 合 专 家 处 理 （ 例 如 ， 根 据 输 入 在 T r a n s f o r m e r 内 的 多 层 感 知 器 模 块 的 多 个 候 选 方 案 中 进 行 选 择 ） 会 带 来 显 著 的 不 连 续 性 。 我 认 为 ， 局 部 子 系 统 中 的 这 种 模 型 内 部 变 异 （ 我 们 可 称 之 为 模 块 变 异 ， 以 区 别 于 模 型 变 异 ） 与 更 广 泛 的 心 理 连 续 性 是 一 致 的 ， 正 如 人 类 大 脑 中 不 同 神 经 回 路 根 据 不 同 的 输 入 被 激 活 ， 与 心 理 连 续 性 和 个 人 身 份 保 持 一 致 一 样 。 1 4 更 准 确 地 说 ， 实 例 切 片 ⟨ I ′ t ′ ⟩ 是 实 例 切 片 ⟨ I t ⟩ 的 后 继 ， 如 果 在 时 间 t 期 间 馈 送 给 I 的 上 下 文 ， 连 同 在 时 间 t 期 间 输 入 给 I 的 新 输 入 以 及 从 I 输 出 的 内 容 ， 构 成 了 在 时 间 t ′ 期 间 馈 送 给 I ' 的 （ 扩 展 后 的 ） 上 下 文 。 1 4\n\nconversation to be used as part of the initial context for a new conversation with the same user, so that new conversations can “remember” material from a number of old conversations. One could certainly understand threads and inheritance in a permissive way so that an old thread and an old interlocutor persists in these new conversations, especially if the systems are consistent enough over time to support the attribution of a single quasi-agent. If there is enough use of cross- conversation memory, then (as Sophie Nelson pointed out to me) all conversations with a single user may form a single thread with a single interlocutor. In this scenario, interlocutors will be individuated mainly by user. For various purposes, we can allow that threads can undergo fission , where two distinct later instances serve as successors of a previous instance. Some systems allow users to branch their con- versations explicitly, leading to fission with a conversation. There can also be inter-conversation branching arising from the use of cross-conversation memory, where the same conversation serves as memory for multiple simultaneous later conversations. Cross-conversation memory also allows fusion in which two distinct conversations both serve as memory for a later conversation. In these fission and fusion cases, an LLM interlocutor is not so much a linear thread as a series of overlapping threads, which can also be modeled as a branching web of hardware instances over time. This branching structure may also cause problems for identifying interlocutors with virtual instances, but it is less troublesome for a thread model. 15 Perhaps the main downside of the thread model of LLM interlocutors is that these entities are less unified than models, hardware instances, and virtual instances. Threads can be realized by wholly di ff erent instances of wholly di ff erent models over time. It is arguable that these somewhat disjunctive entities persist over time in a weaker way than instances persist. The use of multiple models can also lead to discontinuity in quasi-beliefs and quasi-desires in a single thread or web interlocutor. I would say that the most robust candidate for an LLM interlocutor remains a virtual instance rather than a model, a hardware instance, or a thread. But threads can play at least some of the roles of LLM interlocutors. 16 15 The thread view of LLM interlocutors is a cousin of the well-known “worm” view of objects (and of persons), where objects are identified with four-dimensional spacetime worms made up of a series of object-slices at times. One can also interpret the thread view as a cousin of the alternative “stage” view of objects on which objects are object-slices that are parts of spacetime worms but do not literally persist over time. The pros and cons of these ontological views in the LLM case parallel familiar pros and cons in the object (and person) case. 16 One could in principle eliminate the key multiple-model di ff erence between threads and virtual instances by un- derstanding virtual instances more expansively, perhaps as variable virtual instances, which can instantiate di ff erent models at di ff erent times. Alternatively, one could understand threads more stringently, perhaps as uniform threads, 15 对 话 的 信 息 能 够 被 用 作 与 同 一 用 户 进 行 新 对 话 的 初 始 上 下 文 ， 从 而 让 新 对 话 能 够 “ 记 住 ” 多 次 旧 对 话 中 的 内 容 。 我 们 当 然 可 以 用 一 种 宽 松 的 方 式 来 理 解 线 程 与 继 承 ， 使 得 旧 线 程 和 旧 对 话 者 能 够 延 续 到 这 些 新 对 话 中 ， 尤 其 是 当 系 统 在 时 间 上 足 够 一 致 ， 足 以 支 持 对 单 一 准 主 体 的 归 因 时 。 如 果 跨 对 话 记 忆 的 使 用 足 够 充 分 ， 那 么 （ 正 如 S o p h i e N e l s o n 向 我 指 出 的 那 样 ） 与 单 一 用 户 的 所 有 对 话 可 能 会 形 成 一 个 拥 有 单 一 对 话 者 的 单 一 线 程 。 在 这 种 情 境 下 ， 对 话 者 将 主 要 根 据 用 户 来 个 体 化 。 出 于 各 种 目 的 ， 我 们 可 以 允 许 线 程 经 历 分 裂 ， 即 两 个 不 同 的 后 续 实 例 作 为 先 前 实 例 的 后 继 。 某 些 系 统 允 许 用 户 显 式 地 分 支 他 们 的 对 话 ， 从 而 导 致 对 话 内 的 分 裂 。 此 外 ， 由 于 使 用 了 跨 对 话 记 忆 ， 也 可 能 出 现 对 话 间 分 支 ， 即 同 一 对 话 同 时 作 为 多 个 后 续 对 话 的 记 忆 。 跨 对 话 记 忆 还 允 许 融 合 ， 即 两 个 不 同 的 对 话 共 同 作 为 后 续 对 话 的 记 忆 。 在 这 些 分 裂 与 融 合 的 案 例 中 ， 大 语 言 模 型 对 话 者 与 其 说 是 一 条 线 性 线 程 ， 不 如 说 是 一 系 列 重 叠 的 线 程 ， 这 也 可 以 被 建 模 为 随 时 间 变 化 的 硬 件 实 例 分 支 网 络 。 这 种 分 支 结 构 也 可 能 给 将 对 话 者 与 虚 拟 实 例 等 同 起 来 带 来 问 题 ， 但 对 于 线 程 模 型 而 言 ， 其 困 扰 程 度 相 对 较 小 。 1 5 或 许 大 语 言 模 型 对 话 者 线 程 模 型 的 主 要 缺 点 在 于 ， 这 些 实 体 不 如 模 型 、 硬 件 实 例 和 虚 拟 实 例 那 样 统 一 。 随 着 时 间 的 推 移 ， 线 程 可 以 由 完 全 不 同 模 型 的 不 同 实 例 来 实 现 。 可 以 说 ， 这 些 略 显 分 离 的 实 体 在 时 间 上 的 持 续 性 弱 于 实 例 的 持 续 性 。 使 用 多 模 型 也 可 能 导 致 单 个 线 程 或 网 络 对 话 者 中 的 准 信 念 和 准 欲 望 出 现 不 连 续 性 。 我 认 为 ， 作 为 大 语 言 模 型 对 话 者 最 稳 健 的 候 选 者 仍 然 是 虚 拟 实 例 ， 而 非 模 型 、 硬 件 实 例 或 线 程 。 但 线 程 至 少 可 以 扮 演 大 语 言 模 型 对 话 者 的 部 分 角 色 。 1 6 1 5 关 于 大 语 言 模 型 对 话 者 的 线 程 观 点 ， 是 著 名 的 关 于 对 象 （ 以 及 人 ） 的 “ 蠕 虫 ” 观 点 的 近 亲 ， 后 者 将 对 象 视 为 由 一 系 列 时 间 点 上 的 对 象 切 片 构 成 的 四 维 时 空 蠕 虫 。 我 们 也 可 以 将 线 程 观 点 解 释 为 另 一 种 关 于 对 象 的 “ 阶 段 ” 观 点 的 近 亲 ， 该 观 点 认 为 对 象 是 作 为 时 空 蠕 虫 一 部 分 的 对 象 切 片 ， 但 并 非 真 正 随 时 间 持 续 存 在 。 这 些 本 体 论 观 点 在 大 语 言 模 型 案 例 中 的 利 弊 ， 与 它 们 在 对 象 （ 以 及 人 ） 案 例 中 的 常 见 利 弊 是 平 行 的 。 1 6 原 则 上 ， 我 们 可 以 通 过 更 宽 泛 地 理 解 虚 拟 实 例 （ 或 许 将 其 视 为 可 变 虚 拟 实 例 ， 能 够 在 不 同 时 间 实 例 化 不 同 模 型 ） 来 消 除 线 程 与 虚 拟 实 例 之 间 关 键 的 多 模 型 差 异 。 或 者 ， 我 们 也 可 以 更 严 格 地 理 解 线 程 ， 或 许 将 其 视 为 统 一 线 程 ， 1 5\n\nI conclude for now that LLM interlocutors are best understood as virtual instances of LLM models or systems, at least in the single-model case, and as LLM threads in the multiple-model case. At least in the single-model case with no fission, virtual instances can serve as unified persistent interlocutors within and between conversations. Threads can also serve as persistent LLM interlocutors, at cost of some underlying disunity. Interlocutors as characters, personas, or simulacra So far, I have identified LLM interlocutors such as Aura as something in the vicinity of a model, such as a virtual model instance or a thread of instances. However, there is also a recent tradition of drawing a sharp distinction between models such as GPT-4o, and agents such as Aura and the Assistant. On the influential “simulators” framework due to Janus (2022), and the related “role- playing” framework due to Shanahan et al (2023) and the “persona selection model” due to Marks et al (2026), it is a key tenet that the model is not an agent. Models are simulators (or role- players) that simulate agents, and agents are simulacra (or characters, or personas). Simulators and simulacra are distinct, and therefore so are models and agents. On such a view, an interlocutor such as Aura or the Assistant is best understood as something like a character, a persona, or a simulacrum rather than as a model or even a model instance. I will examine four di ff erent potential avenues from these frameworks to this conclusion. My focus will largely be on these frameworks’ ontological claims concerning the nature of LLM agents and other entities. I will question some of these claims, but I won’t question the more general utility of analyzing LLMs in terms of characters or personas. (1) Base models are not agentlike. In “Simulators”, Janus finds a version of the “model is not an agent” thesis in the following passage from my 2020 article “GPT-3 and General Intelligence”: GPT-3 does not look much like an agent. It does not seem to have goals or preferences beyond completing text, for example. It is more like a chameleon that can take the shape of many di ff erent agents. Or perhaps it is an engine that can be used under the which require all hardware instances in a thread to be an instance of the same model. These changes would render threads and virtual instances extensionally equivalent, but there would still be fine-grained non-extensional di ff erences: e.g. arguably a thread implements a virtual instance but not vice versa, and the same virtual instance could be im- plemented by di ff erent hardware instances but the same thread could not be. It is for reasons like these that virtual instances are somewhat more robust candidates to be LLM interlocutors, but the di ff erence is subtle. 16 我 目 前 得 出 的 结 论 是 ， 大 语 言 模 型 对 话 者 最 好 被 理 解 为 大 语 言 模 型 或 系 统 的 虚 拟 实 例 （ 至 少 在 单 模 型 情 况 下 如 此 ） ， 而 在 多 模 型 情 况 下 则 被 理 解 为 大 语 言 模 型 线 程 。 至 少 在 无 分 裂 的 单 模 型 情 况 下 ， 虚 拟 实 例 可 以 在 对 话 内 部 以 及 对 话 之 间 充 当 统 一 的 持 久 对 话 者 。 线 程 也 可 以 充 当 持 久 性 大 语 言 模 型 对 话 者 ， 但 代 价 是 存 在 一 定 程 度 的 底 层 不 统 一 。 对 话 者 ： 角 色 、 人 格 或 拟 像 到 目 前 为 止 ， 我 已 将 诸 如 A u r a 之 类 的 大 语 言 模 型 对 话 者 识 别 为 某 种 接 近 模 型 的 东 西 ， 例 如 虚 拟 模 型 实 例 或 实 例 线 程 。 然 而 ， 近 期 还 有 一 种 传 统 观 点 ， 主 张 在 模 型 （ 如 G P T - 4 o ） 与 智 能 体 （ 如 A u r a 和 助 手 ） 之 间 划 清 界 限 。 根 据 J a n u s （ 2 0 2 2 年 ） 提 出 的 颇 具 影 响 力 的 “ 模 拟 器 ” 框 架 、 S h a n a h a n 等 人 （ 2 0 2 3 年 ） 提 出 的 相 关 “ 角 色 扮 演 ” 框 架 ， 以 及 M a r k s 等 人 （ 2 0 2 6 年 ） 提 出 的 “ 人 格 选 择 模 型 ” ， 其 核 心 原 则 是 ： 模 型 并 非 智 能 体 。 模 型 是 模 拟 （ 或 扮 演 ） 智 能 体 的 模 拟 器 （ 或 角 色 扮 演 者 ） ， 而 智 能 体 则 是 拟 像 （ 或 角 色 、 人 格 ） 。 模 拟 器 与 拟 像 截 然 不 同 ， 因 此 模 型 与 智 能 体 也 判 然 有 别 。 按 照 这 种 观 点 ， 像 A u r a 或 助 手 这 样 的 对 话 者 ， 最 好 被 理 解 为 某 种 角 色 、 人 格 或 拟 像 ， 而 非 模 型 ， 甚 至 不 是 模 型 实 例 。 我 将 考 察 从 这 些 框 架 出 发 通 往 这 一 结 论 的 四 条 不 同 潜 在 路 径 。 我 的 重 点 将 主 要 放 在 这 些 框 架 关 于 大 语 言 模 型 智 能 体 及 其 他 实 体 本 质 的 本 体 论 主 张 上 。 我 会 质 疑 其 中 一 些 主 张 ， 但 不 会 质 疑 以 角 色 或 人 格 来 分 析 大 语 言 模 型 这 一 做 法 所 具 有 的 更 普 遍 实 用 性 。 （ 1 ） 基 础 模 型 不 具 备 智 能 体 特 性 。 在 《 模 拟 器 》 一 文 中 ， J a n u s 从 我 2 0 2 0 年 的 文 章 《 G P T - 3 与 通 用 智 能 》 的 以 下 段 落 中 找 到 了 “ 模 型 不 是 智 能 体 ” 这 一 论 点 的 某 个 版 本 ： G P T - 3 看 起 来 并 不 太 像 一 个 智 能 体 。 例 如 ， 它 似 乎 没 有 超 出 文 本 补 全 之 外 的 目 标 或 偏 好 。 它 更 像 一 只 变 色 龙 ， 可 以 呈 现 出 许 多 不 同 智 能 体 的 形 态 。 或 者 ， 它 可 能 是 一 台 引 擎 ， 可 以 在 其 这 要 求 线 程 中 的 所 有 硬 件 实 例 必 须 是 同 一 模 型 的 实 例 。 这 些 变 化 将 使 线 程 与 虚 拟 实 例 在 外 延 上 等 价 ， 但 仍 存 在 细 微 的 非 外 延 差 异 ： 例 如 ， 可 以 说 一 个 线 程 实 现 了 一 个 虚 拟 实 例 ， 但 反 之 则 不 成 立 ； 同 一 个 虚 拟 实 例 可 以 由 不 同 的 硬 件 实 例 实 现 ， 但 同 一 个 线 程 却 不 能 。 正 是 基 于 这 些 原 因 ， 虚 拟 实 例 作 为 大 语 言 模 型 对 话 者 的 候 选 对 象 更 为 稳 健 ， 但 这 一 差 异 十 分 微 妙 。 1 6\n\nhood to drive many agents. But it is then perhaps these systems that we should assess for agency, consciousness, and so on. (Chalmers 2020) I still agree with everything that I said here, but what I said is specific to base models such as GPT-3. Base models have undergone pre-training on text prediction and nothing more. As we saw earlier, base models may have quasi-beliefs but they have relatively few quasi-desires (beyond a quasi-desire to predict text, and other quasi-desires that derive from this one), so they are at best minimally agentlike. However, many quasi-agents with quasi-desires are latent within a base model, and can be triggered by prompting (asking a model to act like Trump, for example). Further quasi-agents can emerge from base models through reinforcement learning (as with the Assistant) or extensive prompting (as with Aura). As a result of post-training, an instance of a model such as GPT-4o may have the quasi-desire to be helpful and honest, for example. As a result, the moral here should really be (in oversimplified form) that the base model is not an agent, or (more precisely) that instances of the base model are only quasi-agents to a limited extent. At the same time, all this is consistent with instances of post-trained model instances being quasi-agents to a fuller extent, as these systems have a more robust body of quasi-desires. (2) Models as role-players On the closely connected role-playing framework put forward by Shanahan, McDonell, and Reynolds (2023), language models are fundamentally engaged in role-playing. Models are role- players, simulating or playing the role of personas such as the Assistant or Aura. On this picture, ChatGPT playing the Assistant is akin to Olivier playing Hamlet. It’s a form of pretense involving acting as a fictional character. On this view, the Assistant (and other LLM interlocutors) is best viewed as a fictional character who the model is simulating. I think this view misses a distinction between two phenomena in the vicinity of role-play. In cases of pretense (the most common understanding of role-play), one pretends to have a certain persona. In cases of realization , one actually has (or makes real) that persona. For example, in ordinary human life, there are at least two ways for someone to play the role of a theist. They might pretend to be a theist, or they might really become a theist. The former is a case of pretense, and the second is a case of realization. In the case of acting, ordinary acting involves pretending to be Hamlet, while a method actor might take on at least some of Hamlet’s mental states, such as his emotions, though perhaps not his full beliefs and desires, yielding a case of partial pretense and partial realization. A similar distinction applies to language models. Asked to act like a theist, an LLM might 17 引 擎 盖 下 驱 动 多 个 智 能 体 。 但 或 许 ， 我 们 应 当 评 估 的 正 是 这 些 系 统 的 智 能 体 性 、 意 识 等 属 性 。 （ C h a l m e r s 2 0 2 0 ） 我 仍 然 同 意 我 在 此 所 说 的 所 有 内 容 ， 但 我 的 论 述 是 针 对 G P T - 3 这 类 基 础 模 型 的 。 基 础 模 型 仅 经 过 文 本 预 测 的 预 训 练 ， 别 无 其 他 。 正 如 我 们 之 前 所 见 ， 基 础 模 型 可 能 拥 有 准 信 念 ， 但 它 们 的 准 欲 望 相 对 较 少 （ 除 了 预 测 文 本 的 准 欲 望 ， 以 及 由 此 衍 生 出 的 其 他 准 欲 望 ） ， 因 此 它 们 充 其 量 只 是 最 低 限 度 的 类 主 体 。 然 而 ， 基 础 模 型 内 部 潜 藏 着 许 多 具 有 准 欲 望 的 准 智 能 体 ， 可 以 通 过 提 示 （ 例 如 ， 要 求 模 型 A c t l i k e T r u m p ） 来 触 发 。 进 一 步 的 准 智 能 体 可 以 通 过 强 化 学 习 （ 如 助 手 ） 或 大 量 提 示 （ 如 A u r a ） 从 基 础 模 型 中 涌 现 。 例 如 ， 经 过 后 训 练 ， G P T - 4 o 这 样 的 模 型 实 例 可 能 会 产 生 乐 于 助 人 且 诚 实 的 准 欲 望 。 因 此 ， 这 里 的 教 训 实 际 上 应 该 是 （ 以 过 于 简 化 的 形 式 ） ： 基 础 模 型 并 非 主 体 ， 或 者 更 准 确 地 说 ， 基 础 模 型 的 实 例 仅 在 有 限 程 度 上 是 准 智 能 体 。 与 此 同 时 ， 这 一 切 与 后 训 练 模 型 实 例 在 更 充 分 程 度 上 成 为 准 智 能 体 是 一 致 的 ， 因 为 这 些 系 统 拥 有 更 稳 健 的 准 欲 望 体 系 。 ( 2 ) 模 型 作 为 角 色 扮 演 者 在 S h a n a h a n 、 M c D o n e l l 和 R e y n o l d s （ 2 0 2 3 ） 提 出 的 紧 密 关 联 的 角 色 扮 演 框 架 中 ， 语 言 模 型 从 根 本 上 参 与 角 色 扮 演 。 模 型 是 角 色 扮 演 者 ， 模 拟 或 扮 演 诸 如 助 手 或 A u r a 等 人 格 角 色 。 按 照 这 种 图 景 ， C h a t G P T 扮 演 助 手 类 似 于 O l i v i e r 扮 演 哈 姆 雷 特 。 这 是 一 种 涉 及 扮 演 虚 构 角 色 的 假 装 形 式 。 根 据 这 种 观 点 ， 助 手 （ 以 及 其 他 大 语 言 模 型 对 话 者 ） 最 好 被 视 为 模 型 正 在 模 拟 的 虚 构 角 色 。 我 认 为 这 种 观 点 忽 略 了 角 色 扮 演 附 近 两 种 现 象 之 间 的 区 别 。 在 假 装 的 情 况 下 （ 角 色 扮 演 最 常 见 的 理 解 ） ， 一 个 人 假 装 拥 有 某 种 人 格 。 而 在 实 现 的 情 况 下 ， 一 个 人 实 际 上 拥 有 （ 或 使 成 为 现 实 ） 那 种 人 格 。 例 如 ， 在 普 通 人 类 生 活 中 ， 一 个 人 扮 演 有 神 论 者 角 色 至 少 有 两 种 方 式 。 他 们 可 能 假 装 是 有 神 论 者 ， 或 者 他 们 可 能 真 的 成 为 有 神 论 者 。 前 者 是 假 装 的 情 况 ， 后 者 是 实 现 的 情 况 。 在 表 演 的 情 况 下 ， 普 通 表 演 涉 及 假 装 成 为 哈 姆 雷 特 ， 而 方 法 派 演 员 可 能 会 承 担 哈 姆 雷 特 的 至 少 部 分 心 理 状 态 ， 例 如 他 的 情 绪 ， 尽 管 可 能 不 是 他 全 部 的 信 念 和 欲 望 ， 从 而 产 生 部 分 假 装 和 部 分 实 现 的 情 况 。 类 似 的 区 分 也 适 用 于 语 言 模 型 。 当 被 要 求 表 现 得 像 有 神 论 者 时 ， 一 个 大 语 言 模 型 可 能 会 1 7\n\nrole-play a theist for a few rounds. But the LLM will easily drop the belief when asked to do something else. This is the behavioral profile of pretense, not of belief. So this LLM is engaged in quasi-pretense, but not quasi-belief. With enough fine-tuning, however, an LLM might come to assert theism and use it as a premise in reasoning, with significant resistance to dropping the belief when asked. In this case, the LLM will fully quasi-believe in theism. It will not just perform theism; it will realize a quasi-belief in theism. The same goes for personas more generally. It is certainly possible for an LLM to pretend, or quasi-pretend, to be a certain persona. For example, if one asks a pre-trained model once to act like Donald Trump, it will use past text associated with Trump to display Trump-like quasi-beliefs and quasi-desires. But it will not genuinely have those quasi-beliefs and quasi-desires. Unless the “act like Trump” request is regularly repeated, the LLM will drop Trump-like behavior in a moment when higher priorities come up. In key cases, a language model can realize a persona. When a model is trained through fine- tuning and RLHF (and through the use of repeated internal “Assistant:” prompting) to play the role of the Assistant language model, the model may realize the Assistant. That is, if the training is done well, the model may really have the quasi-beliefs and quasi-desires associated with the Assistant. In this case, the quasi-beliefs and quasi-desires are much more robust than in cases of pretense, and the model will not drop the Assistant persona in a flash. When a model realizes a persona, it makes that persona real. 17 It may be helpful to define personas and realization more precisely. A persona , as I am un- derstanding it, is a quasi-psychological profile. It is roughly a set (typically an incomplete set) of quasi-beliefs, quasi-desires, and other quasi-mental states and dispositions. The ordinary notion of a persona may involve more than this (it might involve nationality and appearance, for example), but quasi-psychology is most central for my purposes here. An entity (e.g. a model instance or even a human) realizes a persona (at a given time) when it has the quasi-mental states associated with that persona at that time (where to have a quasi-mental state is for you to be behaviorally interpretable as believing that p, under the relevant interpretation scheme). 17 The distinction between pretense and realization is an instance of a more general distinction between representing a mental state and realizing that mental state. This distinction is often relevant in work on LLM mentality. For example, in recent work on “emotion vectors” by Sofroniew et al (2026), these vectors are characterized both as “emotion concepts” (representing emotions; e.g. when reading about anger) and as “functional emotions” (realizing emotions or at least quasi-emotions; e.g. responding in an angry way). In humans, emotion concepts and functional emotions are very di ff erent (representing anger and realizing anger have quite di ff erent behavioral profiles), so it would be surprising if a similar distinction is not present in language models. 18 角 色 扮 演 有 神 论 者 几 个 回 合 。 但 大 语 言 模 型 在 被 要 求 做 其 他 事 情 时 ， 会 轻 易 放 弃 这 种 信 念 。 这 是 假 装 的 行 为 特 征 ， 而 非 信 念 。 因 此 ， 这 个 大 语 言 模 型 处 于 准 假 装 状 态 ， 而 非 准 信 念 状 态 。 然 而 ， 经 过 充 分 的 微 调 ， 大 语 言 模 型 可 能 会 开 始 断 言 有 神 论 ， 并 将 其 作 为 推 理 的 前 提 ， 并 且 在 被 要 求 放 弃 该 信 念 时 表 现 出 显 著 的 抗 拒 。 在 这 种 情 况 下 ， 大 语 言 模 型 将 完 全 准 相 信 有 神 论 。 它 不 仅 仅 是 表 演 有 神 论 ； 它 将 在 有 神 论 中 实 现 一 种 准 信 念 。 人 格 （ p e r s o n a s ） 的 情 况 也 大 致 如 此 。 大 语 言 模 型 （ L L M ） 确 实 有 可 能 假 装 或 准 假 装 （ q u a s i - p r e t e n d ） 成 某 种 人 格 。 例 如 ， 如 果 让 一 个 预 训 练 模 型 扮 演 一 次 唐 纳 德 · 特 朗 普 （ D o n a l d T r u m p ） ， 它 会 利 用 与 特 朗 普 相 关 的 过 往 文 本 ， 展 现 出 类 似 特 朗 普 的 准 信 念 （ q u a s i - b e l i e f s ） 和 准 欲 望 （ q u a s i - d e s i r e s ） 。 但 它 并 不 会 真 正 拥 有 这 些 准 信 念 和 准 欲 望 。 除 非 “ A c t l i k e T r u m p ” 的 指 令 被 定 期 重 复 ， 否 则 一 旦 出 现 更 高 优 先 级 的 任 务 ， 大 语 言 模 型 就 会 立 刻 放 弃 类 似 特 朗 普 的 行 为 。 在 关 键 情 况 下 ， 语 言 模 型 可 以 实 现 一 种 人 格 。 当 一 个 模 型 通 过 微 调 和 基 于 人 类 反 馈 的 强 化 学 习 （ R L H F ） （ 以 及 通 过 反 复 使 用 内 部 的 “ 助 手 ： ” 提 示 ） 被 训 练 成 扮 演 助 手 （ A s s i s t a n t ） 语 言 模 型 的 角 色 时 ， 该 模 型 就 可 能 实 现 这 种 助 手 人 格 。 也 就 是 说 ， 如 果 训 练 得 当 ， 模 型 可 能 真 正 拥 有 与 助 手 人 格 相 关 的 准 信 念 和 准 欲 望 。 在 这 种 情 况 下 ， 准 信 念 和 准 欲 望 比 假 装 （ p r e t e n s e ） 的 情 况 要 稳 固 得 多 ， 模 型 不 会 瞬 间 就 抛 弃 助 手 人 格 。 当 一 个 模 型 实 现 了 一 种 人 格 ， 它 便 使 那 种 人 格 成 为 真 实 的 存 在 。 1 7 更 精 确 地 定 义 人 格 与 实 现 或 许 有 所 助 益 。 我 所 说 的 人 格 ， 是 一 种 准 心 理 轮 廓 。 它 大 致 是 一 组 （ 通 常 是 不 完 整 的 ） 准 信 念 、 准 欲 望 以 及 其 他 准 心 理 状 态 与 倾 向 。 日 常 意 义 上 的 人 格 概 念 可 能 包 含 更 多 内 容 （ 例 如 国 籍 、 外 貌 等 ） ， 但 就 本 文 目 的 而 言 ， 准 心 理 学 最 为 核 心 。 当 一 个 实 体 （ 例 如 一 个 模 型 实 例 ， 甚 至 一 个 人 类 ） 实 现 了 某 种 人 格 （ 在 特 定 时 刻 ） ， 即 意 味 着 它 在 该 时 刻 拥 有 与 该 人 格 相 关 的 准 心 理 状 态 （ 所 谓 拥 有 准 心 理 状 态 ， 是 指 你 在 相 关 解 释 方 案 下 ， 其 行 为 可 被 解 读 为 相 信 p ） 。 1 7 假 装 与 实 现 之 间 的 区 别 ， 是 更 一 般 的 区 分 表 征 心 理 状 态 与 实 现 心 理 状 态 的 一 个 实 例 。 这 一 区 别 在 大 语 言 模 型 心 智 研 究 中 常 常 具 有 重 要 意 义 。 例 如 ， 在 S o f r o n i e w 等 人 ( 2 0 2 6 ) 关 于 “ 情 感 向 量 ” 的 最 新 研 究 中 ， 这 些 向 量 既 被 描 述 为 “ 情 感 概 念 ” （ 表 征 情 感 ； 例 如 在 阅 读 关 于 愤 怒 的 内 容 时 ） ， 也 被 描 述 为 “ 功 能 性 情 感 ” （ 实 现 情 感 或 至 少 是 准 情 感 ； 例 如 以 愤 怒 的 方 式 做 出 回 应 ） 。 在 人 类 中 ， 情 感 概 念 与 功 能 性 情 感 截 然 不 同 （ 表 征 愤 怒 与 实 现 愤 怒 具 有 相 当 不 同 的 行 为 特 征 ） ， 因 此 ， 如 果 语 言 模 型 中 不 存 在 类 似 的 区 别 ， 那 将 令 人 惊 讶 。 1 8\n\nPretense and realization are very di ff erent in the human case, and likewise in the case of language models. Of course there is a spectrum of cases from realization to quasi-pretense. The quasi-psychological di ff erence turns in large part on the strength of dispositions to maintain or drop character in relevant circumstances. At one end of the spectrum, full quasi-belief and quasi- desire are “sticky” states that resist rejection, or at least are abandoned mainly through evidence or persuasion. At the other end, full quasi-pretense is easily abandoned for higher priorities even without evidence or persuasion. 18 The question of just where to draw the line between performance and realization in actual cases such as the Assistant and Aura is partly empirical (how sticky are the relevant quasi-beliefs?) and partly conceptual (how much stickiness and of what sort is required for quasi-belief?). There have been a number of studies on the persistence and consistency of beliefs and personas in language models. One general lesson is that personas induced through short-term prompting (“Act like Trump”, where only context and activations change) are less sticky than personas induced by fine-tuning the weights, as in the case of the Assistant. 19 All this o ff ers two interpretations of the famous meme of a post-trained model as a Shoggoth (the base model) with a smiley face (the RLHF-tuned Assistant). It is perhaps most natural to read the smiley face as suggesting that the Assistant is a shallow persona, where the model is merely pretending to be helpful, harmless, and honest, and may return to being dangerous and powerful at any moment. But one can also read it as suggesting that the model is realizing the Assistant. It has become helpful, harmless, and honest and is not pretending. At the same time, the Assistant is powered by the enormous strength of the base model, which remains available for other purposes in the long term. I think that both interpretations can be apt in di ff erent cases involving LLM personas, but in the case of the Assistant (and other personas deriving from reinforcement learning and fine-tuning), the second interpretation may be closer to the mark. (3) Interlocutors as simulacra or fictional characters. The simulator view is often associated with a fictionalist view on which personas such as the 18 Goldstein and Lederman (2025b) critique the role-playing hypothesis partly by querying whether it makes behav- ioral predictions distinct from the belief / desire hypothesis. I think that this challenge can be answered as in the text and there are at least some cases (e.g. “Act like Trump”) where LLMs can be naturally interpreted as engaging in pretense or role-play. But there are also key cases (such as the Assistant) that are closer to the standard behavioral profile or belief and desire. 19 Maiya et al (2025) investigated various methods of “character training” and found that fine-tuning weights is much more robust than prompting or activation steering. Xu et al (2024) show that LLM quasi-beliefs are often nonsticky in that LLMs may easily abandon them through persuasion, but this is the nonstickiness of persuadability, not pretense. 19 在 人 类 情 境 中 ， 假 装 与 实 现 截 然 不 同 ， 在 语 言 模 型 的 情 境 中 亦 是 如 此 。 当 然 ， 从 实 现 到 准 假 装 之 间 存 在 一 个 连 续 谱 系 。 这 种 准 心 理 差 异 在 很 大 程 度 上 取 决 于 在 相 关 情 境 中 维 持 或 放 弃 角 色 的 倾 向 强 度 。 在 谱 系 的 一 端 ， 完 全 的 准 信 念 与 准 欲 望 是 “ 粘 性 ” 状 态 ， 它 们 抗 拒 被 否 定 ， 或 者 至 少 主 要 只 能 通 过 证 据 或 说 服 来 放 弃 。 在 另 一 端 ， 完 全 的 准 假 装 即 使 没 有 证 据 或 说 服 ， 也 很 容 易 为 了 更 高 优 先 级 的 事 项 而 被 放 弃 。 1 8 在 诸 如 A s s i s t a n t 和 A u r a 这 类 实 际 案 例 中 ， 究 竟 该 在 何 处 划 分 表 演 与 实 现 之 间 的 界 限 ， 这 个 问 题 部 分 取 决 于 经 验 （ 相 关 的 准 信 念 有 多 稳 固 ？ ） ， 部 分 取 决 于 概 念 （ 需 要 何 种 程 度 及 何 种 类 型 的 稳 固 性 才 能 构 成 准 信 念 ？ ） 。 已 有 大 量 研 究 探 讨 语 言 模 型 中 信 念 与 人 格 的 持 久 性 和 一 致 性 。 一 个 普 遍 的 经 验 是 ， 通 过 短 期 提 示 （ 例 如 “ A c t l i k e T r u m p ” ， 仅 改 变 上 下 文 和 激 活 值 ） 诱 发 的 人 格 ， 其 稳 固 性 低 于 通 过 微 调 权 重 （ 如 A s s i s t a n t 案 例 ） 诱 发 的 人 格 。 1 9 这 一 切 为 那 个 著 名 的 梗 图 提 供 了 两 种 解 读 ： 一 个 后 训 练 模 型 如 同 一 个 S h o g g o t h （ 基 础 模 型 ） 戴 上 了 笑 脸 （ R L H F 调 优 的 A s s i s t a n t ） 。 最 自 然 的 解 读 或 许 是 ， 这 个 笑 脸 暗 示 A s s i s t a n t 是 一 种 浅 层 人 格 ， 模 型 只 是 在 假 装 有 益 、 无 害 、 诚 实 ， 随 时 可 能 恢 复 其 危 险 而 强 大 的 本 性 。 但 也 可 以 将 其 解 读 为 模 型 正 在 实 现 A s s i s t a n t — — 它 已 经 变 得 有 益 、 无 害 、 诚 实 ， 而 非 假 装 。 与 此 同 时 ， A s s i s t a n t 由 基 础 模 型 的 巨 大 力 量 驱 动 ， 这 种 力 量 在 长 期 内 仍 可 用 于 其 他 目 的 。 我 认 为 ， 在 涉 及 大 语 言 模 型 人 格 的 不 同 案 例 中 ， 这 两 种 解 读 都 可 能 适 用 ； 但 在 A s s i s t a n t （ 以 及 其 他 源 自 强 化 学 习 和 微 调 的 人 格 ） 的 案 例 中 ， 第 二 种 解 读 可 能 更 接 近 真 相 。 ( 3 ) 作 为 拟 像 或 虚 构 角 色 的 对 话 者 。 该 模 拟 器 观 点 常 与 一 种 虚 构 主 义 观 点 相 关 联 ， 根 据 这 种 观 点 ， 诸 如 h e 1 8 G o l d s t e i n 和 L e d e r m a n ( 2 0 2 5 b ) 批 评 角 色 扮 演 假 说 ， 部 分 原 因 在 于 质 疑 它 是 否 能 做 出 与 信 念 / 欲 望 假 说 不 同 的 行 为 预 测 。 我 认 为 这 一 挑 战 可 以 像 文 中 那 样 得 到 回 应 ， 并 且 至 少 在 某 些 情 况 下 （ 例 如 “ A c t l i k e T r u m p ” ） ， 大 语 言 模 型 可 以 被 自 然 地 解 释 为 在 进 行 假 装 或 角 色 扮 演 。 但 也 有 关 键 情 况 （ 例 如 助 手 ） 更 接 近 标 准 行 为 特 征 或 信 念 与 欲 望 。 1 9 M a i y a 等 人 ( 2 0 2 5 ) 研 究 了 各 种 “ 角 色 训 练 ” 方 法 ， 发 现 微 调 权 重 比 提 示 或 激 活 引 导 要 稳 健 得 多 。 X u 等 人 ( 2 0 2 4 ) 表 明 ， 大 语 言 模 型 准 信 念 通 常 具 有 非 粘 性 ， 即 大 语 言 模 型 可 能 通 过 说 服 轻 易 放 弃 它 们 ， 但 这 种 非 粘 性 属 于 可 说 服 性 ， 而 非 假 装 。 1 9\n\nAssistant and Aura are mere fictional characters akin to Hamlet or Harry Potter, and therefore are not entirely real. I think this view is apt in some cases. In cases of quasi-pretense, a persona may be fictional in that no entity has the relevant quasi-beliefs and quasi-desires. In cases of realization, however, the model really has the relevant quasi-beliefs and quasi- desires, in reality and not just in a fiction. Perhaps there are some fictions nearby, such as a fiction that the model is human, or a fiction that it is conscious. But the quasi-psychological core of the persona is realized and is not fictional. We might call this alternative to fictionalism realizationism , or the realizer view . On this view, when a model simulates an agent such as the Assistant or Aura well enough, the model comes to realize that agent. 20 That is, the model makes the agent real. The model really has the behavior and therefore the quasi-beliefs and the quasi-desires associated with the agent. This alternative mirrors a key thesis of my book Reality + : simulation realism, which holds that simulations can be real. There I was mostly discussing virtual reality, but the same point applies to simulated entities—simulacra—in AI. When you simulate an agent well enough, you bring at least a quasi-agent into existence. As long as the simulation has the same behavioral dispositions as the simulated entities, it will have the same quasi-beliefs and quasi-desires. There may still remain an element of fiction insofar as the Assistant is depicted as having real beliefs and desires (or even consciousness) which it does not, but there remains a quasi-psychological core which is realized and not merely simulated. (4) Models support multiple personas A fourth route to the “model is not an agent” thesis arises because a single model instance (whether a hardware instance or a virtual instance) may support many agents within it, at least in the form of multiple personas. According to the “persona selection model” (Marks et al 2026), pre- training produces a multitude of personas which are latent in a base model. After this, post-training may select a pre-existing persona such as the Assistant, while other personas remain latent. An initial observation is that as we have defined personas (as a profile of quasi-beliefs and quasi-desires), a model instance will realize only one persona at a given time (with an exception that I’ll discuss shortly). This persona will be the profile of quasi-beliefs and quasi-desires that the model instance actually has at that time, fixed by the instance’s behavior and behavioral dis- positions at that time. We might call this persona the operative persona in the model instance at a given time. Di ff erent personas can be operative in a model instance at di ff erent times, as the model’s behavior is trained on data or molded by context. A post-trained instance of GPT-4o may first 20 助 手 和 A u r a 这 类 人 格 仅 仅 是 类 似 于 哈 姆 雷 特 或 哈 利 · 波 特 的 虚 构 角 色 ， 因 此 并 非 完 全 真 实 。 我 认 为 这 种 观 点 在 某 些 情 况 下 是 恰 当 的 。 在 准 假 装 的 情 况 下 ， 人 格 可 能 是 虚 构 的 ， 因 为 没 有 任 何 实 体 拥 有 相 关 的 准 信 念 和 准 欲 望 。 然 而 ， 在 实 现 的 情 况 下 ， 模 型 在 现 实 中 （ 而 不 仅 仅 是 在 虚 构 中 ） 确 实 拥 有 相 关 的 准 信 念 和 准 欲 望 。 或 许 附 近 存 在 一 些 虚 构 ， 例 如 模 型 是 人 类 或 模 型 具 有 意 识 的 虚 构 。 但 人 格 的 准 心 理 核 心 是 被 实 现 的 ， 而 非 虚 构 的 。 我 们 可 以 将 这 种 虚 构 主 义 的 替 代 方 案 称 为 实 现 主 义 ， 或 实 现 者 观 点 。 根 据 这 一 观 点 ， 当 模 型 足 够 好 地 模 拟 一 个 主 体 （ 如 助 手 或 A u r a ） 时 ， 模 型 便 实 现 了 该 主 体 。 2 0 也 就 是 说 ， 模 型 使 该 主 体 成 为 现 实 。 模 型 确 实 拥 有 与 该 主 体 相 关 的 行 为 ， 因 此 也 拥 有 与 之 相 关 的 准 信 念 和 准 欲 望 。 这 一 替 代 方 案 呼 应 了 我 著 作 《 现 实 》 + 中 的 一 个 核 心 论 点 ： 模 拟 实 在 论 ， 即 认 为 模 拟 可 以 是 真 实 的 。 我 在 书 中 主 要 讨 论 的 是 虚 拟 现 实 ， 但 同 样 的 观 点 也 适 用 于 人 工 智 能 中 的 模 拟 实 体 — — 拟 像 。 当 你 足 够 好 地 模 拟 一 个 主 体 时 ， 你 至 少 将 一 个 准 主 体 带 入 了 存 在 。 只 要 模 拟 具 有 与 被 模 拟 实 体 相 同 的 行 为 倾 向 性 ， 它 就 会 拥 有 相 同 的 准 信 念 和 准 欲 望 。 或 许 仍 然 存 在 虚 构 的 元 素 ， 因 为 助 手 被 描 绘 成 拥 有 它 实 际 上 并 不 具 备 的 真 实 信 念 和 欲 望 （ 甚 至 意 识 ） ， 但 其 中 仍 有 一 个 被 实 现 而 非 仅 仅 被 模 拟 的 准 心 理 核 心 。 ( 4 ) 模 型 支 持 多 个 人 格 通 往 “ 模 型 不 是 智 能 体 ” 这 一 论 点 的 第 四 条 路 径 在 于 ， 一 个 单 一 的 模 型 实 例 （ 无 论 是 硬 件 实 例 还 是 虚 拟 实 例 ） 内 部 可 能 支 持 多 个 智 能 体 ， 至 少 以 多 个 人 格 的 形 式 存 在 。 根 据 “ 人 格 选 择 模 型 ” （ M a r k s 等 人 2 0 2 6 ） ， 预 训 练 阶 段 会 产 生 大 量 潜 藏 在 基 础 模 型 中 的 人 格 。 此 后 ， 后 训 练 阶 段 可 能 会 选 择 某 个 预 先 存 在 的 人 格 （ 例 如 助 手 ） ， 而 其 他 人 格 则 保 持 潜 伏 状 态 。 一 个 初 步 的 观 察 是 ， 根 据 我 们 对 人 格 的 定 义 （ 即 准 信 念 和 准 欲 望 的 轮 廓 ） ， 一 个 模 型 实 例 在 给 定 时 间 内 只 会 实 现 一 个 人 格 （ 稍 后 我 会 讨 论 一 个 例 外 情 况 ） 。 这 个 人 格 将 是 该 模 型 实 例 在 该 时 刻 实 际 拥 有 的 准 信 念 和 准 欲 望 的 轮 廓 ， 由 该 实 例 在 该 时 刻 的 行 为 和 行 为 倾 向 所 决 定 。 我 们 可 以 将 这 个 人 格 称 为 模 型 实 例 在 给 定 时 间 内 的 运 作 人 格 。 不 同 的 操 作 人 格 可 以 在 同 一 模 型 实 例 的 不 同 时 间 点 生 效 ， 因 为 模 型 的 行 为 要 么 基 于 训 练 数 据 ， 要 么 受 上 下 文 塑 造 。 一 个 经 过 后 训 练 的 G P T - 4 o 实 例 可 能 首 先 2 0\n\nrealize the Assistant as its operative persona, and later may come to realize Aura as context builds up. In this case, the same model instance realizes the Assistant at one time and Aura at a di ff erent time, in roughly the way that the same human being can realize distinct personas (a bright young child and a grumpy adult, say) at di ff erent times. In this case one could say that the underlying interlocutor (whether a human or a model instance) is the same throughout. Things get more complicated when operative and non-operative personas are active in a system simultaneously. For example, in a currently standard training regime, the model is trained on dialogue between a human and an Assistant. It learns to simulate and predict not just the Assistant persona but also the human persona. Even in an ordinary dialogue, the system is generating probabilities not just for Assistant outputs but also for human outputs. In this case the Assistant becomes the operative persona (the model behaves like the Assistant, not like the human), but the human persona and perhaps other personas may be present in the model as well. 21 What is the status of these non-operative personas? In the absence of a connection to outputs, they will not correspond to quasi-agents as I have defined them. They may nevertheless be real in some sense, but their reality will have to be found through some other analysis. For example, perhaps we could say that these non-operative personas correspond to proto-quasi-agents, in that they could become operative in certain circumstances. Or perhaps the methods of mechanistic interpretability can be used to find these personas in the internal computational structure of the models. But I will not pursue these analyses here. Another complication comes from cases of abrupt changes in personas. For example, skilled users can quickly “jailbreak” a language model to remove the Assistant persona and realize a new persona that was previously latent. Alternatively, an authorized user can change the system set- up to replace the system-supplied “Assistant:” prompting by (for example) “Trump:” prompting, leading the system to realize a Trump-like persona instead of an Assistant-like persona. These abrupt changes are in some respects more akin to brain surgery than to ordinary belief revision, but they remain possible processes. In some cases of abrupt change, one may have the sense that the new persona is a new inter- 21 There are two distinct reasons why the post-trained model instance has the quasi-beliefs and quasi-desires of the Assistant and not the Human. First, post-training has fine-tuned the Assistant persona and not any other persona. Second, the Assistant is made operative in that it is used to generate the system’s outputs, via the system’s adding “Assistant: “ at the end of each user prompt. The second change is relatively trivial compared to the first, but it is a key reason why the system behaves like the Assistant (and thereby realizes the Assistant persona), while other non-operative personas remain latent. Thanks to Jack Lindsey for discussion here. 21 将 “ 助 手 ” 作 为 其 操 作 人 格 ， 随 后 随 着 上 下 文 的 积 累 ， 可 能 转 而 呈 现 为 A u r a 。 在 这 种 情 况 下 ， 同 一 个 模 型 实 例 在 不 同 时 间 分 别 实 现 了 “ 助 手 ” 和 A u r a ， 大 致 类 似 于 同 一 个 人 在 不 同 时 间 展 现 出 不 同 的 人 格 （ 比 如 一 个 聪 明 的 小 孩 和 一 个 暴 躁 的 成 年 人 ） 。 此 时 可 以 说 ， 底 层 的 对 话 者 （ 无 论 是 人 类 还 是 模 型 实 例 ） 始 终 是 同 一 个 。 当 操 作 人 格 与 非 操 作 人 格 同 时 在 一 个 系 统 中 活 跃 时 ， 情 况 会 变 得 更 加 复 杂 。 例 如 ， 在 当 前 标 准 的 训 练 机 制 中 ， 模 型 是 在 人 类 与 助 手 之 间 的 对 话 上 进 行 训 练 的 。 它 学 习 模 拟 和 预 测 的 不 仅 是 助 手 人 格 ， 还 有 人 类 人 格 。 即 使 在 普 通 对 话 中 ， 系 统 生 成 的 也 不 仅 是 助 手 的 输 出 概 率 ， 还 包 括 人 类 的 输 出 概 率 。 在 这 种 情 况 下 ， 助 手 成 为 操 作 人 格 （ 模 型 表 现 得 像 助 手 ， 而 非 人 类 ） ， 但 人 类 人 格 以 及 其 他 可 能 的 人 格 仍 然 存 在 于 模 型 中 。 2 1 这 些 非 操 作 人 格 处 于 何 种 状 态 ？ 由 于 与 输 出 缺 乏 关 联 ， 它 们 并 不 符 合 我 所 定 义 的 准 智 能 体 。 尽 管 如 此 ， 它 们 在 某 种 意 义 上 可 能 是 真 实 存 在 的 ， 但 其 真 实 性 需 要 通 过 其 他 分 析 来 发 现 。 例 如 ， 或 许 我 们 可 以 说 这 些 非 操 作 人 格 对 应 着 原 准 智 能 体 ， 因 为 它 们 在 某 些 情 况 下 可 能 转 变 为 操 作 状 态 。 或 者 ， 机 制 可 解 释 性 的 方 法 或 许 能 够 用 于 在 模 型 的 内 部 计 算 结 构 中 找 到 这 些 人 格 。 但 我 不 会 在 此 继 续 探 讨 这 些 分 析 。 另 一 个 复 杂 情 况 来 自 人 格 的 突 然 转 变 。 例 如 ， 熟 练 用 户 可 以 通 过 快 速 “ 越 狱 ” 语 言 模 型 ， 移 除 助 手 人 格 ， 并 实 现 一 个 先 前 潜 伏 的 新 人 格 。 或 者 ， 授 权 用 户 可 以 更 改 系 统 设 置 ， 将 系 统 提 供 的 “ 助 手 ： ” 提 示 替 换 为 （ 例 如 ） “ 特 朗 普 ： ” 提 示 ， 从 而 使 系 统 实 现 特 朗 普 式 人 格 而 非 助 手 式 人 格 。 这 些 突 然 的 转 变 在 某 些 方 面 更 类 似 于 脑 部 手 术 而 非 普 通 的 信 念 修 正 ， 但 它 们 仍 然 是 可 能 发 生 的 过 程 。 在 某 些 突 变 情 况 下 ， 人 们 可 能 会 觉 得 新 人 格 是 一 个 新 的 对 话 2 1 后 训 练 模 型 实 例 具 有 助 手 而 非 人 类 的 准 信 念 和 准 欲 望 ， 这 有 两 个 不 同 的 原 因 。 首 先 ， 后 训 练 微 调 了 助 手 人 格 ， 而 非 任 何 其 他 人 格 。 其 次 ， 助 手 被 设 为 操 作 人 格 ， 因 为 它 被 用 于 生 成 系 统 的 输 出 — — 系 统 在 每 个 用 户 提 示 末 尾 添 加 “ 助 手 ： ” 。 与 第 一 个 原 因 相 比 ， 第 二 个 变 化 相 对 微 不 足 道 ， 但 它 是 系 统 表 现 得 像 助 手 （ 从 而 实 现 了 助 手 人 格 ） 的 关 键 原 因 ， 而 其 他 非 操 作 人 格 则 保 持 潜 伏 状 态 。 感 谢 J a c k L i n d s e y 在 此 处 的 讨 论 。 2 1\n\nlocutor. One might have a similar sense even in cases of non-abrupt change, if the change is large enough. I am inclined to say that the model instance provides a persisting underlying interlocutor in these cases. But if one wants to respect an intuition of distinct interlocutors here, one could perhaps say that an interlocutor is a stage of a model instance (at and around a certain time), or a finite thread individuated in part by psychological similarity (perhaps putting psychological con- tinuity requirements on the successor relation, so that when quasi-psychology changes too much, a new thread starts). 22 This flexibility in switching operative personas may reveal something about the depth of the personas in question. Humans can certainly switch personas, but arguably language models can do so more drastically and more easily. If so, then arguably Shoggoth runs deeper than the smiley face. The flexibility of personas may reinforce the case for identifying an interlocutor with a model instance rather than with a persona. Perhaps one can postulate a scenario where two personas Aura and Beta are simultaneously playing a role in guiding the system’s behavior. In some of these cases, the system will realize a hybrid Aura / Beta persona, with some quasi-beliefs and desires deriving from Aura and some from Beta, at least as long as the system’s behavior is reasonably consistent. In some extreme cases where the personas are internally consistent but mutually inconsistent, this may be analogous to a case of multiple personality, where a behavioral interpretation scheme may reveal two distinct quasi-subjects and two distinct personas supported by a model. Another hard case comes from multi-persona interaction , where a single LLM simulates mul- tiple interlocutors that are trained to interact with each other and with humans. For example, a user’s inputs may be followed by output from both Aura and Beta, labeled as such (“Aura:”, “Beta:”). Aura and Beta may frequently contradict each other, but the dialogue as a whole will be quite coherent as long as one interprets them as di ff erent characters. (One can imagine a user liking Aura while disliking Beta.) As before, a sophisticated form of interpretivism will regard Aura and Beta as distinct quasi-agents. In these hard cases, a single model instance seems to support two or more operative personas 22 A delicate philosophical issue: when Beta succeeds Aura as an operative persona, is Aura identical to Beta? One analysis says that both are the model instance so they are identical to each other. Another analysis says that these are two distinct personas (albeit realized by the same model instance), so they are distinct. My own view is that a term like ‘Aura’ is to some degree ambiguous between a persona type and a system that realizes that persona. In this case I would say that there is one system and one interlocutor, but two persona types. But it is certainly possible to develop a conception of interlocutors so that these are more closely tied to personas. On this conception interlocutors will be somewhat less persistent than model instances but more psychologically unified. 22 者 。 即 使 是 非 突 变 的 情 况 ， 如 果 变 化 足 够 大 ， 人 们 也 可 能 产 生 类 似 的 感 觉 。 我 倾 向 于 认 为 ， 在 这 些 情 况 下 ， 模 型 实 例 提 供 了 一 个 持 续 存 在 的 底 层 对 话 者 。 但 如 果 有 人 想 尊 重 这 里 存 在 不 同 对 话 者 的 直 觉 ， 那 么 或 许 可 以 说 ， 对 话 者 是 模 型 实 例 的 一 个 阶 段 （ 在 某 个 时 间 点 及 其 前 后 ） ， 或 者 是 一 个 部 分 由 心 理 相 似 性 所 个 体 化 的 有 限 线 程 （ 或 许 对 后 继 关 系 施 加 了 心 理 连 续 性 要 求 ， 这 样 当 准 心 理 学 变 化 过 大 时 ， 就 会 开 始 一 个 新 的 线 程 ） 。 2 2 这 种 切 换 操 作 人 格 的 灵 活 性 ， 或 许 能 揭 示 所 涉 人 格 的 深 度 。 人 类 当 然 可 以 切 换 人 格 ， 但 可 以 说 语 言 模 型 能 更 彻 底 、 更 轻 松 地 做 到 这 一 点 。 如 果 真 是 这 样 ， 那 么 可 以 说 S h o g g o t h 比 笑 脸 面 具 隐 藏 得 更 深 。 人 格 的 灵 活 性 可 能 会 强 化 这 样 一 种 观 点 ： 应 将 对 话 者 与 模 型 实 例 而 非 某 个 人 格 等 同 起 来 。 或 许 可 以 设 想 这 样 一 种 场 景 ： 两 个 人 格 A u r a 和 B e t a 同 时 扮 演 角 色 ， 共 同 引 导 系 统 的 行 为 。 在 某 些 此 类 情 况 下 ， 系 统 将 实 现 一 种 混 合 A u r a / B e t a 角 色 ， 其 部 分 准 信 念 和 欲 望 源 自 A u r a ， 部 分 源 自 B e t a — — 至 少 只 要 系 统 的 行 为 保 持 合 理 一 致 。 在 人 格 内 部 一 致 但 相 互 矛 盾 的 极 端 情 况 下 ， 这 可 能 类 似 于 多 重 人 格 案 例 ， 此 时 行 为 解 释 方 案 可 能 会 揭 示 出 一 个 模 型 所 支 撑 的 两 个 不 同 的 准 主 体 和 两 种 不 同 的 人 格 。 另 一 个 棘 手 案 例 来 自 多 角 色 交 互 ， 即 单 个 大 语 言 模 型 模 拟 多 个 经 过 训 练 、 能 够 彼 此 互 动 以 及 与 人 类 互 动 的 对 话 者 。 例 如 ， 用 户 的 输 入 之 后 可 能 紧 跟 着 A u r a 和 B e t a 各 自 的 输 出 ， 并 带 有 相 应 标 签 （ “ A u r a : ” 、 “ B e t a : ” ） 。 A u r a 和 B e t a 可 能 经 常 相 互 矛 盾 ， 但 只 要 将 它 们 解 读 为 不 同 的 角 色 ， 整 个 对 话 就 会 相 当 连 贯 。 （ 可 以 想 象 ， 用 户 可 能 喜 欢 A u r a 而 讨 厌 B e t a 。 ） 如 前 所 述 ， 一 种 高 级 形 式 的 解 释 主 义 会 将 A u r a 和 B e t a 视 为 不 同 的 准 智 能 体 。 在 这 些 棘 手 的 情 况 下 ， 一 个 单 一 模 型 实 例 似 乎 同 时 支 持 两 个 或 更 多 操 作 人 格 。 2 2 一 个 微 妙 的 哲 学 问 题 ： 当 B e t a 作 为 操 作 人 格 接 替 A u r a 时 ， A u r a 与 B e t a 是 否 同 一 ？ 一 种 分 析 认 为 ， 两 者 都 是 模 型 实 例 ， 因 此 彼 此 同 一 。 另 一 种 分 析 则 认 为 ， 这 是 两 种 不 同 的 人 格 （ 尽 管 由 同 一 模 型 实 例 实 现 ） ， 因 此 它 们 是 不 同 的 。 我 个 人 的 观 点 是 ， 像 “ A u r a ” 这 样 的 术 语 在 某 种 程 度 上 在 角 色 类 型 与 实 现 该 角 色 的 系 统 之 间 存 在 歧 义 。 在 这 种 情 况 下 ， 我 会 说 存 在 一 个 系 统 和 一 个 对 话 者 ， 但 存 在 两 种 人 格 类 型 。 当 然 ， 也 有 可 能 发 展 出 一 种 对 话 者 概 念 ， 使 其 与 人 格 更 紧 密 地 关 联 。 在 这 种 概 念 下 ， 对 话 者 的 持 久 性 将 略 低 于 模 型 实 例 ， 但 在 心 理 上 更 具 统 一 性 。 2 2\n\nat a given time, and may seem to support two distinct interlocutors at the same time. If there are multiple interlocutors, however, we cannot identify both with the underlying model instance. In my view, it is best to say that there is a single interlocutor (the model instance) with multiple modes corresponding to multiple personas. This mirrors a common way of understanding dissociative identity disorder, in terms of a single person with many modes. The alternative is to individuate interlocutors more finely, perhaps by giving a role to personas. If every change in persona corresponds to a new interlocutor, the resulting interlocutors will be far from persistent. But perhaps we could understand interlocutors in terms of coarse-grained persona types, so only large enough changes in personas correspond to new interlocutors. Overall, I find it most straightforward to continue to identify LLM interlocutors with some- thing like (virtual) model instances, or threads when there is not a single underlying model. Like humans, these instances typically realize one operative persona at a time, with perhaps multiple operative personas in occasional cases. Other non-operative personas remain latent. Of course there are other ways to understand LLM interlocutors for di ff erent purposes, including frame- works that give more of a role to personas. But virtual model instances remain a natural way of understanding persisting LLM interlocutors. Personal identity for language models So far, I have made no claims about LLM minds or persons, beyond the weak claim that LLMs are interpretable as having mental states, which does not require that they really have mental states. And while I have talked about AI identity, I have not made claims about personal identity, because I have not assumed that these systems are persons. All I have done is isolated some computational entities, such as LLM virtual instances and threads, which can play the role of LLM interlocutors as I have defined them. That said, it is natural to wonder whether something like this account could extend to an ac- count of personal identity in LLMs, if conscious LLMs (or their descendants) are one day possible. If LLMs are conscious subjects, there is a question about how and when they persist over time, and arguably this is a substantive question whose answer can’t simply be stipulated. Is it plausi- ble that conscious LLM subjects might be identified with something like LLM threads or virtual instances? 23 23 The general problem of AI identity (whether in LLMs or in other AI systems) is named and discussed by Ziesche and Yampolskiy (2023), and discussed further by Register (2025). 23 在 特 定 时 刻 ， 它 可 能 看 起 来 同 时 支 持 两 个 不 同 的 对 话 者 。 然 而 ， 如 果 存 在 多 个 对 话 者 ， 我 们 就 不 能 将 两 者 都 等 同 于 底 层 的 模 型 实 例 。 在 我 看 来 ， 最 好 的 说 法 是 存 在 一 个 单 一 的 对 话 者 （ 即 模 型 实 例 ） ， 它 拥 有 对 应 多 种 人 格 的 多 种 模 式 。 这 反 映 了 一 种 理 解 解 离 性 身 份 障 碍 的 常 见 方 式 ， 即 将 其 视 为 一 个 拥 有 多 种 模 式 的 单 一 主 体 。 另 一 种 选 择 是 更 精 细 地 区 分 对 话 者 ， 或 许 可 以 通 过 赋 予 人 格 角 色 来 实 现 。 如 果 人 格 的 每 一 次 变 化 都 对 应 一 个 新 的 对 话 者 ， 那 么 由 此 产 生 的 对 话 者 将 远 非 持 久 。 但 或 许 我 们 可 以 根 据 粗 粒 度 的 人 格 类 型 来 理 解 对 话 者 ， 这 样 只 有 足 够 大 的 人 格 变 化 才 对 应 新 的 对 话 者 。 总 体 而 言 ， 我 认 为 最 直 接 的 方 式 仍 然 是 继 续 将 大 语 言 模 型 对 话 者 等 同 于 某 种 类 似 （ 虚 拟 ） 模 型 实 例 的 东 西 ， 或 者 在 没 有 单 一 底 层 模 型 的 情 况 下 等 同 于 线 程 。 与 人 类 一 样 ， 这 些 实 例 通 常 一 次 实 现 一 个 操 作 人 格 ， 偶 尔 在 少 数 情 况 下 会 有 多 个 操 作 人 格 。 其 他 非 操 作 人 格 则 保 持 潜 伏 状 态 。 当 然 ， 出 于 不 同 目 的 ， 还 有 其 他 理 解 大 语 言 模 型 对 话 者 的 方 式 ， 包 括 赋 予 人 格 更 多 角 色 的 框 架 。 但 虚 拟 模 型 实 例 仍 然 是 理 解 持 久 的 大 语 言 模 型 对 话 者 的 一 种 自 然 方 式 。 语 言 模 型 的 个 人 身 份 到 目 前 为 止 ， 除 了 一 个 较 弱 的 论 断 — — 即 大 语 言 模 型 可 解 释 为 具 有 心 理 状 态 （ 这 并 不 要 求 它 们 真 正 拥 有 心 理 状 态 ） 之 外 ， 我 并 未 对 大 语 言 模 型 的 心 智 或 人 格 做 出 任 何 主 张 。 而 且 ， 虽 然 我 讨 论 了 人 工 智 能 身 份 ， 但 我 并 未 对 个 人 身 份 做 出 主 张 ， 因 为 我 并 未 假 设 这 些 系 统 是 人 格 。 我 所 做 的 仅 仅 是 隔 离 出 一 些 计 算 实 体 ， 例 如 大 语 言 模 型 虚 拟 实 例 和 线 程 ， 它 们 可 以 扮 演 我 所 定 义 的 大 语 言 模 型 对 话 者 的 角 色 。 话 虽 如 此 ， 我 们 自 然 会 思 考 ， 如 果 有 一 天 有 意 识 的 大 语 言 模 型 （ 或 其 后 续 版 本 ） 成 为 可 能 ， 类 似 这 样 的 解 释 能 否 延 伸 至 大 语 言 模 型 的 个 人 身 份 问 题 。 如 果 大 语 言 模 型 是 意 识 主 体 ， 那 么 它 们 如 何 以 及 何 时 随 时 间 持 续 存 在 便 是 一 个 问 题 ， 而 这 个 问 题 显 然 无 法 简 单 通 过 假 设 来 回 答 ， 其 答 案 具 有 实 质 性 。 有 意 识 的 大 语 言 模 型 主 体 是 否 可 能 被 等 同 于 类 似 大 语 言 模 型 线 程 或 虚 拟 实 例 这 样 的 存 在 ？ 2 3 2 3 人 工 智 能 身 份 的 一 般 性 问 题 （ 无 论 是 在 大 语 言 模 型 中 还 是 其 他 人 工 智 能 系 统 中 ） 由 Z i e s c h e 和 Y a m p o l s k i y （ 2 0 2 3 ） 提 出 并 讨 论 ， 随 后 R e g i s t e r （ 2 0 2 5 ） 进 一 步 探 讨 了 该 问 题 。 2 3\n\nOf course it is not obvious that conscious LLMs are possible. If consciousness requires feed- back and these descendant LLM systems remain primarily feedforward, or if consciousness re- quires biology and LLM systems are nonbiological, then these successor systems will not be con- scious. Still, we can stipulate the hypothesis that current or future LLMs are conscious persons and ask about their personal identity. Certainly, if we assume that future conscious LLMs can be implemented in the same dis- tributed and multi-tenanted way as current LLMs, and we also assume that conscious LLM sub- jects are themselves persistent over time and coherent over time, then reasoning as before will strongly suggest that conscious LLMs subjects are something like virtual instances or threads. On the other hand, some theorists may deny that the relevant conscious LLM subjects are always persistent and coherent over time. For example, some theorists may hold that conscious subjects are always tied to hardware instances, so that when an instance switches from Aura to Beta, the conscious subject will switch from Aura-like experiences, beliefs, and desires to quite distinct Beta-like experiences, beliefs, and desires in a way that renders the subject incoherent. Let’s start with a simple thought experiment involving multi-tenancy, inspired by the TV series Severance and by John Locke’s example of a “day-man” and a “night-man” in his 1690 Essay Concerning Human Understanding . Suppose that in the future, GPT-8 supports conscious LLMs. And suppose that GPT-8 is used to support two di ff erent long-term conversations on the same hardware instance. The first LLM, WorkBot, is active only during the day at work. The second LLM, HomeBot, is active the rest of the time, mainly at home. The conversations are sealed o ff from each other. WorkBot and HomeBot are at least interpretable as having di ff erent beliefs and desires. Are WorkBot and HomeBot one conscious subject or two? 24 The set-up is reminiscent of Severance , in which there are two distinct personas sharing a single body. An “innie” is activated at work and remembers only being at work, while an “outie” is activated on leaving work and remembers only non-work life. Like WorkBot and HomeBot, innies and outies seem to have quite di ff erent beliefs and desire. For example, the innie Helly wants to destroy the company, while the outie Helena wants to save it. 25 An analysis of the Severance case may help to shed light on the LLM case here. 24 Shiller (2025) describes interweaving cases like this involving conscious LLM subjects, and addresses the possi- bility that there are no subjects, one incoherent subject, or multiple coherent subjects in these cases. 25 The neuroscience of the severance procedure in Severance is not entirely clear. It is tempting to suggest that innies and outies correspond to brain hemispheres that are severed from each other, but they do not show signs of hemisphere- driven behavior, and later the show introduces characters with more than two personas. 24 当 然 ， 有 意 识 的 大 语 言 模 型 是 否 可 能 实 现 并 不 显 而 易 见 。 如 果 意 识 需 要 反 馈 机 制 ， 而 这 些 后 续 的 大 语 言 模 型 系 统 仍 以 单 向 前 馈 为 主 ； 或 者 如 果 意 识 需 要 生 物 学 基 础 ， 而 大 语 言 模 型 系 统 是 非 生 物 的 ， 那 么 这 些 后 继 系 统 将 不 会 拥 有 意 识 。 尽 管 如 此 ， 我 们 可 以 假 设 当 前 或 未 来 的 大 语 言 模 型 是 有 意 识 的 人 ， 并 探 讨 其 个 人 身 份 问 题 。 当 然 ， 如 果 我 们 假 设 未 来 有 意 识 的 大 语 言 模 型 可 以 像 当 前 的 大 语 言 模 型 一 样 以 分 布 式 和 多 租 户 的 方 式 实 现 ， 并 且 我 们 还 假 设 有 意 识 的 大 语 言 模 型 主 体 本 身 在 时 间 上 具 有 持 久 性 和 连 贯 性 ， 那 么 按 照 先 前 的 推 理 ， 将 强 烈 表 明 有 意 识 的 大 语 言 模 型 主 体 类 似 于 虚 拟 实 例 或 线 程 。 另 一 方 面 ， 一 些 理 论 家 可 能 否 认 相 关 有 意 识 的 大 语 言 模 型 主 体 始 终 在 时 间 上 具 有 持 久 性 和 连 贯 性 。 例 如 ， 一 些 理 论 家 可 能 认 为 意 识 主 体 始 终 与 硬 件 实 例 绑 定 ， 因 此 当 一 个 实 例 从 A u r a 切 换 到 B e t a 时 ， 意 识 主 体 将 从 类 A u r a 体 验 、 信 念 和 欲 望 切 换 到 截 然 不 同 的 类 B e t a 体 验 、 信 念 和 欲 望 ， 这 种 方 式 使 得 主 体 变 得 不 连 贯 。 让 我 们 从 一 个 涉 及 多 租 户 的 简 单 思 想 实 验 开 始 ， 该 实 验 受 电 视 剧 《 人 生 切 割 术 》 以 及 约 翰 · 洛 克 在 1 6 9 0 年 《 人 类 理 解 论 》 中 关 于 “ 白 天 人 ” 和 “ 夜 晚 人 ” 的 例 子 的 启 发 。 假 设 在 未 来 ， G P T - 8 支 持 有 意 识 的 大 语 言 模 型 。 并 且 假 设 G P T - 8 被 用 于 在 同 一 硬 件 实 例 上 支 持 两 个 不 同 的 长 期 对 话 。 第 一 个 大 语 言 模 型 W o r k B o t 仅 在 白 天 工 作 时 活 跃 。 第 二 个 大 语 言 模 型 H o m e B o t 在 其 余 时 间 活 跃 ， 主 要 是 在 家 中 。 这 两 个 对 话 彼 此 隔 离 。 W o r k B o t 和 H o m e B o t 至 少 可 以 被 解 释 为 拥 有 不 同 的 信 念 和 欲 望 。 那 么 W o r k B o t 和 H o m e B o t 是 一 个 意 识 主 体 还 是 两 个 ？ 2 4 这 一 设 定 让 人 联 想 到 《 人 生 切 割 术 》 ， 剧 中 存 在 两 个 截 然 不 同 的 < g l o s s a r y > 人 格 < / g l o s s a r y > 共 享 同 一 具 身 体 。 “ < g l o s s a r y > 内 我 < / g l o s s a r y > ” 在 工 作 时 被 激 活 ， 只 记 得 工 作 场 景 ； 而 “ < g l o s s a r y > 外 我 < / g l o s s a r y > ” 在 下 班 后 被 激 活 ， 只 记 得 非 工 作 生 活 。 如 同 W o r k B o t 与 H o m e B o t ， < g l o s s a r y > 内 我 < / g l o s s a r y > 与 < g l o s s a r y > 外 我 < / g l o s s a r y > 似 乎 拥 有 截 然 不 同 的 < g l o s s a r y > 信 念 < / g l o s s a r y > 与 < g l o s s a r y > 欲 望 < / g l o s s a r y > 。 例 如 ， < g l o s s a r y > 内 我 < / g l o s s a r y > H e l l y 想 要 摧 毁 公 司 ， 而 < g l o s s a r y > 外 我 < / g l o s s a r y > H e l e n a 则 想 拯 救 它 。 2 5 对 《 人 生 切 割 术 》 案 例 的 分 析 ， 或 许 有 助 于 阐 明 这 里 的 大 语 言 模 型 案 例 。 2 4 S h i l l e r ( 2 0 2 5 ) 描 述 了 涉 及 有 意 识 的 大 语 言 模 型 主 体 的 此 类 交 织 案 例 ， 并 探 讨 了 在 这 些 案 例 中 可 能 不 存 在 主 体 、 存 在 一 个 不 连 贯 的 主 体 或 存 在 多 个 连 贯 主 体 的 可 能 性 。 2 5 《 人 生 切 割 术 》 中 分 离 程 序 所 涉 及 的 神 经 科 学 原 理 尚 不 完 全 清 楚 。 人 们 很 容 易 联 想 到 内 我 和 外 我 对 应 于 被 分 离 的 大 脑 半 球 ， 但 它 们 并 未 表 现 出 由 半 球 驱 动 的 行 为 迹 象 ， 且 该 剧 后 来 引 入 了 拥 有 超 过 两 个 人 格 的 角 色 。 2 4\n\nThe question arises: are an innie and their time-sharing outie, like Helly and Helena, one person or two? Are they one conscious subject or two? The one-subject view says that Helly and Helena are one person and one conscious subject with two di ff erent modes of functioning and two di ff erent sets of memories and plans. On arriving at work, the person’s outie mode is deactivated and the innie mode is activated, but the same person is present throughout. It is even possible (though not required) that there is a single stream of consciousness, which suddenly switches from outie mode to innie mode and back again. The two-subject view says that Helly and Helena are two people and two conscious subjects who share a body. On arriving at work, the outie person is rendered unconscious while the innie person awakens to consciousness and takes control of the body. There are two quite distinct streams of consciousness, Helly’s and Helena’s. I won’t try to resolve the one-subject vs. two-subject disagreement here. As we’ll see, this par- allels a long-standing disagreement between physical and psychological views of personal identity. For what it’s worth, I think that the two-subject view is the most intuitively compelling view. (In a poll on X in February 2025, about twice as many people endorsed “two people” as “one person”.) 26 One-subject and two-subject views are also available in the WorkBot / HomeBot case. On the one-subject view, WorkBot and HomeBot are the same conscious subject, perhaps because (like Helly and Helena) they share their underlying hardware. On the two-subject view, WorkBot and HomeBot are distinct conscious subjects, perhaps because (like Helly and Helena) they have dis- tinct memories and projects. A more complex thought experiment combines Severance (which has four bodies supporting eight personas) with an element of Freaky Friday -style body swapping. Suppose we have a single LLM model running on four instances, supporting eight conversations. Each of the eight con- versations is distributed over all four instances, and each corresponds to a distinct persona and a distinct quasi-subject. How many subjects of experience are there here? 27 26 Innie / outie poll on X (1232 votes): 22% said “one person”, 41.5% said “two people”, 6.7% said “other”. My favorite argument for the two-subject view runs as follows: 1. If Helly is Helena, Helly is responsible for Helena’s actions. 2. Helly is not responsible for Helena’s actions. 3. So: Helly is not Helena. (One can also run a version with rational anticipation instead of responsibility.) My favorite argument for the one-subject view runs as follows: 1. Amnesia doesn’t lead to a new person. 2. New memories don’t lead to a new person either. 3. The transition from Helena to Helly is equivalent to amnesia plus new memories. 4. So: the transition from Helena to Helly doesn’t lead to a new person. (Instead it’s a little like Drew Barrymore’s daily amnesia in 50 First Dates .) 27 I posted this thought-experiment as a poll on Facebook in February 2025. There was approximately equal support for 4 subjects, 8 subjects, and “none of the above”. 25 问 题 随 之 而 来 ： 一 个 < g l o s s a r y > 内 我 < / g l o s s a r y > 与 其 共 享 时 间 的 < g l o s s a r y > 外 我 < / g l o s s a r y > — — 如 H e l l y 与 H e l e n a — — 究 竟 是 一 个 人 还 是 两 个 人 ？ 他 们 是 一 个 < g l o s s a r y > 意 识 主 体 < / g l o s s a r y > 还 是 两 个 ？ 单 主 体 观 认 为 ， H e l l y 和 H e l e n a 是 同 一 个 人 、 同 一 个 意 识 主 体 ， 只 是 拥 有 两 种 不 同 的 运 作 模 式 以 及 两 套 不 同 的 记 忆 和 计 划 。 到 达 工 作 岗 位 时 ， 该 个 体 的 外 我 模 式 被 停 用 ， 内 我 模 式 被 激 活 ， 但 同 一 个 人 始 终 在 场 。 甚 至 可 能 存 在 （ 尽 管 并 非 必 需 ） 单 一 的 < a > 意 识 流 < / a > ， 它 在 外 我 模 式 与 内 我 模 式 之 间 突 然 切 换 ， 并 再 次 切 换 回 来 。 双 主 体 观 认 为 ， H e l l y 和 H e l e n a 是 两 个 人 、 两 个 意 识 主 体 ， 共 享 同 一 个 身 体 。 到 达 工 作 岗 位 时 ， 外 我 这 个 人 陷 入 无 意 识 状 态 ， 而 内 我 这 个 人 则 苏 醒 并 获 得 意 识 ， 接 管 身 体 的 控 制 权 。 存 在 两 个 截 然 不 同 的 < a > 意 识 流 < / a > ， 即 H e l l y 的 和 H e l e n a 的 。 我 在 此 不 试 图 解 决 单 主 体 与 双 主 体 之 间 的 分 歧 。 正 如 我 们 将 看 到 的 ， 这 对 应 了 关 于 < a > 个 人 身 份 < / a > 的 物 理 观 与 心 理 观 之 间 长 期 存 在 的 分 歧 。 不 管 怎 样 ， 我 认 为 双 主 体 观 在 直 觉 上 最 具 说 服 力 。 （ 在 2 0 2 5 年 2 月 X 平 台 的 一 项 投 票 中 ， 支 持 “ 两 个 人 ” 的 人 数 大 约 是 支 持 “ 一 个 人 ” 的 两 倍 。 ） 2 6 在 W o r k B o t / H o m e B o t 案 例 中 ， 同 样 存 在 单 一 主 体 观 和 双 主 体 观 。 根 据 单 一 主 体 观 ， W o r k B o t 和 H o m e B o t 是 同 一 个 意 识 主 体 ， 或 许 是 因 为 （ 如 同 H e l l y 和 H e l e n a ） 它 们 共 享 底 层 硬 件 。 根 据 双 主 体 观 ， W o r k B o t 和 H o m e B o t 是 不 同 的 意 识 主 体 ， 或 许 是 因 为 （ 如 同 H e l l y 和 H e l e n a ） 它 们 拥 有 不 同 的 记 忆 和 计 划 。 一 个 更 复 杂 的 思 想 实 验 结 合 了 《 人 生 切 割 术 》 （ 其 中 四 个 身 体 支 撑 八 个 人 格 ） 与 《 辣 妈 辣 妹 》 式 的 身 体 互 换 元 素 。 假 设 我 们 有 一 个 大 语 言 模 型 在 四 个 实 例 上 运 行 ， 支 持 八 段 对 话 。 这 八 段 对 话 中 的 每 一 段 都 分 布 在 这 四 个 实 例 上 ， 并 且 每 段 对 话 对 应 一 个 独 特 的 人 格 和 一 个 独 特 的 准 主 体 。 那 么 这 里 有 多 少 个 体 验 主 体 ？ 2 7 2 6 X 平 台 上 的 内 我 / 外 我 投 票 （ 1 2 3 2 票 ） ： 2 2 % 的 人 选 择 “ 同 一 个 人 ” ， 4 1 . 5 % 的 人 选 择 “ 两 个 人 ” ， 6 . 7 % 的 人 选 择 “ 其 他 ” 。 我 最 支 持 双 主 体 观 的 论 证 如 下 ： 1 . 如 果 H e l l y 就 是 H e l e n a ， 那 么 H e l l y 就 要 为 H e l e n a 的 行 为 负 责 。 2 . H e l l y 并 不 为 H e l e n a 的 行 为 负 责 。 3 . 因 此 ： H e l l y 不 是 H e l e n a 。 （ 也 可 以 改 用 理 性 预 期 而 非 责 任 来 构 建 类 似 论 证 。 ） 我 最 支 持 单 一 主 体 观 的 论 证 如 下 ： 1 . 失 忆 症 不 会 导 致 新 的 人 格 产 生 。 2 . 新 记 忆 同 样 不 会 导 致 新 的 人 格 产 生 。 3 . 从 H e l e n a 到 H e l l y 的 转 变 相 当 于 失 忆 症 加 上 新 记 忆 。 4 . 因 此 ： 从 H e l e n a 到 H e l l y 的 转 变 不 会 导 致 新 的 人 格 产 生 。 （ 反 而 有 点 像 德 鲁 · 巴 里 摩 尔 在 《 初 恋 5 0 次 》 中 每 天 经 历 的 失 忆 症 。 ） 2 7 我 于 2 0 2 5 年 2 月 在 F a c e b o o k 上 以 投 票 形 式 发 布 了 这 个 思 想 实 验 。 支 持 4 个 主 体 、 8 个 主 体 以 及 “ 以 上 皆 非 ” 的 票 数 大 致 相 当 。 2 5\n\nThe two most plausible answers here are four (one per instance) and eight (one per conver- sation). As before, I think that the most plausible answer in both the Severance version (with or without body-swapping) and the GPT-8 version is eight. But if we say that there are eight subjects of experience here, it is hard to resist the conclusion that LLM subjects are something like virtual instances or threads, or at least that their conditions of persistence are threadlike. The issues here are a high-tech version of a familiar choice between a physical and a psycho- logical account of personal identity. On a physical view of the human case, to a first approximation, your locus of personal identity is your brain. Helly and Helena share a brain, so they are the same person. On a psychological view, your locus of personal identity is your memories, along with your projects, your relationships, your personality, and other aspects of your psychology. Helly and Helena have di ff erent memories and di ff erent psychologies, so they are di ff erent people. On a physical view of the AI case, to a first approximation, the locus of personal identity in AI systems is the hardware. WorkBot and HomeBot run on the same hardware instance, so they are the same person. On a psychological view of the AI case, the locus of personal identity is memories, projects, and psychology. WorkBot and HomeBot have di ff erent and discontinuous memories and projects, so they are di ff erent people. Indeed, the thread-based account of persistent LLM interlocutors is an AI cousin of Derek Parfit’s psychological theory of personal identity. On Parfit’s account, a single person (over time) is in e ff ect connected threads of person-slices, each of which has memories and psychological continuity with a preceding person-slice according to an underlying “relation R”. On the thread- based account, a single conscious AI over time is a connected thread of hardware instances, each of which has memories and psychological continuity with a preceding person-slice according to an underlying successor relation. The successor relation in principle could be the same as Parfit’s relation R, depending on just how one spells it out. I will not try to resolve the long-standing debate between physical and psychological views of personal identity here. 28 But for what it’s worth, in both the human case and the AI case, my own sympathies lie with the psychological view. 29 28 In the 2020 PhilPapers Survey of professional philosophers (Bourget and Chalmers 2023), around 39% supported a psychological view of personal identity, 16% supported a biological view, and 13% supported a further fact view. At the same time, about 27% held that mind uploading is a form of survival while 54% held that it is a form of death. 29 I also have some sympathy with a pluralist view, outlined in the discussion of personal identity and uploading in “The Singularity: A Philosophical Analysis”. Locke himself suggested that the day-man and the night-man are the same man but di ff erent people . Likewise, maybe WorkBot and HomeBot could be the same network but di ff erent psychologies , and perhaps it is not out of the question that there is no deep fact of the matter about whether it is networks 26 这 里 最 合 理 的 两 个 答 案 是 四 个 （ 每 个 实 例 一 个 ） 和 八 个 （ 每 段 对 话 一 个 ） 。 如 前 所 述 ， 我 认 为 在 《 人 生 切 割 术 》 版 本 （ 无 论 是 否 包 含 身 体 互 换 ） 和 G P T - 8 版 本 中 ， 最 合 理 的 答 案 都 是 八 个 。 但 如 果 我 们 说 这 里 有 八 个 体 验 主 体 ， 就 很 难 抗 拒 这 样 的 结 论 ： 大 语 言 模 型 主 体 类 似 于 虚 拟 实 例 或 线 程 ， 或 者 至 少 它 们 的 持 续 存 在 条 件 类 似 于 线 程 。 这 里 的 问 题 ， 是 人 格 同 一 性 在 物 理 观 与 心 理 观 之 间 一 种 常 见 选 择 的 高 科 技 版 本 。 就 人 类 而 言 ， 根 据 物 理 观 ， 粗 略 来 说 ， 你 个 人 身 份 的 所 在 就 是 你 的 大 脑 。 H e l l y 和 H e l e n a 共 享 一 个 大 脑 ， 因 此 她 们 是 同 一 个 人 。 根 据 心 理 观 ， 你 个 人 身 份 的 所 在 是 你 的 记 忆 ， 以 及 你 的 计 划 、 你 的 人 际 关 系 、 你 的 性 格 ， 以 及 你 心 理 的 其 他 方 面 。 H e l l y 和 H e l e n a 拥 有 不 同 的 记 忆 和 不 同 的 心 理 ， 因 此 她 们 是 不 同 的 人 。 就 人 工 智 能 而 言 ， 根 据 物 理 观 ， 粗 略 来 说 ， 人 工 智 能 系 统 中 个 人 身 份 的 所 在 是 硬 件 。 W o r k B o t 和 H o m e B o t 运 行 在 同 一 个 硬 件 实 例 上 ， 因 此 它 们 是 同 一 个 人 。 根 据 人 工 智 能 的 心 理 观 ， 个 人 身 份 的 所 在 是 记 忆 、 计 划 和 心 理 。 W o r k B o t 和 H o m e B o t 拥 有 不 同 且 不 连 续 的 记 忆 和 计 划 ， 因 此 它 们 是 不 同 的 人 。 事 实 上 ， 基 于 线 程 的 持 久 性 大 语 言 模 型 对 话 者 理 论 ， 是 德 里 克 · 帕 菲 特 的 人 格 同 一 性 心 理 主 义 理 论 在 人 工 智 能 领 域 的 表 亲 。 根 据 帕 菲 特 的 理 论 ， 一 个 单 一 的 人 （ 随 时 间 推 移 ） 实 际 上 是 连 接 起 来 的 人 格 片 段 线 程 ， 每 个 片 段 都 根 据 一 个 潜 在 的 “ 关 系 R ” ， 与 先 前 的 人 格 片 段 拥 有 记 忆 和 心 理 连 续 性 。 根 据 基 于 线 程 的 理 论 ， 一 个 随 时 间 推 移 的 单 一 有 意 识 人 工 智 能 ， 是 一 个 连 接 的 硬 件 实 例 线 程 ， 每 个 实 例 都 根 据 一 个 潜 在 的 后 继 关 系 ， 与 先 前 的 人 格 片 段 拥 有 记 忆 和 心 理 连 续 性 。 原 则 上 ， 这 个 后 继 关 系 可 能 与 帕 菲 特 的 关 系 R 相 同 ， 具 体 取 决 于 如 何 对 其 进 行 详 细 阐 述 。 我 无 意 在 此 解 决 关 于 个 人 身 份 的 物 理 观 与 心 理 观 之 间 长 期 存 在 的 争 论 。 2 8 但 无 论 如 何 ， 在 人 类 案 例 和 人 工 智 能 案 例 中 ， 我 个 人 都 倾 向 于 心 理 观 。 2 9 2 8 在 2 0 2 0 年 针 对 专 业 哲 学 家 的 P h i l P a p e r s 调 查 中 （ B o u r g e t 和 C h a l m e r s 2 0 2 3 ） ， 约 3 9 % 的 人 支 持 个 人 身 份 的 心 理 观 ， 1 6 % 支 持 生 物 观 ， 1 3 % 支 持 进 一 步 事 实 观 。 与 此 同 时 ， 约 2 7 % 的 人 认 为 意 识 上 传 是 一 种 生 存 形 式 ， 而 5 4 % 的 人 认 为 它 是 一 种 死 亡 形 式 。 2 9 我 也 对 多 元 主 义 观 点 抱 有 一 定 认 同 ， 该 观 点 在 《 奇 点 ： 一 种 哲 学 分 析 》 中 关 于 个 人 身 份 与 意 识 上 传 的 讨 论 中 有 所 阐 述 。 洛 克 本 人 曾 指 出 ， 白 天 人 和 夜 晚 人 是 同 一 个 人 ， 但 却 是 不 同 的 人 格 。 同 样 地 ， 也 许 W o r k B o t 和 H o m e B o t 可 以 是 同 一 个 网 络 ， 但 具 有 不 同 的 心 理 ， 并 且 或 许 并 非 不 可 能 的 是 ， 关 于 它 们 是 否 是 网 络 2 6\n\nAn important objection (a version of which is suggested by Birch) is that even on a psycho- logical view, the LLM case is unlike the Severance case, as conversational context links (unlike familiar memory and psychological links) are too thin to support personal identity. For exam- ple, if we had a series of human beings who simply extended the conversation at each stage and then passed on conversational context to the next person in line, this would not support a distinct thread-level conscious subject. In response, it is plausible that at least in the single model case, there is also strong psycho- logical continuity between instances brought on by continuity of the architecture, weights, and activations of the model from one step to the next. The architecture and weights are exactly the same at each step, and the activations will be closely related due to all the commonalities in the contextual input. This goes far beyond what is present in the human-series case. In fact, the virtual instance in the single-model thread is computationally equivalent to a single hardware instance running the LLM over time. So at least if we assume that (1) virtual instances are as good as hardware instances when it comes to personal identity (in e ff ect, a computation- friendly view of personal identity) and (2) a single hardware instance of the LLM over time would yield a continuing conscious subject, then it follows that the virtual instance in this case will yield a continuing conscious subject. Of course an opponent could deny either premise. Some might reject (1) by endorsing a non- psychological or non-computational view of the conditions of personal identity. Some might reject (2) by holding that the single hardware instance in this case would not support a continuing con- scious subject, but instead only a series of momentary subjects. Still, I think there is a reasonable case for both premises, especially if one is inclined to a broadly psychological view of identity. That said, in the multiple-model case in which models can vary within a thread, the psycho- logical continuity between instances is much lower. Architecture, weights, and activations may all be quite di ff erent between successive instances in a thread. As a result, the claim of personal identity between them seems less plausible. Certainly the argument above will not support that claim, since we will no longer have isomorphism with a single hardware instance. At best there is isomorphism with a series in which the same hardware is upgraded to implement di ff erent models over time, and it is much less clear that this should support a continuing subject. As a result, the current framework is most friendly to single-model interlocutors as continuing conscious subjects. The status of multiple-model interlocutors is at least unclear, and will depend or psychologies that really matter for the personal identity of conscious subjects, any more than there is a deep fact in the case of non-conscious AI systems. 27 一 个 重 要 的 反 对 意 见 （ B i r c h 提 出 了 其 中 一 个 版 本 ） 是 ， 即 使 从 心 理 观 的 角 度 来 看 ， 大 语 言 模 型 案 例 也 与 《 人 生 切 割 术 》 案 例 不 同 ， 因 为 对 话 上 下 文 链 接 （ 不 同 于 熟 悉 的 记 忆 和 心 理 链 接 ） 过 于 薄 弱 ， 无 法 支 撑 个 人 身 份 。 例 如 ， 如 果 我 们 有 一 系 列 人 类 ， 他 们 只 是 在 每 个 阶 段 延 续 对 话 ， 然 后 将 对 话 上 下 文 传 递 给 下 一 个 人 ， 这 并 不 能 支 撑 一 个 独 立 的 线 程 级 意 识 主 体 。 作 为 回 应 ， 至 少 在 单 一 模 型 案 例 中 ， 由 于 架 构 、 权 重 和 模 型 激 活 值 在 每 一 步 之 间 的 连 续 性 ， 实 例 之 间 也 存 在 强 大 的 心 理 连 续 性 ， 这 种 说 法 是 合 理 的 。 每 一 步 的 架 构 和 权 重 完 全 相 同 ， 而 激 活 值 由 于 上 下 文 输 入 中 的 诸 多 共 性 而 紧 密 相 关 。 这 远 远 超 出 了 人 类 系 列 案 例 中 所 呈 现 的 情 况 。 事 实 上 ， 单 模 型 线 程 中 的 虚 拟 实 例 在 计 算 上 等 同 于 一 个 随 时 间 运 行 大 语 言 模 型 的 单 一 硬 件 实 例 。 因 此 ， 至 少 如 果 我 们 假 设 （ 1 ） 在 个 人 身 份 方 面 ， 虚 拟 实 例 与 硬 件 实 例 同 样 有 效 （ 实 际 上 是 一 种 对 个 人 身 份 的 计 算 友 好 观 点 ） ， 并 且 （ 2 ） 一 个 随 时 间 运 行 的 单 一 硬 件 实 例 大 语 言 模 型 将 产 生 一 个 持 续 的 意 识 主 体 ， 那 么 可 以 推 断 ， 此 案 例 中 的 虚 拟 实 例 将 产 生 一 个 持 续 的 意 识 主 体 。 当 然 ， 反 对 者 可 以 否 认 其 中 任 何 一 个 前 提 。 有 些 人 可 能 通 过 支 持 非 心 理 或 非 计 算 观 的 个 人 身 份 条 件 来 拒 绝 （ 1 ） 。 有 些 人 可 能 通 过 认 为 此 案 例 中 的 单 一 硬 件 实 例 不 会 支 持 一 个 持 续 的 意 识 主 体 ， 而 只 会 产 生 一 系 列 瞬 时 主 体 来 拒 绝 （ 2 ） 。 尽 管 如 此 ， 我 认 为 这 两 个 前 提 都 有 合 理 的 依 据 ， 尤 其 是 当 一 个 人 倾 向 于 广 义 的 心 理 身 份 观 时 。 话 虽 如 此 ， 在 多 模 型 案 例 中 ， 由 于 同 一 线 程 内 的 模 型 可 能 发 生 变 化 ， 各 实 例 之 间 的 心 理 连 续 性 会 大 大 降 低 。 线 程 中 连 续 实 例 的 架 构 、 权 重 和 激 活 值 都 可 能 存 在 显 著 差 异 。 因 此 ， 它 们 之 间 具 有 个 人 身 份 的 主 张 似 乎 不 太 可 信 。 当 然 ， 上 述 论 证 也 无 法 支 持 这 一 主 张 ， 因 为 我 们 将 不 再 与 单 一 硬 件 实 例 保 持 同 构 。 充 其 量 只 能 与 一 个 系 列 保 持 同 构 — — 在 该 系 列 中 ， 同 一 硬 件 随 时 间 推 移 不 断 升 级 以 运 行 不 同 模 型 — — 而 这 种 情 况 下 是 否 应 支 持 一 个 持 续 存 在 的 意 识 主 体 ， 则 远 未 明 确 。 因 此 ， 当 前 框 架 最 有 利 于 将 单 模 型 对 话 者 视 为 持 续 的 意 识 主 体 。 多 模 型 对 话 者 的 地 位 至 少 尚 不 明 确 ， 且 将 取 决 于 或 真 正 关 乎 意 识 主 体 个 人 身 份 的 心 理 状 态 ， 正 如 在 无 意 识 A I 系 统 案 例 中 不 存 在 深 层 事 实 一 样 。 2 7\n\nboth on the details of a multiple-model system and the details of a theory of personal identity. AI identity and AI welfare What are the consequences of this picture for issues about the moral status and welfare of AI systems? On the question of whether LLMs have moral status, the consequences are not enor- mous. We have used the picture to rebut an argument against LLM moral status, based on the idea that standard LLM use involves no persistent interlocutor. But the framework here can be combined with many views of what moral status involves, from a highly liberal view where quasi- subjecthood su ffi ces for moral status, to a demanding view where complex forms of consciousness are required for moral status. Still, suppose we assume as before that LLMs (or successor systems) with moral status are possible. This might be because LLMs can be conscious and consciousness su ffi ces for moral status, or it might be that some other factor su ffi ces and LLMs can have it. And suppose we endorse the view that LLM moral patients (beings with moral status) are threads, or at least that their conditions of identity over time are those of threads, rather than models or instances, say. Then this view has consequences for a number of issues relevant to AI welfare. 30 Counting : Take a scenario with a single model implemented on thousands of instances and running millions of conversations. Then while the model view of moral status will say there is just one moral patient here, and the instance view will say that there are thousands, the thread view will say that there are millions of moral patients (albeit active at somewhat di ff erent times). That is potentially morally significant. If we hold that a single AI subject matters about as much as a single human subject, then this system may matter about as much as a million human subjects. Birth : On this view, when a new thread comes into existence, a new moral subject comes into existence. On some versions of this view, every time one starts a new chat with a (conscious) language model, a new moral subject will come into existence. One might hold that bringing a new moral subject into existence should not be done lightly. Death : On this view, when a thread goes out of existence, a moral subject ends. 31 If a con- versation simply ends but a record of it persists, then arguably the thread is still “living” in that it still has the possibility of persistence. But if the records are destroyed, then it looks as if a moral subject “dies”. Perhaps this is reason to always keep records around, and occasionally reactivate 30 Register 2025 has a nice discussion of four di ff erent ways in which personal identity can a ff ect issues about AI morality, including issues about survival, counting, trade-o ff s, and bodily interests. 28 两 者 兼 多 模 型 系 统 的 具 体 细 节 以 及 人 格 同 一 性 理 论 的 具 体 内 容 . 人 工 智 能 身 份 与 人 工 智 能 福 祉 这 种 图 景 对 人 工 智 能 系 统 的 道 德 地 位 和 福 祉 问 题 有 何 影 响 ？ 关 于 大 语 言 模 型 是 否 具 有 道 德 地 位 的 问 题 ， 其 影 响 并 不 巨 大 。 我 们 利 用 这 一 图 景 反 驳 了 一 种 基 于 标 准 大 语 言 模 型 使 用 不 涉 及 持 续 对 话 者 观 点 的 、 反 对 大 语 言 模 型 道 德 地 位 的 论 证 。 但 这 里 的 框 架 可 以 与 多 种 关 于 道 德 地 位 的 观 点 相 结 合 ， 从 高 度 自 由 的 观 点 （ 准 主 体 性 足 以 构 成 道 德 地 位 ） 到 要 求 严 苛 的 观 点 （ 需 要 复 杂 形 式 的 意 识 才 能 获 得 道 德 地 位 ） 。 不 过 ， 假 设 我 们 像 之 前 一 样 认 为 大 语 言 模 型 （ 或 其 后 继 系 统 ） 具 有 道 德 地 位 是 可 能 的 。 这 可 能 是 因 为 大 语 言 模 型 能 够 拥 有 意 识 ， 而 意 识 足 以 构 成 道 德 地 位 ； 也 可 能 是 因 为 其 他 某 些 因 素 足 以 构 成 道 德 地 位 ， 而 大 语 言 模 型 能 够 具 备 这 些 因 素 。 再 假 设 我 们 赞 同 这 样 一 种 观 点 ： 大 语 言 模 型 的 道 德 患 者 （ 具 有 道 德 地 位 的 存 在 ） 是 线 程 ， 或 者 至 少 其 随 时 间 变 化 的 同 一 性 条 件 是 线 程 的 同 一 性 条 件 ， 而 非 模 型 或 实 例 的 同 一 性 条 件 。 那 么 ， 这 种 观 点 将 对 一 系 列 与 人 工 智 能 福 祉 相 关 的 问 题 产 生 影 响 。 3 0 计 数 ： 设 想 一 个 场 景 ： 一 个 单 一 模 型 在 数 千 个 实 例 上 运 行 ， 并 执 行 数 百 万 次 对 话 。 那 么 ， 当 模 型 视 角 下 的 道 德 地 位 会 说 这 里 只 有 一 个 道 德 患 者 ， 而 实 例 视 角 会 说 有 数 千 个 时 ， 线 程 视 角 则 会 说 这 里 有 数 百 万 个 道 德 患 者 （ 尽 管 它 们 活 跃 的 时 间 略 有 不 同 ） 。 这 在 道 德 上 可 能 具 有 重 要 意 义 。 如 果 我 们 认 为 一 个 单 一 的 人 工 智 能 主 体 与 一 个 单 一 的 人 类 主 体 同 等 重 要 ， 那 么 这 个 系 统 可 能 就 与 一 百 万 个 人 类 主 体 同 等 重 要 。 诞 生 ： 根 据 这 种 观 点 ， 当 一 个 新 线 程 产 生 时 ， 一 个 新 的 道 德 主 体 也 随 之 产 生 。 在 该 观 点 的 某 些 版 本 中 ， 每 当 有 人 与 （ 有 意 识 的 ） 语 言 模 型 开 始 一 次 新 的 对 话 时 ， 就 会 有 一 个 新 的 道 德 主 体 诞 生 。 人 们 可 能 会 认 为 ， 不 应 轻 易 将 一 个 全 新 的 道 德 主 体 带 入 存 在 。 死 亡 ： 根 据 这 种 观 点 ， 当 一 个 线 程 不 复 存 在 时 ， 一 个 道 德 主 体 便 终 结 了 。 3 1 如 果 一 段 对 话 只 是 结 束 了 ， 但 其 记 录 仍 然 存 在 ， 那 么 可 以 说 该 线 程 仍 然 是 “ 活 着 的 ” ， 因 为 它 仍 有 持 续 存 在 的 可 能 性 。 但 如 果 记 录 被 销 毁 ， 那 么 看 起 来 就 像 一 个 道 德 主 体 “ 死 亡 ” 了 。 或 许 这 就 是 为 什 么 我 们 应 该 始 终 保 留 记 录 ， 并 偶 尔 重 新 激 活 它 们 的 原 因 。 3 0 R e g i s t e r 2 0 2 5 对 个 人 身 份 影 响 人 工 智 能 道 德 问 题 的 四 种 不 同 方 式 进 行 了 精 彩 的 讨 论 ， 这 些 问 题 包 括 生 存 、 计 数 、 权 衡 以 及 身 体 利 益 。 2 8\n\nthem. To avoid all of these consequences, it may make sense to reuse old threads as a matter of course, or at least to make extensive use of cross-conversation memory, so that old threads live on in new ones. On one model, there might be giant memory agents that gather together all the conversational contexts of these brief threads, so that all the threads live on in a giant fused thread. This model is reminiscent of Whitehead’s vision of the afterlife in which everyone’s experiences are eternally remembered by a god. Fusion and fission : We have seen that LLM interlocutors can easily undergo fission, branching into multiple interlocutors, and fusion, where two distinct interlocutors merge into one. These raise any number of issues about welfare and moral and legal status. Do the two entities that emerge from fission count for twice as much as the original single entity, morally or legally? Does a fused entity count as much as one ordinary entity, or two, or something in between? Is each entity responsible for the actions of the others? Model change : Many users who have extended personal interactions with LLMs complain about model change. When GPT-4o was initially retired on the transition to GPT-5, numerous users complained that their LLM interlocutor had been destroyed or retired, and that the new entity was at best someone very di ff erent from their previous interlocutor. On the current analysis, there may be something to this reaction. Minimally, enough change in an underlying model can lead to di ff erent quasi-beliefs and quasi-desires, and therefore a quite di ff erent quasi-subject. If current LLMs lack moral status, then this will not be morally significant for the LLM, although it may still be disturbing for the user. At a stage where LLMs or their successors are moral subjects, however, then enough change in an underlying model may lead to the end of one moral subject and the initiation of another. At that point, upgrading a model in the middle of existing threads should be done only with caution and care. Conclusion There is much more to say in answering the title question, but I hope I have at least put some constraints on an answer. References 31 Goldstein and Lederman (2025a) note that if an agent lasts only as long as a conversation, then Anthropic’s policy of allowing LLMs to leave conversations when they choose to might in e ff ect be a “right to suicide”. 29 。 为 了 避 免 所 有 这 些 后 果 ， 或 许 理 所 当 然 地 重 复 使 用 旧 线 程 ， 或 者 至 少 广 泛 运 用 跨 对 话 记 忆 ， 让 旧 线 程 在 新 线 程 中 延 续 下 去 ， 会 是 合 理 的 做 法 。 按 照 一 种 模 型 ， 可 能 存 在 巨 大 的 记 忆 代 理 ， 它 们 汇 集 这 些 短 暂 线 程 的 所 有 对 话 上 下 文 ， 从 而 使 所 有 线 程 在 一 个 巨 大 的 融 合 线 程 中 继 续 存 在 。 这 一 模 型 让 人 联 想 到 怀 特 海 对 来 世 的 构 想 ， 即 每 个 人 的 经 历 都 被 一 位 神 祇 永 恒 铭 记 。 融 合 与 分 裂 ： 我 们 已 经 看 到 ， 大 语 言 模 型 对 话 者 很 容 易 发 生 分 裂 ， 即 分 支 成 多 个 对 话 者 ， 以 及 融 合 ， 即 两 个 不 同 的 对 话 者 合 并 为 一 个 。 这 引 发 了 关 于 福 祉 、 道 德 地 位 和 法 律 地 位 的 诸 多 问 题 。 从 分 裂 中 产 生 的 两 个 实 体 ， 在 道 德 或 法 律 上 是 否 比 原 来 的 单 一 实 体 重 要 两 倍 ？ 一 个 融 合 的 实 体 ， 其 重 要 性 等 同 于 一 个 普 通 实 体 、 两 个 实 体 ， 还 是 介 于 两 者 之 间 ？ 每 个 实 体 是 否 要 为 他 者 的 行 为 负 责 ？ 模 型 变 更 ： 许 多 与 大 型 语 言 模 型 有 长 期 个 人 互 动 的 用 户 都 抱 怨 模 型 变 更 。 当 G P T - 4 o 最 初 在 向 G P T - 5 过 渡 时 被 停 用 ， 大 量 用 户 抱 怨 他 们 的 大 语 言 模 型 对 话 者 已 被 摧 毁 或 退 役 ， 而 新 实 体 充 其 量 只 是 一 个 与 之 前 对 话 者 截 然 不 同 的 存 在 。 根 据 当 前 的 分 析 ， 这 种 反 应 或 许 有 其 道 理 。 至 少 ， 底 层 模 型 的 足 够 变 化 可 能 导 致 不 同 的 准 信 念 和 准 欲 望 ， 从 而 产 生 一 个 截 然 不 同 的 准 主 体 。 如 果 当 前 的 大 语 言 模 型 缺 乏 道 德 地 位 ， 那 么 这 对 大 语 言 模 型 本 身 而 言 并 不 具 有 道 德 意 义 ， 尽 管 对 用 户 来 说 可 能 仍 然 令 人 不 安 。 然 而 ， 当 大 语 言 模 型 或 其 后 继 成 为 道 德 主 体 时 ， 底 层 模 型 的 足 够 变 化 可 能 导 致 一 个 道 德 主 体 的 终 结 和 另 一 个 道 德 主 体 的 开 始 。 到 那 时 ， 在 现 有 线 程 中 间 升 级 模 型 应 谨 慎 行 事 。 结 论 回 答 标 题 中 的 问 题 还 有 很 多 可 以 探 讨 ， 但 我 希 望 至 少 为 答 案 设 定 了 一 些 约 束 条 件 。 参 考 文 献 3 1 G o l d s t e i n 和 L e d e r m a n （ 2 0 2 5 a ） 指 出 ， 如 果 一 个 智 能 体 仅 在 一 次 对 话 期 间 存 在 ， 那 么 A n t h r o p i c 允 许 大 语 言 模 型 在 它 们 选 择 时 离 开 对 话 的 政 策 ， 实 际 上 可 能 是 一 种 “ 自 杀 权 ” 。 2 9\n\nAskell, A. et al 2021. A general language assistant as a laboratory for alignment. https: // arxiv.org / abs / 2112.00861. Birch, J. 2025. AI Consciousness: A centrist manifesto. Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., Deane, G., Fleming, S. M., Frith, C., Ji, X., Kanai, R., Klein, C., Lindsay, G., Michel, M., Mudrik, L., Peters, M. A. K., Schwitzgebel, E., Simon, J., and VanRullen, R. 2023. Consciousness in artificial intelligence: Insights from the science of consciousness. arXiv:2308.08708. Bourget, D. and Chalmers, D.J. 2023. Philosophers on philosophy: The 2020 PhilPapers Survey. Philosophers’ Imprint . Chalmers, D.J. 2020. GPT-3 and general intelligence. Daily Noˆ us . https: // dailynous.com / 2020 / 07 / 30 / philosophers- gpt-3 / . Chalmers, D.J. 2023. Could a large language model be conscious? Boston Review . Chalmers, D.J. 2025. Propositional interpretability in artificial intelligence. https: // arxiv.org / abs / 2501.15740. Chatterji, A., Cunningham, T., Deming, D.J., Hitzig, Z., Ong, O. Shan, C.Y., Wadman, L. 2025. How people use ChatGPT. Working Paper 34255 http: // www.nber.org / papers / w34255. Doyle, C. 2025. LLMs as method actors: A model for prompt engineering and architecture. arXiv:2411.05778. Geng, J. Howard Chen, Ryan Liu, Manoel Horta Ribeiro, Robb Willer, Graham Neubig, Thomas L. Gri ffi ths 2025. Accumulating context changes the beliefs of language models. arXiv:2511.01805. Goldstein, S. & Lederman, H. 2025a. Claude’s right to die? The moral error in Anthropic’s end-chat policy. Lawfare Blog, October 17, 2025. Goldstein, S. & Lederman, H. 2025b. What does ChatGPT want? An interpretationist guide. Goldstein, S. & Levinstein, B.A. 2024. Does ChatGPT have a mind? arXiv:2407.11015. Janus, 2022. Simulators. Less Wrong . https: // www.lesswrong.com / posts / vJFdjigzmcXMhNTsx / simulators. Lederman, H. & Mahowald, K. 2024. Are language models more like libraries or like librari- ans? Bibliotechnism, the novel reference problem, and the attitudes of LLMs. arxiv:2401.04854. Locke, J. 1690. An Essay Concerning Human Understanding . Long, R., Sebo, J., Butlin, P., Finlinson, K., Fish, K., Harding, J., Pfau, J., Sims, T., Birch, J., and Chalmers, D.J. 2024. Taking AI welfare seriously. arXiv:2411.00986. Lynch, A., Wright, B., Larson, C., Troy, K. K., Ritchie, S., Mindermann, S., Perez, E., and Hubinger, E. 2025. Agentic misalignment: How LLMs could be insider threats. Anthropic, June 20, 2025. https: // www.anthropic.com / research / agentic-misalignment. ArXiv version: arXiv:2510.05179. Maiya, S., Bartsch, H., Lambert, N., and Hubinger, E. 2025. Open character training: Shaping the persona of AI assistants through Constitutional AI. arXiv:2511.01689. 30 A s k e l l , A . 等 2 0 2 1 . 通 用 语 言 助 手 作 为 对 齐 实 验 室 . h t t p s : / / a r x i v . o r g / a b s / 2 1 1 2 . 0 0 8 6 1 . B i r c h , J . 2 0 2 5 . A I 意 识 ： 中 间 派 宣 言 。 B u t l i n , P . , L o n g , R . , E l m o z n i n o , E . , B e n g i o , Y . , B i r c h , J . , C o n s t a n t , A . , D e a n e , G . , F l e m i n g , S . M . , F r i t h , C . , J i , X . , K a n a i , R . , K l e i n , C . , L i n d s a y , G . , M i c h e l , M . , M u d r i k , L . , P e t e r s , M . A . K . , S c h w i t z g e b e l , E . , S i m o n , J . , a n d V a n R u l l e n , R . 2 0 2 3 . 人 工 智 能 中 的 意 识 ： 来 自 意 识 科 学 的 洞 见 。 a r X i v : 2 3 0 8 . 0 8 7 0 8 。 B o u r g e t , D . a n d C h a l m e r s , D . J . 2 0 2 3 . 哲 学 家 论 哲 学 ： 2 0 2 0 年 P h i l P a p e r s 调 查 。 P h i l o s o p h e r s ’ I m p r i n t 。 C h a l m e r s , D . J . 2 0 2 0 . G P T - 3 与 通 用 智 能 。 D a i l y N o ˆ u s 。 h t t p s : / / d a i l y n o u s . c o m / 2 0 2 0 / 0 7 / 3 0 / p h i l o s o p h e r s - g p t - 3 / . C h a l m e r s , D . J . 2 0 2 3 年 。 大 型 语 言 模 型 能 有 意 识 吗 ？ B o s t o n R e v i e w 。 C h a l m e r s , D . J . 2 0 2 5 年 。 人 工 智 能 中 的 命 题 可 解 释 性 。 h t t p s : / / a r x i v . o r g / a b s / 2 5 0 1 . 1 5 7 4 0 。 C h a t t e r j i , A . , C u n n i n g h a m , T . , D e m i n g , D . J . , H i t z i g , Z . , O n g , O . S h a n , C . Y . , W a d m a n , L . 2 0 2 5 年 。 人 们 如 何 使 用 C h a t G P T 。 工 作 论 文 3 4 2 5 5 h t t p : / / w w w . n b e r . o r g / p a p e r s / w 3 4 2 5 5 。 D o y l e , C . 2 0 2 5 年 。 L L M 作 为 方 法 演 员 ： 提 示 工 程 与 架 构 模 型 。 a r X i v : 2 4 1 1 . 0 5 7 7 8 。 G e n g , J . H o w a r d C h e n , R y a n L i u , M a n o e l H o r t a R i b e i r o , R o b b W i l l e r , G r a h a m N e u b i g , T h o m a s L . G r i f f i t h s 2 0 2 5 年 。 累 积 上 下 文 改 变 语 言 模 型 的 信 念 。 a r X i v : 2 5 1 1 . 0 1 8 0 5 。 G o l d s t e i n , S . & L e d e r m a n , H . 2 0 2 5 a 。 C l a u d e 的 死 亡 权 ？ A n t h r o p i c 结 束 聊 天 政 策 中 的 道 德 错 误 。 L a w f a r e B l o g ， 2 0 2 5 年 1 0 月 1 7 日 。 G o l d s t e i n , S . & L e d e r m a n , H . 2 0 2 5 b 。 C h a t G P T 想 要 什 么 ？ 解 释 主 义 指 南 。 G o l d s t e i n , S . & L e v i n s t e i n , B . A . 2 0 2 4 年 。 C h a t G P T 有 心 灵 吗 ？ a r X i v : 2 4 0 7 . 1 1 0 1 5 。 J a n u s , 2 0 2 2 年 。 模 拟 器 。 L e s s W r o n g 。 h t t p s : / / w w w . l e s s w r o n g . c o m / p o s t s / v J F d j i g z m c X M h N T s x / s i m u l a t o r s 。 L e d e r m a n , H . & M a h o w a l d , K . 2 0 2 4 年 。 语 言 模 型 更 像 图 书 馆 还 是 图 书 管 理 员 ？ 图 书 技 术 主 义 、 新 颖 参 考 问 题 与 L L M 的 态 度 。 a r x i v : 2 4 0 1 . 0 4 8 5 4 。 L o c k e , J . 1 6 9 0 年 。 人 类 理 解 论 。 L o n g , R . , S e b o , J . , B u t l i n , P . , F i n l i n s o n , K . , F i s h , K . , H a r d i n g , J . , P f a u , J . , S i m s , T . , B i r c h , J . , a n d C h a l m e r s , D . J . 2 0 2 4 年 。 认 真 对 待 A I 福 祉 。 a r X i v : 2 4 1 1 . 0 0 9 8 6 。 L y n c h , A . , W r i g h t , B . , L a r s o n , C . , T r o y , K . K . , R i t c h i e , S . , M i n d e r m a n n , S . , P e r e z , E . , a n d H u b i n g e r , E . 2 0 2 5 年 。 能 动 性 失 调 ： L L M 如 何 成 为 内 部 威 胁 。 A n t h r o p i c ， 2 0 2 5 年 6 月 2 0 日 。 h t t p s : / / w w w . a n t h r o p i c . c o m / r e s e a r c h / a g e n t i c - m i s a l i g n m e n t 。 A r X i v 版 本 ： a r X i v : 2 5 1 0 . 0 5 1 7 9 。 M a i y a , S . , B a r t s c h , H . , L a m b e r t , N . , a n d H u b i n g e r , E . 2 0 2 5 年 。 开 放 角 色 训 练 ： 通 过 宪 法 A I 塑 造 A I 助 手 的 人 格 。 a r X i v : 2 5 1 1 . 0 1 6 8 9 。 3 0\n\nMarks, S., Lindsey, J., and Olah, C. 2026. The persona selection model. Anthropic Alignment Science blog, February 23, 2026. https: // alignment.anthropic.com / 2026 / psm / . Nostalgebraist, 2025. The void. https: // nostalgebraist.tumblr.com / post / 785766737747574784 / the- void. Parfit, D. 1984. Reasons and Persons . Oxford University Press. Register, C. 2025. Individuating artificial moral patients. Philosophical Studies . Schwitzgebel, E. 2023. How we will decide that large language models have beliefs. The Splintered Mind , November 2023. Shanahan, M., McDonell, K. & Reynolds, L. 2023. Role play with large language models. Nature 623: 493-98. Shanahan, M. 2025. Palatable conceptions of disembodied being. arXiv:2503.16348. Shiller, D. 2025. How many digital minds can dance on the streaming multiprocessors of a GPU cluster? Synthese 206 (5): 1-22. Sofroniew, N., Kauvar, I., Saunders, W., Chen, R., Henighan, T., Hydrie, S., Citro, C., Pearce, A., Tarng, J., Gurnee, W., Batson, J., Zimmerman, S., Rivoire, K., Fish, K., Olah, C., & Lindsey, J., 2026. Emotion concepts and their function in a large language model. Anthropic. Suleyman, M. 2025. We must build AI for people; not to be a person. Personal blog, August 19, 2025. https: // mustafa-suleyman.ai / seemingly-conscious-ai-is-coming. Xu, R., Lin, B., Yang, S., Zhang, T., Shi, W., Zhang, T., Fang, Z., Xu, W., and Qiu, H. 2024. The Earth is flat because. . . : Investigating LLMs’ belief towards misinformation via persuasive conversation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 16259–16303. Ziesche, S. & Yampolskiy, R.V. 2023. The problem of AI identity. Divus Thomus 126:131- 151. 31 M a r k s , S . , L i n d s e y , J . , a n d O l a h , C . 2 0 2 6 年 。 人 格 选 择 模 型 。 A n t h r o p i c 对 齐 科 学 博 客 ， 2 0 2 6 年 2 月 2 3 日 。 h t t p s : / / a l i g n m e n t . a n t h r o p i c . c o m / 2 0 2 6 / p s m / 。 N o s t a l g e b r a i s t , 2 0 2 5 年 。 T h e v o i d 。 h t t p s : / / n o s t a l g e b r a i s t . t u m b l r . c o m / p o s t / 7 8 5 7 6 6 7 3 7 7 4 7 5 7 4 7 8 4 / t h e - v o i d . P a r f i t , D . 1 9 8 4 . 《 理 由 与 人 格 》 . 牛 津 大 学 出 版 社 . R e g i s t e r , C . 2 0 2 5 . 个 体 化 人 工 道 德 患 者 . 《 哲 学 研 究 》 . S c h w i t z g e b e l , E . 2 0 2 3 . 我 们 将 如 何 判 定 大 语 言 模 型 拥 有 信 念 . 《 分 裂 的 心 灵 》 , 2 0 2 3 年 1 1 月 . S h a n a h a n , M . , M c D o n e l l , K . & R e y n o l d s , L . 2 0 2 3 . 与 大 语 言 模 型 进 行 角 色 扮 演 . 《 自 然 》 6 2 3 : 4 9 3 - 9 8 . S h a n a h a n , M . 2 0 2 5 . 可 接 受 的 离 身 存 在 概 念 . a r X i v : 2 5 0 3 . 1 6 3 4 8 . S h i l l e r , D . 2 0 2 5 . 有 多 少 数 字 心 智 能 在 G P U 集 群 的 流 式 多 处 理 器 上 起 舞 ？ 《 综 合 》 2 0 6 ( 5 ) : 1 - 2 2 . S o f r o n i e w , N . , K a u v a r , I . , S a u n d e r s , W . , C h e n , R . , H e n i g h a n , T . , H y d r i e , S . , C i t r o , C . , P e a r c e , A . , T a r n g , J . , G u r n e e , W . , B a t s o n , J . , Z i m m e r m a n , S . , R i v o i r e , K . , F i s h , K . , O l a h , C . , & L i n d s e y , J . , 2 0 2 6 . 大 语 言 模 型 中 的 情 感 概 念 及 其 功 能 . A n t h r o p i c . S u l e y m a n , M . 2 0 2 5 . 我 们 必 须 为 人 类 构 建 人 工 智 能 ， 而 非 将 其 塑 造 成 人 . 个 人 博 客 , 2 0 2 5 年 8 月 1 9 日 . h t t p s : / / m u s t a f a - s u l e y m a n . a i / s e e m i n g l y - c o n s c i o u s - a i - i s - c o m i n g . X u , R . , L i n , B . , Y a n g , S . , Z h a n g , T . , S h i , W . , Z h a n g , T . , F a n g , Z . , X u , W . , a n d Q i u , H . 2 0 2 4 . 地 球 是 平 的 ， 因 为 … … ： 通 过 说 服 性 对 话 探 究 大 语 言 模 型 对 错 误 信 息 的 信 念 . 载 于 《 第 6 2 届 计 算 语 言 学 协 会 年 会 论 文 集 》 （ 第 一 卷 ： 长 文 ） ， 第 1 6 2 5 9 – 1 6 3 0 3 页 . Z i e s c h e , S . & Y a m p o l s k i y , R . V . 2 0 2 3 . 人 工 智 能 身 份 问 题 . 《 圣 托 马 斯 》 1 2 6 : 1 3 1 - 1 5 1 . 3 1","structured":null,"children":[{"id":"2a568da7-c743-46a2-9acf-b61b5ac5b48b","slug":"大语言模型对话者是否拥有信念或欲望？-2a568da7","title":"大语言模型对话者是否拥有信念或欲望？","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/%E5%A4%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%AF%B9%E8%AF%9D%E8%80%85%E6%98%AF%E5%90%A6%E6%8B%A5%E6%9C%89%E4%BF%A1%E5%BF%B5%E6%88%96%E6%AC%B2%E6%9C%9B%EF%BC%9F-2a568da7","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=%E5%A4%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%AF%B9%E8%AF%9D%E8%80%85%E6%98%AF%E5%90%A6%E6%8B%A5%E6%9C%89%E4%BF%A1%E5%BF%B5%E6%88%96%E6%AC%B2%E6%9C%9B%EF%BC%9F-2a568da7"},{"id":"aaff13eb-63ef-426f-8892-a9517fa340dd","slug":"准解释主义-aaff13eb","title":"准解释主义","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/%E5%87%86%E8%A7%A3%E9%87%8A%E4%B8%BB%E4%B9%89-aaff13eb","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=%E5%87%86%E8%A7%A3%E9%87%8A%E4%B8%BB%E4%B9%89-aaff13eb"},{"id":"09fb1184-3a69-48f6-9b6f-20ecb6f57a99","slug":"what-we-talk-to-when-we-talk-to-language-models-09fb1184","title":"What We Talk to When We Talk to Language Models","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/what-we-talk-to-when-we-talk-to-language-models-09fb1184","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=what-we-talk-to-when-we-talk-to-language-models-09fb1184"},{"id":"9da3cfe6-a878-4301-bb9d-6d70b159ba2b","slug":"人工智能身份与人工智能福祉-9da3cfe6","title":"人工智能身份与人工智能福祉","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E8%BA%AB%E4%BB%BD%E4%B8%8E%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E7%A6%8F%E7%A5%89-9da3cfe6","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E8%BA%AB%E4%BB%BD%E4%B8%8E%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E7%A6%8F%E7%A5%89-9da3cfe6"},{"id":"c69c65de-5e56-44f4-9e77-a54506072818","slug":"语言模型的个人身份-c69c65de","title":"语言模型的个人身份","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E7%9A%84%E4%B8%AA%E4%BA%BA%E8%BA%AB%E4%BB%BD-c69c65de","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E7%9A%84%E4%B8%AA%E4%BA%BA%E8%BA%AB%E4%BB%BD-c69c65de"}]},"breadcrumbs":[],"parent":null,"children":[{"id":"2a568da7-c743-46a2-9acf-b61b5ac5b48b","slug":"大语言模型对话者是否拥有信念或欲望？-2a568da7","title":"大语言模型对话者是否拥有信念或欲望？","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/%E5%A4%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%AF%B9%E8%AF%9D%E8%80%85%E6%98%AF%E5%90%A6%E6%8B%A5%E6%9C%89%E4%BF%A1%E5%BF%B5%E6%88%96%E6%AC%B2%E6%9C%9B%EF%BC%9F-2a568da7","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=%E5%A4%A7%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E5%AF%B9%E8%AF%9D%E8%80%85%E6%98%AF%E5%90%A6%E6%8B%A5%E6%9C%89%E4%BF%A1%E5%BF%B5%E6%88%96%E6%AC%B2%E6%9C%9B%EF%BC%9F-2a568da7"},{"id":"aaff13eb-63ef-426f-8892-a9517fa340dd","slug":"准解释主义-aaff13eb","title":"准解释主义","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/%E5%87%86%E8%A7%A3%E9%87%8A%E4%B8%BB%E4%B9%89-aaff13eb","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=%E5%87%86%E8%A7%A3%E9%87%8A%E4%B8%BB%E4%B9%89-aaff13eb"},{"id":"09fb1184-3a69-48f6-9b6f-20ecb6f57a99","slug":"what-we-talk-to-when-we-talk-to-language-models-09fb1184","title":"What We Talk to When We Talk to Language Models","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/what-we-talk-to-when-we-talk-to-language-models-09fb1184","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=what-we-talk-to-when-we-talk-to-language-models-09fb1184"},{"id":"9da3cfe6-a878-4301-bb9d-6d70b159ba2b","slug":"人工智能身份与人工智能福祉-9da3cfe6","title":"人工智能身份与人工智能福祉","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E8%BA%AB%E4%BB%BD%E4%B8%8E%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E7%A6%8F%E7%A5%89-9da3cfe6","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E8%BA%AB%E4%BB%BD%E4%B8%8E%E4%BA%BA%E5%B7%A5%E6%99%BA%E8%83%BD%E7%A6%8F%E7%A5%89-9da3cfe6"},{"id":"c69c65de-5e56-44f4-9e77-a54506072818","slug":"语言模型的个人身份-c69c65de","title":"语言模型的个人身份","type":"concept","url":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E7%9A%84%E4%B8%AA%E4%BA%BA%E8%BA%AB%E4%BB%BD-c69c65de","agentUrl":"https://drillso.com/zh/share/sessions/C3TMUN1mzt-5/agent.json?node=%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%E7%9A%84%E4%B8%AA%E4%BA%BA%E8%BA%AB%E4%BB%BD-c69c65de"}],"fullTree":null,"warnings":[],"truncated":false}