Deprecate reflect’s synthesis, push reflection into recall + skills
Date: 2026-06-17 Status: Design approved, ready for implementation plan Repos touched:backend/, xysq-skills/, python_sdk/
Problem
memory_reflect calls Hindsight’s gpt-oss harmony model to synthesize prose
answers. That model intermittently leaks raw scaffolding tokens
(<|channel|>, to=functions, <|call|>, or a bare {"answer": ...} JSON
blob) instead of prose. Observed live on 2026-06-17 (the decisions skill
returned scaffolding).
There is a partial guard already: HindsightProvider.reflect() in
backend/memory/hindsight.py detects poisoned text via
_is_poisoned_reflect_text() and raises, so the service returns
{"status": "error"} and skills fall back to recall. But the guard does not
catch every leak mode, and it is a patch over a deeper issue: we are running a
synthesis model in the request path whose only consumers are themselves LLMs
that re-articulate the result.
Audit finding (decisive)
Every call site that consumesreflect is an LLM that re-synthesizes the
result. There are zero direct-render consumers - no UI component, REST
client, dashboard card, cron job, or CLI displays a reflect answer to a human
without an LLM in the loop. Consumers found:
- MCP tool
memory_reflect-> calling agent (Claude Code, native chat, skills) - REST
/memories/reflectand/api/sdk/memory/reflect-> LLM/SDK callers python_sdkmemory.synthesize()-> embedded into an LLM system prompt bySynthesizeStrategy/BothStrategybackend/chat/wiring.pyreflect()-> fire-and-forget, result discarded (unused)
Principle
The MCP layer returns facts; the downstream LLM does the synthesis. We stop running the harmony model in the request path. The “reflection” quality moves into skill and docstring instructions that the calling agent applies, ported from Hindsight’s own reflect recipe.Design
1. Backend: re-point reflect at the recall path
services.memory.reflect()stops callingprovider.reflect()(the harmony model). It calls the same retrieval path recall uses and returns facts in a recall-shaped payload.provider.reflect()and the_is_poisoned_reflect_text()guard inhindsight.pyare left in place but unused by the tool. Mark deprecated in a comment. Deleting the provider method is a separate later pass - it is still referenced bybackend/evals/and the unusedchat/wiring.pyreflect_background. (Decision: re-point, keep provider.)- The
memory_reflectMCP tool keeps its name and signature so no caller breaks. Docstring changes from “Hindsight synthesizes; you consume prose” to “returns facts; synthesize them yourself using your skill’s recipe.”
2. Skills: port Hindsight’s synthesis recipe
Hindsight’s reflect is an agentic loop with four portable ingredients (hindsight-docs developer/reflect.md). Each becomes a skill instruction the
downstream agent already knows how to follow:
| Hindsight ingredient | Where it lands in our skills |
|---|---|
| Hierarchical retrieval (mental models -> observations -> raw facts) | The skill’s recall call chooses types=["observation"] for “what’s true now”, raw episodic for history. Made explicit per skill. |
| Mission / identity (per use-case prose) | Each skill gets a one-paragraph mission line: who is reasoning, what they care about. |
| Disposition (skepticism/literalism/empathy 1-5) | Baked into the mission as one or two adjectives - no numeric trait system. blockers/decisions lean skeptical+literal+direct; recap/prep lean neutral. |
| Grounding + validated citation | Shared rule in _shared/recall-recipe.md: synthesize ONLY from returned facts; cite the id/document_id of facts actually used; if recall returns nothing, say so - never invent. |
recap, decisions, actionables, blockers, prep) and
_shared/recall-recipe.md are updated. The obsolete “if reflect fails, fall
back to recall” contract in recall-recipe.md is removed - there is no
harmony failure mode left, because reflect is now recall.
Example - decisions:
Mission: You are surfacing decisions the user made. Be precise and literal: report the choice and its stated rationale, nothing inferred. If a decision has no clear rationale in the facts, present it as “rationale unknown” rather than inventing one. Scope: (see 3a) Retrieve:memory_recallwithtags: ["memory_kind:decision"],types: ["observation"](so superseded decisions resolve to the current one). Synthesize: [shared grounding + citation rules] Format: [existing decisions format]
3a. Scope: parameter, not a hardcoded default
Skills currently hardcodepersonal_only: true. Since reflect becomes
recall-backed, it inherits recall’s fan-out (personal + recall-enabled teams) -
strictly more capable than the old single-vault reflect. The recipe changes
from a constant to an explicit scope step:
Scope: Default to the user’s personal vault. If the request names a team (or the active context is a team), pass that team’s scope instead ofThepersonal_only. If genuinely ambiguous and permitted, recall across personal
- recall-enabled teams and label each fact by its
source(personal/team:<id>) in the output.
source field already returns on every recall fact, so the agent can
attribute team vs personal. Constraint: this must respect existing scope
guards - native chat stays personal-only, and the org iron wall hard-blocks
cross-scope at the backend. The recipe instructs; the backend authorizes.
Recipe wording: “if permitted.”
3b. python_sdk: facts digest in answer
synthesize() returns recall facts joined into a readable block in the
SynthesizeResult.answer field, citations from fact ids, confidence
derived from result count/score. No new LLM call in the SDK. Every
synthesize() consumer is itself an LLM that re-articulates, so a facts digest
is exactly as useful as the old prose and cannot leak scaffolding.
SynthesizeStrategy and BothStrategy continue to embed answer into the
system prompt - now grounded facts instead of harmony prose. (Decision: facts
digest, not client-side synthesis, not a breaking deprecation.)
3c. response_schema: accept-but-ignore, deprecate
Keep theresponse_schema param on the tool/route so no caller breaks. Stop
producing server-side structured_output; the downstream agent builds the
structure itself (it conforms to a schema better than the harmony model did).
Document the param as deprecated; remove next version.
What we explicitly do NOT do
- Do not delete
provider.reflect()or the harmony guard this pass (later cleanup). - Do not add numeric disposition traits to skills - adjectives in the mission only.
- Do not make a new LLM call inside the SDK.
- Do not break the SDK
synthesize()shape or theresponse_schemasignature. - Do not let the skill recipe override backend scope authorization.
Success criteria
memory_reflect(MCP),/memories/reflect,/api/sdk/memory/reflect, andpython_sdksynthesize()all return without ever invoking the harmony model - verified by no call toprovider.reflect()in the request path.- The scaffolding-leak class (
<|channel|>,to=functions,<|call|>, bare{"answer"}) is impossible from these paths because no harmony prose is generated. A test asserts reflect output contains only fact-shaped data. - The 5 skills synthesize grounded, cited answers from recall facts; the
“fall back to recall” contract is gone from
recall-recipe.md. - Skills default to personal scope but can target a team vault when named and permitted; backend scope guards still enforce.
- No existing caller breaks: tool/route signatures and
SynthesizeResultshape are unchanged.