Deprecate reflect synthesis, push reflection into recall + skills — Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.
Goal: Make memory_reflect return recall-shaped facts instead of calling Hindsight’s harmony synthesis model, and move synthesis into skill instructions, so the scaffolding-leak class (<|channel|>, to=functions, <|call|>) becomes impossible.
Architecture: services.memory.reflect() is re-pointed at the recall retrieval path and returns a facts payload plus a backward-compatible answer/confidence/citations envelope built from those facts (no LLM call). The harmony provider method is left in place but unused. Skills (5 claude-code recall skills + their shared recipe + 9 core variants) are reworded to call memory_recall and synthesize the result themselves. The python_sdk synthesize() keeps its shape but its answer becomes a facts digest. response_schema is accepted-but-ignored and documented deprecated.
Tech Stack: Python (FastAPI, fastmcp), pytest, Markdown skill files, Pydantic (python_sdk).
Environment: Every backend/ and python_sdk/ command MUST be prefixed with source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && (pre-commit hooks call kiwiskil; bare git/pytest fail). Run only explicitly-named test files (never tree-wide pytest — it has OOM-rebooted the machine).
Ground truth verified against current main (2026-06-18)
These facts were confirmed in code before writing; do not “fix” them away:
- MCP
memory_recall(backend/tools/memory.py:498-624) HAStags,tags_match,personal_only,team_id,team_ids,query_timestamp. It does NOT haveoccurred_after/occurred_before. Theclaude-codeskills correctly target this tool, so theirtags: ["memory_kind:decision"]usage STAYS. - Chat
recall(native ADK tool, different surface) hasoccurred_after/occurred_beforeand notags. Thechat/skill variants target THAT tool and are already recall-only and reflect-free — out of scope, do not touch. services.memory.reflect()lives atbackend/services/memory.py:701-784.services.memory.recall()(the retrieval helper to reuse) lives atbackend/services/memory.py:608-698and returnslist[dict].- MCP
memory_reflecttool:backend/tools/memory.py:626-708. - REST reflect routes:
backend/api/routes/memories.py:762-794(callsprovider.reflectdirectly — must change) andbackend/api/routes/sdk.py:198-213(callsservices.reflect). python_sdksynthesize():python_sdk/xysq/memory.py:108-125.SynthesizeResulttype:python_sdk/xysq/types.py:22-27— fields areanswer, query, confidence, sources, wiki_context_used. There is NOcitationsfield on SynthesizeResult — map fact ids intosources.- Skills with the harmony-fallback block to delete (6 files):
_shared/recall-recipe.md,recap|decisions|blockers|actionables|prep/claude-code/SKILL.md. - Skills saying “Use
memory_reflect”:recap,decisions,blockers(claude-code) + all 9core/*. (actionables,prepalready usememory_recall; their only reflect mention is the fallback block + a “fall back to untagged reflect” line.) - catalog versions now: actionables=4, blockers=3, decisions=3, prep=3, recap=3, core=1.
File structure
backend/ (one git repo)- Modify:
backend/services/memory.py:701-784—reflect()re-pointed at recall path. - Modify:
backend/tools/memory.py:626-708—memory_reflectdocstring. - Modify:
backend/api/routes/memories.py:762-794— REST/reflectcallsservices.reflect, notprovider.reflect. - Create:
backend/tests/skill/test_reflect_returns_facts.py— new behavior tests.
- Modify:
_shared/recall-recipe.md— delete harmony-fallback block. - Modify:
recap|decisions|blockers/claude-code/SKILL.md— swap reflect→recall in “How to recall” + delete fallback block + add mission line. - Modify:
actionables|prep/claude-code/SKILL.md— delete fallback block + drop “fall back to untagged reflect” line + add mission line. - Modify:
core/{claude-code,cursor,codex,chatgpt,claude-desktop,copilot-cli,gemini-cli,windsurf,generic}/SKILL.md— reword reflect guidance to “reflect returns facts.” - Modify:
catalog.json— bump touched skill versions.
- Modify:
python_sdk/xysq/memory.py:108-125—synthesize()docstring (deprecateresponse_schema); no behavior change needed (it still posts to the same route, which now returns the new envelope).
PHASE 1 — Backend (the real behavior change)
Task 1: Re-point services.memory.reflect() at the recall path
Files:
-
Modify:
backend/services/memory.py:701-784 -
Test:
backend/tests/skill/test_reflect_returns_facts.py(create) - Step 1: Write the failing test
backend/tests/skill/test_reflect_returns_facts.py:
- Step 2: Run test to verify it fails
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py -v
Expected: FAIL — current reflect() calls provider.reflect (trip-wire fires) and returns no facts key.
- Step 3: Rewrite
reflect()to use the recall path
reflect() at backend/services/memory.py:701-784 with this. Keep the signature identical (callers unchanged):
- Step 4: Run test to verify it passes
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py -v
Expected: PASS (both tests).
- Step 5: Confirm the old harmony-guard test still passes (provider untouched)
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_harmony_guard.py -v
Expected: PASS — we did not touch hindsight.py; the guard is now dead-but-correct.
- Step 6: Commit
Task 2: REST /reflect route must call services.reflect, not provider.reflect
The main REST route (memories.py:788) bypasses the service and calls provider.reflect directly — it would still hit the harmony model. Re-point it.
Files:
-
Modify:
backend/api/routes/memories.py:762-794 -
Test:
backend/tests/skill/test_reflect_returns_facts.py(add a route-level test) - Step 1: Write the failing test
backend/tests/skill/test_reflect_returns_facts.py:
- Step 2: Run test to verify it fails
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py::test_rest_reflect_route_uses_service -v
Expected: FAIL — memories.py does not import a module-level reflect (it calls provider.reflect).
- Step 3: Rewrite the route body
retain_memory in this same file (verified at api/routes/memories.py:657-688): build VaultContext directly, resolve team scope via resolve_team_vault, and create the session via get_session_store().find_or_create_api_key_session(...). Replace the route body at lines 762-794:
retain_memory exactly (same imports, same VaultContext + resolve_team_vault + find_or_create_api_key_session calls), so no new helper is invented and provider.reflect / the old team_bank(...) direct-bank path are dropped. The unused team_bank / get_team_store imports in this route, if now orphaned, should be left alone unless YOUR change made them unused — check with grep -n "team_bank\|get_team_store" api/routes/memories.py and only remove an import your edit orphaned.
- Step 4: Run the wiring test
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py -v
Expected: PASS.
- Step 5: Import-sanity the route module
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -c "import api.routes.memories"
Expected: no ImportError.
- Step 6: Commit
Task 3: Update the memory_reflect MCP tool docstring
No behavior change (it already calls services.reflect). Only the docstring lies now — it says “Hindsight resolves contradictions at synthesis time” and “present it to the user directly.”
Files:
-
Modify:
backend/tools/memory.py:635-677 - Step 1: Replace the docstring
""" markers) with:
- Step 2: Import-sanity
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -c "import tools.memory"
Expected: no error.
- Step 3: Commit
PHASE 2 — Skills (xysq-skills repo)
All skill edits are Markdown. After edits, bump the touched skill’sversionincatalog.json(and the<!-- version: N -->footer in each edited SKILL.md, where present). No test runner; verification is grep-based.
Task 4: Strip the harmony-fallback block from the shared recipe + 5 claude-code skills
Files:-
Modify:
xysq-skills/_shared/recall-recipe.md -
Modify:
xysq-skills/{recap,decisions,blockers,actionables,prep}/claude-code/SKILL.md - Step 1: Delete the fallback block from the master recipe
xysq-skills/_shared/recall-recipe.md, delete the entire section (heading + body):
- Step 2: Delete the same block from each of the 5 claude-code skills
recap/claude-code/SKILL.md, decisions/claude-code/SKILL.md,
blockers/claude-code/SKILL.md, actionables/claude-code/SKILL.md,
prep/claude-code/SKILL.md. Delete the same ## If reflect fails, fall back to recall section (heading through the “optimization on top of it.” line) from each.
- Step 3: Verify the block is gone everywhere
cd xysq-skills && grep -rln "If reflect fails" .
Expected: NO output (empty).
- Step 4: Commit
Task 5: Swap reflect→recall + add mission lines in recap/decisions/blockers
These three say “Usemcp__xysq__memory_reflect”. Point them at memory_recall and add a one-line mission. decisions shown in full; apply the analogous change to recap and blockers.
Files:
-
Modify:
xysq-skills/decisions/claude-code/SKILL.md -
Modify:
xysq-skills/recap/claude-code/SKILL.md -
Modify:
xysq-skills/blockers/claude-code/SKILL.md - Step 1: decisions — rewrite the “How to recall” section
xysq-skills/decisions/claude-code/SKILL.md, replace:
- Step 2: recap — rewrite the “How to recall” section
xysq-skills/recap/claude-code/SKILL.md, replace:
- Step 3: blockers — rewrite the “How to recall” section
xysq-skills/blockers/claude-code/SKILL.md, replace the block beginning ## How to recall\nUse \mcp__xysq__memory_reflect` with `tags: [“memory_kind:blocker”]` and `personal_only: true`. Read the current block first (grep -n -A12 ”## How to recall” xysq-skills/blockers/claude-code/SKILL.md`), then replace it with:
- Step 4: Verify no reflect calls remain in these three
cd xysq-skills && grep -rn "memory_reflect" recap/claude-code/SKILL.md decisions/claude-code/SKILL.md blockers/claude-code/SKILL.md
Expected: NO output.
- Step 5: Add mission lines to actionables + prep, drop their “untagged reflect” line
actionables and prep already call memory_recall. Two small edits each:
(a) add a ## Mission line just above ## How to recall; (b) change any “fall back to untagged reflect” wording to “untagged recall”.
For actionables/claude-code/SKILL.md, add above ## How to recall:
prep/claude-code/SKILL.md, add above ## How to recall:
reflect and change “untagged reflect” → “untagged recall” if present:
grep -n "reflect" xysq-skills/actionables/claude-code/SKILL.md xysq-skills/prep/claude-code/SKILL.md
- Step 6: Final reflect-free check across all claude-code recall skills
cd xysq-skills && grep -rn "reflect" {recap,decisions,blockers,actionables,prep}/claude-code/SKILL.md
Expected: NO output.
- Step 7: Bump versions + footers
catalog.json, increment version for recap, decisions, blockers, actionables, prep by 1 (→ recap 4, decisions 4, blockers 4, actionables 5, prep 4). Update each edited SKILL.md’s <!-- version: N --> footer to match.
- Step 8: Commit
Task 6: Reword reflect guidance in all 9 core variants
core/* actively instructs calling reflect and describes it as a synthesizer that “cites its sources.” Verified against current main: the variants split into two families that share near-identical reflect sentences (different line numbers):
- Tool-call family (8): claude-code, cursor, codex, claude-desktop, copilot-cli, gemini-cli, windsurf, generic — describe
memory_reflect(...)as a tool. - REST family (1): chatgpt — describes
/api/sdk/memory/reflectas an endpoint; uses barereflect, notmemory_reflect.
-
Modify:
xysq-skills/core/{claude-code,cursor,codex,claude-desktop,copilot-cli,gemini-cli,windsurf,generic}/SKILL.md(tool-call family) -
Modify:
xysq-skills/core/chatgpt/SKILL.md(REST family) - Step 1: Tool-call family — apply these exact string replacements in all 8 variants
core/cursor/SKILL.md:5):
FIND: Use reflect for synthesis questions, recall for raw fact retrieval.
REPLACE: Use memory_reflect to gather facts across memory then synthesize yourself; use memory_recall for raw fact retrieval.
Replacement 2 — the “Call ONLY when” paragraph:
FIND: Call `memory_reflect` ONLY when the user's question itself requires synthesis across memory - "what do I prefer about X", "summarise my stance on Y", "compare my past decisions on Z". Not as warmup.
REPLACE: Call `memory_reflect` when a question needs facts gathered across memory and you will synthesize the answer - "what do I prefer about X", "summarise my stance on Y", "compare my past decisions on Z". It returns facts (plus a convenience digest), not a finished answer. Not as warmup.
Replacement 3 — the reference-block heading:
FIND: ### memory_reflect - ask a question, get a direct answer
REPLACE: ### memory_reflect - gather facts to synthesize
Replacement 4 — the “already cites its sources” line:
FIND: Do NOT call memory_recall after memory_reflect for the same question - reflect already cites its sources.
REPLACE: Do NOT call memory_recall after memory_reflect for the same question - both return facts; memory_reflect adds a digest plus observation-resolution, memory_recall gives raw history.
Replacement 5 — the recall-section cross-reference:
FIND: For "what's true now / what does the user prefer" - call memory_reflectinstead, or passtypes=[“observation”] to recall to get conflict-resolved facts.
REPLACE: For "what's true now / what does the user prefer" - call memory_reflect(it returns observation-resolved facts to synthesize), or passtypes=[“observation”] to recall directly.
Replacement 6 — the get_document section reference:
FIND: Use after memory_reflectreturns citations and the user asks for the source. Returns the document'soriginal_text + tags + metadata.
REPLACE: Use after memory_reflectreturns citations (each carries a document_id) and the user asks for the source. Returns the document'soriginal_text + tags + metadata.
Replacement 7 — the “what do you remember” troubleshooting line (clarify reflect is not a synthesizer):
FIND: **User asks "what do you remember about X?":** Use `memory_recall(query="X", budget="mid")` and present the results. Do NOT use `memory_reflect` here - the user wants the raw list, not a synthesis.
REPLACE: **User asks "what do you remember about X?":** Use `memory_recall(query="X", budget="mid")` and present the results. Either tool returns facts; recall is the direct path for a raw list.
NOTE on the memory_reflect( code-block (e.g. claude-code:193, cursor:146): read the 8-15 lines under the ### memory_reflect heading in each variant and ensure no prose there says the result is a finished answer / “present directly” / “complete response.” If such a sentence exists, replace it with: Read the returned facts and synthesize the answer yourself; the answer field is a convenience digest of those facts. Locate per variant with grep -n -A15 "### memory_reflect" core/<variant>/SKILL.md.
- Step 2: REST family — apply these exact replacements in
core/chatgpt/SKILL.md
reflect):
FIND: Call reflect ONLY when the user's question itself requires synthesis (e.g. "what do I prefer about X", "summarise my stance on Y"). Not as warmup.
REPLACE: Call reflect when a question needs facts gathered across memory and you will synthesize the answer (e.g. "what do I prefer about X", "summarise my stance on Y") - it returns facts plus a digest, not a finished answer. Not as warmup.
Replacement C2 — the capability table row:
FIND: | Get synthesised answer | POST | /api/sdk/memory/reflect|POST /api/memories/reflect |
REPLACE: | Gather facts to synthesize | POST | /api/sdk/memory/reflect|POST /api/memories/reflect |
Replacement C3 — the recall cross-reference:
FIND: For "what does the user prefer / what's true now" - use /reflect(it resolves contradictions). Passtypes=[“observation”]to recall for conflict-resolved facts only; omittypes for raw history.
REPLACE: For "what does the user prefer / what's true now" - use /reflect(it returns observation-resolved facts to synthesize). Passtypes=[“observation”]to recall for conflict-resolved facts only; omittypes for raw history.
Replacement C4 — the “what do you remember” line:
FIND: **User asks "what do you remember about X?":** Call recall and return the raw list. Do NOT use reflect here - the user wants source material, not a synthesis.
REPLACE: **User asks "what do you remember about X?":** Call recall and return the raw list - it is the direct path for source material.
(The chatgpt ### POST /api/sdk/memory/reflect section body, line ~107: read it with grep -n -A12 "### POST /api/sdk/memory/reflect" core/chatgpt/SKILL.md and, if it describes the response as a finished answer, change that sentence to “returns facts plus a digest for you to synthesize.”)
- Step 3: Verify no variant still claims reflect produces a finished synthesis
cd xysq-skills && grep -rn -i "ask a question, get a direct answer\|already cites its sources\|get synthesised answer\|requires synthesis across memory" core/*/SKILL.md
Expected: NO output (all four stale phrasings gone).
Then: cd xysq-skills && grep -rln "memory_reflect\|reflect" core/*/SKILL.md
Expected: variants still mention reflect (the tool exists) — confirm by eyeballing 2-3 that remaining mentions are the KEPT lines (Organise, session-start warning, confidence=low) plus the reworded ones, none claiming a finished answer.
- Step 4: Bump core version
catalog.json, bump core version 1 → 2. Update any <!-- version: N --> footers present in the core variants to match.
- Step 5: Commit
PHASE 3 — python_sdk
Task 7: Document synthesize() deprecation of response_schema; verify shape
No behavior change is required: synthesize() POSTs to /api/sdk/memory/reflect, which now returns {answer, confidence, citations, facts}. SynthesizeResult validates {answer, query, confidence, sources, wiki_context_used} and ignores extra keys by default, so answer (the digest) + confidence still populate. Confirm, and update the docstring.
Files:
-
Modify:
python_sdk/xysq/memory.py:108-125 -
Test:
python_sdk/tests/(add a parse test if a test dir exists; otherwise inline assert) - Step 1: Confirm SynthesizeResult tolerates the new envelope
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd python_sdk && python -c " from xysq.types import SynthesizeResult r = SynthesizeResult.model_validate({'answer':'- a\n- b','confidence':'medium','citations':[{'id':'f1'}],'facts':[{'id':'f1'}]}) print('answer:', repr(r.answer)); print('confidence:', r.confidence) assert r.answer.startswith('- a') and r.confidence == 'medium' print('OK') "
Expected: prints OK. (If it raises on extra keys, the model has extra='forbid' — in that case add model_config = ConfigDict(extra='ignore') to SynthesizeResult and re-run.)
- Step 2: Update the
synthesize()docstring
python_sdk/xysq/memory.py, replace the synthesize() docstring line """Synthesise an answer from memories.""" with:
- Step 3: Commit
PHASE 4 — Verification
Task 8: End-to-end grounding checks
- Step 1: Backend tests for the changed paths only
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py tests/skill/test_reflect_harmony_guard.py -v
Expected: all PASS.
- Step 2: Grep proof — no skill teaches reflect-as-synthesis
cd xysq-skills && grep -rln "If reflect fails" . ; echo "---" ; grep -rn "memory_reflect" {recap,decisions,blockers,actionables,prep}/claude-code/SKILL.md
Expected: both empty.
- Step 3: Grep proof — provider.reflect is no longer in any request path
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && grep -rn "provider.reflect\|\.reflect(" services/memory.py api/routes/memories.py api/routes/sdk.py tools/memory.py
Expected: only the definition/dead-code in hindsight.py is untouched; NO hit in services/routes/tools calls provider.reflect. (services/memory.py now calls recall, routes call services.reflect.)
- Step 4: Push branches (ask user first)
xysq-skills is ahead 7 + these commits; backend and python_sdk have new commits; docs has the spec + plan. Confirm branch strategy with the user (these are on main locally — they may want feature branches).
Self-review notes
- Spec coverage: reflect→facts (Task 1), REST route fix (Task 2, a gap the spec implied via “no harmony in path” — the main REST route bypassed the service), docstring (Task 3), shared recipe + 5 skills + mission/scope (Tasks 4-5), 9 core variants (Task 6), SDK contract via facts digest in
answer(Task 7), response_schema accept-but-ignore (Tasks 1, 3, 7). All success criteria mapped. - Spec correction captured: the
claude-codeskills legitimately keeptags(MCPmemory_recallHAS atagsparam — verified). Thechat/variants are a separate tool surface, already recall-only — explicitly out of scope. - No provider deletion this pass (matches the locked decision).
- SDK reality:
SynthesizeResulthassources, notcitations— Task 7 relies on extra-key tolerance rather than inventing a field.