Skip to main content

Deprecate reflect synthesis, push reflection into recall + skills — Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.
Goal: Make memory_reflect return recall-shaped facts instead of calling Hindsight’s harmony synthesis model, and move synthesis into skill instructions, so the scaffolding-leak class (<|channel|>, to=functions, <|call|>) becomes impossible. Architecture: services.memory.reflect() is re-pointed at the recall retrieval path and returns a facts payload plus a backward-compatible answer/confidence/citations envelope built from those facts (no LLM call). The harmony provider method is left in place but unused. Skills (5 claude-code recall skills + their shared recipe + 9 core variants) are reworded to call memory_recall and synthesize the result themselves. The python_sdk synthesize() keeps its shape but its answer becomes a facts digest. response_schema is accepted-but-ignored and documented deprecated. Tech Stack: Python (FastAPI, fastmcp), pytest, Markdown skill files, Pydantic (python_sdk). Environment: Every backend/ and python_sdk/ command MUST be prefixed with source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && (pre-commit hooks call kiwiskil; bare git/pytest fail). Run only explicitly-named test files (never tree-wide pytest — it has OOM-rebooted the machine).

Ground truth verified against current main (2026-06-18)

These facts were confirmed in code before writing; do not “fix” them away:
  • MCP memory_recall (backend/tools/memory.py:498-624) HAS tags, tags_match, personal_only, team_id, team_ids, query_timestamp. It does NOT have occurred_after/occurred_before. The claude-code skills correctly target this tool, so their tags: ["memory_kind:decision"] usage STAYS.
  • Chat recall (native ADK tool, different surface) has occurred_after/occurred_before and no tags. The chat/ skill variants target THAT tool and are already recall-only and reflect-free — out of scope, do not touch.
  • services.memory.reflect() lives at backend/services/memory.py:701-784.
  • services.memory.recall() (the retrieval helper to reuse) lives at backend/services/memory.py:608-698 and returns list[dict].
  • MCP memory_reflect tool: backend/tools/memory.py:626-708.
  • REST reflect routes: backend/api/routes/memories.py:762-794 (calls provider.reflect directly — must change) and backend/api/routes/sdk.py:198-213 (calls services.reflect).
  • python_sdk synthesize(): python_sdk/xysq/memory.py:108-125. SynthesizeResult type: python_sdk/xysq/types.py:22-27 — fields are answer, query, confidence, sources, wiki_context_used. There is NO citations field on SynthesizeResult — map fact ids into sources.
  • Skills with the harmony-fallback block to delete (6 files): _shared/recall-recipe.md, recap|decisions|blockers|actionables|prep/claude-code/SKILL.md.
  • Skills saying “Use memory_reflect”: recap, decisions, blockers (claude-code) + all 9 core/*. (actionables, prep already use memory_recall; their only reflect mention is the fallback block + a “fall back to untagged reflect” line.)
  • catalog versions now: actionables=4, blockers=3, decisions=3, prep=3, recap=3, core=1.

File structure

backend/ (one git repo)
  • Modify: backend/services/memory.py:701-784reflect() re-pointed at recall path.
  • Modify: backend/tools/memory.py:626-708memory_reflect docstring.
  • Modify: backend/api/routes/memories.py:762-794 — REST /reflect calls services.reflect, not provider.reflect.
  • Create: backend/tests/skill/test_reflect_returns_facts.py — new behavior tests.
xysq-skills/ (one git repo, currently ahead 7 unpushed)
  • Modify: _shared/recall-recipe.md — delete harmony-fallback block.
  • Modify: recap|decisions|blockers/claude-code/SKILL.md — swap reflect→recall in “How to recall” + delete fallback block + add mission line.
  • Modify: actionables|prep/claude-code/SKILL.md — delete fallback block + drop “fall back to untagged reflect” line + add mission line.
  • Modify: core/{claude-code,cursor,codex,chatgpt,claude-desktop,copilot-cli,gemini-cli,windsurf,generic}/SKILL.md — reword reflect guidance to “reflect returns facts.”
  • Modify: catalog.json — bump touched skill versions.
python_sdk/ (one git repo)
  • Modify: python_sdk/xysq/memory.py:108-125synthesize() docstring (deprecate response_schema); no behavior change needed (it still posts to the same route, which now returns the new envelope).

PHASE 1 — Backend (the real behavior change)

Task 1: Re-point services.memory.reflect() at the recall path

Files:
  • Modify: backend/services/memory.py:701-784
  • Test: backend/tests/skill/test_reflect_returns_facts.py (create)
  • Step 1: Write the failing test
Create backend/tests/skill/test_reflect_returns_facts.py:
"""reflect() must return recall facts in a back-compatible envelope, never
call the harmony synthesis model, and never emit scaffolding tokens."""
import asyncio
import types

import pytest

import services.memory as memsvc


class _Session:
    device_code = "dev-1"
    agent_name = "test-agent"


class _VaultCtx:
    bank_id = "bank-1"
    actor_id = "user-1"
    team_id = None


@pytest.fixture
def stub_recall(monkeypatch):
    """Stub services.memory.recall to return two fact rows; fail if
    provider.reflect (the harmony model) is ever called."""
    facts = [
        {"id": "f1", "text": "Picked Kafka for the event store.",
         "type": "experience", "occurred_at": "2026-06-01T00:00:00Z",
         "document_id": "d1", "source": "personal", "score": 0.9},
        {"id": "f2", "text": "Earlier chose Postgres.",
         "type": "experience", "occurred_at": "2026-05-01T00:00:00Z",
         "document_id": "d2", "source": "personal", "score": 0.7},
    ]

    async def fake_recall(vault_ctx, session, **kwargs):
        return facts

    monkeypatch.setattr(memsvc, "recall", fake_recall)

    # Trip-wire: if reflect ever reaches the provider, fail loudly.
    def _boom(*a, **k):
        raise AssertionError("provider.reflect must not be called")

    prov = memsvc.get_memory_provider()
    monkeypatch.setattr(prov, "reflect", _boom, raising=False)
    # background loggers are fire-and-forget; stub to no-op
    monkeypatch.setattr(memsvc, "get_dashboard_store", lambda: types.SimpleNamespace(log_agent_event=lambda **k: None))
    monkeypatch.setattr(memsvc, "get_session_store", lambda: types.SimpleNamespace(update_last_active=lambda *a, **k: None))
    return facts


def test_reflect_returns_facts_envelope(stub_recall):
    out = asyncio.run(memsvc.reflect(
        vault_ctx=_VaultCtx(), session=_Session(), query="event store decision",
    ))
    # Back-compat envelope present
    assert set(out) >= {"answer", "confidence", "citations", "facts"}
    # facts are the raw recall rows
    assert [f["id"] for f in out["facts"]] == ["f1", "f2"]
    # citations carry fact ids
    assert {c["id"] for c in out["citations"]} == {"f1", "f2"}
    # answer is a plain-text digest, never scaffolding
    for marker in ("<|channel|>", "to=functions", "<|call|>", "<|start|>"):
        assert marker not in out["answer"]
    # confidence derived from count (2 facts -> medium)
    assert out["confidence"] == "medium"


def test_reflect_empty_recall_low_confidence(stub_recall, monkeypatch):
    async def empty_recall(vault_ctx, session, **kwargs):
        return []
    monkeypatch.setattr(memsvc, "recall", empty_recall)
    out = asyncio.run(memsvc.reflect(
        vault_ctx=_VaultCtx(), session=_Session(), query="nothing here",
    ))
    assert out["facts"] == []
    assert out["citations"] == []
    assert out["confidence"] == "low"
    assert out["answer"] == ""
  • Step 2: Run test to verify it fails
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py -v Expected: FAIL — current reflect() calls provider.reflect (trip-wire fires) and returns no facts key.
  • Step 3: Rewrite reflect() to use the recall path
Replace the body of reflect() at backend/services/memory.py:701-784 with this. Keep the signature identical (callers unchanged):
async def reflect(
    vault_ctx: VaultContext,
    session: DeviceSession,
    *,
    query: str,
    budget: str = "mid",
    response_schema: dict[str, Any] | None = None,
    tags: list[str] | None = None,
) -> dict[str, Any]:
    """
    Return memory facts for the caller to synthesize, in a back-compatible
    envelope. As of the reflect-to-recall change (2026-06-18) this NO LONGER
    calls the provider's harmony synthesis model — that model intermittently
    leaked raw scaffolding tokens, and every consumer is an LLM that
    re-articulates anyway. We retrieve via the same path as ``recall`` and
    wrap the facts so existing ``answer``/``confidence``/``citations`` readers
    keep working.

    ``response_schema`` is accepted but ignored (deprecated): the downstream
    agent produces structured output itself.
    """
    # reflect is a READ — allowed even if write quota is reached.
    facts = await recall(
        vault_ctx=vault_ctx,
        session=session,
        query=query,
        budget=budget,
        tags=tags,
        # "what is true now" lean: reflect historically resolved contradictions,
        # so prefer conflict-resolved observations, same as the old behavior.
        types=["observation"],
    )

    # recall() may return an error dict on failure; pass it straight through.
    if isinstance(facts, dict):
        return facts

    citations: list[dict[str, Any]] = []
    digest_lines: list[str] = []
    for f in facts:
        fid = f.get("id")
        if fid:
            citations.append({
                "id":             fid,
                "type":           f.get("type"),
                "context":        f.get("context"),
                "occurred_start": f.get("occurred_at"),
                "occurred_end":   f.get("occurred_at"),
                "document_id":    f.get("document_id"),
            })
        text = (f.get("text") or "").strip()
        if text:
            digest_lines.append(f"- {text}")

    n = len(citations)
    confidence = "high" if n >= 5 else ("medium" if n >= 2 else "low")

    return {
        "answer": "\n".join(digest_lines),
        "confidence": confidence,
        "citations": citations,
        "facts": facts,
    }
  • Step 4: Run test to verify it passes
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py -v Expected: PASS (both tests).
  • Step 5: Confirm the old harmony-guard test still passes (provider untouched)
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_harmony_guard.py -v Expected: PASS — we did not touch hindsight.py; the guard is now dead-but-correct.
  • Step 6: Commit
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && \
git add services/memory.py tests/skill/test_reflect_returns_facts.py && \
git commit -m "feat: reflect() returns recall facts, not harmony synthesis

Re-point services.memory.reflect at the recall retrieval path and wrap the
facts in a back-compatible answer/confidence/citations envelope plus a new
facts[] field. No harmony model in the path -> no scaffolding leak.
provider.reflect + the harmony guard are left in place, now unused.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"

Task 2: REST /reflect route must call services.reflect, not provider.reflect

The main REST route (memories.py:788) bypasses the service and calls provider.reflect directly — it would still hit the harmony model. Re-point it. Files:
  • Modify: backend/api/routes/memories.py:762-794
  • Test: backend/tests/skill/test_reflect_returns_facts.py (add a route-level test)
  • Step 1: Write the failing test
Append to backend/tests/skill/test_reflect_returns_facts.py:
def test_rest_reflect_route_uses_service(monkeypatch):
    """The /reflect REST route must delegate to services.reflect (facts path),
    not provider.reflect (harmony)."""
    import api.routes.memories as memroute

    called = {}

    async def fake_service_reflect(**kwargs):
        called["service"] = True
        return {"answer": "", "confidence": "low", "citations": [], "facts": []}

    def boom_provider(*a, **k):
        raise AssertionError("route must not call provider.reflect")

    monkeypatch.setattr(memroute, "reflect", fake_service_reflect, raising=False)
    prov = memsvc.get_memory_provider()
    monkeypatch.setattr(prov, "reflect", boom_provider, raising=False)
    # the test only asserts wiring; see Step 3 for the import the route needs.
    assert hasattr(memroute, "reflect")
  • Step 2: Run test to verify it fails
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py::test_rest_reflect_route_uses_service -v Expected: FAIL — memories.py does not import a module-level reflect (it calls provider.reflect).
  • Step 3: Rewrite the route body
Reuse the EXACT vault/session pattern already used by retain_memory in this same file (verified at api/routes/memories.py:657-688): build VaultContext directly, resolve team scope via resolve_team_vault, and create the session via get_session_store().find_or_create_api_key_session(...). Replace the route body at lines 762-794:
@router.post("/reflect")
async def reflect_memories(
    body: ReflectRequest,
    user: UserClaims = Depends(require_auth0_user),
) -> dict[str, Any]:
    """
    Return grounded memory facts for the caller to synthesize (or a team vault).

    Equivalent to the MCP ``memory_reflect`` tool. As of 2026-06-18 this returns
    facts plus a back-compatible answer/confidence/citations envelope — it does
    not run server-side synthesis. Pass ``team_id`` to scope to a team vault; the
    caller must be a member with at least 'ro' role.
    """
    from auth.deps import VaultContext
    from dependencies import get_session_store
    from services.memory import reflect

    if not body.query.strip():
        raise HTTPException(status_code=422, detail="query must not be empty.")

    vault_ctx = VaultContext(
        bank_id=user.user_id,
        actor_id=user.user_id,
        team_id=None,
        role="personal",
    )
    if body.team_id:
        from api.deps.agent_auth import resolve_team_vault
        vault_ctx = await resolve_team_vault(vault_ctx, body.team_id, min_role="ro")

    session = get_session_store().find_or_create_api_key_session(
        vault_ctx.actor_id, agent_name="dashboard", client_id=None
    )
    return await reflect(
        vault_ctx=vault_ctx,
        session=session,
        query=body.query,
        budget=body.budget,
        response_schema=body.response_schema,
        tags=None,
    )
This mirrors retain_memory exactly (same imports, same VaultContext + resolve_team_vault + find_or_create_api_key_session calls), so no new helper is invented and provider.reflect / the old team_bank(...) direct-bank path are dropped. The unused team_bank / get_team_store imports in this route, if now orphaned, should be left alone unless YOUR change made them unused — check with grep -n "team_bank\|get_team_store" api/routes/memories.py and only remove an import your edit orphaned.
  • Step 4: Run the wiring test
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py -v Expected: PASS.
  • Step 5: Import-sanity the route module
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -c "import api.routes.memories" Expected: no ImportError.
  • Step 6: Commit
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && \
git add api/routes/memories.py tests/skill/test_reflect_returns_facts.py && \
git commit -m "fix: REST /reflect delegates to services.reflect (facts path)

The route called provider.reflect directly, bypassing the new facts path and
still hitting the harmony model. Route through services.reflect so it returns
the back-compatible facts envelope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"

Task 3: Update the memory_reflect MCP tool docstring

No behavior change (it already calls services.reflect). Only the docstring lies now — it says “Hindsight resolves contradictions at synthesis time” and “present it to the user directly.” Files:
  • Modify: backend/tools/memory.py:635-677
  • Step 1: Replace the docstring
Replace the docstring block (lines 635-677, between the """ markers) with:
        """
        Return memory facts relevant to a question, for you to synthesize.

        As of 2026-06-18 this returns the same kind of facts as memory_recall,
        wrapped in a convenience envelope — it does NOT pre-synthesize an answer.
        Read the ``facts`` list and write the answer yourself.

        Returns:
            facts:      List of memory rows (id, text, type, occurred_at,
                        document_id, source) — the ground truth to reason over.
            answer:     A plain-text digest of the facts (one bullet per fact).
                        Convenience only; prefer synthesizing from ``facts``.
            confidence: "high" | "medium" | "low" — based on how many facts matched.
            citations:  `{id, type, context, occurred_start, occurred_end,
                        document_id}` per fact. Pass `document_id` into
                        memory_get_document for the verbatim source.

        ``response_schema`` is accepted but ignored (deprecated): produce any
        structured output yourself.

        VAULT ROUTING:
        - Omit team_id → reads from YOUR PERSONAL vault.
        - Provide team_id → reads from THAT TEAM's vault (requires ro+).
        - 403 means not authorized; it does not reveal whether the team exists.

        Args:
            query:           Question to answer using stored memories.
            budget:          Retrieval depth: "low", "mid" (default), "high".
            response_schema: Deprecated, ignored.
            tags:            Optional tag filter, e.g. ``["memory_kind:decision"]``.
            team_id:         Optional team UUID. Reads team vault if provided.
        """
  • Step 2: Import-sanity
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -c "import tools.memory" Expected: no error.
  • Step 3: Commit
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && \
git add tools/memory.py && \
git commit -m "docs: memory_reflect docstring reflects facts-not-synthesis behavior

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"

PHASE 2 — Skills (xysq-skills repo)

All skill edits are Markdown. After edits, bump the touched skill’s version in catalog.json (and the <!-- version: N --> footer in each edited SKILL.md, where present). No test runner; verification is grep-based.

Task 4: Strip the harmony-fallback block from the shared recipe + 5 claude-code skills

Files:
  • Modify: xysq-skills/_shared/recall-recipe.md
  • Modify: xysq-skills/{recap,decisions,blockers,actionables,prep}/claude-code/SKILL.md
  • Step 1: Delete the fallback block from the master recipe
In xysq-skills/_shared/recall-recipe.md, delete the entire section (heading + body):
## If reflect fails, fall back to recall
`memory_reflect` can return a malformed or empty result - an `answer` that is
empty, that is low `confidence` with no `citations`, or that contains raw model
scaffolding (e.g. text with `<|channel|>`, `to=functions`, `<|call|>`, or other
token markers instead of prose). Treat ANY of these as a failed reflect.
When reflect fails: do NOT present its output and do NOT invent an answer.
Re-run the same query with `memory_recall` (it returns raw facts reliably),
then synthesize the result yourself following the output contract below.
Recall is the dependable floor; reflect is an optimization on top of it.
  • Step 2: Delete the same block from each of the 5 claude-code skills
The identical block is embedded (copy-pasted) in each of: recap/claude-code/SKILL.md, decisions/claude-code/SKILL.md, blockers/claude-code/SKILL.md, actionables/claude-code/SKILL.md, prep/claude-code/SKILL.md. Delete the same ## If reflect fails, fall back to recall section (heading through the “optimization on top of it.” line) from each.
  • Step 3: Verify the block is gone everywhere
Run: cd xysq-skills && grep -rln "If reflect fails" . Expected: NO output (empty).
  • Step 4: Commit
cd xysq-skills && \
git add _shared/recall-recipe.md recap/claude-code/SKILL.md decisions/claude-code/SKILL.md blockers/claude-code/SKILL.md actionables/claude-code/SKILL.md prep/claude-code/SKILL.md && \
git commit -m "refactor(skills): drop harmony-fallback block — reflect returns facts now

The 'if reflect fails, fall back to recall' contract is obsolete: reflect is
recall-backed and cannot emit scaffolding. Removed from the shared recipe and
all 5 claude-code recall skills.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"

Task 5: Swap reflect→recall + add mission lines in recap/decisions/blockers

These three say “Use mcp__xysq__memory_reflect”. Point them at memory_recall and add a one-line mission. decisions shown in full; apply the analogous change to recap and blockers. Files:
  • Modify: xysq-skills/decisions/claude-code/SKILL.md
  • Modify: xysq-skills/recap/claude-code/SKILL.md
  • Modify: xysq-skills/blockers/claude-code/SKILL.md
  • Step 1: decisions — rewrite the “How to recall” section
In xysq-skills/decisions/claude-code/SKILL.md, replace:
## How to recall
Use `mcp__xysq__memory_reflect` with `tags: ["memory_kind:decision"]` and
`personal_only: true`.

- Query: the user's stated topic or "decisions and choices" for the window.
- Set `query_timestamp` to today's ISO date for relative windows; post-filter
  by `occurred_start` to stay inside the stated period.
- If the tag-filtered result is thin, fall back to untagged reflect with the
  same query and filter manually for decision-shaped content.
with:
## Mission
You are surfacing decisions the user made. Be precise and literal: report the
choice and its stated rationale, nothing inferred. If a decision has no clear
rationale in the facts, present it as "rationale unknown" rather than inventing
one.

## How to recall
Use `mcp__xysq__memory_recall` with `tags: ["memory_kind:decision"]` and
`types: ["observation"]` (so a superseded decision resolves to the current one).

- Scope: default to the user's personal vault (`personal_only: true`). If the
  request names a team, pass that team's `team_id` instead; if it's genuinely
  ambiguous and permitted, omit `personal_only` to fan out across personal +
  recall-enabled teams and label each item by its `source`.
- Query: the user's stated topic or "decisions and choices" for the window.
- Set `query_timestamp` to today's ISO date for relative windows; post-filter
  by each row's `occurred_at` to stay inside the stated period.
- If the tag-filtered result is thin, re-run untagged recall with the same query
  and filter manually for decision-shaped content.
  • Step 2: recap — rewrite the “How to recall” section
In xysq-skills/recap/claude-code/SKILL.md, replace:
## How to recall
Use `mcp__xysq__memory_reflect` with `personal_only: true`.

- Default query: the user's stated topic or "recent activity" for the window.
- For "why" context, also reflect with `tags: ["memory_kind:decision"]`.
- For "what happened" detail, also reflect with `tags: ["memory_kind:event"]`.
- Set `query_timestamp` to today's ISO date and raise `budget` to over-fetch when
  a relative window ("this week", "yesterday") is given - then post-filter by
  `occurred_start` from results.
with:
## Mission
You are a neutral narrator summarizing what happened in a period. Cover the
ground evenly; do not over-weight any single thread. Stay grounded in the facts.

## How to recall
Use `mcp__xysq__memory_recall` with `personal_only: true` by default.

- Scope: if the request names a team, pass that `team_id` instead of
  `personal_only`; if ambiguous and permitted, omit `personal_only` to fan out
  across personal + recall-enabled teams and label items by `source`.
- Default query: the user's stated topic or "recent activity" for the window.
- For "why" context, also recall with `tags: ["memory_kind:decision"]`.
- For "what happened" detail, also recall with `tags: ["memory_kind:event"]`.
- Set `query_timestamp` to today's ISO date and raise `budget` to over-fetch when
  a relative window ("this week", "yesterday") is given - then post-filter by
  each row's `occurred_at`.
  • Step 3: blockers — rewrite the “How to recall” section
In xysq-skills/blockers/claude-code/SKILL.md, replace the block beginning ## How to recall\nUse \mcp__xysq__memory_reflect` with `tags: [“memory_kind:blocker”]` and `personal_only: true`. Read the current block first (grep -n -A12 ”## How to recall” xysq-skills/blockers/claude-code/SKILL.md`), then replace it with:
## Mission
You are listing what is stuck. Be direct: for each item name exactly what or who
it is waiting on. Lean skeptical — only call something a blocker if the facts
show it cannot proceed, and drop anything the facts show was resolved.

## How to recall
Use `mcp__xysq__memory_recall` with `tags: ["memory_kind:blocker"]` and
`personal_only: true` by default.

- Scope: if the request names a team, pass that `team_id` instead of
  `personal_only`; if ambiguous and permitted, omit `personal_only` to fan out
  across personal + recall-enabled teams and label items by `source`.
- Query: the user's stated topic or "blocked items waiting on external dependency".
- Do NOT restrict by time window - a blocker may have been logged weeks ago and
  still be active.
- If a recalled memory signals the blocker was resolved (e.g., a follow-up note
  says "unblocked", "approved", "done"), do NOT list it as active.
- If the tag-filtered result is thin, re-run untagged recall with a query shaped
  around stuck/waiting/on-hold wording.
  • Step 4: Verify no reflect calls remain in these three
Run: cd xysq-skills && grep -rn "memory_reflect" recap/claude-code/SKILL.md decisions/claude-code/SKILL.md blockers/claude-code/SKILL.md Expected: NO output.
  • Step 5: Add mission lines to actionables + prep, drop their “untagged reflect” line
actionables and prep already call memory_recall. Two small edits each: (a) add a ## Mission line just above ## How to recall; (b) change any “fall back to untagged reflect” wording to “untagged recall”. For actionables/claude-code/SKILL.md, add above ## How to recall:
## Mission
You are surfacing the user's open, self-owned work — tasks they still need to
do. Forward-looking only; surface long-pending items prominently. Stay grounded.
For prep/claude-code/SKILL.md, add above ## How to recall:
## Mission
You are assembling context for an upcoming meeting or call. Pull broadly across
the topic regardless of age; favor recent but keep essential background. Neutral.
Then in both, grep for reflect and change “untagged reflect” → “untagged recall” if present: grep -n "reflect" xysq-skills/actionables/claude-code/SKILL.md xysq-skills/prep/claude-code/SKILL.md
  • Step 6: Final reflect-free check across all claude-code recall skills
Run: cd xysq-skills && grep -rn "reflect" {recap,decisions,blockers,actionables,prep}/claude-code/SKILL.md Expected: NO output.
  • Step 7: Bump versions + footers
In catalog.json, increment version for recap, decisions, blockers, actionables, prep by 1 (→ recap 4, decisions 4, blockers 4, actionables 5, prep 4). Update each edited SKILL.md’s <!-- version: N --> footer to match.
  • Step 8: Commit
cd xysq-skills && \
git add recap/claude-code/SKILL.md decisions/claude-code/SKILL.md blockers/claude-code/SKILL.md actionables/claude-code/SKILL.md prep/claude-code/SKILL.md catalog.json && \
git commit -m "feat(skills): claude-code recall skills call recall + carry a mission

recap/decisions/blockers now call memory_recall (not reflect); all 5 get a
mission line (identity + disposition) ported from Hindsight's reflect recipe,
plus an explicit scope step (personal default, team-overridable). Versions bumped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"

Task 6: Reword reflect guidance in all 9 core variants

core/* actively instructs calling reflect and describes it as a synthesizer that “cites its sources.” Verified against current main: the variants split into two families that share near-identical reflect sentences (different line numbers):
  • Tool-call family (8): claude-code, cursor, codex, claude-desktop, copilot-cli, gemini-cli, windsurf, generic — describe memory_reflect(...) as a tool.
  • REST family (1): chatgpt — describes /api/sdk/memory/reflect as an endpoint; uses bare reflect, not memory_reflect.
Because the sentences are near-identical strings, do EXACT find-replace by the source strings below (not by line number). A string that isn’t present in a given variant is simply skipped. Lines about Organise surfacing content, “Do NOT fire a generic session-start reflect”, “Skip both recall and reflect”, and the confidence=“low” troubleshooting line are all KEPT — they remain true. Files:
  • Modify: xysq-skills/core/{claude-code,cursor,codex,claude-desktop,copilot-cli,gemini-cli,windsurf,generic}/SKILL.md (tool-call family)
  • Modify: xysq-skills/core/chatgpt/SKILL.md (REST family)
  • Step 1: Tool-call family — apply these exact string replacements in all 8 variants
Replacement 1 — the cursor/codex/etc. intro line (present in some variants only, e.g. core/cursor/SKILL.md:5): FIND: Use reflect for synthesis questions, recall for raw fact retrieval. REPLACE: Use memory_reflect to gather facts across memory then synthesize yourself; use memory_recall for raw fact retrieval. Replacement 2 — the “Call ONLY when” paragraph: FIND: Call `memory_reflect` ONLY when the user's question itself requires synthesis across memory - "what do I prefer about X", "summarise my stance on Y", "compare my past decisions on Z". Not as warmup. REPLACE: Call `memory_reflect` when a question needs facts gathered across memory and you will synthesize the answer - "what do I prefer about X", "summarise my stance on Y", "compare my past decisions on Z". It returns facts (plus a convenience digest), not a finished answer. Not as warmup. Replacement 3 — the reference-block heading: FIND: ### memory_reflect - ask a question, get a direct answer REPLACE: ### memory_reflect - gather facts to synthesize Replacement 4 — the “already cites its sources” line: FIND: Do NOT call memory_recall after memory_reflect for the same question - reflect already cites its sources. REPLACE: Do NOT call memory_recall after memory_reflect for the same question - both return facts; memory_reflect adds a digest plus observation-resolution, memory_recall gives raw history. Replacement 5 — the recall-section cross-reference: FIND: For "what's true now / what does the user prefer" - call memory_reflectinstead, or passtypes=[“observation”] to recall to get conflict-resolved facts. REPLACE: For "what's true now / what does the user prefer" - call memory_reflect(it returns observation-resolved facts to synthesize), or passtypes=[“observation”] to recall directly. Replacement 6 — the get_document section reference: FIND: Use after memory_reflectreturns citations and the user asks for the source. Returns the document'soriginal_text + tags + metadata. REPLACE: Use after memory_reflectreturns citations (each carries a document_id) and the user asks for the source. Returns the document'soriginal_text + tags + metadata. Replacement 7 — the “what do you remember” troubleshooting line (clarify reflect is not a synthesizer): FIND: **User asks "what do you remember about X?":** Use `memory_recall(query="X", budget="mid")` and present the results. Do NOT use `memory_reflect` here - the user wants the raw list, not a synthesis. REPLACE: **User asks "what do you remember about X?":** Use `memory_recall(query="X", budget="mid")` and present the results. Either tool returns facts; recall is the direct path for a raw list. NOTE on the memory_reflect( code-block (e.g. claude-code:193, cursor:146): read the 8-15 lines under the ### memory_reflect heading in each variant and ensure no prose there says the result is a finished answer / “present directly” / “complete response.” If such a sentence exists, replace it with: Read the returned facts and synthesize the answer yourself; the answer field is a convenience digest of those facts. Locate per variant with grep -n -A15 "### memory_reflect" core/<variant>/SKILL.md.
  • Step 2: REST family — apply these exact replacements in core/chatgpt/SKILL.md
Replacement C1 — the “Call ONLY when” paragraph (bare reflect): FIND: Call reflect ONLY when the user's question itself requires synthesis (e.g. "what do I prefer about X", "summarise my stance on Y"). Not as warmup. REPLACE: Call reflect when a question needs facts gathered across memory and you will synthesize the answer (e.g. "what do I prefer about X", "summarise my stance on Y") - it returns facts plus a digest, not a finished answer. Not as warmup. Replacement C2 — the capability table row: FIND: | Get synthesised answer | POST | /api/sdk/memory/reflect|POST /api/memories/reflect | REPLACE: | Gather facts to synthesize | POST | /api/sdk/memory/reflect|POST /api/memories/reflect | Replacement C3 — the recall cross-reference: FIND: For "what does the user prefer / what's true now" - use /reflect(it resolves contradictions). Passtypes=[“observation”]to recall for conflict-resolved facts only; omittypes for raw history. REPLACE: For "what does the user prefer / what's true now" - use /reflect(it returns observation-resolved facts to synthesize). Passtypes=[“observation”]to recall for conflict-resolved facts only; omittypes for raw history. Replacement C4 — the “what do you remember” line: FIND: **User asks "what do you remember about X?":** Call recall and return the raw list. Do NOT use reflect here - the user wants source material, not a synthesis. REPLACE: **User asks "what do you remember about X?":** Call recall and return the raw list - it is the direct path for source material. (The chatgpt ### POST /api/sdk/memory/reflect section body, line ~107: read it with grep -n -A12 "### POST /api/sdk/memory/reflect" core/chatgpt/SKILL.md and, if it describes the response as a finished answer, change that sentence to “returns facts plus a digest for you to synthesize.”)
  • Step 3: Verify no variant still claims reflect produces a finished synthesis
Run: cd xysq-skills && grep -rn -i "ask a question, get a direct answer\|already cites its sources\|get synthesised answer\|requires synthesis across memory" core/*/SKILL.md Expected: NO output (all four stale phrasings gone). Then: cd xysq-skills && grep -rln "memory_reflect\|reflect" core/*/SKILL.md Expected: variants still mention reflect (the tool exists) — confirm by eyeballing 2-3 that remaining mentions are the KEPT lines (Organise, session-start warning, confidence=low) plus the reworded ones, none claiming a finished answer.
  • Step 4: Bump core version
In catalog.json, bump core version 1 → 2. Update any <!-- version: N --> footers present in the core variants to match.
  • Step 5: Commit
cd xysq-skills && \
git add core/ catalog.json && \
git commit -m "docs(skills): core variants describe reflect as facts-not-synthesis

All 9 platform variants reworded via exact string replacements: reflect returns
facts (+ a digest) for the agent to synthesize, no longer 'a direct answer that
cites its sources'. 8 tool-call variants + the chatgpt REST variant.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"

PHASE 3 — python_sdk

Task 7: Document synthesize() deprecation of response_schema; verify shape

No behavior change is required: synthesize() POSTs to /api/sdk/memory/reflect, which now returns {answer, confidence, citations, facts}. SynthesizeResult validates {answer, query, confidence, sources, wiki_context_used} and ignores extra keys by default, so answer (the digest) + confidence still populate. Confirm, and update the docstring. Files:
  • Modify: python_sdk/xysq/memory.py:108-125
  • Test: python_sdk/tests/ (add a parse test if a test dir exists; otherwise inline assert)
  • Step 1: Confirm SynthesizeResult tolerates the new envelope
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd python_sdk && python -c " from xysq.types import SynthesizeResult r = SynthesizeResult.model_validate({'answer':'- a\n- b','confidence':'medium','citations':[{'id':'f1'}],'facts':[{'id':'f1'}]}) print('answer:', repr(r.answer)); print('confidence:', r.confidence) assert r.answer.startswith('- a') and r.confidence == 'medium' print('OK') " Expected: prints OK. (If it raises on extra keys, the model has extra='forbid' — in that case add model_config = ConfigDict(extra='ignore') to SynthesizeResult and re-run.)
  • Step 2: Update the synthesize() docstring
In python_sdk/xysq/memory.py, replace the synthesize() docstring line """Synthesise an answer from memories.""" with:
        """Retrieve memory facts to synthesize from.

        As of xysq backend 2026-06-18 this returns facts plus a convenience
        digest in ``answer`` — it no longer runs server-side synthesis. Your
        agent should synthesize from the facts. ``response_schema`` is accepted
        but ignored (deprecated); produce structured output yourself.
        """
  • Step 3: Commit
source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd python_sdk && \
git add xysq/memory.py && \
git commit -m "docs(sdk): synthesize() returns facts+digest; response_schema deprecated

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>"

PHASE 4 — Verification

Task 8: End-to-end grounding checks

  • Step 1: Backend tests for the changed paths only
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && python -m pytest tests/skill/test_reflect_returns_facts.py tests/skill/test_reflect_harmony_guard.py -v Expected: all PASS.
  • Step 2: Grep proof — no skill teaches reflect-as-synthesis
Run: cd xysq-skills && grep -rln "If reflect fails" . ; echo "---" ; grep -rn "memory_reflect" {recap,decisions,blockers,actionables,prep}/claude-code/SKILL.md Expected: both empty.
  • Step 3: Grep proof — provider.reflect is no longer in any request path
Run: source ~/miniconda3/etc/profile.d/conda.sh && conda activate xysq && cd backend && grep -rn "provider.reflect\|\.reflect(" services/memory.py api/routes/memories.py api/routes/sdk.py tools/memory.py Expected: only the definition/dead-code in hindsight.py is untouched; NO hit in services/routes/tools calls provider.reflect. (services/memory.py now calls recall, routes call services.reflect.)
  • Step 4: Push branches (ask user first)
Do NOT push without the user’s go-ahead. When approved: xysq-skills is ahead 7 + these commits; backend and python_sdk have new commits; docs has the spec + plan. Confirm branch strategy with the user (these are on main locally — they may want feature branches).

Self-review notes

  • Spec coverage: reflect→facts (Task 1), REST route fix (Task 2, a gap the spec implied via “no harmony in path” — the main REST route bypassed the service), docstring (Task 3), shared recipe + 5 skills + mission/scope (Tasks 4-5), 9 core variants (Task 6), SDK contract via facts digest in answer (Task 7), response_schema accept-but-ignore (Tasks 1, 3, 7). All success criteria mapped.
  • Spec correction captured: the claude-code skills legitimately keep tags (MCP memory_recall HAS a tags param — verified). The chat/ variants are a separate tool surface, already recall-only — explicitly out of scope.
  • No provider deletion this pass (matches the locked decision).
  • SDK reality: SynthesizeResult has sources, not citations — Task 7 relies on extra-key tolerance rather than inventing a field.