Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.xysq.ai/llms.txt

Use this file to discover all available pages before exploring further.

Memory Storage Model

This is the canonical mental model for memories in xysq. If you’re touching anything in backend/memory/, backend/services/memory.py, backend/api/routes/memories.py, or backend/chat/wiring.py, read this first.

The two stores

xysq stores user memories in two places, by design:
  • Postgres (memories table) — durable source of truth for what the user wrote, what they tagged it with, what state it’s in (pending/processing/completed/failed). The vault list endpoint reads from here.
  • Hindsight (per-bank document store) — the AI memory provider. Stores the original document text plus N atomic memory units the LLM extracted from it. Recall ranks units; mutations operate on documents.
Every Postgres memories row corresponds to exactly one Hindsight document, identified by the row’s document_id column. We send document_id at retain time; Hindsight indexes by it; we use it for all subsequent operations.

Document vs memory unit

ONE USER NOTE  ──→  ONE memories row  ──→  ONE Hindsight document
                                            ├── memory unit
                                            ├── memory unit         ← extracted by Hindsight LLM
                                            └── memory unit
The user’s mental model is the document (their note). Hindsight extracts atomic facts from it as units, internally, for retrieval ranking. App-layer code never operates on individual units — that’s an internal Hindsight concept. We tag, edit, and delete documents.

Bank routing

The Hindsight bank for a given user item (memory or knowledge source) is deterministic from the row state. Privacy tags take precedence over the item’s default bank. Personal items:
Tags include pii?Tags include confidential?Item is knowledge_sources?Bank(s)
any{user_id}:private
any{user_id}:confidential
any{user_id}:private AND {user_id}:confidential (fan out)
{user_id}:wiki (knowledge default)
{user_id} (memory default)
Team items (no privacy splits today; pii/confidential on team rows are ignored for routing — they remain as informational tags but the bank is still team_main / team_wiki):
Item is knowledge_sources?Bank(s)
team:{owner_id}:wiki
team:{owner_id}
Bidirectional migration is real. Adding pii to a knowledge source moves it from :wiki to :private. Removing pii from a memory moves it from :private back to main. The update_memory_tags and update_knowledge_tags route handlers must compute the new banks under the new tags, PATCH whatever stays, retain into newly-targeted banks, and delete from no-longer-targeted banks. This is the same fan-out logic on both sides — a memory and a knowledge source are routed by the same rule, just with different defaults when no privacy tag is present. Never iterate banks at mutation time. If a row is in either table, its bank(s) are computed by resolve_banks_for_row(row, *, default_bank) — single deterministic answer. If you need to operate on an item and the answer is “I don’t know what bank”, that’s a data inconsistency bug, not a “try them all” situation.

What memories.hindsight_id was, and why it’s gone

The column tried to store the per-unit Hindsight memory id so we could call per-unit endpoints (PATCH /memories/{memory_id}/tags). But:
  1. Hindsight’s RetainResponse has no per-unit id field, so the column was always NULL.
  2. Even if we could populate it, a “memory” has N units — operating on one updated tags on 1/N facts and silently left the others stale.
The right unit of operation at the app layer is the document. The column is dropped.

What the webhook does

The retain.completed webhook from Hindsight fires once per document after extraction finishes. The handler matches the memories row by document_id and flips status from processingcompleted. That’s it. It does not write a hindsight_id — there’s nothing useful to write.

What asset uploads do (and don’t do)

When a user uploads a file via Organise:
  1. assets table row created (file metadata, GCS uri, extraction status)
  2. Hindsight retain to {user_id}:wiki bank (where the extracted content lives for recall)
  3. knowledge_sources table row of type='document' (so the file appears in the Knowledge Base scope of the unified vault)
What we explicitly DO NOT do anymore: write a placeholder memories row. Asset uploads are knowledge content; they belong in knowledge_sources. The pre-2026-05-11 dual-surfacing pattern (asset → memories row + knowledge_sources row + Hindsight wiki) created rows whose vault_type='personal' claimed they were main-vault memories but whose actual content was in the wiki bank — a categorical lie. That pattern is removed; existing leaked rows are cleaned up by the cleanup_asset_memories.py one-shot.

Operational invariants

These should hold after the refactor lands. Add tests / asserts that catch violations:
  1. Every memories row with status='completed' has a corresponding Hindsight document at the bank computed by resolve_banks_for_row(row, default_bank='main'). (Holds modulo the orphan cleanup; the mark_orphan_memories_failed.py one-shot promotes violations to status=‘failed’ so the UI surfaces them.)
  2. No memories row has role='asset' in tags or metadata after the cleanup runs. Asset uploads only live in knowledge_sources.
  3. memories.hindsight_id column does not exist (after Phase 5 migration).
  4. Every Hindsight document in a user’s bank has either a memories row (with matching document_id) or a knowledge_sources row. Backfilled by backfill_ghost_documents.py; ongoing invariant maintained because retain creates the Postgres row before calling Hindsight.