Files are sent directly to the ingestion pipeline (parse → chunk → entity extraction → graph + vector). Synchronous: results appear inline below when complete. Supported: PDF, DOCX, PPTX, XLSX, CSV, HTML, Markdown, AsciiDoc, WebVTT, images, audio, plain text, and source code (.py / .ts / .tsx — pair with the code_structure ontology to build a class/call graph).

Or upload an entire code folder (recurses into subdirectories; filtered to .py, .ts, .tsx for the code_structure ontology):

Open Airflow

Airflow runs on Cloud Composer behind Google sign-in, which blocks embedding it inline. Click below to open it in a new tab — you’ll authenticate once with the same Google account that has access to this GCP project.

Open Airflow

Source

Type a path or click Browse… to navigate the mounted folder tree. The browser is sandboxed under the configured INGEST_BROWSE_ROOT (default /data/ingest).

Leave empty to use the default text-y formats (.txt .md .markdown .rst .yaml .yml .json .html .htm .csv .tsv .log .cypher .cql).

Live event log

    PDF, Word, HTML, Markdown, or plain text. Max 10 files, 25 MB each.


    Jobs

    Loading…

    Loading…

    Graph connectedness

    How fragmented is the entity graph? Updated daily by a background job — this view never re-runs the algorithm.

    Loading…

    Loading…

    Loading…

    History

    Loading…

    Name Connector Schedule Status Last run Last Run Stats Actions

    Leave empty to search across all documents. When provided, every key/value must match (AND).

    Retrieval runs across the selected layer (or all layers). Without an LLM key, the answer is a deterministic template that surfaces the top retrieved chunk.

    Include extra fields (uncheck for faster response)

    Total candidates fed to the merger = K × number of layers. The merger then dedupes and returns 3–10 bullets.

    Built-ins apply to any ontology. User-saved queries appear here when the matching ontology is selected.

    Read-only — write statements (CREATE / DELETE / MERGE / SET) are rejected by Neo4j. Use parameters by referencing $name; values come from the params dict (advanced — usually empty).

    Type a name and click Save. If the name matches the currently-selected saved query, it’ll be updated; otherwise a new query is created for the selected ontology.

    Graph view

    Nodes coloured by their non-Entity label (Framework / Library / Person / etc). Drag nodes to rearrange. Hover for properties.

    Tabular result

    Upload CSVs

    Drop one or more files. You can mix node and relationship CSVs in a single upload — nodes are imported first so relationships can resolve cross-file references.

    Stamped on rows that don't carry an ontology column.

    Used when a node row has no type column.

    Stamped on every node/edge as a doc_id so a future per-document delete can wipe this import. Auto-generated when blank.

    CSV format reference

    Nodes CSV — header row must include name (or one of: label, title). Recognised columns:

    • node_id / id — stable id used to wire relationships within this upload. Defaults to name.
    • name — required.
    • type / kind / node_type — Neo4j label (e.g. Framework, Person). Defaults to the value above.
    • ontology — defaults to the value above.
    • description — accumulated into n.descriptions[] and surfaced as n.description.
    • Any additional columns are written as node properties.

    Relationships CSV — must include both a source and target id column. Recognised columns:

    • source_node_id / source_id / from — required.
    • target_node_id / target_id / to — required.
    • type / rel_type / relationship — required, e.g. USES, WORKS_AT.
    • ontology — defaults to the value above.
    • Any additional columns are written as relationship properties.

    Identifiers (node labels and relationship types) must match [A-Za-z][A-Za-z0-9_]*. Spaces and dashes are converted to underscores; rows with unsafe labels are skipped.

    ⚠️ Destructive operation

    Clicking the button below will:

    • Delete the cake-vectors Elasticsearch index (every chunk + embedding).
    • Delete the cake-chunks Elasticsearch index (the keyword index).
    • Run MATCH (n) DETACH DELETE n in Neo4j (every node + every relationship).

    After reset, the indices will be auto-recreated on the next ingest with the correct mapping. Ontologies and layer config are not affected — only your ingested data is wiped.

    + Create new schedule

    Filled in automatically from the preset; edit only when Custom cron is selected.

    Active schedules

    Loading…

    + Add new model

    Lowercased on save. Longest match wins.

    Catalog

    + Create new tenant

    8 chars from [A-HJ-KM-NP-TV-Z2-9] — no 0,1,I,L,O,U. Leave blank to let the system pick.

    All tenants

    + Create new user

    All users

    Connector catalog (source · data_connectors)

    Available source connector types for ingestion (same JSON as gateway GET /v1/catalog/connectors). Requires AIRFLOW_GATEWAY_BASE_URL.

    Store catalog (destination · stores)

    Available destination store types for artifacts (same JSON as gateway GET /v1/catalog/stores). Requires AIRFLOW_GATEWAY_BASE_URL.

    Loading…

    1. Allowed models

    Only models ticked here appear in the per-operation dropdowns below.

    2. Per-operation model

    Pick the model each operation should use. Operations with no binding fall back to the global default.

    + Add new prompt

    Used as a programmatic key. Must be unique.

    Start a new run

    Running as · tenant · backends layer + context

    Top-level shape: { "metadata": {...}, "questions": [{ id, question, expected_answer, required_facts: [...], category?, difficulty? }] }

    Past runs

    🔭

    Coming soon

    ⚙️

    Coming soon

    Past questions

    + Add new query

    Use {ontology} as a placeholder if you want the query to substitute the currently-selected ontology at fill time (built-in pattern). Leave it out for verbatim queries.

    All saved queries

    Loading…