Cake API · Knowledge Graph Console

Ontology

Layer

Upload files Paste text

Files are sent directly to the ingestion pipeline (parse → chunk → entity extraction → graph + vector). Synchronous: results appear inline below when complete. Supported: PDF, DOCX, PPTX, XLSX, CSV, HTML, Markdown, AsciiDoc, WebVTT, images, audio, plain text, and source code (.py / .ts / .tsx — pair with the code_structure ontology to build a class/call graph).

Select files

Or upload an entire code folder (recurses into subdirectories; filtered to .py, .ts, .tsx for the code_structure ontology):

Metadata (optional, JSON)

Open Airflow

Airflow runs on Cloud Composer behind Google sign-in, which blocks embedding it inline. Click below to open it in a new tab — you’ll authenticate once with the same Google account that has access to this GCP project.

Open Airflow

Source

Server folder (mounted path) External bucket

Connector

Extractor config (MinIO) S3-compatible

Bucket

Endpoint URL

Access key

Secret key

Path-style addressing

Execution mode Run id (optional) Advanced runtime JSON (pre-filled default) Ontology Layer

Airflow run status

Folder (in-container path)

Type a path or click Browse… to navigate the mounted folder tree. The browser is sandboxed under the configured INGEST_BROWSE_ROOT (default /data/ingest).

Ontology Layer Extensions (comma-separated, leading dot optional)

Leave empty to use the default text-y formats (.txt .md .markdown .rst .yaml .yml .json .html .htm .csv .tsv .log .cypher .cql).

Extra metadata (JSON, optional)

Recursive

Max files

Max file size (MiB)

Live event log

Sample documents

PDF, Word, HTML, Markdown, or plain text. Max 10 files, 25 MB each.

Job label (optional)

Jobs

Loading…

Status

Ontology

Layer

Loading…

Graph connectedness

How fragmented is the entity graph? Updated daily by a background job — this view never re-runs the algorithm.

—

Loading…

—

Loading…

—

Loading…

Ontology

Label (optional)

Max weak types

History

Loading…

Name	Connector	Schedule	Status	Last run	Last Run Stats	Actions

Query

Top K

Layer

Metadata filter (optional, JSON)

Leave empty to search across all documents. When provided, every key/value must match (AND).

Question

Top K chunks

Layer

Retrieval runs across the selected layer (or all layers). Without an LLM key, the answer is a deterministic template that surfaces the top retrieved chunk.

Include extra fields (uncheck for faster response) intent review surrounding context

Query

Top K per layer

Total candidates fed to the merger = K × number of layers. The merger then dedupes and returns 3–10 bullets.

Ontology

Saved query

Built-ins apply to any ontology. User-saved queries appear here when the matching ontology is selected.

Cypher

Read-only — write statements (CREATE / DELETE / MERGE / SET) are rejected by Neo4j. Use parameters by referencing $name; values come from the params dict (advanced — usually empty).

Type a name and click Save. If the name matches the currently-selected saved query, it’ll be updated; otherwise a new query is created for the selected ontology.

Graph view

Nodes coloured by their non-Entity label (Framework / Library / Person / etc). Drag nodes to rearrange. Hover for properties.

Tabular result

Upload CSVs

Drop one or more files. You can mix node and relationship CSVs in a single upload — nodes are imported first so relationships can resolve cross-file references.

Default ontology

Stamped on rows that don't carry an ontology column.

Default node label

Used when a node row has no type column.

Import session id (optional)

Stamped on every node/edge as a doc_id so a future per-document delete can wipe this import. Auto-generated when blank.

CSV files

CSV format reference

Nodes CSV — header row must include name (or one of: label, title). Recognised columns:

node_id / id — stable id used to wire relationships within this upload. Defaults to name.
name — required.
type / kind / node_type — Neo4j label (e.g. Framework, Person). Defaults to the value above.
ontology — defaults to the value above.
description — accumulated into n.descriptions[] and surfaced as n.description.
Any additional columns are written as node properties.

Relationships CSV — must include both a source and target id column. Recognised columns:

source_node_id / source_id / from — required.
target_node_id / target_id / to — required.
type / rel_type / relationship — required, e.g. USES, WORKS_AT.
ontology — defaults to the value above.
Any additional columns are written as relationship properties.

Identifiers (node labels and relationship types) must match [A-Za-z][A-Za-z0-9_]*. Spaces and dashes are converted to underscores; rows with unsafe labels are skipped.

⚠️ Destructive operation

Clicking the button below will:

Delete the cake-vectors Elasticsearch index (every chunk + embedding).
Delete the cake-chunks Elasticsearch index (the keyword index).
Run MATCH (n) DETACH DELETE n in Neo4j (every node + every relationship).

After reset, the indices will be auto-recreated on the next ingest with the correct mapping. Ontologies and layer config are not affected — only your ingested data is wiped.

I understand this is irreversible

+ Create new schedule

Name

Type

Ontology

Email recipient (optional)

Frequency

Cron expression (min hour day month dow — UTC)

Filled in automatically from the preset; edit only when Custom cron is selected.

Active schedules

Loading…

Window:

+ Add new model

Model prefix

Lowercased on save. Longest match wins.

Display name (optional)

Provider (optional)

Input $/1M tokens

Output $/1M tokens

Notes (optional)

Catalog

+ Create new tenant

Name

Tenant code (optional)

8 chars from [A-HJ-KM-NP-TV-Z2-9] — no 0,1,I,L,O,U. Leave blank to let the system pick.

Description (optional)

All tenants

+ Create new user

Full name

Initial password

System admin

All users

Connector catalog (source · data_connectors)

Available source connector types for ingestion (same JSON as gateway GET /v1/catalog/connectors). Requires AIRFLOW_GATEWAY_BASE_URL.

Connector type

Store catalog (destination · stores)

Available destination store types for artifacts (same JSON as gateway GET /v1/catalog/stores). Requires AIRFLOW_GATEWAY_BASE_URL.

Store type

Loading…

1. Allowed models

Only models ticked here appear in the per-operation dropdowns below.

2. Per-operation model

Pick the model each operation should use. Operations with no binding fall back to the global default.

+ Add new prompt

Name

Used as a programmatic key. Must be unique.

Evals

Upload a test set and benchmark search quality.

Each case runs through both search backends and is scored automatically against expected answers — results are captured side-by-side so you can compare head-to-head.

Runs execute under the active sidebar tenant + the logged-in user. Concurrency is bounded by EVAL_CONCURRENCY (default 3) cases in flight; within a case the two searches fire in parallel. Each search half opens its own Langfuse trace if Langfuse is enabled.

Start a new run

Running as … · tenant … · backends layer + context

Test JSON

Top-level shape: { "metadata": {...}, "questions": [{ id, question, expected_answer, required_facts: [...], category?, difficulty? }] }

Run label (optional)

Past runs

🔭

Coming soon

⚙️

Coming soon

Question

Past questions

Active only

+ Add new query

Name

Ontology (blank = applies everywhere)

Cypher

Use {ontology} as a placeholder if you want the query to substitute the currently-selected ontology at fill time (built-in pattern). Leave it out for verbatim queries.

Description (optional)

All saved queries

Loading…

Open Airflow

Jobs

Job

Graph preview

Proposed YAML

Graph connectedness

History

Recommendation

Add source

Runs

Graph view

Upload CSVs

⚠️ Destructive operation

Active schedules

Catalog

All tenants

All users

1. Allowed models

2. Per-operation model

Start a new run

Active run

Past runs

Coming soon

Coming soon

Answer

Retrieval trace

Past questions

All saved queries