Dashboard
Ingest
Add new content to the knowledge base.
Upload files or paste text — the platform breaks it down, identifies the key concepts and relationships, and makes it ready for search and retrieval.
Files are sent directly to the ingestion pipeline (parse → chunk → entity extraction → graph + vector). Synchronous: results appear inline below when complete. Supported: PDF, DOCX, PPTX, XLSX, CSV, HTML, Markdown, AsciiDoc, WebVTT, images, audio, plain text, and source code (.py / .ts / .tsx — pair with the code_structure ontology to build a class/call graph).
Or upload an entire code folder (recurses into subdirectories; filtered to .py, .ts, .tsx for the code_structure ontology):
Bulk ingest from folder
Bulk import from a folder.
Point to a folder and process every supported file in one go — progress streams live below.
Source
Extractor config (MinIO) S3-compatible
Extractor config (Azure Blob)
Extractor config (Google Drive)
Extractor config (Jira Export)
Extractor config (Confluence)
Extractor config (Jira)
Extractor config (qMetry)
Extractor config (Azure DevOps)
Coming soon — use MinIO for now.
Airflow run status
Type a path or click Browse… to navigate the mounted folder tree. The browser is sandboxed under the configured INGEST_BROWSE_ROOT (default /data/ingest).
Leave empty to use the default text-y formats (.txt .md .markdown .rst .yaml .yml .json .html .htm .csv .tsv .log .cypher .cql).
Live event log
Ontology discovery
Discover a starting ontology from sample documents.
Upload a few representative documents and the platform proposes a starter set of concepts and relationships you can review and save.
PDF, Word, HTML, Markdown, or plain text. Max 10 files, 25 MB each.
Jobs
Loading…
Job
Graph preview
Proposed YAML
Document library
All ingested documents, sorted by most recent.
Drill into any document to see its concepts, relationships, and processing status.
Loading…
Quality reports
Spot quality issues across your knowledge base.
Surfaces thin concepts, orphans, hubs without depth, lonely nodes, and documents where nothing was extracted.
Graph connectedness
How fragmented is the entity graph? Updated daily by a background job — this view never re-runs the algorithm.
Loading…
Loading…
Ontology scorecard
Health scorecard for each ontology.
See how much of every defined concept and relationship has real data behind it, plus connectivity and diversity scores.
Loading…
Recommendations
AI-assisted suggestions for improving your ontologies.
Generation runs in the background — you can leave this tab and come back. Every run is saved to the history below so you can revisit, compare, or print past recommendations.
History
Loading…
Recommendation
Sources
Connect external data sources.
Register integrations to systems where your content lives, store credentials securely, and run them on-demand or on a schedule.
| Name | Connector | Schedule | Status | Last run | Last Run Stats | Actions |
|---|
No sources registered for this tenant. Click + Add source to create one.
Add source
Applies when you use Save & Run or when the schedule runs once after save. Skips time-based delta listing, clears cached pipeline state for this connector, and sets the run to non-incremental so every file is fetched and processed again.
Runs
Similarity search
Find content that matches a query by meaning.
Each result includes the matched text and the concepts associated with it.
Leave empty to search across all documents. When provided, every key/value must match (AND).
Context search ⓘLayer Search helps you find the right answer.
Context Search equips you with the full picture.
Ask a question and get a grounded answer.
The platform retrieves relevant content, expands it with related concepts, and answers using only the evidence it found.
Retrieval runs across the selected layer (or all layers). Without an LLM key, the answer is a deterministic template that surfaces the top retrieved chunk.
Layer search ⓘLayer Search helps you find the right answer.
Context Search equips you with the full picture.
Search across all knowledge layers and merge the results.
Each fact surfaces once — under the most specific layer that contains it (project > enterprise > domain > general).
Total candidates fed to the merger = K × number of layers. The merger then dedupes and returns 3–10 bullets.
Ontology
Browse the concept and relationship schemas.
Pick an ontology to see its schema graph plus the full entity and relationship lists.
Graph
Run read-only queries against the knowledge graph and explore the result.
Picking an ontology pre-fills a starter query scoped to it — edit it freely before running.
Built-ins apply to any ontology. User-saved queries appear here when the matching ontology is selected.
Read-only — write statements (CREATE / DELETE / MERGE / SET) are rejected by Neo4j. Use parameters by referencing $name; values come from the params dict (advanced — usually empty).
Type a name and click Save. If the name matches the currently-selected saved query, it’ll be updated; otherwise a new query is created for the selected ontology.
Graph view
Nodes coloured by their non-Entity label (Framework / Library / Person / etc). Drag nodes to rearrange. Hover for properties.
Tabular result
Import graph
Load nodes and relationships directly from CSV files. Bypasses the live ingestion pipeline — no chunking, no embeddings, no LLM extraction; only the graph is written.
Each file's header row decides whether it's a nodes or relationships CSV. Imports are tenant-scoped to the active tenant.
Upload CSVs
Drop one or more files. You can mix node and relationship CSVs in a single upload — nodes are imported first so relationships can resolve cross-file references.
Stamped on rows that don't carry an ontology column.
Used when a node row has no type column.
Stamped on every node/edge as a doc_id so a future per-document delete can wipe this import. Auto-generated when blank.
CSV format reference
Nodes CSV — header row must include name (or one of: label, title). Recognised columns:
node_id/id— stable id used to wire relationships within this upload. Defaults toname.name— required.type/kind/node_type— Neo4j label (e.g. Framework, Person). Defaults to the value above.ontology— defaults to the value above.description— accumulated inton.descriptions[]and surfaced asn.description.- Any additional columns are written as node properties.
Relationships CSV — must include both a source and target id column. Recognised columns:
source_node_id/source_id/from— required.target_node_id/target_id/to— required.type/rel_type/relationship— required, e.g. USES, WORKS_AT.ontology— defaults to the value above.- Any additional columns are written as relationship properties.
Identifiers (node labels and relationship types) must match [A-Za-z][A-Za-z0-9_]*. Spaces and dashes are converted to underscores; rows with unsafe labels are skipped.
Statistics
Operational metrics across the platform.
Refresh to recompute.
Reset data
Permanently delete all ingested content.
This cannot be undone. Re-ingesting your sources is the only way to restore content.
⚠️ Destructive operation
Clicking the button below will:
- Delete the
cake-vectorsElasticsearch index (every chunk + embedding). - Delete the
cake-chunksElasticsearch index (the keyword index). - Run
MATCH (n) DETACH DELETE nin Neo4j (every node + every relationship).
After reset, the indices will be auto-recreated on the next ingest with the correct mapping. Ontologies and layer config are not affected — only your ingested data is wiped.
Scheduling
Periodic background jobs.
Each scheduled run is recorded, and a notification goes to the configured recipient when the job finishes.
+ Create new schedule
Filled in automatically from the preset; edit only when Custom cron is selected.
Active schedules
Loading…
Token & cost
Aggregated usage and estimated cost across every workload.
Costs are estimates derived from per-model rates in the Catalog — edit a rate there to refresh the math.
LLM Catalog
Per-model rate table that backs the cost dashboard.
Costs are computed from input and output token rates; model lookup uses longest-prefix matching against the live model id.
+ Add new model
Lowercased on save. Longest match wins.
Catalog
Tenants
Each tenant is a customer or workspace.
Content is scoped to one tenant; ontologies, schedules, and rate catalogs are shared. The active tenant in the sidebar is sent on every request.
+ Create new tenant
8 chars from [A-HJ-KM-NP-TV-Z2-9] — no 0,1,I,L,O,U. Leave blank to let the system pick.
All tenants
Security
Manage users and the tenants they can access.
Each user is identified by email. Inactive users keep their audit trail but cannot sign in.
+ Create new user
All users
Settings
Runtime configuration for the platform.
Edit values inline — most changes take effect immediately.
Connector catalog (source · data_connectors)
Available source connector types for ingestion (same JSON as gateway GET /v1/catalog/connectors). Requires AIRFLOW_GATEWAY_BASE_URL.
Store catalog (destination · stores)
Available destination store types for artifacts (same JSON as gateway GET /v1/catalog/stores). Requires AIRFLOW_GATEWAY_BASE_URL.
Loading…
Labels
Customise the words the UI uses for shared concepts.
Each row maps a key (used by the UI markup as data-label="…") to a display name. Edits are saved immediately but only take effect after the next API restart — the running process serves labels from a startup cache.
LLM Settings
Configure which LLMs the platform may use and which model handles each operation.
Section 1 toggles the model allowlist on cake.llm_catalog. Section 2 binds each LLM call site to one of the allowed models. Bindings take effect on the next request.
1. Allowed models
Only models ticked here appear in the per-operation dropdowns below.
2. Per-operation model
Pick the model each operation should use. Operations with no binding fall back to the global default.
Prompts
Library of named prompt templates.
Each prompt has a unique name code looks up by, plus a category for grouping (search, ingestion, chat, system, or anything else you need).
+ Add new prompt
Used as a programmatic key. Must be unique.
Audit logs
Append-only record of meaningful user and system actions.
Read-only. Each row carries the actor, tenant, action category, status, request id, and a structured details payload.
Evals
Upload a test set and benchmark search quality.
Each case runs through both search backends and is scored automatically against expected answers — results are captured side-by-side so you can compare head-to-head.
Runs execute under the active sidebar tenant + the logged-in user. Concurrency is bounded by EVAL_CONCURRENCY (default 3) cases in flight; within a case the two searches fire in parallel. Each search half opens its own Langfuse trace if Langfuse is enabled.
Start a new run
Running as … · tenant … · backends layer + context
Top-level shape: { "metadata": {...}, "questions": [{ id, question, expected_answer, required_facts: [...], category?, difficulty? }] }
Past runs
Deep search
Iterative, agent-driven research across your content.
Plans sub-questions, runs them in parallel, refines on what it finds, and stitches a synthesised report with citations.
Coming soon
Retrieval config
Tune how the platform picks data sources for each kind of question.
Set per-intent limits and choose which sources are eligible per tenant — no redeploy required.
Coming soon
Data Retrieval
Customise CAKE to pull from any data source — each one plugs in as a retriever in the catalog below.
CAKE's intelligent retrieval engine understands the question, decides which sources to query, runs them in parallel, and assembles a single grounded answer. Ideal for agentic platforms: instead of wiring every agent to every backend, point them at CAKE and let it gather what they need.
Past questions
Retrievers
Catalog of data sources the platform can choose from.
One row per source — built-in, external tool, or custom adapter. Edits take effect on the next request.
Phase 0: read-only. Add new retrievers via direct SQL insert against cake.retriever_registry for now; admin CRUD lands in Phase 1.
MCP Servers
External services the platform can dispatch tool calls to.
Configure them using the standard JSON format every compatible client uses. Both local and remote servers are supported.
Graph queries
Saved query templates that back the Graph panel’s dropdown.
Built-in templates apply across every ontology; user-saved queries are stored against one ontology. Built-ins can be edited but not deleted.
+ Add new query
Use {ontology} as a placeholder if you want the query to substitute the currently-selected ontology at fill time (built-in pattern). Leave it out for verbatim queries.
All saved queries
API Docs
Interactive Swagger UI for the FastAPI backend. Use the “Try it out” controls inside each endpoint to call the API live.
MCP (Model Context Protocol)
Expose retrieval as native tools to compatible clients.
Auth is via the same API keys you mint from the sidebar.
Loading…