SureIndex & SureIntel — Document Intelligence
SureCentric provides a two-component document intelligence pipeline for AI-powered search and retrieval across the entire SureClinical document archive:
| Component | Powered by | Role |
|---|---|---|
| SureIndex | CocoIndex | Continuous ingestion pipeline — reads SureDrive documents, chunks, embeds, and writes a live vector index |
| SureIntel | Custom RAG service + SureLLM | Query service — takes a user question, retrieves relevant chunks from the SureIndex vector store, returns a grounded answer |
Together they give SureCentric users the ability to ask natural-language questions about any clinical document in the archive and receive accurate, sourced answers — with full audit trail attribution.
Architecture
SureIndex — Ingestion Pipeline
SureIndex is a stateful, incremental ingestion pipeline built on CocoIndex (Python + Rust core). It watches the SureDrive Nuxeo archive and continuously keeps the vector index fresh.
Why CocoIndex
| Need | CocoIndex Feature |
|---|---|
| Only re-index changed documents | Incremental processing — skips unchanged chunks |
| Know which doc produced which chunk | Data lineage — built-in provenance chain |
| Handle regulatory document types (PDF, Excel, CSV) | Native PDF, Office, and structured data parsers |
| Performance at scale | Rust-core, parallel chunking, async embedding |
| Regulatory traceability | Lineage maps answer → chunk → source document → Nuxeo ID |
Source connectors
SureIndex ingests from:
| Source | Method | Content Type |
|---|---|---|
| SureDrive / Nuxeo | Nuxeo REST API + webhook trigger | PDFs, DOCX, XLSX, CSV |
| SureCentric DuckDB | Direct DuckDB reads | Structured clinical tables |
| External files | DUCKDB_PATH file watcher | Parquet, JSON-LD, CSV |
| REST APIs (via SureConnect) | OpenAPI connector | JSON responses |
Pipeline steps
# Conceptual SureIndex flow (CocoIndex Python API)
@cocoindex.flow
def surecentric_index(source: NuxeoSource):
docs = source.fetch_documents(content_types=["PDF", "XLSX", "CSV"])
chunks = docs.chunk(strategy="semantic", max_tokens=512)
embedded = chunks.embed(model=SureLLM.embedding_model)
embedded.export(target=DuckDBVSSTarget(path=DUCKDB_PATH, table="sure_index"))
embedded.lineage(target=AuditLineageTarget()) # regulatory provenance
Monitoring UI — CocoInsight
CocoIndex ships with a built-in pipeline monitoring UI called CocoInsight (port 7861). This is embeddable in the SureCentric Desktop as a panel iframe.
SureIntel — RAG Query Service
SureIntel is the user-facing query layer. It takes a natural-language question, retrieves the most relevant chunks from SureIndex's vector store, and calls SureLLM to generate a grounded, cited answer.
Query flow
User: "What is the primary endpoint definition in the APEX-01 protocol?"
│
├── POST /api/intel/query { question, project_id, user_id }
│
├── 1. Vector search: top-5 chunks from sure_index WHERE project_id = ?
│
├── 2. Construct prompt:
│ System: "Answer using only the provided clinical documents."
│ Context: [chunk 1 text + source citation] ... [chunk 5 text + citation]
│ User: question
│
├── 3. SureLLM call → grounded completion
│
└── 4. Response:
answer: "The primary endpoint is..."
citations: [
{ doc: "APEX-01-v3.pdf", page: 12, nuxeo_id: "abc123" },
{ doc: "APEX-01-statistical-plan.xlsx", sheet: "Endpoints" }
]
Audit trail
Every SureIntel query generates an INTEL_QUERY audit event in audit_trail:
| Field | Value |
|---|---|
event_type | INTEL_QUERY |
actor_type | user or agent |
resource_type | document_index |
resource_id | project_id |
after_state | { question, answer_length, chunks_retrieved, model } |
This satisfies FDA 21 CFR Part 11 §11.10(e) — every AI-assisted query against clinical data is permanently attributed, timestamped, and tamper-evident.
REST API
POST /api/intel/query
Body: { question: string, project_id: string, top_k?: number }
Returns: { answer: string, citations: Citation[], query_id: string }
GET /api/intel/history?project_id=
Returns: paginated list of past queries for a project
POST /api/intel/reindex
Triggers a full SureIndex re-ingestion for a project
GET /api/intel/index-status?project_id=
Returns: { doc_count, chunk_count, last_indexed_at, pending_changes }
Desktop UI — Electron Integration
Both SureIndex and SureIntel have iframeable UIs that integrate into the SureCentric Desktop Project Canvas using the same pattern as Superset dashboards.
SureIntel Chat Panel
The SureIntel chat interface runs as a lightweight web app (port 7860) and is embedded in the Desktop as a panel iframe in the Project Desktop canvas:
Project Desktop canvas
├── Dashboards → Superset embed (port 8088)
├── Schema Builder→ Vue app iframe (port 5173)
├── Reports → Superset embed (port 8088)
├── Knowledge → SureIntel Chat (port 7860) ← NEW
└── Index Monitor → CocoInsight (port 7861) ← NEW (admin/power users)
SureIntel Chat iframe design considerations:
| Consideration | Approach |
|---|---|
| Auth context | Parent Electron window passes project context + actor JWT as URL params or postMessage |
| CSP | Add http://localhost:7860 http://localhost:7861 to frame-src |
| Scoping | SureIntel queries are scoped to project_id from the active project — users only see docs they have access to |
| Citations | Source document links in the chat UI open the corresponding Nuxeo document in the SC panel |
| No iframe needed | SureIntel could also be a native renderer panel using an IPC call to CARD API → /api/intel/query — gives more control over UI, avoids CSP issues |
SureIndex Monitor Panel (Admin)
The CocoInsight monitoring UI (a CocoIndex built-in) shows:
- Which documents are indexed, pending, or failed
- Data lineage visualization (source → chunks → embeddings)
- Re-index triggers
This is only exposed to users with an admin or data-manager role.
Regulatory Provenance
The data lineage from CocoIndex is particularly valuable in regulated environments:
Audit Query: "What document supported the answer about APEX-01 primary endpoint?"
answer ← chunk_id: 7f3a ← doc: APEX-01-v3.pdf (page 12)
↑
nuxeo_id: "abc123"
↑
SureDrive archive entry
↑
audit_trail: INTEL_QUERY event ([email protected], 2026-04-12T10:05:00Z)
This produces a complete, regulatory-grade chain of custody from the user's question to the exact source paragraph in the archived clinical document.
Deployment
Add to docker-compose.yml:
sureindex:
image: python:3.12-slim
command: python -m sureindex.pipeline --watch
environment:
NUXEO_URL: http://nuxeo:8080
DUCKDB_PATH: /data/clinical.duckdb
SURELLM_URL: http://surellm:11434
COCOINSIGHT_PORT: 7861
volumes:
- clinical-data:/data
ports:
- "7861:7861" # CocoInsight monitoring UI
sureintel:
image: node:20-alpine
command: node dist/sureintel/server.js
environment:
DUCKDB_PATH: /data/clinical.duckdb
SURELLM_URL: http://surellm:11434
CARD_API_URL: http://card-api:3099
volumes:
- clinical-data:/data
ports:
- "7860:7860" # SureIntel Chat UI
Electron config.js additions:
SUREINTEL_URL: process.env.SUREINTEL_URL || "http://localhost:7860",
SUREINDEX_UI_URL: process.env.SUREINDEX_UI_URL || "http://localhost:7861",
CSP addition:
'frame-src http://localhost:8080 http://localhost:8088 http://localhost:5173 http://localhost:7860 http://localhost:7861'