Skip to main content

SureIndex & SureIntel — Document Intelligence

SureCentric provides a two-component document intelligence pipeline for AI-powered search and retrieval across the entire SureClinical document archive:

ComponentPowered byRole
SureIndexCocoIndexContinuous ingestion pipeline — reads SureDrive documents, chunks, embeds, and writes a live vector index
SureIntelCustom RAG service + SureLLMQuery service — takes a user question, retrieves relevant chunks from the SureIndex vector store, returns a grounded answer

Together they give SureCentric users the ability to ask natural-language questions about any clinical document in the archive and receive accurate, sourced answers — with full audit trail attribution.


Architecture


SureIndex — Ingestion Pipeline

SureIndex is a stateful, incremental ingestion pipeline built on CocoIndex (Python + Rust core). It watches the SureDrive Nuxeo archive and continuously keeps the vector index fresh.

Why CocoIndex

NeedCocoIndex Feature
Only re-index changed documentsIncremental processing — skips unchanged chunks
Know which doc produced which chunkData lineage — built-in provenance chain
Handle regulatory document types (PDF, Excel, CSV)Native PDF, Office, and structured data parsers
Performance at scaleRust-core, parallel chunking, async embedding
Regulatory traceabilityLineage maps answer → chunk → source document → Nuxeo ID

Source connectors

SureIndex ingests from:

SourceMethodContent Type
SureDrive / NuxeoNuxeo REST API + webhook triggerPDFs, DOCX, XLSX, CSV
SureCentric DuckDBDirect DuckDB readsStructured clinical tables
External filesDUCKDB_PATH file watcherParquet, JSON-LD, CSV
REST APIs (via SureConnect)OpenAPI connectorJSON responses

Pipeline steps

# Conceptual SureIndex flow (CocoIndex Python API)
@cocoindex.flow
def surecentric_index(source: NuxeoSource):
docs = source.fetch_documents(content_types=["PDF", "XLSX", "CSV"])
chunks = docs.chunk(strategy="semantic", max_tokens=512)
embedded = chunks.embed(model=SureLLM.embedding_model)
embedded.export(target=DuckDBVSSTarget(path=DUCKDB_PATH, table="sure_index"))
embedded.lineage(target=AuditLineageTarget()) # regulatory provenance

Monitoring UI — CocoInsight

CocoIndex ships with a built-in pipeline monitoring UI called CocoInsight (port 7861). This is embeddable in the SureCentric Desktop as a panel iframe.


SureIntel — RAG Query Service

SureIntel is the user-facing query layer. It takes a natural-language question, retrieves the most relevant chunks from SureIndex's vector store, and calls SureLLM to generate a grounded, cited answer.

Query flow

User: "What is the primary endpoint definition in the APEX-01 protocol?"

├── POST /api/intel/query { question, project_id, user_id }

├── 1. Vector search: top-5 chunks from sure_index WHERE project_id = ?

├── 2. Construct prompt:
│ System: "Answer using only the provided clinical documents."
│ Context: [chunk 1 text + source citation] ... [chunk 5 text + citation]
│ User: question

├── 3. SureLLM call → grounded completion

└── 4. Response:
answer: "The primary endpoint is..."
citations: [
{ doc: "APEX-01-v3.pdf", page: 12, nuxeo_id: "abc123" },
{ doc: "APEX-01-statistical-plan.xlsx", sheet: "Endpoints" }
]

Audit trail

Every SureIntel query generates an INTEL_QUERY audit event in audit_trail:

FieldValue
event_typeINTEL_QUERY
actor_typeuser or agent
resource_typedocument_index
resource_idproject_id
after_state{ question, answer_length, chunks_retrieved, model }

This satisfies FDA 21 CFR Part 11 §11.10(e) — every AI-assisted query against clinical data is permanently attributed, timestamped, and tamper-evident.

REST API

POST /api/intel/query
Body: { question: string, project_id: string, top_k?: number }
Returns: { answer: string, citations: Citation[], query_id: string }

GET /api/intel/history?project_id=
Returns: paginated list of past queries for a project

POST /api/intel/reindex
Triggers a full SureIndex re-ingestion for a project

GET /api/intel/index-status?project_id=
Returns: { doc_count, chunk_count, last_indexed_at, pending_changes }

Desktop UI — Electron Integration

Both SureIndex and SureIntel have iframeable UIs that integrate into the SureCentric Desktop Project Canvas using the same pattern as Superset dashboards.

SureIntel Chat Panel

The SureIntel chat interface runs as a lightweight web app (port 7860) and is embedded in the Desktop as a panel iframe in the Project Desktop canvas:

Project Desktop canvas
├── Dashboards → Superset embed (port 8088)
├── Schema Builder→ Vue app iframe (port 5173)
├── Reports → Superset embed (port 8088)
├── Knowledge → SureIntel Chat (port 7860) ← NEW
└── Index Monitor → CocoInsight (port 7861) ← NEW (admin/power users)

SureIntel Chat iframe design considerations:

ConsiderationApproach
Auth contextParent Electron window passes project context + actor JWT as URL params or postMessage
CSPAdd http://localhost:7860 http://localhost:7861 to frame-src
ScopingSureIntel queries are scoped to project_id from the active project — users only see docs they have access to
CitationsSource document links in the chat UI open the corresponding Nuxeo document in the SC panel
No iframe neededSureIntel could also be a native renderer panel using an IPC call to CARD API → /api/intel/query — gives more control over UI, avoids CSP issues

SureIndex Monitor Panel (Admin)

The CocoInsight monitoring UI (a CocoIndex built-in) shows:

  • Which documents are indexed, pending, or failed
  • Data lineage visualization (source → chunks → embeddings)
  • Re-index triggers

This is only exposed to users with an admin or data-manager role.


Regulatory Provenance

The data lineage from CocoIndex is particularly valuable in regulated environments:

Audit Query: "What document supported the answer about APEX-01 primary endpoint?"

answer ← chunk_id: 7f3a ← doc: APEX-01-v3.pdf (page 12)

nuxeo_id: "abc123"

SureDrive archive entry

audit_trail: INTEL_QUERY event ([email protected], 2026-04-12T10:05:00Z)

This produces a complete, regulatory-grade chain of custody from the user's question to the exact source paragraph in the archived clinical document.


Deployment

Add to docker-compose.yml:

sureindex:
image: python:3.12-slim
command: python -m sureindex.pipeline --watch
environment:
NUXEO_URL: http://nuxeo:8080
DUCKDB_PATH: /data/clinical.duckdb
SURELLM_URL: http://surellm:11434
COCOINSIGHT_PORT: 7861
volumes:
- clinical-data:/data
ports:
- "7861:7861" # CocoInsight monitoring UI

sureintel:
image: node:20-alpine
command: node dist/sureintel/server.js
environment:
DUCKDB_PATH: /data/clinical.duckdb
SURELLM_URL: http://surellm:11434
CARD_API_URL: http://card-api:3099
volumes:
- clinical-data:/data
ports:
- "7860:7860" # SureIntel Chat UI

Electron config.js additions:

SUREINTEL_URL:    process.env.SUREINTEL_URL    || "http://localhost:7860",
SUREINDEX_UI_URL: process.env.SUREINDEX_UI_URL || "http://localhost:7861",

CSP addition:

'frame-src http://localhost:8080 http://localhost:8088 http://localhost:5173 http://localhost:7860 http://localhost:7861'