Skip to main content

Web Service API

This page documents the Spring Boot service entry point that exposes document-processing, OCR, redaction, watermarking, and AI-assisted classification endpoints.

Scope

The service module is a direct HTTP API. It is separate from the desktop client and web client, and it focuses on file transformation plus text and classification utilities.

Primary Entry Point

The application is defined in WebServiceApplication.

That class is both:

  • the Spring Boot application entry point
  • the REST controller for the service endpoints

The main() method loads the iText license key, sets the server port to 8081, and starts the application.

Exposed Endpoints

The current controller surface includes:

  • POST /redact for PDF redaction using a multipart file and a redaction payload
  • POST /search for locating matching text ranges in a PDF
  • POST /watermark/add for applying a watermark to a PDF
  • POST /watermark/remove for removing a watermark
  • POST /watermark/get for listing detected watermarks
  • POST /ocr/get for returning OCR output as JSON or plain data depending on the engine
  • POST /ocr/add for adding an OCR layer to uploaded files
  • POST /ocr/text for returning OCR text output
  • POST /ai/classify/etmf for eTMF content-type prediction
  • POST /ai/document/details for combined document AI output
  • GET /demo for the demo landing page
  • POST /demo/ai/classify for demo classification text output
  • POST /demo/ner/stanford for Stanford NER output
  • POST /demo/ner/opennlp for OpenNLP-style entity extraction output
  • POST /demo/ocr/add for demo OCR PDF generation

Supporting Services

The controller delegates work to the service and utility layer, including:

  • OcrUtils
  • VisionApiUtils
  • ItextUtils for PDF cleanup, redaction, and watermark handling
  • ClassificationUtils for PDF text extraction and classification helpers
  • VertexAiUtils for eTMF classification and document AI output
  • TextUtils for named-entity extraction
  • DocumentAiDetails as the combined AI response object

OCR Behavior

OcrUtils supports three engine modes:

  • documentAi for document AI processing
  • ocr for Google Vision OCR
  • tesseract for local HOCR-oriented processing

VisionApiUtils bridges Google Vision OCR and OCR-layer generation over PDF and image inputs.

Implementation Notes

  • Most endpoints operate on multipart file uploads and return generated files or structured JSON directly.
  • The service writes temporary files during processing rather than streaming transformations in place.
  • The AI endpoints combine OCR text, entity extraction, and content-type predictions to support downstream document triage.
  • The demo endpoints are intentionally separate from the main API surface and are useful as reference behavior for the helpers.

Authentication and Deployment Context

Invocation model

WebServiceApplication is an internal utility microservice. It is not an end-user-facing API and does not expose an authentication layer. The service is expected to be co-deployed with the main SureClinical web application, isolated behind the application server's network boundary.

PropertyValue
Default port8081 (set in main() via server.port)
Auth mechanismNone — network-level access control only
Expected callersThe SureClinical web application server (internal service-to-service calls)
Public exposureShould not be directly accessible from the internet

Security constraints

  • All endpoints accept multipart file uploads. No session token, API key, or OAuth credential is validated.
  • The iText license key is loaded on startup from a classpath or file-system resource — it must be present for PDF operations to succeed.
  • Temporary files are written during processing. The working directory must be writable and on a local, trusted volume.
  • Demo endpoints (/demo/*) are included in the same application context. These expose representative request/response behavior and should be disabled or network-restricted in production deployments.

Hardening checklist

  • Restrict port 8081 to localhost or a private network interface in production.
  • Ensure the directory used for temporary file writes is not web-accessible.
  • Remove or firewall the /demo/* routes in hardened deployments.
  • Audit multipart upload size limits — WebServiceApplication does not document a spring.servlet.multipart.max-file-size setting; default Spring Boot limits apply unless overridden.