how kextract^™ works

AI models

The document analysis process is designed to convert text and tables from PDFs and complex images into highly reliable structured data.

Discover how KeXtract™ works

KeXtract è progettato per ottimizzare testo e tabelle contenuti in PDF e immagini complesse in dati strutturati

The pipeline consists of two main phases:

Visual and structural preprocessing performed by custom models (trained by Kedos Srl) and semantic extraction handled by a suitably trained LLM. This hybrid approach minimises parsing errors and drastically reduces the risk of hallucinations

STEP 1

Layout analysis and visual parsing (custom model)

Objective: understand the document’s visual structure before passing text to an LLM.
What the custom model does: primarily identifies text blocks and tables and separates them from any visual artefacts.
Advantages: the LLM does not receive the original document, but a reconstructed version containing only the relevant text regions. This reduces token consumption, focuses the model's attention on relevant content, and significantly cuts the risk of hallucinations.

Invio selettivo al LLM e prompt engineering

STEP 2

Selective LLM input and prompt engineering

Contextual prompting: the LLM receives a specially constructed prompt that includes operating instructions (e.g. ‘extract the fields defined in schema X’), examples of the desired output, format constraints such as ISO dates or numbers as numbers, and validation rules.
Schema-driven output: the LLM is instructed to produce only JSON that complies with the provided schema.

Compliance and security of visual models

EU and AI Act compliance: the visual models used are designed and evaluated for use in the European Union and comply with the security and transparency requirements of the AI Act. This includes risk assessments, mitigation measures for sensitive scenarios, and technical documentation for audits.
Privacy by design: pre-processing and the pipeline comply with the processing confidentiality policy. Documents are processed in temporary storage, references are removed within the guaranteed timeframes, and the output returned is limited to the necessary data.

Start working with Kextract^™

Request access to Kextract™

KeXtract garantisce conformità e sicurezza dei modelli visual

Concrete Benefits for Your Business:

Reduction of hallucinations

Thanks to pre-processing that filters visual noise and reconstructs the document.

Greater accuracy on complex tables and layouts

Thanks to pre-processing models trained for document handling and the use of operational instructions (prompts) for the LLM, geared towards data extraction.

Operational efficiency

Less human intervention for corrections, higher throughput, and optimised costs.

Request a demo of KeXtract™

Do you have any questions? Write to us at info@kextract.it