how kextract works

AI models

The document analysis process is designed to convert text and tables from PDFs and complex images into highly reliable structured data.

The pipeline consists of two main phases:

Visual and structural preprocessing performed by custom models (trained by Kedos Srl) and semantic extraction handled by a suitably trained LLM. This hybrid approach minimises parsing errors and drastically reduces the risk of hallucinations

STEP 1

Layout analysis and visual parsing (custom model)

  • Objective: understand the document’s visual structure before passing text to an LLM.
  • What the custom model does: primarily identifies text blocks and tables and separates them from any visual artefacts.
  • Advantages: the LLM does not receive the original document, but a reconstructed version containing only the relevant text regions. This reduces token consumption, focuses the model's attention on relevant content, and significantly cuts the risk of hallucinations.

STEP 2

Selective LLM input and prompt engineering

  • Contextual prompting: the LLM receives a specially constructed prompt that includes operating instructions (e.g. ‘extract the fields defined in schema X’), examples of the desired output, format constraints such as ISO dates or numbers as numbers, and validation rules.
  • Schema-driven output: the LLM is instructed to produce only JSON that complies with the provided schema.

Compliance and security of visual models

  • EU and AI Act compliance: the visual models used are designed and evaluated for use in the European Union and comply with the security and transparency requirements of the AI Act. This includes risk assessments, mitigation measures for sensitive scenarios, and technical documentation for audits.
  • Privacy by design: pre-processing and the pipeline comply with the processing confidentiality policy. Documents are processed in temporary storage, references are removed within the guaranteed timeframes, and the output returned is limited to the necessary data.

Start working with Kextract

Concrete Benefits for Your Business:

Do you have any questions? Write to us at info@kextract.it