Case Study

AI-Powered Data Extraction for Leasing Documents

Softwise.AI designed and delivered an automated data extraction system for Europejski Fundusz Leasingowy (EFL), one of Poland's leading leasing companies and part of the Credit Agricole Group. Built on Azure and large language models, the solution replaces a manual monthly process of reviewing around 100 leasing documents with an end-to-end pipeline that turns a raw batch into a complete, standardized report in minutes - while meeting the strict banking, RODO, and group-level compliance requirements.

CLIENT

EFL

YEAR

2026

INDUSTIRES

Financial Services Leasing

DELIVERABLES

AI Data Extraction Pipeline, Azure Architecture, LLM Prompt Engineering, Production Operationalization

The problem

EFL operates in a highly competitive and regulated leasing environment, where ongoing market monitoring requires regular analysis of competitor and partner documentation.

Until now, that monitoring meant a manual monthly process: reviewing around 100 PDF documents by hand, copying out the key fields, and assembling the results into a report. The work consumed many hours of expert time, limited how often it could be done, and was inherently prone to human error.

EFL wanted to remove the manual bottleneck without compromising on the standards required by its banking-group governance: alignment with internal AI, cloud, and security committees, plus RODO obligations including anonymization of personal data. Softwise.AI was asked to design and deliver a production-grade system that would automatically extract structured financial data from leasing documents, hand back standardized outputs, and pass the organisation’s full compliance review.

{Key Challenges}
Why template-based OCR failed across 40+ document formats?

The project had to solve a set of interlocking problems that ruled out a simple off-the-shelf or template-based approach:

Manual, slow competitive monitoring

Monthly analysis of around 100 documents was performed by hand. The process was time-consuming, error-prone, and impossible to scale or run more frequently

Heterogeneous, messy input data

Documents from more than 40 sources arrived in very different formats, layouts, and terminologies - often as scanned or photographed pages - making traditional template-based OCR brittle and expensive to maintain

Strict security and compliance requirements

As part of a banking group, the solution had to satisfy internal AI, cloud, and security committees as well as RODO, including anonymization of personal data and controlled access to cloud resources

Production-grade robustness from day one

The system had to go well beyond prototype: controlled retries, quality monitoring, predictable processing times, and clear operational procedures for both business users and IT.

{Process}
How we built the data extraction pipeline in 5 steps?

Softwise.AI delivered the engagement as a focused build, designed to move from a real-world problem to a working production pipeline without losing momentum on compliance approvals.

1. Scoping and data understanding

Working with EFL's analysts, the team reviewed real document samples, identified the fields that actually mattered for monthly reporting, and decided where to focus extraction effort - including a conscious decision to concentrate on first-page content, where the vast majority of relevant information sits.

2. Architecture and security alignment

The pipeline was designed on Azure with EFL's IT, security, and AI governance bodies involved from the start. Approval processes were run in parallel rather than sequentially, and a local anonymization solution was chosen specifically to support audits and pass internal review.

3. LLM extraction and prompt tuning

Rather than rigid templates, the team adopted LLM-based extraction with prompts tuned to leasing terminology. Prompts were iteratively refined against real documents to handle widely varying layouts and language.

4. Robustness engineering

A retry mechanism was added to automatically reprocess problematic files up to four times, with the best result selected by confidence score. Parallelization (up to 10 files at once) and Kafka-based orchestration brought processing time down by 89%.

5. Operationalization

The system was wired into a simple, business-friendly workflow: upload to an Azure Blob inbox, trigger by email, receive results by email. Documentation, monitoring, and operational procedures were delivered alongside the pipeline.

Solution

Softwise.AI delivered a production-ready extraction and analysis pipeline for leasing documents, combining Azure cloud services, large language models, and domain-specific validation logic.

A business workflow built around the analyst

The user uploads documents to a dedicated Azure Blob inbox, then sends a trigger email from an authorized address to start the batch. When processing completes, results are emailed back as JSON files (a short version with extractions and a full version with confidence scores and metadata) plus an Excel file formatted for quick human verification. A monthly batch of raw PDFs becomes a complete report in minutes, with no manual copy-paste or scripting.

Architecture diagram of the document-processing system, showing Azure Cloud Services (AI Foundry and Blob Storage) and Core Application Services (FastStream, API, Kafka and PostgreSQL) connected through an EFL trigger.

LLM-based extraction tuned to the domain

OCR and model inference together extract every required field using prompts engineered for leasing terminology. Mandatory-field checks and confidence scoring run on every output, and the LLM approach handles widely varying real-world layouts that template-based OCR could not.

Designed for predictable production runs

Processing is deliberately batch-oriented and optimized for monthly cycles rather than real-time use. Under the hood, a retry mechanism reprocesses problematic files up to four times and picks the best result by confidence score, while parallel workers keep throughput high.

Compliance and security by design

Local anonymization, controlled cloud access, and full alignment with EFL's AI, cloud, and security committees were not bolted on at the end - they shaped the architecture from the first design session.

{Results}
94% extraction accuracy and 100 files processed in 10 minutes

The pipeline was evaluated against agreed targets and exceeded each of them:

Extraction accuracy - Target 85%, achieved 94% on key data fields across test documents.
Processing time for a 100-file batch - Target under 15 minutes, achieved 10 minutes.
Rejected file rate - Target under 15%, achieved 10.5%.
System availability - Target above 95%, achieved 99%.
Scale headroom - 188 PDFs processed in 13 minutes (around 40 seconds per file with ten parallel workers); the pipeline remains stable well above expected monthly load.

For EFL, the system converts a slow, manual monthly chore into a fast, reliable, and audit-ready capability

{Business impact}
98.75% less manual work for EFL analysts

Manual work eliminated. Specialist time spent on monthly extraction is reduced by an estimated 98.75%, freeing analysts for higher-value work.
Faster market insights. Monthly reports can now be produced in minutes, enabling more frequent and timely reviews of market conditions.
Standardized, analysis-ready data. Every document is transformed into a consistent JSON format, making downstream analytics and integration with the data warehouse straightforward.
Scalability by design. The pipeline can handle at least three times the planned volume without infrastructure changes - room to expand scope without re-engineering.
Higher confidence in decisions. 94% extraction accuracy combined with per-field confidence scores gives EFL a solid factual base for pricing and strategic decisions.

{Key recommendations}
5 rules for automating document-heavy work under compliance

The project also surfaced a set of principles that apply to any organization automating document-heavy processes under regulatory constraint:

Choose LLMs over rigid templates for documents with wildly varying layouts. Prompt tuning scales far better than rule maintenance.
Engineer for production from day one. Retries, confidence-based selection, monitoring, and clear operational procedures matter more than peak accuracy on a clean sample.
Run compliance in parallel, not sequentially. Bringing IT, security, and AI governance in at the start - rather than at the end - keeps group projects on schedule.
Pick batch over real-time when the business case is batch. Optimizing for predictable monthly runs gave EFL faster, cheaper, and more reliable processing than a real-time architecture would have.
Draw a clean boundary on edge cases. Excluding a small number of extremely exotic document types - where the cost of handling outweighs the business value - kept the system economically viable.