Uber recently described a GenAI-powered invoice processing system that reduced manual effort by 2x, cut handling time by 70%, and delivered 25–30% cost savings. By leveraging GPT-4 and a modular platform called TextSense, Uber improved data accuracy by 90%, enabling globally scalable, efficient, and highly automated financial operations.
The company replaced its legacy Robotic Process Automation (RPA) and Rule-Based Systems (RBS) with Generative AI (GenAI) to address growing complexity and inefficiencies. Uber engineers state that traditional tools "need more adaptability and intelligence to handle the diverse and dynamic nature of invoice formats," making them ill-suited for Uber's scale. At the same time, the GenAI solution offered the flexibility to "adapt to new and diverse invoice formats without requiring manual rule-setting," enabling greater automation and resilience as Uber's operations scaled globally.
RPA-based invoice submission process (source)
To power this transformation, Uber built TextSense, a modular, scalable document processing platform that is the backbone of its GenAI invoicing solution. TextSense "abstracts all the above processes and serves as a versatile utility for extracting text from various document types, not just invoices." Designed with configurability at its core, it integrates Optical Character Recognition (OCR), Large Language Model (LLM) based extraction, and post-processing through reusable components, enabling rapid onboarding of new formats with "simple configuration changes" rather than code rewrites.
GenAI-powered document proccing pipeline (source)
This architecture allows Uber to flexibly scale document processing across global use cases—spanning over 25 languages, various formats, and even handwritten or scanned documents—while maintaining accuracy and operational efficiency. As the team explains, "Uber works with thousands of suppliers, each using varying invoice templates and formats," and their solution needed to "handle low-resolution scans and handwritten texts" while delivering consistent, structured output.
Uber's approach combines GenAI with Human-in-the-Loop (HITL) review to ensure high accuracy and maintain human oversight where needed. A purpose-built UI enables operators to compare the extracted data and the original PDF side by side, accelerating validation through intuitive design. This interface "enables the user to review all the details with simple eye movements compared to hand movements, fast-tracking the review process," while "multiple alerts and soft warning messages" help catch inconsistencies without overwhelming the user.
HITL operator UI (source)
Uber compared fine-tuned open-source models like Flan T5 and LLaMA 2 against proprietary solutions in evaluating different language models for invoice extraction. While open-source models performed well on header-level fields, they struggled with line-item consistency—often showing a 25–30% drop in accuracy beyond the first line. GPT-4, on the other hand, delivered superior accuracy across both header and line-level fields with minimal tuning. Uber's engineers explain:
Even though GenAI wasn't adept at detecting our existing invoice data patterns, it was very good at predicting what was available in the documents. So, for our data pipeline, we predicted all the details required from an invoice and built a post-processing layer to apply any business logic before showing it to the user for HITL review.
The models referenced in Uber's article are now considered somewhat dated, as newer models with multimodal capabilities, like GPT-4o, Claude 3.7, and Llama 4, are available today. Uber engineers do not specify when did they conduct the model evaluations. InfoQ reached out to Uber for clarification regarding the timeline and whether newer models have since been evaluated, but no response was received at the time of publication.