Extractive vs. Generative AI: Why the Difference is Important for Intelligent Document Processing
Since the launch of ChatGPT in late 2022, the topic of artificial intelligence has been in the forefront of conversation across industries and has also led to significant opportunities in intelligent document processing – a term that was created long before the recent AI wave which describes the intelligent extraction and processing of data from structured and unstructured documents, which AI and ML has always been a part of (hence ‘intelligent’).
Entering this new wave has meant a slew of new terms becoming far more mainstream than the narrower function that is IDP, with the most commonly referenced being Generative AI (GenAI) and LLMs (large language models). However, what we are focusing on today is an emerging term ‘Extractive AI’ and why distinguishing between these is crucial to leveraging AI’s full potential in IDP, ensuring that businesses can not only streamline their operations but also ensure confidence in their usage.
Generative AI is in some ways easily summed up by a response from one of the main vendors in the space, OpenAI. As their lawyers commented in an ongoing lawsuit,
“By its very nature, AI-generated content is probabilistic and not always factual, and there is near universal consensus that responsible use of AI includes fact-checking prompted outputs before using or sharing them.”
It’s important to remember that the ultimate goal of Generative AI is to provide an answer - and that doesn’t mean that the answer it provides is correct.
Extractive artificial intelligence, on the other hand, focuses on pulling specific, relevant information from various content, acting much more like a sophisticated filter.
Both Generative and Extractive AI work based on prompts, which is plain text input provided in order to generate a result.
How Generative and Extractive AI Can Work Together in IDP
When it comes to intelligent document processing, these differences mean that there are scenarios in when one may be more appropriate than the other, or when it may be beneficial to use both.
Let’s take the example of a new mortgage customer application, a complex yet high-value task for both lenders and applicants which typically involves handling numerous documents such as ID's, bank statements, credit reports, rental payment history, deeds, land titles, property appraisals, sale agreement, etc. Both Extractive AI and Generative AI can play a role in optimizing this process.
Extractive AI is highly suited for tasks where specific information needs to be found, structured and validated. This is particularly valuable in longer documents, where data is locked inside and it would otherwise be a long task to try and process manually. In our mortgage example, Extractive AI may be used in the early stages of processing to automate the process of reading and extracting key information and providing it in structured data that neatly fits into loan application systems. While we can expect Extractive AI to produce data with high accuracy, other technologies should be applied to arrive at 100% accuracy and confidence for a decision to be made. These technologies are core IDP capabilities and include HITL (human-in-the-loop), database and AI service validations.
Generative AI, on the other hand, is approximate by design. It is designed to create or generate new content, based on underlying patterns in data. In our mortgage processing example, this can complement Extractive AI by facilitating further analysis using chat and to create personalized communications. Again using our loan example, Extractive AI would be used to extract the information and summarize for review by a knowledge worker, and Generative AI to to create a client letter or recommendation.
For an overview of the differences between Extractive and Generative AI, check out our comparison matrix.
This powerful combination means that not only do automation rates increase, but that a new level of personalization and insight that was previously unattainable can be obtained and enhanced.
In summary, the main differences between Extractive and Generative AI lie in their primary functions and outputs. A primary advantage in applying Extractive AI in document processing is to identify and pull specific information from existing content, to structure data, create efficiency and drive accuracy, whereas Generative AI focuses on creating new content with an understanding of complex contexts and the ability to adapt to various scenarios, offering the opportunity to improve both the user and customer experience.
Ultimately both offer benefits to organizations, but the key lies in being able to optimally orchestrate these services and tools for superior efficiency and customer experiences to deliver truly robust automated processes.