Article

AI for KYC Compliance – Three Use Cases

ByNicholas Denittis | Reviewed by: Daniel Faggella Published on: January 9, 2023January 10, 2023 Last updated: January 10, 2023

‘Know Your Customer (KYC)’ compliance is what is commonly referred to by a legally-mandated, globally developed set of guidelines to ensure that banks, financial institutions (FIs), and other related enterprises perform due diligence on potential customers. The purpose is to verify customer identity as well as the legitimacy and risk involved with developing and maintaining a business relationship.

By verifying customer identities according to the strict rules-based system of the Banking Secrecy Act of 1970, KYC compliance rules reduce instances of fraud and money laundering. Businesses comply with KYC to meet local business standards, mitigate the risk of fines and punishments, and avoid reputation damage.

Failing to comply with KYC and anti-money laundering (AML) rules carries stiff penalties. Lengthy audits, multi-million-dollar fines, and bans on conducting business in specific countries or regions are all possible consequences.

To assist business leaders with navigating the hazard-filled, turbulent waters of KYC compliance, this article will examine the commonly-held compliance difficulties of banks, FIs, and other enterprises. In an attempt to provide some insight into potential solutions, we present several use cases–including the business problems, AI used, workflow changes, and business outcomes of said AI.

These use cases include:

Automated cross-referencing and application processing: Reducing processing time and costs in expediting KYC customer data verification.
10-K data extraction: Accelerate labeling workflows to shorten customer onboarding processes, labor costs, and time spend.
Identifying customer relationships: Analyzing transactions between banking customers to decrease the time and cost to perform a KYC review.

The Business Costs of Know Your Customer Compliance

As the true business costs of KYC are diverse and extensive, it is important for enterprise leaders to understand both the monetary and opportunity costs of traditional, mostly manual, KYC compliance processes. The business costs of KYC compliance can be classified into direct and indirect costs; the latter being the result of inefficient processes due to antiquated technologies.

Per a report on IT operations from financial services research and advisory firm Celent, FIs spent approximately $37.1 billion on AML-KYC compliance functions. Elsewhere in Celent research, KYC compliance is described as among the riskiest and most inefficient of all banking operations due to a lack of quality data and automation.

The primary challenge of KYC compliance for most banks usually consists of one of the following dilemmas:

Combining outdated technology with disparate internal and external data sources.
The absence or severe lack of high-quality data and automation.

The results are similar: data that are difficult–if not impossible–to locate, integrate, analyze or use. The pain is especially acute during a client’s “KYC check.”

A typical corporate KYC check process involves gathering, identifying, and validating various company or individual data. These data include client ID, wealth/net worth, funding sources, corporate subsidiaries, and more. Verifiers must cross-reference these data across sources to ensure customer truthfulness and information accuracy.

The absence of high-quality data and automation means:

(a) Increased security vulnerability
(b) Cumbersome, time-consuming assessments of corporate subsidiaries, shareholders, and structures.

The byproduct of effect (a) is that skilled fraudsters can take advantage of vulnerabilities by constructing networks of front firms and corporate structures. The result is increasing perplexity, with the fraudster often ultimately avoiding detection.

The byproduct of (b) shows up on the balance sheet. The volume of work that KYC/AML compliance mandates often translates to high expenses. The additional labor costs and technology spend are particularly significant.

Besides the significant expenses, opportunity costs for banks and FIs exist. These include:

Lost customers
Reduced productivity
Low-value-adding work
Stunted business growth.

The last is of particular concern. Enterprises operating in competitive environments may lose patience and take their business elsewhere. As such, vendors that offer KYC automation solutions often list “customer satisfaction” as a key benefit to their respective solutions.

AI and machine learning can augment or automate KYC compliance processes, possibly reducing some of the aforementioned direct and indirect costs.

We begin by discussing how one fintech uses a combination of machine learning components to integrate data silos, extract form data, and cross-verify data across various internal and external data stores.

Use Case #1: Automated Cross-Referencing and Application Processing

Datametica is a software company that offers automation, cloud, machine learning, and data warehouse migration solutions.

According to the case study, the client is a fintech firm that issues KYC acknowledgment letters. The firm’s clients transfer KYC-required customer information and assorted documentation. The firm then cross-references this data with those in-house. Reception of the acknowledgment letter is eagerly awaited by the client, as the business transaction can not proceed until this happens. There is a lot at stake for both the receiving and the issuing firms.

Before implementing the solution, the client’s workflow involved several manual processes. This included the manual acquisition of the application as well as the identity and proof of address documentation.

In the Datamenica webinar below, presenters explain the company views on the essentials for a KYC compliance solution based on available technological capabilities. The relevant section begins at the 34:31 mark and lasts approximately one minute:

The manual process carried over to identifying, matching, and verifying application details across internal data stores and sources before issuing the acknowledgment letter. The case study report describes the process as time-consuming and costly.

Complicating the problem was the inability to scale and meet customer requirements. Due to an influx of KYC requests and difficulty scaling manual processes, Datametica claims that the traditional workflow created challenges for the fintech firm in that the client could not satisfy the terms and conditions in its service-level agreements (SLAs).

On the extraction and processing side, the case study report also claims fintech’s leaders were also concerned about the potential for human error in several manual functions, including:

Processing backend data
Data extraction
Correlating customer data across sources and documents (e.g., application data versus some KYC document or database).

The case study further indicates the presence of a data bottleneck, making it difficult for the fintech firm to accommodate different application types from a single distribution point. A key influencing variable in this bottleneck was the 150+ data providers, each with their own KYC applications, documents, and supporting formats.

To overcome these challenges, Datametica states that they automated the end-to-end reception and verification of KYC applications and associated data via a machine learning model equipped with deep learning capabilities. The company claims that the solution can extract data from any assortment of digital KYC applications and forms using a single CVL client.

The case study reports integrating the following solutions, inputs, and outputs:

A OCR deep learning image processing model using custom computer vision and OCR codebase: to extract applicant information from printed forms and KYC documents
An integrated data pipeline: Aa central data repository for easier cross-reference of application information against KYC documents and databases
An image processing pipeline: to retrain tagged supporting documents
Validation and classification model: to identify new data points in KYC forms and verify against the client’s metadata.

The case study does not reveal the specifics of before and after workflow changes. However, we may safely conclude the following workflow modifications (assuming the information and reported results within the case study is accurate):

Potentially significantly faster and easier access to data sources, data extraction, and cross-verification
Potentially significantly less manual tagging and labeling of data
Potentially a significantly higher automation-to-human throughput ratio

Datametica reports the fintech client was able to achieve the following results using their solution:

75% reduction in operational costs from reduced manual processes, model implementation, and automated file classification
66% faster KYC application processing
85% accuracy in the automated verification process
Easier scalability with less effort

Use Case #2: 10-K Data Extraction for Customer Verification

Snorkel AI is a software company that produces solutions focused on accelerating AI applications for its clients via a patented automated data labeling method. The company has coined the term “programmatic labeling” to describe this method.

Snorkel’s client was reputedly a top-3 US bank, though no more details are given.

Before implementing Snorkel’s solution, the bank manually extracted data from 10-K forms. The length of 10-K reports — up to 300 pages — made the manual mining of these data time-consuming and onerous. The bank reported that this method lengthened the onboarding process, costing time and money.

Manual extraction and labeling of training data is often a slow process that requires a large team of data scientists and domain experts. Labor costs and time consumption are two of the more common complaints of business leaders here.

Snorkel AI offers a platform called Snorkel Flow, which the company claims can help businesses accelerate labeling using machine learning.

The platform uses what Snorkel dubs “programmatic labeling,” defined as “noisy, programmatic rules and heuristics that assign labels to unlabeled training data.” These attributes describe weak supervision machine learning, which appears to be at the center of the Snorkel Flow value proposition.

The Snorkel Flow value proposition. (Source: Snorkel AI)

To understand some of the content within this case study, it is necessary to quickly define supervised machine learning and why it is sometimes not an ideal solution from a business perspective.

In short, supervised machine learning requires mapping input data to output and manual labeling. For the enterprise, this process is — literally and figuratively — expensive as it is slow, requiring a team of data scientists and, often, domain experts.

The banking client reported the following quantitative operational problems with its KYC functions:

Labor costs: 300-500 KYC analysts necessary to manually extract data
Time spend vs. volume: 30-90 min spent manually reviewing a single 10-K report, with 10,000+ reports analyzed every year

An automated extraction solution centered around Snorkel Flow was constructed. To meet client requirements, Snorkel worked with the bank to custom-build its solution.

A key reason why Snorkel recommended the Flow solution was the programmatic labeling ability of the software. Snorkel states that programmatic labeling improves traditional methods by labeling functions by enabling large-scale labeling instead of one-by-one tagging, expediting the process.

A screenshot of Snorkel AI’s programmatic labeling user Interface. (Source: Snorkel AI)

The end-user workflow appears to be as follows:

Data integration: The client integrates the platform with its data stores using APIs.
Writing labeling functions: Users create labeling functions in this phase to represent different weak supervision sources, such as patterns, heuristics, outside knowledge bases, and other organizational resources.
Modeling relationships: User-provided labeling functions are combined with new weights to develop a generative model that estimates certain accuracies and correlations.
Model training: The model is trained using a set of probabilistic labels generated by the software.

The case study does not provide specifics on which method was used to train the model. However, we can make some safe assumptions given the bank’s size.

The model was likely custom-trained (not using one of the five model frameworks or AutoML), given its status as a “top-3” bank. We may also deduce that its asset holdings, intellectual property, and data science resources are significant enough to demand a more resource-intensive, technically-rigorous solution.

Concerning the input and output data, we know from a Snorkel-sponsored webinar that the input data consisted of a dataset of unstructured, multi-format 10-K reports. The software extracts this unstructured data using programmatic labeling. The output is a database comprising the key attributes of the customer. The above-cited webinar reports the following output data:

Company name
Nature of business
Key senior managers
Total assets
Other attributes (15-20)

The business outcomes as reported by Snorkel:

89+% model accuracy
10,000 labor hours saved per year, equivalent to $500,000

Use Case #3: Identification and Tracking of Beneficial Owners

Quantexa is a London-based software company that produces decision intelligence software for banks and other enterprises.

The company produces a solution called Contextual Decision Intelligence (CDI) that it claims enables businesses to improve decision-making by mapping and displaying contextual relationships between data using machine learning.

ABN-amro is a Dutch multinational bank with a presence in 15 countries. The company reports a 2021 net profit of EUR 1.2 billion on revenues of EUR 8.47 billion.

A Product Owner at ABN-AMRO, Paul Westrate, discussed the use case in a video call with an analyst from Celent.

In the call, Westrate discusses the business reasons behind the partnership with Quantexa. He lists the time-consuming nature of financial crime investigations, high operational costs, and evolving compliance requirements as the three main impetus factors behind the partnership and sought solution.

More specifically, ABN-AMRO leaders sought an automated solution that could:

Reduce the labor required in manual data gathering and analysis
Reduce time spent on discerning legitimate and non-legitimate suspicious activities through automation
Combine internal and external data sources, group companies into hierarchies, and gain insight into their relationships.

Quantexa lists several components of the platform, including the core platform, underlying platform capabilities, and the underlying technology (see below). All of these may provide insight into what the workflow may look like for the bank’s end-user.

Visualization of the components and capabilities of Quantexa’s Contextual Decision Intelligence Platform (Source: Celent)

The bank’s end-user first connects their machine to the platform via server or cloud and selects their internal and external data via an API. Among the data points within internal sources are:

Customer/Company data
Account information
Transaction details
Alerts and cases

Among the data points within external data are:

Company structures
Ultimate Beneficial Owner (UBI) data
Enrichment
Watchlists

Following data integration, the entity resolution engine creates a single view of the integrated data. An existing data schema is THEN used to infer, configure, parse, and standardize potential linking attributes.

Network generation then links entities (i.e. customers/companies) into networks that may demonstrate some connection.

The output is a GUI of identified networks and highlighted risk areas for investigators. The display includes the most relevant connections, entities, and data links between ABN-amro customers. These data may include party and counterparty names, relationships, and transactions.

From this output, the end-user can then prepare analytic models and perform data exploration and visualization. This output reportedly helped the bank to understand, recognize, and counteract risks and threats and potentially enable more informed, accurate, and consistent investigations and decision-making.

Unfortunately, there are no publically reported quantitative benefits realized by ABN-amro. However, the following qualitative results were reported by a case study and the above-cited webinar, respectively.

According to a case study published by financial research and consulting firm, Celent, Quantexa’s CDI platform enhances KYC/CDD practices by:

Pinpointing and tracking disclosed and undisclosed beneficial owners and their associations
Promoting effective customer risk evaluations
Streamlining customer due diligence processes

Mr. Westrate also listed a couple of other benefits realized from the solution in the above-cited webinar:

Reduction in time spent gathering and understanding data and information
Improvement in the overall client experience

“Acknowledgement Letter Definition.” Law Insider, https://www.lawinsider.com/dictionary/acknowledgement-letter.
“Celent Case Study: Automating KYC Investigations with ABN AMRO.” Quantexa, Quantexa, 15 Mar. 2022, https://www.quantexa.com/resources/celent-kyc-investigations/.
Datametica. “How a Finance Company Saved 75% Cost by Automating KYC Process Using Machine Learning Model: Datametica Case Study.” Datametica, 25 Apr. 2022, https://www.datametica.com/how-a-finance-company-saved-75-cost-by-automating-kyc-process-through-machine-learning-model/.
“Programmatic Labeling.” Snorkel AI, Snorkel AI, 13 Sept. 2022, https://snorkel.ai/programmatic-labeling/.
Ray, Arin. “ABN AMRO: KYC Investigations.” Celent, Celent, 15 Mar. 2022, https://www.celent.com/insights/805245301.
Ray, Arin. “ABN AMRO: KYC Investigations.” Celent, Celent, 15 Mar. 2022, https://www.celent.com/insights/805245301.
“Snorkel Flow AI Application Development Platform.” Snorkel AI, 9 Dec. 2022, https://snorkel.ai/snorkel-flow-platform/.
“Understanding the Steps of a ‘Know Your Customer’ Process.” Dow Jones Professional, 23 Sept. 2022, https://www.dowjones.com/professional/risk/glossary/know-your-customer/.

Discovering Automation and AI Opportunities in Financial Services – with Christophe Makni of Basler Kantonalbank

Article

Discovering Automation and AI Opportunities in Financial Services – with Christophe Makni of Basler Kantonalbank

ByMegan Jarrell August 31, 2022February 9, 2023

As AI automation (aka, “intelligent automation,” or IA) in financial services quickly becomes mainstream, it attracts increased stakeholder interest as firms explore the possibility of unlocking value via increased efficiency,…

Banks vs Fintechs: The Battle for Customer Experience – with Lee Smallwood of Citibank, Hivemind

Article

Banks vs Fintechs: The Battle for Customer Experience – with Lee Smallwood of Citibank, Hivemind

ByMatthew DeMello August 31, 2022February 7, 2023

Following nearly a decade of rampant cybersecurity events and subsequent negative press, successful AI use cases in fraud detection and data security are catching the collective eyes of investors and…

Internal Conversational Agents in Banking and Financial Services – with Dr. Tanushree Luke, Head of AI at US Bank

Article

Internal Conversational Agents in Banking and Financial Services – with Dr. Tanushree Luke, Head of AI at US Bank

ByMatthew DeMello August 31, 2022February 23, 2023

In the wake of COVID-19, conversation agents remain a huge focus for financial institutions looking to maintain and winning market share through a seamless digital experience for the customer, not…

Banking AI Use Cases and Trends – An Executive Brief

Article

Banking AI Use Cases and Trends – An Executive Brief

ByNicholas Denittis August 31, 2022February 24, 2023

Until recently, research has shown that banking companies have been somewhat hesitant regarding the business value of AI. A proliferation of external evidence indicates that the trend is quickly changing….

Artificial Intelligence at U.S. Bank – Two Current Use Cases

Article

Artificial Intelligence at U.S. Bank – Two Current Use Cases

ByNicholas Denittis August 31, 2022January 11, 2023

U.S. Bank is the fifth largest bank in the United States by total assets. Traded on the NYSE (symbol: USB), the company has a market capitalization of approximately $67 billion….

How to Rethink Banking in the Age of AI – with Ian Wilson, Former Head of AI at HSBC

Article

How to Rethink Banking in the Age of AI – with Ian Wilson, Former Head of AI at HSBC

ByMatthew DeMello January 10, 2023January 10, 2023

Banks and banking AI vendors turn to Emerj to help them maximize ROI by allocating their funds to the most high-need areas. Our research also helps them make smarter decisions…