Simple Applications of NLP in Banking Compliance – with Jan Veldsink of Rabobank

An operational compliance framework is critical for banks and other financial institutions (FIs). However, it is a tremendous challenge to constantly and uniformly comply with regulations, guidelines, and laws, given the complexity and immensity of legislation.

Consider the 2010 Dodd-Frank Act, a comprehensive banking reform measure that created the Consumer Financial Protection Bureau, among other entities and frameworks. This sole law contains more than 1,500 provisions and 400 rule mandates banks and FIs must consider.

The typical compliance efforts used to – and to some extent, still do – involve costly, labor-intensive processes. However, NLP technology is helping automate some of the more tedious and repetitive workflows. The technology is primarily used to expedite routine workflows such as information gathering, analysis, and reporting.

To get an insider view of NLP in banking compliance, we connected with Head of AI and machine learning at Rabobank Jan Veldsink.

Rabobank is a multinational banking and financial services company based in the Netherlands. The bank has a presence in 38 countries and holds over EUR 600 billion in assets. The bank reported a net profit of EUR 3.7 billion in 2021. Rabobank reports 4,900 employees.

 In a wide-ranging 15-minute conversation between Jan and Emerj CEO Daniel Faggella, the two AI veterans focus pull apart two distinct topics:

  • Simple, near-term use cases of NLP in compliance
  • Where NLP can add more value to banking in the future

Listen to the full episode below, skim our interview takeaways, or read the full transcript below.

Guest: Jan Veldsink, Head of AI for Compliance, Rabobank

Expertise: Applied AI, machine learning R&D, AI compliance, Cybersecurity

Brief Recognition: Jan spent the last two-plus decades in multiple high-level AI roles, including the past twelve years in his present role as Head of AI at Rabobank. Jan also teaches data security and cybercrime at Nyenrode Business University’s EMBA program, where he has taught business classes for the last 22 years.

Key Insights

  • Start with the documents: When starting with NLP, look at your document archive. NLP technology can find much valuable information in these documents, including potentially actionable customer data.
  • Use simple NLP compliance tasks to add value: Using more straightforward (and readily outsourced) NLP techniques such as text mining and topic extraction can help discover potentially-harmful trends. For example, new phishing methods and other harmful email content.
  • NLP can help find both questions and answers: When starting out, you often will not know what initial question(s) you want to ask of that data to solve a particular problem. NLP can help you extract data and reframe the question in a way recognizable by the language contained within a set of documents.

Interview Transcript

Dan Faggella: So Jan, we’ll start by asking just where you see NLP having promise, having value in banking today? Obviously, it’s just one branch of AI, but where do you see NLP resonating with business problems?

Jan Veldsink: That’s a good question,. Dan. It’s a broad sense. Of course, we start it all off with data. What we have come to conclude is that we have a huge [opportunity with documents]. For example, at Rabobank, we have about 500 million digitally-scanned documents lying around. In that vast amount of documents lies what I think is gold for the bank and compliance.

Dan Faggella: Yeah.

Jan Veldsink: If you look at the 500 million documents, you can let a Google search algorithm or box run through it and find your terms – but I’m not interested in terms as a compliance man. We’re interested in “Okay, what do those documents tell me? What are they about?” 

Yeah, of course, we put them into a document management system, and somebody metadata-ed those documents, okay, wrong. 

Somebody did that, but it didn’t know my question. I think the image, or the projection in the sense of knowing your customer due diligence, is all hidden in those texts. Beforehand, we don’t know what question we want to ask. So the data needs to tell me how to interpret my question and how to find things in that vast amount of documents that can help me get a clearer picture of my customer, customer groups, or whatever. 

So I think, in that sense, that’s a very agnostic way of looking at language because we are going to be all used to the sentiment analysis and the “bag of words” approaches. They are limited because they only give me what I put in a bag of words or my sentiments, but I want to take it a step further and say, “Okay, these are the documents. These are my customers. Do they behave in a way we expect them? And how do they text the NLP and the raw texts? How do we contribute to the image of the customer?”

Dan Faggella: Yeah, that’s, I mean, clearly, as you had said, that’s going to add more depth, more value than saying, “Oh, are these emails angry? Or are they friendly?” You know, there’s more than this, the singular sorts of things of that kind. Clearly, it poses more challenges to get that kind of value out of documents. 

Like you’ve said, it would seem as though we really need to structure the questions we’re asking. We really need to know what we’re looking for. What does it look like to get that level of insight that’s so much deeper than the surface layer NLP approaches?

Jan Veldsink: Yeah, that’s also a good question because we do not have that now,. A and so that’s something we were headed towards. We have some students from the University working for us on this question.

And that’s not an easy cookie, and it’s all about context. Because, as you know, AI and machine learning now is very bad in changing context. We know we put up an AI or machine learning in a context, and in that context, it will perform happily, and we can do that. The same with recommendation engine stuff like the old robots. 

They all function within a certain context, and they perform well. And for text is the same, can I give such an algorithm a context? And that’s from within this context, and that could be a regulation or something like that. Something happens in the outside world, and we think, “Okay, how many customers have information on their documents that can relate to this topic?” 

So there’s a new money laundering scheme,  – Russian laundering scheme, Panama papers, or whatever. Okay, give my engine, my text engine, these contexts. And can it find, based upon that context, related items and related issues in the texts we have on customers?

Dan Faggella: So clearly, again, part of this is where you want to be heading. If you could summarize or nutshell, [where] today –, and maybe a lot of it is the light-level AI stuff NLP stuff –, where there is still some value to be gleaned. 

You could say today, maybe just in your domain, the compliance domain, where NLP is being used, like a couple of little snippet examples. It could be your bank, or it could be a general insight about where it’s being used. Where would you say kind of layman’s terms?

Jan Veldsink: Okay, well, what we do now with text analysis, we do, let’s say, for emails, we get a bunch of phishing emails from customers. Is this a phishing email versus Rabobank mail? No, no, it wasn’t. And we do topic extraction. 

Okay, where are these? Are the general topics of these emails? And that’s again agnostic because we do not use a bag of words as your set. Okay. Tell me the general topic of these emails. That helps us in finding out. Is there a new trend in phishing going on?

Dan Faggella: That’s interesting. Okay. So, more emails might say, “Reset your password,” or phishing like, “There’s been this big legislative thing,” and you need to pick up on that.

Jan Veldsink: Yes, yes. Our assumption is that certain groups do phishing emails. And we now can see, okay, we had just launched a new product. That’s one. Of course, we have legislation, [so] we need to screen all transactions that go abroad, towards terrorism, financing, U.S. sanctions, etcetera. 

So we screen all the transactions and names of customers on those lists. That’s also a form of natural language processing because we see, okay, in this text is the name of a person that is in this text. That’s also something we do that’s very compliance oriented. And that goes for descriptions or remarks in payments, as well as names, places, harbors, vessels, or whatever.

Dan Faggella: And that’s just kind of entity-oriented work there.

Jan Veldsink: Yes, entity extraction. A matching of names. It looks simple. Okay, as a human, I can see my name. If it’s spelled wrong, you can see how it might be wrong, but the machine just says, “I don’t know.” It’s spelled wrong.

Dan Faggella: It needs to be able to match entities. You know, sometimes it’s listed as IBM, sometimes it’s IBM Corp. Sometimes they might spell out the whole old name, International Business Machines or something. We would need a system that maybe can take into account, you know, different types of spellings.

Jan Veldsink: Yeah, that’s the current state of affairs. That’s what we do. I wrote an algorithm that just agnostically looks at two texts and says, okay, they might be related. And the words in those texts might be in different orders, and the mighty spelling errors in there, but still, I think those texts are related. So that’s the thing we do now.

Dan Faggella: Got it. So just to dive into both of those quick little examples and we’ll talk a bit about the future and what you’re excited to have NLP be able to do moving forward, maybe a couple of like, you know, visions of where you’d be excited to see things go. 

But just to touch on both of these [use cases] on the entity side, even if it’s very straight and narrow NLP, but there’s still business value to it. Can we scan, whether it’s our documents, the news, some kind of Reuters ticker, or whatever the case may be, for known entities?

Jan Veldsink: Yeah, we do that.

Dan Faggella: But bankers might ask themselves, “Is there value we would gain by scanning those documents to be able to find information about entities that maybe we do business with, or people that maybe we do business with?” 

Can we have some listen-to notifier of that so we can skim through those and look for compliance issues or things we should know about those customers? So there’s NLP low-hanging fruit there.

Jan Veldsink: Yes.

Dan Faggella: Certainly. The other [use case] is that we cluster the kinds of terms and topics and summarize those from individual messages. We know this could be done at a document level, but you were talking about it being done at a phishing email level, where there’s some way of sorting through those. 

Now, do those have to be labeled in some way? In other words, does a human being have to look at those phishing emails – let’s say 10, 20, or 200 – emails so that the machine knows to put it in that folder moving forward?

Jan Veldsink: No, no. We just made [the algorithm] agnostic. The topic extraction algorithm is just agnostic. Again, an agnostic, unsupervised learning method that just said, “Okay, give me statistically what’s in those emails and tell me the most important topics.” 

I researched Google YouTube videos and transcripts of that in relation to an article written for the correspondent for an online magazine in the Netherlands. We just did topic extraction on four gigs of transcribed YouTube videos to find some topics. 

We didn’t give them the topics we wanted or what we were looking for, but [the model] gave [feedback on] certain regions of topics, so we did some clustering on those topics.

Dan Faggella: I was going to say because you obviously wouldn’t want a human to have to look at endless blocks of different topics, they would, you could distill the topics, but then there would have to be a bucketing, and you’re saying both of those processes are NLP-able. Maybe a little bit of guidance along the way. But that NLP can sort of do both of those.

Jan Veldsink: Yep.

Dan Faggella: Cool. So maybe we can wrap up with a little quick idea. Maybe there’s a way to nutshell what you see as far as where NLP could go, the aspects about NLP that you’d be excited to see moving forward, or that you think maybe we’ll get to.

Jan Veldsink: Okay, I think of conversational agents. I always refer to what I do as a human. I read books, I read newspapers, I read online magazines, etcetera. And it all comes back to my mind somewhere. 

And I envision that there will be conversational agents that take in all that information, and create in the background, such a context for communication with customers, with regulators, or with employees in a bank that is richer than just the “yes, no” question or the simple chatbot functionality. 

We have now that you have to give it the context and let the documents in your organization create the context for the communication and the interaction you have with employees and customers.

Dan Faggella: So would you suspect that, in the coming two to three years, we’ll see a notable uptick in that? Or do you think it might be even longer until we can see the needle move, so to speak? Because I know there are so many challenges to make that happen.

Jan Veldsink: Yeah, there’s a lot of judgment with how this will go forward. And it all starts with having the idea that this could be useful for your business. And then I think there is a bunch of technologies already available that could assist and help you in this. But we’re not there yet.

Dan Faggella: Yeah, fingers crossed. A lot of bankers are excited to get there eventually. Cool. Well, that’s topic two, Jan. I appreciate you being able to be with us here for a second interview on AI and banking. So thank you so much for being with us.

Jan Veldsink: Thank you, Dan.

  1. “Our Impact in 2021.” Rabobank, Rabobank Communications & Corporate Affairs,

Similar Posts