WeBank, the first digital bank established in China, is developing new models in artificial intelligence called federated learning as regulators bolster privacy and security rules.
The bank, founded in 2014 by Tencent as an all-digital financial institution, has piloted the technology in China with a national electronic invoice (fapiao) centre and has developed its first federated learning model for credit rating in April.
“The invoice information is secret,” said Chen Tianjian, deputy general manager at WeBank’s A.I. department. “Invoice centers were willing to work with WeBank because they would remain the only owner and controller of their data.”
This new model leveraged WeBank’s own data, as well as the encrypted invoice data which stays on the invoice center’s servers. The co-developed model is strictly restricted to measuring the credit risk of small and micro-enterprises.
So far, Chen says, the model has halved the number of
defaults among WeBank’s loans to these customers.
Decentralized data
Artificial intelligence relies on lots of data. Without enough data, there can’t be effective model and training. And without that, you can’t
glean insights or rely on the software to improve.
So technology companies or those using A.I., like banks, are
hungry for data, particularly relevant data from targeted users (like
customers).
But this demand is running smack into protective walls for privacy and security. Both regulators and tech companies are promoting measures to ensure individuals retain control over their personal data. This is leading to series of measures requiring consent, prohibitions against selling data, or storing it outside of certain countries’ borders.
Federated learning is a distributed and encrypted system, which natually protects the customer’s privacy
Chen Tianjian, WeBank
The risk is that data becomes as siloed as today’s banking departments – and therefore it becomes very cumbersome and expensive to properly train a computer model.
Federated data
WeBank’s A.I. team is working with a Hong Kong-based startup called Clustar on “federated learning”. It is designed to integrate data scattered throughout different departments, companies, or jurisdictions. It takes data that is inert and makes it useful.
Federated learning requires companies and institutions to collaborate. But they can’t share or transfer data, which remains distributed and encrypted.
Take the example of WeBank’s pilot efforts to manage credit risk.
Traditional banks have little visibility over customers who
are applying for loans from multiple institutions. Yet such information is
vital to scoring a borrower’s risk.
Using federated learning, multiple banks jointly develop a model based on sub-models in each bank’s individual environment.
Let’s say a customer applies for a loan from Bank A. The bank’s credit officer has no way of knowing whether the applicant is already borrowing from other banks. But the co-developed algorithm could produce a general credit score suggesting this customer is risky. However, the algorithm doesn’t have the customer’s credit record, because each bank owns part of the record and therefore replies part of the question, while all related information is encrypted. Then it’s up to Bank A to either serve her a loan at a higher interest rate, or decline her altogether.
“Federated learning is a distributed and encrypted system,
which natually protects the customer’s privacy,” said Chen.
So one use case for federated learning is to link information to a user identity from among the universe of similar institutions – in this case, banks. Every time a given identity (a person) interacts with the banks, information is added to that identity. Other uses cases can help resolve differences in data for the same group of customers: for example, a bank and an e-commerce company may have the same customer base but a different set of data.
Computing challenges
Some of the concepts behind federated learning are similar to decentralized ledger technology, a.k.a. blockchain. Both technologies seek solutions at the level of the marketplace, rather than within single organizations. Both seek to enable transactions while preserving encryption, privacy, and security.
The challenge for federated learning is that making calculations using data that remains encrypted is incredibly difficult. Blockchain doesn’t require calculation, it’s just creating and validating information on a shared ledger: it’s just infrastructure. Federated learning is the prospect of adding brains to otherwise unconnected points of data, while leaving security intact.
Chen says encryption increases calculation volumes by hundreds of times. If A.I. training takes 10 hours in an unencrypted model, then an encrypted training session would require at least 100 hours, and maybe 1,000.
For WeBank’s collaboration with electronic invoice data, the model for credit risk took up to four months to become useful. But invoices involve small data sets, and a risk control model does not need to be updated quickly. It normally updates every six months, according to Chen.
“There is no problem to spend a hundred hours modeling in
this case,” Chen said. “But the
current computing power will limit the implementation in many other
areas that iterate quickly or that are data-intensive.”
However, WeBank is facing technical challenges to further implement federated A.I. model to more complicated use cases. That’s where Clustar comes in.
Clustar’s role
Clustar is an A.I. infrastructure startup launched by Chen Kai, a professor at Hong Kong University of Science and Technology (HKUST), in 2018. It has received $10 million investments from investors such as Sequoia Capital and Stone VC. Clustar reached a valuation of $80 million last month.
(Universities in Hong Kong have become an important force in recent years to push tech industrialisations. SenseTime was founded in 2014 by professors at Chinese University of Hong Kong and now it became a unicorn in AI space. Da Jiang or DJI, the world leader in camera drones, was founded in HKUST’s dormitory by Wang Frank Wang, a student at that time, backed by his professor Li Zexiang.)
Earlier this year, Clustar did a proof of concept with WeBank to speed up the computing process for federated learning.
“We’ve chosen Clustar because there is very few companies in
the market that can at the same time provide solutions for both network
transmission optimization and computational optimization,” said Chen Tianjian.
He explained the two solutions that Clustar brought.
FPGA and RDMA
First, a typical computer can only handle a calculation of 64 bits, but the encryption creates super-large figures, for example, of 1,024 bits. Therefore a computer would have to break down such large calculations in bite sizes, that is, in chunks of 64 bits. “It will become super lengthy,” said Zhang Junxue, executive vice president at Clustar.
The startup has brought in a technique called FPGA, or field-programmable gate array, to solve this. FPGA is hardware to customize a computer’s core system in order to expand its byte bandwidth, to 1,024 bits for example. It lets computers handle large calculations in a speedy manner.
In the PoC with WeBank, it sped up the process by four to five times, according to WeBank’s Chen.
The second solution is on the transmission side. Chen Kai, founder of Clustar, told DigFin that because of delay in data transmission, even if there are 10 computers calculating together, the actual computing power might only equate to two computers.
The solution is called
RDMA (remote direct memory access), which will remove all the intermediate
steps (copy the result from the computing card to the network card, copy again
to another computer’s network card, etc), and directly write the result to
another computer.
“It may be copied five to six times,” WeBank’s Chen said. “Because the transmission is very slow, most of the time, we might be waiting for the transfer of the result, instead of modeling.”
He said that Clustar’s RDMA solution can also speed up the process by four to five times – raising the prospect that federated A.I. can be applied to far more complex products and markets.