UPST AI/ML vs LendingPoint AI/ML

I have been doing additional research on another fintech competitor to Upstart: LendingPoint.

I believe they are a good comparison, (just a bit more complicated to analyze due to loan renewals, which is the primary reason I didn’t include them in my other post: https://discussion.fool.com/upst39s-outperformance-in-personal-l…) as they also claim to be using AI/MI for underwriting.
I believe we can try to compare loan performance as they and Upstart have similar average weighted FICOs across their securitized trusts according to KBRA data.
Their latest issuance is LP 2021-A https://www.kbra.com/documents/report/51130/lendingpoint-202…

The privately held LendingPoint uses proprietary credit scoring models to enable borrowers to obtain loans. They say they “leverage big data, machine learning and best-in-class algorithms to look beyond traditional credit when measuring the willingness and ability to repay debt. In determining the willingness and ability to pay, LendingPoint only puts a 5% weight on FICO.”

Sound familiar? AI, big data and machine learning buzzwords galore - ubiquitous among all fintechs these days.

A closer look at KBRA’s report shows "LendingPoint’s scoring model assigns a coefficient to each variable and adds them together to arrive at a cumulative probability of default. These probabilities of defaults are converted into a LendingPoint loan grade which vary for every grade and term. LendingPoint’s proprietary scoring model has undergone numerous updates since it was first implemented in 2016. The Company does not retire model versions and is currently using three versions of the model to determine the final loan grade. All applications must pass all three models with the newest model having the highest weighting. Each version builds off the previous version and adds additional minimum eligibility criteria and other variables. The most recent version of the model uses over 40 variables and 18 automatic disqualifiers."

We can also see on LendingPoint’s careers page they are currently looking for three “data scientists”.

Meanwhile for Upstart’s AI/ML, from their S1: "Our models incorporate more than 1,600 variables, which are analogous to the columns in a spreadsheet. They have been trained by more than 9 million repayment events (as of 2020), analogous to rows of data in a spreadsheet. Interpreting these almost 15 billion cells of data are increasingly sophisticated machine learning algorithms that enable a more predictive model. These elements of our model are co-dependent; the use of hundreds or thousands of variables is impractical without sophisticated machine learning algorithms to tease out the interactions between them. And sophisticated machine learning depends on large volumes of training data. Over time, we have been able to deploy and blend more sophisticated modeling techniques, leading to a more accurate system. This co-dependency presents a challenge to others who may aim to short-circuit the development of a competitive model. While incumbent lenders may have vast quantities of historical repayment data, their training data lacks the hundreds of columns, or variables, that power our model.

On Upstart’s career page they are currently looking for eleven more machine learning engineers/researchers/software engineers.

It’s unknown how large the AI/MI team is at LendingPoint, but we know there are at least 21 existing on Upstart’s team (estimating based on a photo from https://www.youtube.com/watch?v=o1SE9tOD0w4).
I believe it’s fair to say Upstart’s AI/MI team is more robust.

So how will LendingPoint’s 40 variable models stack up against Upstart’s 1600 variable models? Is 1600 really just ‘overkill’ with marginal gains quickly diminishing past a few dozen variables?

Let’s look at the data. But first, additional background:
Founded in July 2014, LendingPoint issued its first direct to consumer (DTC) loan in Q1 2015.
Their DTC loans are categorized as either newly originated loans or renewal loans. Renewal loans are granted to existing customers in good standing and allows the company to offer a more competitive interest rate and/or term for current customers who have paid down 22-25% of their existing loan. All renewals are re-underwritten and rescored since the original loan was originated. The proceeds of the renewal loan are used to pay off the initial loan with any excess being distributed to the borrower.

LP 2021-A has 76.9% and 23.1% of new and renewal loans, respectively. (Their 2020 REV-1 had 34.7% renewal loans).

LP 2021-A had a weighted average FICO, weighted average APR and interest rate of 671, 23.66% and 20.35%, respectively.

Meanwhile, UPST 2021-3 had a weighted average FICO, weighted average APR and interest rate of 669, 22.09% and 18.61%, respectively. https://www.kbra.com/documents/report/51522/upstart-securiti…

Please note however that having a significant chunk of loans being ‘renewals’ for LendingPoint means that:

  1. Overall FICO scores are artifically dragged down which makes the entire loan pool appear “more subprime” than in reality (the company admitted this in the report: “the lower weighted average FICO on renewal loans is because a borrower’s FICO score tends to drop after LendingPoint completes a hard inquiry with the credit bureau and funds the initial loan; it may take approximately 18 to 24 months for a borrowers’ FICO to recover.”)

  2. Loan APRs are lower for these renewals, which LendingPoint can offer them ‘safely’ because they already know these borrowers are much lower risk - as they paid 25% of their first loan without problems!This drags the overall loan pool’s weighted average APR lower than it would be if it were comprised of only new loans

  3. Loss rates for an overall pool of LendingPoint loans will always be lower than if they were comprised of only new loans.

and 4) Upstart is therefore awarding lower APR/interest rates to a ‘more subprime’ loan pool versus LendingPoint’s new loans.

Unfortunately, I could not find cumulative net loss (CNL) data (which takes into account any recoveries of defaulted loan balances by collections; typically, 8-10%) in the LendingPoint reports for ‘new’ loans specifically - I can only find CNL data given for the entirety of trusts (which mixes the new and renewals together). I can only find Cumulative Gross Loss data (CGL data which does not take into account recovered loan balances) that is actually separated into ‘new’ vs renewal.

LendingPoint’s CGL data: https://i.imgur.com/T3wuBkE.png

Upstart’s CNL data for its trusts: https://i.imgur.com/LWwErwS.png

We can’t directly compare LendingPoint versus Upstart with the above (even though we know their loan pool average FICO scores are very similar), but at least we can roughly eyeball that Upstart has a significant outperformance.

Even the worst performing 2018 UPST trust at about 12% CNL is doing much better than LendingPoint’s 2018 ‘new’ loans at 17% CGL!

Of note though, we do see LendingPoint has significantly improved its loss rates over time (indicating their models do work) from about 23% CGL for 2016 new loans, down to about 17% CGL for 2018 new loans.

Now, according to KBRA’s report, LendingPoint originated a “new quarterly high of approximately $355.6 million in Q1 2021, a 33% increase from Q1 2020.” At an average loan balance at origination of about $10500, we can estimate they did 33866 loans in Q1 2021.

For comparison, Upstart had 169750 at $1.729 billion transacted in Q1 2021, a 101.6% increase from Q1 2020 and about 5 times more than LendingPoint.

So, Upstart has been growing WAY WAY faster than LendingPoint, despite both companies beginning work on unsecured lending around the same time in 2015 (Recall that although Upstart was founded in 2012, they started on income share agreements but then pivoted in May 2014 to personal loans).

Also incredibly, Upstart has been in the lead despite smaller amounts of funding ($144.1M total before IPO, versus $325M from LendingPoint’s wealthy founders and a network of over 30 family and business-related high net worth investors).

I strongly believe we can attribute Upstart’s better loan performance and faster growth to its higher quality machine learning team/models, resulting in the positive feedback loop of “more loans = more data = faster improvement = faster loans = faster data”.
(Also - keep in mind - Upstart has the ONLY “No Action Letter” from CFPB for any lender utilizing AI/ML. This is a huge regulatory ‘plus’ that LendingPoint doesn’t have.)

I think the biggest conclusion to take away from this is not just “Upstart is better than yet another fintech competitor”.

It’s that Upstart has developed a sizeable AI/ML “first mover advantage”. Like Upstart said in its S1: “This co-dependency presents a challenge to others who may aim to short-circuit the development of a competitive model. While incumbent lenders may have vast quantities of historical repayment data, their training data lacks the hundreds of columns, or variables, that power our model.”
Even though other instutitions may already have big data, they are still going to be several years behind. They need to ‘start from scratch’ because the number and depth of variables have proven to be very important - they can’t just magically conjure about the ‘columns’ for each piece of ‘row’ data point collected from their troves of existing data. They’ll need to start from the drawing board, like everybody else.

156 Likes

My background: In 2014/15 I developed credit scoring model for one bigger lender in US - 10M loans over 4 years with average ticket of $500, 90% subprime population.

The end is what triggered me:
“While incumbent lenders may have vast quantities of historical repayment data, their training data lacks the hundreds of columns, or variables, that power our model.”"

There is no problem to prepare dataset that has thousands of attributes. In my development sample, there were 4,000 attributes (columns). Even from traditional credit report you can derive north of 2,000 attributes (check out Experian Premier Attributes). LexisNexis had at that time 400+ attributes (https://risk.lexisnexis.com/products/riskview) and there are other vendors with non-traditional datasets.

Even for lender who never used such attributes, data providers are able to append the data retrospectively, you DO NOT have to collect the data over the years (means you can ask LexisNexis to append credit risk attributes let’s say to your Jan '18 cohort and they will pull the relevant data from the archive).

You absolutely can create another hundreds or thousands of attributes by aggregating data put together from all the vendors or by deriving other attributes from transactional, behavioral history, history of web browsing, cell phone data (not in US :), geolocation data (also not in US). When this group was invested in AYX, they acquired company called Feature Labs - company that can automatically generate features from underlying data (https://www.featurelabs.com/open/). Having the data is not the issue, lack of AI/ML techniques is not the issue either.

Why companies are not using all of these variables and alternate data sources? I’ve tried and failed and believe others had similar problem. The issue is COMPLIANCE. With all of these alternate data points, you have trouble proving that your model does not cause disparate impact (means you are not unintentionally affecting protected classes). First version of my model included attributes like “highest education”, “number of criminal records”, “number of memberships in professional associations”, …) None of these passed the compliance team and I had to exclude these very predictive variables from the model due to their fear of causing disparate treatment.

Upstart quotes the following as the risk in their application (https://www.consumerfinance.gov/documents/9368/cfpb_upstart-…:slight_smile:
“There is a potential risk that Upstart’s underwriting model will deny protected class
applicants at rates that are higher than non-protected class applicants, and that there will
be an insufficient business justification to support such higher denial rates, or that
Upstart’s legitimate business needs could be reasonably achieved by alternatives less
disparate in their impact.”

It is very difficult to control AI for disparate impact/treatment. Although not from lending, Amazon’s AI for hiring did exhibit disparate treatment: https://www.reuters.com/article/us-amazon-com-jobs-automatio… With modern ML/AI techniques you typically have only limited insight how each variable is impacting the model - it makes it difficult to evaluate the model for disparate impact.

I think that it is absolutely cool that they were able to get the no action letter. In longer term I see these two possible outcomes:
(a) cfpb will determine that these alternate data points are causing disparate impact and upstart will need to adjust the model (and losing some predictive power as a result)
(b) cfpb will determine that these data points are ok to be used (I wish) and in that case eventually some other big players will start using these in their scoring models

It does not mean that UPST doesn’t have great advantage. Even when (b) happens it will still take couple years before most of the scoring models are changed (you would not believe how rigid the compliance structures are) and some banks (especially local ones) may never go that route and prefer to rely on 3rd party like upst to source the leads and provide scoring uplift.

I am long UPST, like the business.

166 Likes

tominvest83,

Wow, that is an incredibly helpful insight. As I have zero background in anything tech or finance, I greatly appreciate your real world experience in this arena.

So it seems it is actually the “regulatory moat”, or “compliance moat”, if you will, that keeps Upstart ahead of anybody else attempting AI/ML to underwrite loans?

That CFPB No action letter from 2017 and subsequent renewal in 2020 really is a tremendous deal. Nobody else has one, and I suspect the NAL is a driving factor to why banks and credit unions have been comfortable enough to sign up with Upstart. This had been emphasized in videos by SVP biz dev Jeff Keltner where he was interviewing bank execs - all the banking folks stated compliance was a major factor as to why they chose Upstart.

I did also post previously about partisan organizations creating unfair negative attention to Upstart: https://discussion.fool.com/dividends20-to-add-to-your-points-up…

(Despite the false claims by SBPC, the CFPB renewed Upstart’s NAL just a couple months later which I believe speaks to how baseless they were)

I had highlighted that Upstart has been very proactive in countering these unfairly negative accusations by working with NAACP and SBPC to adjust their models as desired.

From the initial report by Relman Colfax (April 2021): https://www.relmanlaw.com/media/news/1089_Upstart%20Initial%…

“In responses to those conversations, as well as the congressional inquiry, Upstart made certain changes to how its underwriting model utilized educational data. Most notably, it abolished the use of average incoming SAT and ACT scores to group education institutions in its underwriting model. While Upstart’s model continues to incorporate information about the educational institution attended, it switched to grouping schools based on average post- graduation income.

Upstart also established a “normalization” process for “Minority Serving Institutions” (“MSIs”)—which Upstart defines as schools where 80 percent or more of the student body are members of the same racial demographic group. Under that process, Upstart normalized MSIs as a group to have equal graduate incomes as non-MSIs by calculating and using the distance, as a percentage, between a school’s graduate incomes and its respective school group average (i.e., MSIs, non-MSIs). This process results in MSIs and non-MSIs being on average equal. Put another way, above average MSIs (in terms of graduate income) are treated above average overall by as much as they are above the MSI average. Any decisioning, including tranching, is then performed on this normalized information.
These changes are in place now and the fair lending testing conducted pursuant to this Monitorship will be of Upstart’s platform with these changes incorporated. Upstart emphasizes that it voluntarily adopted these changes and that none of Upstart’s fair lending tests—which are reported to the CFPB—have identified unlawful bias against any protected class, including any racial group.“

The law firm monitoring this will release their next report in October which I will be closely following.

49 Likes

“So it seems it is actually the “regulatory moat”, or “compliance moat”, if you will, that keeps Upstart ahead of anybody else attempting AI/ML to underwrite loans?”

What’s ironic to me about compliance and regulatory as “drivers of success” or “moat” for UPST, rather than AI or ML or the thousands of attributes about the person requesting the loan that initially appeared to be their secret sauce, is that this is very similar to the driving force behind Cloudflare (NET), for which the moat or main driving force behind its success initially appeared to be speed, low latency and lower cost, when in fact the CEO and CFO have stated that the biggest driving force of NET is compliance and regulatory of where (location) the data originates, compute and storage takes place.

Fantastic insights gentlemen. Thank you for sharing your expertise, research and for connecting the dots.
sjo

24 Likes

Great post and great points about fighting bias. In the name of checks-and-balances I want to point out:

“Even from traditional credit report you can derive north of 2,000 attributes. … you DO NOT have to collect the data over the years…”

This is speculation. The attributes, or inputs, of the model Upstart has developed is proprietary. A couple of novel inputs can make a massive difference. Even if the inputs are public-access there is still a lot of room for secret sauce in the model(s) used to create the results. We just don’t know.

9 Likes

“The attributes, or inputs, of the model Upstart has developed is proprietary. A couple of novel inputs can make a massive difference.”

I’d say this differently…

Proprietary rights to the original data belong to the sources UPST buys it from. What happens to that data afterwards makes it’s UPST’s proprietary product, including: (a) cleansing the data; (b) filling in missing values by deriving or imputing information from other inputs; adding other derived data judged as useful from UPST’s prior mining work.

As a side note, the more UPST learns from its data, the more it can eliminate the effort involved in purchasing and processing data which turns out not to be of interest/influence. This is the “efficient frontier” of data which Gu mentioned in at least one of his interviews.

14 Likes

Three quick comments (with explanations to follow), IMHO:

  • comparing Upstart to LendingPoint is a bit of an apples and oranges exercise
  • True AI/ML based models actually reduce the chance of compliance issues
  • UpStart is not only buying data, but collecting their own

UPST is has built a multi-year lead in the area of AI based lending which provides the industry with lower default rates and a higher return. This lead is enabling them to grow faster than others in the similar business (numbers well documented by others). This growth allows them to further invest in expansion and refining the AI/ML capabilities to maintain a lead. That is why I went long UPST earlier this year.

UpStart vs LendingPoint = Apples & Oranges
Jonwayne’s writeup says it all:
LendingPoint - ”The most recent version of the model uses over 40 variables and 18 automatic disqualifiers. “
LendingPoint is using traditional lending models with basic statistical analysis to determine criteria for lending. In this approach you start with prior known or assumed correlations to build models to calculate the quality of the loan (and deny if a dis-qualifier is hit) - basically building really long equations and ranking formulas. The data scientists they hire are analyzing the data to refine the equations looking for further improvements. I don’t think there really is much new that they are doing.

Upstart - ”Our models incorporate more than 1,600 variables … trained by more than 9 million repayment events”
Upstart is using massive (and increasing) amounts of data to continually train the models – basically they are scouring the data looking for linkages that the human mind would never find (and never even think of looking for). They aren’t inserting the preconceived notions of which variables are most important, they are using Machine Learning to figure out the linkages and the weights – that is the training.

So regarding the (hopefully rhetorical) question - ”Is 1600 really just ‘overkill’ with marginal gains quickly diminishing past a few dozen variables?”
If using the “old school” methods, certainly it is overkill.
But in the new age of ML techniques, data is king and the more variables and data that the ML can scour the higher likelihood that it will be able to identify predictive data points that would be irrelevant to a loan officer.

Look at what Smart Finance (app YongQianBao) and other companies have successfully doen in China for short term small lending in the under-banked communities. They don’t just use obvious metrics like how much money is in your account, they use obscure data points such as the level of charge in your phone battery, how fast you type in your birth date, etc to determine the users credit-worthiness. These data points are not blatantly obvious but the AI/ML methods have identified a linkage. In order to find these predictive data points you need lots and lots of data to scour. They have that because they have access to all the data on the mobile device - which is used form much more in China than here in the US.

Upstart obviously doesn’t reveal the data points they have collected, but I would wager there are some things in that list that you wouldn’t think about. Maybe the ML models have identified a time repayment time period that is optimal – maybe those who wait until the last day to repay and those who pay earlier are less optimal than those that pay 3 days before the due date? Who knows? The ML model knows.

That leads to why true AI/ML solutions can potentially reduce the chance of compliance issues.
By looking elsewhere besides the traditional data, they can identify predictive data points with little to no correlation to affecting protected classes. Using the Smart Finance real life example, I think it would be hard pressed for a compliance officer to claim linkage of battery levels and speed of typing to a protected class of people. And where Upstart was questioned, such as SAT/ACT scores, they changed the model and had no issues.

Finally, UpStart is not only buying data, but collecting their own. This is pretty obvious from the S-1 and other filings. Yes, they acquire data, but much of the power of their models resides in the data they have. The more unique the data the harder it will be to replicated.

The 3 keys to successful AI/ML algorithms are (1) big data sets, (2) lots of computing power, and (3) the work of quality AI algorithm engineers. Applying deep learning to problem sets requires all three, but data is the ley.

BTW, that is why I initially held off on investing in UPST, the data they have access to can’t be to the same level as in China - whose population does so much more of their daily lives on mobile devices and the companies there collect more information. I was looking at my view of the technology and the access to data in the US as well as the specific application and should have been focused on the company data that Saul keeps reminding us to focus on.

Am I an expert in AI/ML? No, but back in the late early ‘90’s I wrote a couple SW solutions and presented several conference papers on using statistical analysis and design of experiment techniques to find optimal solutions. I mostly applied this to semiconductor circuit design in order to minimize the number of circuit simulation needed to find the optimal solution which met the key criteria for the circuit. Computation levels at the time limited me to ~20 variables and 6 criteria – enough for small circuits but not much more. Sort of an very early precursor to what AI/ML algorithms are being used for today in the industry.

In my mind I equate:

  • the work that I did (child’s play by today’s standards) to what LendingPoint is doing and
  • today’s AI/ML solutions in the optimization areas (today’s standards) to UpStart

BornGiantsfan

  • Now long UPST (and added a tiny bit more on Monday)
46 Likes

Thanks to tominvest for his awesome insights on AI/ML in bank lending. I have developed models for credit decisions and marketing. And like tominvest, I believe UPST’s advantage is that it doesn’t get much pushback from compliance because it is too small to get regulator’s attention (the kind of attention that big banks get where Fed and OCC set up their own offices at HQ and can dial in to executives’ meetings on credit policies). UPST simply isn’t big enough to get the same attention yet. But it will, eventually. I have only seen banks growing legal and compliance department, and never the other way around (or risk getting an MRA from the Fed).

I also have a few thoughts on some of the comments that have been used to argue UPST’s moat. For example:

  1. 1600 variables is more than what competitors can use.
    No. Every decent size banks can and will create a model that starts with several thousand variables, utilizing both bureau attributes (like the premier attributes that tominvest mentioned), internal data (if the customer already has accounts/loans with us), and 3rd party data. The model will eventually spits out a few dozen variables, or several clusters, totaling perhaps a few hundred variables at the end. With the modern computers we all have, I can use 16k variables if I wanted to.

  2. UPST has some magic variables that make its model unique.
    If you actually have a model with 1600 variables, then an additional input of unique variables, no matter how special, will yield minimal incremental value. Warren Buffet once said you don’t make money on your 7th best idea. The same is true for AI/ML models. You don’t get much incremental value from your 1600th variable.

  3. More variables in the model means better models.
    More data, not more input variables, leads to better models. And if we believe easy interpretability is also required for a “good” model (see below), then more variables would lead to a worse model

  4. AI/ML will solve disparate impacts and fair lending concerns.
    Current academic and regulatory consensus is that many AI/ML models exacerbates unfairness, and create distrust among consumers. There is a strong push to create easily interpretable machine learning models that can be explained to the regulators and even to the consumers.
    https://arxiv.org/pdf/1811.10154.pdf

  5. Competitors only target consumers with a credit score above 700 and UPST is able to target consumers with lower credit scores due to its AI/ML model.
    Major banks have to target customers with good credit scores because: 1. Stress test punishes banks for lending to consumers with low credit scores; 2. Major banks must restrict its loan concentration in subprime consumers, no matter how good these consumers are as predicted by AI/ML models.

UPST has found a niche spot in near prime and/or subprime consumers. It leverages AI/ML models that most major banks are reluctant to use. It has been a smart player in the lending industry. But major banks are not dumb, they have plenty of PhDs to develop models that can use more than 1600 variables. But their play box is restricted by regulators and consumer watchdogs. I believe this is for a good reason after we saw what happened during 2008 when problems within 1 big financial institution can destabilize the whole economy. One day, UPST will grow big enough to run into the same problem. Until then, it will be a great ride.

FG
(No position due to, you guess it, legal and compliance reasons)

37 Likes

Am I an expert in AI/ML? No, but back in the late early ‘90’s I wrote a couple SW solutions and presented several conference papers on using statistical analysis and design of experiment techniques to find optimal solutions… … Sort of an very early precursor to what AI/ML algorithms are being used for today in the industry.

In my mind I equate:
- the work that I did (child’s play by today’s standards) to what LendingPoint is doing, and
- today’s AI/ML solutions in the optimization areas (today’s standards) to UpStart

BornGiantsfan
- Now long UPST (and added a tiny bit more on Monday)

Thanks BornGiantsfan. Nice to get a personal perspective like that.
Saul

9 Likes

Thanks very much for your enlightening post. I’m not a tech guy and your post corrected several beliefs I had about UPSTART. Your industry incite and analysis is greatly appreciated. Long UPST, Luremaster

Hi Tominvest83,

This is excellent insight, thank you!

In your opinion, was there any particular reason UPST was able to adhere to compliance while many others were not able to?

Is this purely a “first mover” issue and a company culture issue (lack of legacy processes)?

Someone else mentioned scale in this thread - I think that makes sense, but doesn’t exactly answer the fundamental question.

-Purplemist