UPST: they lend like we invest!

I was waiting to hear the magic words of “we developed our own machine learning algorithms”.
He did not say them which was disappointing.

Dividends20,

https://ir.upstart.com/events/event-details/bank-america-glo…
This webcast had been posted before. At timestamp 28:02 the CEO states ‘data science is a term that’s thrown about…but people really capable of building models of the type that we use is more rare than you think’. Perhaps that seems to suggest they have indeed built their own machine learning algorithms?

Of note, their S1 had also stated “As of September 30, 2020, we had 2 patent applications in the United States related to our proprietary risk model.”
I could find this filing from 5/9/2019 “U.S. patent application number 15/806161 was filed with the patent office on 2019-05-09 for augmenting incomplete training datasets for use in a machine learning system. The applicant listed for this patent is Upstart Network, Inc… Invention is credited to Paul Gu, Brandon Ray Kam, Viraj Navkal, Grant Schneider, Alec M. Zimmer.” https://uspto.report/patent/app/20190138941

I admit I have no background in machine learning or computers whatsoever. Are ‘algorithms’ that you refer to, different from ‘proprietary risk models’?

12 Likes

There are three things. a) Data b) Algorithm c) Model

This patent is to fix these incomplete datasets. No dataset is perfect. It always has gaps and anomalies. Fixing this is much harder than it sounds as data from multiple lenders and borrowers pours in. As you can imagine, incorrect data sets will teach the machine to learn wrong things.

This is however not a patent for a new algorithm.

In general, there are a bunch of complicated algorithms prebuilt and are now part of opensource libraries for anyone to use.

In the interview, the CEO uses the b) and c) intermittently so it is not clear which one he is referring to. They could also be using the predefined algorithms and turning them for their use.

A quick look at linked in profiles of Upstarts shows some Research scientists with Phds in statistics so that is positive.

https://www.linkedin.com/in/grantwschneider/
https://www.linkedin.com/in/ehorel/

4 Likes

Hi Dividends,

I’m really happy to see your breaking this down a bit. I recently listened to a PODcast with CEO of Snowflake Here https://www.snowflake.com/podcast/interview-with-frank-sloot… where right off the bat is talking about a) Data. He made the point of the current models using the ‘available algorithms’ producing inaccurate predictions because until now with Snowflake ‘better able’ to combine the Data (Data Sharing in the Data Cloud) there just hasn’t been enough Data to make accurate predictions.

On this thread the comment regarding the ‘eleventh Bank benefiting from the first ten’ was of coarse addressing this point. I’m writing this now because I’m thinking that maybe I’m not the only one that is still trying to get his head around this and wanted to publicly thank you for your contribution.

I’m thinking that with all the prior work with developing algorithms and AI Models and now with Data Sharing available we are going to see an explosion of companies similar to Upstart, likely not in as big a market as Upstart, nor with many of the other benefits Upstart appears to have. But what are you thinking regarding the now available Data Sharing (available on Snowflake) and how it may be reducing the benefits of lead-time ((First Mover advantage) in the various Cloud Categories/Verticals?

Best

Jason

6 Likes

I think your point is a very interesting observation:
“as they move to secured lending (car loans) there is presumably less risk now that there is collateral. Also, auto lending is much more specific case than personal loans (less difference between individual loans). Although Upstarts competitors are (I assume) not using AI they likely have found efficient ways of lending through brute force over the years. I still believe that Upstart has shown enough of an advantage that I am not concerned. I will be watching their expansion into auto loans very carefully and I expect that their advantage may be less pronounced than personal loans. Although a moderate advantage in a much bigger market is still very valuable.”

I watched a podcast with Paul Gu UPST founder where he said that for the auto loan market their data shows their IA would save an average of $1600 on every auto loan and that there were 25 million auto loan a year (I may have this number wrong but the $1600 is correct). By my calculations that would be roughly $30 a month savings for a typical 60 month loan and translates into nearly 1% reduction in the loan rate.

Having just recently purchased a vehicle I can tell you all the sales folks wanted to talk about was the monthly payment so that is clearly important to most buyers.

I think there is probably a small subset of the purchasers that will shop for a car loan rather that just taking the path of least resistance with the dealers loan process, but if they can get the buyer to their loan option seems $30 a month or 1% less would be a differentiator.

4 Likes

These US patent filings disturbs me. The patent filing has protection if granted. To file a patent, you must describe your IP how it works in the specification. Therefore, if it does not grant, anyone skilled in the art can duplicate it. I have been an advocate of having IP implemented in software kept as trade secrets.

With respect to the 2 U.S. patent applications, one can be viewed by the public. This patent has 24 claims that have been allowed and issue fee paid. This suggests the patent will be issued within a few months.

1 Like

I highly, highly recommend watching this video interviewing a Product Manager at Upstart.
https://www.youtube.com/watch?v=PoziQ30UY8k

Dividends20, can you watch around 54:00 timestamp to tell us if what was said implies anything truly unique to their ML/AI?

Some rough timestamps I would like to highlight:
6:40: company values QUALITY of ideas, not flashiness of the ideas, and focused on SPEED

27:00 Maintain a pulse of the company. Reviewing “100 metrics” in 30 min; the product manager brings his amazon experience to UPST for SPEED and EXECUTION.

32:00 discussion on staying close to customers; behavioral researcher on staff helps understand customers

37:00 focus on customer, customer, customer. Inherited from his amazon experience. You’d get “kicked out” of the room if you pitch an idea based on “this will make the business this much money” rather than “this will help the customer in this way”

44:00 FICO scores correlate to risk only in a linear straight line; It’s super easy to estimate risk for high 700-800s scores. But there’s a very wide dispersion in the 600-700 range. FICO can’t tell the differences there. Some super low FICO end up with same risk as 800s.

54:00 Grant Schneider, VP machine learning and head of Upstart Columbus speaks to add color to the machine learning.

16 Likes

https://www.youtube.com/watch?v=o1SE9tOD0w4

I also watched this presentation by a research scientist at Upstart.

I have some rough timestamps to highlight below, but there were lots of statistical terminology that I don’t know/understand so you can see I have no timestamps for a large portion of the video.
Hopefully others can chime in on any important information.

13:30 machine learning was at the core of Upstart at the very beginning, even in 2012 when it was initially founded for income share agreements and not personal lending

18:55 machine learning projects are at play for more than just underwriting, marketing, or fraud/verification decisions. They even have ML projects for finding those late paying borrowers; finding those who would be the most likely to pay after receiving a personal call

21:00 scikit learn, R, python tech stacks… Don’t know what these are.

26:00 CFPB study on Upstart’s underwriting model showed it approves 27% more borrowers than traditional model and yields 16% lower average APR for approved loans. "This is 2 years old and we have made large improvements since then (we can’t give you exact numbers of course)"

40:00 outperformance during covid

47:00 “how do we generate an AAN from a complex non linear ML prediction system? So this is really one of the areas as well as in fairness that we do what I would classify as academic research…we dig into the literature…we haven’t published papers yet”

7 Likes

https://uspto.report/patent/app/20190138941

My understanding from reading this is - they look at ‘failed data’ and identify at what point in the timeline it ‘failed’, they create a distribution of it, and feed that to the incomplete data to generate information about a likely ‘failure’ for that incomplete data.

My interpretation of this is - they look at the loans that defaulted in the past, and try to figure out when/why/parameters that led to loan defaults. And then apply the distribution of many such defaults to the new loan application data, and figure out the likelihood of defaults in these new applications.
This is completely my interpretation and I could be wrong.

1 Like

History of customer wins

11/06/2019: First National
01/14/2021: Oriental Bank
02/03/2021: Kemba Financial Credit Union
02/17/2021: Midwest BankCentre
03/03/2021: Apple Bank
04/15/2021: First Financial
04/28/2021: Drummond Community Bank
05/19/2021: Customers Bank

These are translating into $116m of quarterly revenue.
There are over 10,000 banks and credit unions.

“Network effect” is building.
If Upstart models works with better efficacy than humans,
I can see banks and credit unions not requiring as many people in underwriting business.
It would be a no brainer for the smaller banks to sign into this automated way.

15 Likes

My understanding from reading this is - they look at ‘failed data’ and identify at what point in the timeline it ‘failed’, they create a distribution of it, and feed that to the incomplete data to generate information about a likely ‘failure’ for that incomplete data.

I used to work in a data mining group and will be reviewing the UPST videos others have posted to learn more about the company, but can comment on the general topic of data mining with a general example.

Suppose a company wants to cross-sell a product. A large dataset is constructed from current customer information and then goes through a ‘cleansing’ where some records are tossed out for various issues (e.g., default accounts). Next, additional data is added from whatever sources are purchased (like the USPS or property tax records). Depending on the product being sold, a “householding” step to group individuals and avoid sending multiple product ads to the same household. The last step attempts to fill in missing values by deriving or imputing their values based on similarly situated customers. (For example, my age may not be known, but might be imputed from the length of time at my residence, credit history, etc.) The data is now ready for use in datamining.

From this original set of data, extract a subset of data consisting of customers who already have that product. A data mining tool then crawls through all the data to find ‘clusters’ of customer types in this n-dimensional space of data. The result is a scoring algorithm which can identify how close an individual customer record comes to being in/near one of those clusters. (The closer you are to a cluster, the higher the likelihood of your desiring the product.)The algorithm is then back-tested against the original large set of data to see how well the prediction works. Ideally, the algorithm will pick up a sizable portion of customers who own the product.

If everything is judged satisfactory, you repeat the data prep steps for your entire customer set of data and apply the algorithm. Customers are then sifted into the various clusters and marketing campaigns are developed tailored for each cluster. (This is why you and your neighbor may each receive ads from a company like GEICO but you’ll each see different ads for the same product as the appeal will be based on characteristics of your cluster.)

I hope this helps & I haven’t instead muddied the waters.

19 Likes

very impressive interview. And not just the content that just got thoroughly discussed - I liked the way he responded to questions, and when asked to make a bold prediction at the end,
a) took a lot more time to think about it than you usually witness in interviews
b) decided he didn’t have a good answer, so didn’t give one.
To me, that adds a lot of credibility to everything else he said.

4 Likes

very impressive interview. And not just the content that just got thoroughly discussed - I liked the way he responded to questions, and when asked to make a bold prediction at the end,
a) took a lot more time to think about it than you usually witness in interviews
b) decided he didn’t have a good answer, so didn’t give one.
To me, that adds a lot of credibility to everything else he said.

I read that this way. He wanted badly to predict that Upstart would hit $1B in revenue this year. That certainly would have been a “bold” prediction, as current estimate is $600 million. However he correctly decided that he had no right to blindside the CEO and CFO with “guidance” like that. It wasn’t his place. So he wisely bowed out and said he preferred to not give a bold prediction. Absolutely the right thing to do!

Best,

Saul

38 Likes

Has anyone visited Upstart’s website lately?

https://www.upstart.com/for-banks/credit-decision-api/

“ Use your existing application process with Upstart’s AI-powered Credit Decision API.
Our model supports real-time decisions, APRs, and AANs for your auto loan, personal loan, or private student loan program.”

Whoa. Are they expanding or at least data collecting for student loans now too? Is anyone aware of any press release? I thought they only just started expanding into auto only from personal loans.

https://info.upstart.com/lending-pandemic-report

Can anyone get a copy of the COVID report? Says page not found when I try. (That being said I have a feeling the KBRA data I posted previously in this forum probably says the same about their outperfomance)

“ One Year Later:
AI Underwriting & Portfolio Performance Through COVID
How consumer loans originated before the pandemic performed through a year of economic uncertainty.
Impairment across AI-based loans increased 40% less than the industry as a whole.
An AI model’s “Risk Tier” was 6 times more effective than credit score bands at separating the risk of payment impairment.
Fewer borrowers with AI enabled loans required a hardship program, and more of these borrowers began promptly making on-time payments.
An AI model’s separated risk tiers translated into significantly lower payment impairment rates for bank and credit union partners.”

https://www.upstart.com/for-banks/podcasts/

I have been listening to their podcasts as they get released (the acclaimed Paul Gu interview was episode 5) but am behind by several. Will try to post later any highlights I discover for each episode.

22 Likes

Hi Saul,

I am looking at analyst’s avg estimate for next FY('22) revenues and it is sitting at ~$798m with the high estimate at $839m. The avg is just ~33% above the guidance for $600m from UPST, and there are some thoughts as to if they will do around that number this FY. The $798m for FY22 seems extremely off… 160%+ growth to ~33%? The analysts seem asleep at the wheel unless there is something we are missing. Anyways as this revises higher in the coming Qs it should provide significant SP boost. Seems like a significant opportunity here.

Long UPST
Bnh

2 Likes

I listened to all available podcasts and youtube interviews with UPST management (CEO Dave Girouard, cofounder Paul Gu, senior VP business dev Jeff Keltner, product manager Alex Rouse, data scientist John Lewis, VP auto lending Val Gui). The biggest takeaway from watching all of these: UPST has extremely competent management, and they really know how to execute execute execute.

I further encourage everyone to watch the two interviews below. I highlighted some timestamps:

https://youtu.be/kRLzQ5b6en8
Ep11: interview with Sam Sidhu, CEO elect of Customers Bank

This whole interview is a must-watch. (The audience watching are potential bank partners.)

19:20 we had slowed down to 20-30M a month originations in middle of last year (due to covid), ended the year 50-75M a month, and now we’re glad to be doing 100M per month with Upstart

22:00 Upstart’s AI/ML for verification drives the verification and KYC costs down which makes lending much more economical and profitable. The instant approval simultaneously drives demand/conversion rates (increases value proposition to bank partners)

24:40 “why do you need 1600 variables? Isn’t it just FICO and one or two other variables “good enough?”
We found you need to get over 100 variables just to get to half of the explanatory power of our model…every little variable by itself is not super important even if the credit score itself is removed, many of the 1600 variables are saying similar things as they’re related…but it’s the ones that are related that are actually saying slightly different things and understanding how that reflects a difference in credit worthiness that gives you uplifts. To take advantage of this you need a combination of three things that no one really has done: rows of data, the 1600 columns of variables, and the algorithms. We know the truly creditworthy subprime borrower is there, when you realize when 20% of the pool defaults that sounds awful until you realize then that means 80% paid you back! It’s about finding the 80%. With $10 billion orginated we now have the hundreds of thousands of rows for the 1600 columns of variables. Now you can’t use your old logistic regression that goes to a score cord and prints on a 5 page PDF to take advantage. You need extremely sophisticated algorithms…and we even found some high credit score borrowers actually represent high degrees of risk…the sum of all those variables adding a little something together ends up making a really tremendous difference in understanding the creditworthiness of a specific consumer.

29:00 upstart increases value for bank partners via frictionless lending to a new customer which brings new crosselling opportunities for the bank (opportunities for repeat lending/opening bank deposits for the future)

32:40 in depth talk about upstart’s covid outperformance

44:35 we kept fraud below 30bps across history of entire platform (0.30%)

47:20 extremely important talk about Upstart’s effective fair lending. This is what banks want to see for compliance

53:10 upstart’s nps score of 80+ is massive versus traditional banks below 20.

https://youtu.be/nDmbkf5eUYM
Ep14, interview with Val Gui, VP Automotive Lending at Upstart

22:00 when we started auto refi the conversion rate wasn’t where we thought it would be, but once we worked with borrowers and got feedback on removing unanticipated friction and our instant approval increased to 50% (This is going from zero customers to 6 months later)

31:50 Upstart will be creating and leading the market of auto refi, (autorefi is currently less than 5% of entire autolending market) which the demand is there but the market has not yet been created to meet that demand

37:18 what’s a bold prediction you have for the future? **Oh man…there’s one but I don’t know if i can actually say it because it’s about Upstart…**I don’t know if as a public company I can say this and not get in trouble [just like with Paul Gu’s interview!]

53 Likes

…when you realize when 20% of the pool defaults that sounds awful until you realize then that means 80% paid you back!

I think this quote summarizes UPST’s current advantage very well. It’s also the exact opposite approach of many traditional models. While most banks avoid these borrowers for fear of the 20% defaults, UPST has embraced the challenge of finding the 80% most likely to pay the loans back even at the higher interest rates common for this part of the lending pool. The borrower also benefits from 1) being approved at all and 2) receiving rates that might be high but still lower than alternatives like say credit cards or payday lenders.

If UPST’s model truly identifies those most likely to pay in this underserved segment - and signs point in that direction so far - there is plenty of runway for the company to keep growing at these very strong rates.

11 Likes

“ Use your existing application process with Upstart’s AI-powered Credit Decision API.
Our model supports real-time decisions, APRs, and AANs for your auto loan, personal loan, or private student loan program.”

Wow, jonwayne, nice find! This not only means that they have expanded into student loans too, but that they are already using NXTsoft’s APP, that they just announced last week, to integrate with banks (as I read it anyway). Boy, these guys move FAST!

Saul

34 Likes

This private student loan program may be less tangible than we might like it to be. Upstart has a blog page on student loans, but the two case stories are people who used the personal lending product to pay off credit card debt, while they had significant student loan debt to pay off as well. But Upstart still categorized these two case studies as student loans.

Private student loan lending is a big market, certainly Upstart could enter into it, but I don’t know if it is anything beyond the indirect issue raised in their blogs.

https://www.upstart.com/blog/tag/student-loans

Tinker

39 Likes

One of my larger take-aways was that they plan to expand in to as many verticals as they can, as fast as they can, all at once. They are a tech company first, that is in the loan space, because that is where they feel they can make the biggest difference. This is going to be fun to watch!

Further reading in recent posts:

2 Likes

Response from IR regarding student lending:

Hi Eric,

You are correct, our focus is personal lending and we are moving into auto lending. We are not underwriting direct student loans. Thanks for your question and please let me know if there is anything else I can help with.

Regards,
Jason Schmidt
VP, Investor Relations
Upstart

35 Likes