Quality data is a prerequisite for building useful machine learning and AI models. By quality data I mean that it is representative, accurate and abundant. And then, even when you have quality raw data, which is a big hurdle, there is still plenty of work to do to tailor it to machine learning and then train and test models against it to obtain useful results.
From the recent Q2 conference call, Snowflake’s Slootman states some of these ideas well in response to the current AI excitement. I’m glad to see that he recognizes these important facts that all enterprises, whatever their AI goals, need to understand and execute on accordingly.
Slootman Q2 fiscal 2024 conference call:
“Generative AI is at the forefront of customer conversations. However, enterprises are also realizing that they cannot have an AI strategy without a data strategy to base it on. We have a head start in this race as the epicenter of highly curated, optimized, and trusted enterprise data.”
“…we were actually saying that having highly organized, optimized, trusted, sanctioned data is incredibly important for deploying large language models.”
“If you think you can just, you know, drop a model on top of a data lake and just, you know, see what happens, that’s not going to end well. And then, that’s what people are realizing.
So, [enterprises] really got to get super serious, you know, about their foundations, you know, before we – if you don’t have a good foundation, there’s not much you can build on top of that. There’s tons of governance issues involved as well. You know, we spent, you know, literally decades, you know, as an industry, you know, making data highly governance. In other words, who can have access to what.
So, that now needs to translate into the world of large language models as well. So, there’s tons of questions that are coming up that are really important for the enablement of language models and AI generally. So, being extremely organized on your data is going to become, you know, a premium thing. And we’re obviously – that’s – you know, we’ve been on that, but it’s become more important as a function of this.”
“a lot of these want to do more in the area of AI. But first, they need to get their data into Snowflake, and it’s going to be a journey for these people. It’s not going to happen overnight, AI, for our customers.”
“And hence, the emphasis on getting your data house in order because you just cannot unleash, you know, large language model and hope for the best because of all the issues that we’ve mentioned before around governance and just understanding of what kind of data we are generating in the process.”