Thoughts on AYX, Data Analysis, & the Cloud

Hi everyone! Haven’t posted much outside of a few stats analyses based on the Oomph factor, but wanted to add my two-cents here regarding data analysis, the cloud, and AYX.

For background, I have a PhD in social science and work as a quantitative researcher at a Fortune 100 company. Most of my data is spent performing quantitative analysis for the company.

Where AYX fits:
I’ve never used AYX and no one on my team uses AYX. Almost all of us use R, Stata, or python. That being said, we are not the norm of people who are doing any type of data analysis. In fact, we represent a small portion of the company doing data analysis for the company. Most of my team has PhDs and the standard is to use R, Stata, or python.

I spent about an hour this morning actually playing with the AYX demo and watching some videos. It is very clear to me who this product is oriented towards. It is 100% driven towards people who have done data analysis using Excel in the past. While I thought it was pretty neat, I would likely never use it since nor would people on my team. That does not mean it is not valuable, it just means it is geared for certain roles.

For one, it is only available on Windows. R and python (discussed more below) are platform agnostic, meaning I can send the code to anyone on any machine (Windows, Linux, Mac) and they can run it. To me this suggests the market for AYX was never people like me who value this feature and usability. This also suggests that AYX is not suitable for anything in production (this is more of DataDog’s wheelhouse).

Why AYX is so much better than Excel is pretty simple. For one, Excel has hard limits on the number of rows and columns. I did a quick search and it looks like its around 1,000,000 rows and 16,000 columns. That may sound large, but its rather small in today’s data rich world. We’ve all tried to open large Excel file and it is terrible. Second, what AYX does much better than Excel is organize and record an analysis. A huge issue with Excel is there is not a good way to reproduce an analysis. The series of clicks that one must use is not recorded and stored by default. This is problematic for two reasons: 1) If new data comes in, say a new dataset, I could have to re-click everything opening up countless opportunities for error. Sure there are macros and ways around this, but it is not built in to Excel and not part of the standard operating procedure. AYX offers this in the way that you build an analysis, which can drastically reduce the number of errors and issues. 2) It is really hard to share these things and AYX make that easy. I can send over AYX file and someone else with AYX can run the analysis.Team members can check others works, reproduce analyses, and ensure that errors are limited.
So to me, it looks like AYX is geared towards that type of worker, whether their title is data scientist, data analyst, business insights, marketing researcher, etc. This is a huge market! There are more people at most companies who do quantitative analyses in Excel than R!

One anecdote, I know a manager in accounting at KPMG and they are pretty avid users of AYX. While they still do most work in Excel, there is a growing trend towards AYX. Again, the target is Excel users and with those users, it looks like AYX is crushing it.

Python and R:
At the core, python and R are programming languages. They are both open-source and free for users. They also are platform agnostic. Because they are programming languages, users have to write code and run scripts in order to obtain insights from data. Because they are programming languages, they are also very powerful as they allow users control over every part of the code. Yes there are many packages that people use, but everything in the package can be done by hand writing code. In fact, all the packages used are really just pre-written code.

draj asked in another discussion, “Are the statistical packages available in Python adequate to address any new statistical modeling problem that needs to be addressed. Or is the universe so well defined that they have it covered for all practical purposes.”

I would say there are packages in R and Python that address 98% of statistical modeling. However, these are programming languages and there doesn’t exist a statistical modeling procedure that could not be conducted in R or Python. It may take a while to write and forever to run, but they can handle everything.

Cloud
I think some people have some pretty glaring misunderstandings of what the cloud is, which makes comments like, “we use MSFT Azure for analysis” really problematic. The cloud is simply a set of computers in a centralized location (data center). These computers are more powerful than your personal computer, but they are just computers. In general cloud computers are set up to do certain things and run certain programs. Some computers are set up to host websites and the traffic that comes to the website (servers). Others are loaded with software programs that do certain things, like running analyses (SQL, Python, R). A user can set up a cloud computer (AWS, Azure, etc.) to do whatever they want, including running programs specifically for data analysis, but the cloud is not doing that per se.

Companies set up cloud computing for two reasons: safety and performance. Cloud computers are much safer and much higher performance than personal computers allowing people to work with sensitive information and have the computing power to conduct analyses. They also are more secure because important data does not exist on a users personal computer, which can get lost, stoled, etc.

I don’t know how and if AYX works on the cloud. From the looks of it, it looks like it operates locally on a users computer. This means it is limited in some way,

109 Likes

BMA,

AYX has a stated goal of building a bridge between data scientists and data analysts.

The reasons enterprises like it is because they can use low/no code solutions and multiply their capabilities versus hunting for avid R Studio geeks such as us.

I very much get it.

Honestly, leading a team, it makes hiring and projects much easier if you don’t have the bottle neck of a barrier to learning code.

It’s a fundamental shifts where the data scientist of the future isn’t a programmer who knows who to grab and transform and analyze the data…likely with a heavy IT background and a light operations/business background…but instead the future is a person with significant business understanding of how they use data to make decisions today (ie “I look at this system then I go to this system then finally I pull up this pdf and based on all that I do this”) and empowering that person with the tools to connect the dots.

In other words, which is more scalable for unlocking business intelligence

A bunch of person As
80% computer science knowledge + 20% business understanding

Or a bunch of person Bs
20% computer application knowledge and 80% business understanding but leveraged tools to build out the analysis of person A

You have a treasured skill set and Alteryx is trying to build towards you, if not hire you, but what they really want is to empower others in your organization to scale your value across the company.

Just a Foolish Data Geek in the Energy Industry

52 Likes

These computers are more powerful than your personal computer, but they are just computers. In general cloud computers are set up to do certain things and run certain programs. Some computers are set up to host websites and the traffic that comes to the website (servers). Others are loaded with software programs that do certain things, like running analyses (SQL, Python, R). A user can set up a cloud computer (AWS, Azure, etc.) to do whatever they want, including running programs specifically for data analysis, but the cloud is not doing that per se.

This is correct, but misleading. While there are boxes that one might think are computers in a data center, each box is just piece of compute. A server, which in the old days was a computer, may be part or all of a box, or it may be scattered across many boxes in the server center. It also may be all of the above in any given hour depending on the load on the server.

This is what makes virtualization so powerful and economic. Moreover, if your server needs to pass data, and a lot of it, to another server and it is in the same server farm, it might be able to pass that data through a virtual router and never leave the physical boxes. This can dramatically increase throughput.

Virtualization and cloud computing is no longer just renting a box in server farm, it is much more economical than that and much more robust.

In fact, your server can exist on two different continents at the same time, depending on the task assigned might even work on a task on two different continents at the same time.

Cheers
Qazulight

16 Likes

the future is a person with significant business understanding of how they use data to make decisions today (ie “I look at this system then I go to this system then finally I pull up this pdf and based on all that I do this”) and empowering that person with the tools to connect the dots.

I think fundamentally, there will always be two camps of people and we will never convince each other. One camp, like myself, will always believe that data scientists’ main job is to build models to make predictions and to take people out of the decision-making process. As such, we would need control and flexibility to build, test, and train our models so to make the best predictions possible. As such, the ability to code is a prerequisite for becoming a good data scientist.

The other camp will always see data scientist as expensive people who can “make sense” of the data. For them, they will always see business knowledge and the ability to extract, manipulate, summarize, and visualize the data as the main responsibilities for data scientists. For people who hold this view, then coding presents a barrier to people who want to be a good data scientist.

If you are worried about AYX’s competitiveness in the world of predictive modeling and AI, then you need not to worry. Because only 1% of those who use data in their work need to build predictive models. So to argue whether AYX is better for predictive modelling than other tools is irrelevant when there is still a huge unmet demand for a good tool among people who need to make use of their data but don’t need to build any models.

19 Likes

What I see is that any data scientist to be worth his britches needs to work with the less data skilled line worker in order to create and refine models that work in the real world.

I run my own shop with customers from CEOs to barely above minimum wage. Everything I do flows from top to bottom and back again and economics requires that what I learn at the bottom be learned at the top as well.

Alteryx has a product that enabled this. You can use R and Python and yet still send your work to the marketing analyst using Alteryx. I don’t know any other product that enables this.

Democratization of practical knowledge is the force that had moved history forward and data will not get locked at the top. Alteryx has a platform that enables those at the top to work as they please while keeping those at the bottom in the loop, and enabling those at the bottom to tell those at the top what they are missing on a practical basis.

If not, let us know of another product that enables this.

Thanks.

Tinker

36 Likes

You can use R and Python and yet still send your work to the marketing analyst using Alteryx. I don’t know any other product that enables this.

There is a tool for that - it is called Python.

Python is a programming language. AYX is built by a programming language (my guess is C++, but I don’t know). Therefore, if you can do programming, then you can replicate anything that AYX can do. Obviously, your results cannot be as smooth, nice, beautiful, or easy to use as AYX. But to accomplish something as simple as sending my work to someone else, it takes zero effort with Python.

4 Likes

A very interesting thread! As investors I think we should not get bogged down by the technical details of programming languages vs. applications like Excel and AYX but should concentrate on their market potential. For reference, I don’t have a PhD but I have been writing code and using Excel for decades. Simply put, where Excel can solve my problem, I will not write code. Where Excel gets too clunky I will write code but often I use Excel to model the solution before writing that code. FoolsGrad wrote:

Python is a programming language. AYX is built by a programming language (my guess is C++, but I don’t know). Therefore, if you can do programming, then you can replicate anything that AYX can do.

My current language of choice is PHP because now I write code exclusively for the web. I searched for libraries of financial functions and most of the ones I found were not in PHP. At some point Microsoft open sourced phpExcel. I was very familiar with Excel functions so this was a perfect fit for me. No need to replicate Excel. You can slice and dice any way you want but that is NOT the point. The point for investors is, how big is this market? How easy is to to use this product – complexity is the enemy of adoption.

My WAG is that the market for AYX is two or three orders of magnitude larger than the market for R, Stata, and python combined.

Denny Schlesinger

BTW, I did play around with Python but Python would not play nice with php (at least, I could not get it to play nice with php) so I gave up Python.

38 Likes

Hi everyone! Haven’t posted much… but wanted to add my two-cents here regarding data analysis, the cloud, and AYX.

Hi, BWA, you certainly started an interesting and very useful thread. Combined with muji’s addition to the older thread, it has been a great discussion.

Thanks so much for sharing, and thanks to every one else who has contributed.

Saul

13 Likes

If you own AYX, I highly recommend listening to the Needham conference transcript. One comment toward the end really stuck out to me. CEO Stoecker was talking about culture and how they are really focused on it, because it is the only thing they can see stopping them.

They clearly see a large TAM with AYX as the leader, and don’t see much competitive threats right now.

Also, Crowdstrike owners should listen to that transcript also. CRWD is very confident about their competitive position.

To listen to the transcripts, go to the investor relations page of each company for a link.

Jim

17 Likes

Great discussion.

I’m a Data Architect working in the Enterprise Analytics group of a mid-to-large sized health system. I spend about a third of my time building data marts and adding new data sources to our enterprise data warehouse, a third building dashboard visualizations, and a third building machine learning models for predictive analytics.

While my team does not use AYX, a great number of downstream business analysts find it absolutely indispensable. We often are approached by a team wanting to do a study or analysis on a certain patient population, and the data in the EDW gets them about 95% of the way there. Oftentimes, the team also wants to include data that is stored outside of the electronic medical record in registries specific to the condition or disease they want to study. These registries could be SQL based, document db, Access, even Excel. It has not, is not, and will not ever be appropriate to import these smaller registries into the EDW, but an AYX user can easily join, blend, and study a extract from the EDW with their own data. THAT is the holy-grail type of analysis that was so time and talent intensive in the past, that is now available to almost anyone with the desire to do it.

We see more and more teams using AYX every day in our org and the analysts seem to be spreading the word.

89 Likes

Good posts
to take people out of the decision-making process. Or simply to give another way to look at information. Humans have many biases and can handle only a few inputs. Increasing inputs past a limited number leads to more confidence but not better results. OTOH computers know only what you feed them, GiGO. So some tasks lend themselves to machine decisions, some do not. But having additional input from a good AI is bound to help, wish I had one to help my portfolio planning.

And there is the ability to code- I am not a programmer but I expect skill levels at programming vary a lot, there will never be that many super stars available. There will be lots more who can do a little simple Python if needed. It sounds like the users of Alteryx will not be the PhD from a good university, but these will always be in the minority inside companies. And small companies probably can’t afford many of them.

9 Likes