Hi everyone! Haven’t posted much outside of a few stats analyses based on the Oomph factor, but wanted to add my two-cents here regarding data analysis, the cloud, and AYX.
For background, I have a PhD in social science and work as a quantitative researcher at a Fortune 100 company. Most of my data is spent performing quantitative analysis for the company.
Where AYX fits:
I’ve never used AYX and no one on my team uses AYX. Almost all of us use R, Stata, or python. That being said, we are not the norm of people who are doing any type of data analysis. In fact, we represent a small portion of the company doing data analysis for the company. Most of my team has PhDs and the standard is to use R, Stata, or python.
I spent about an hour this morning actually playing with the AYX demo and watching some videos. It is very clear to me who this product is oriented towards. It is 100% driven towards people who have done data analysis using Excel in the past. While I thought it was pretty neat, I would likely never use it since nor would people on my team. That does not mean it is not valuable, it just means it is geared for certain roles.
For one, it is only available on Windows. R and python (discussed more below) are platform agnostic, meaning I can send the code to anyone on any machine (Windows, Linux, Mac) and they can run it. To me this suggests the market for AYX was never people like me who value this feature and usability. This also suggests that AYX is not suitable for anything in production (this is more of DataDog’s wheelhouse).
Why AYX is so much better than Excel is pretty simple. For one, Excel has hard limits on the number of rows and columns. I did a quick search and it looks like its around 1,000,000 rows and 16,000 columns. That may sound large, but its rather small in today’s data rich world. We’ve all tried to open large Excel file and it is terrible. Second, what AYX does much better than Excel is organize and record an analysis. A huge issue with Excel is there is not a good way to reproduce an analysis. The series of clicks that one must use is not recorded and stored by default. This is problematic for two reasons: 1) If new data comes in, say a new dataset, I could have to re-click everything opening up countless opportunities for error. Sure there are macros and ways around this, but it is not built in to Excel and not part of the standard operating procedure. AYX offers this in the way that you build an analysis, which can drastically reduce the number of errors and issues. 2) It is really hard to share these things and AYX make that easy. I can send over AYX file and someone else with AYX can run the analysis.Team members can check others works, reproduce analyses, and ensure that errors are limited.
So to me, it looks like AYX is geared towards that type of worker, whether their title is data scientist, data analyst, business insights, marketing researcher, etc. This is a huge market! There are more people at most companies who do quantitative analyses in Excel than R!
One anecdote, I know a manager in accounting at KPMG and they are pretty avid users of AYX. While they still do most work in Excel, there is a growing trend towards AYX. Again, the target is Excel users and with those users, it looks like AYX is crushing it.
Python and R:
At the core, python and R are programming languages. They are both open-source and free for users. They also are platform agnostic. Because they are programming languages, users have to write code and run scripts in order to obtain insights from data. Because they are programming languages, they are also very powerful as they allow users control over every part of the code. Yes there are many packages that people use, but everything in the package can be done by hand writing code. In fact, all the packages used are really just pre-written code.
draj asked in another discussion, “Are the statistical packages available in Python adequate to address any new statistical modeling problem that needs to be addressed. Or is the universe so well defined that they have it covered for all practical purposes.”
I would say there are packages in R and Python that address 98% of statistical modeling. However, these are programming languages and there doesn’t exist a statistical modeling procedure that could not be conducted in R or Python. It may take a while to write and forever to run, but they can handle everything.
Cloud
I think some people have some pretty glaring misunderstandings of what the cloud is, which makes comments like, “we use MSFT Azure for analysis” really problematic. The cloud is simply a set of computers in a centralized location (data center). These computers are more powerful than your personal computer, but they are just computers. In general cloud computers are set up to do certain things and run certain programs. Some computers are set up to host websites and the traffic that comes to the website (servers). Others are loaded with software programs that do certain things, like running analyses (SQL, Python, R). A user can set up a cloud computer (AWS, Azure, etc.) to do whatever they want, including running programs specifically for data analysis, but the cloud is not doing that per se.
Companies set up cloud computing for two reasons: safety and performance. Cloud computers are much safer and much higher performance than personal computers allowing people to work with sensitive information and have the computing power to conduct analyses. They also are more secure because important data does not exist on a users personal computer, which can get lost, stoled, etc.
I don’t know how and if AYX works on the cloud. From the looks of it, it looks like it operates locally on a users computer. This means it is limited in some way,