Talend Rough Draft

Here is a rough draft of an article I’ve been working on profiling Talend. Hope this sheds a little more light on the company. I’d love to hear what you guys think.

Very best,


Talend: Big Data Disruptor?

Big data is all the rage. In fact, 82% of organizations are in the process or planning to adopt big data technologies. It is extremely difficult dealing with millions of data points stored in multiple locations, in different formats, but with the help of Talend (NASDAQ: TLND), companies can go “from zero to big data without coding in under 10 minutes.” Let’s see if this big data player is worthy of your investment dollars.

Big Introduction
The proliferation of data in the past few years has created bottlenecks for companies trying to make the best use of it. In today’s day and age, data primarily comes from three locations, on-site databases, in the cloud (just a huge warehouse of servers that allow companies like Amazon (NASDAQ: AMZN) to sell extra computing power and storage), and machine data like cars or internet connected devices. With data coming from all these places, it becomes difficult to aggregate the data to analyze it.

This is where data integrators like Talend come into play. The company is an open source software platform that allows businesses to move data where it needs to go in order to make the best decisions possible. Talend is not just another data integrator though, it is on the cutting edge of the big data and cloud movements. Legacy data integration providers, like Informatica and IBM (NYSE: IBM) have not transitioned well to the big data and cloud spaces. On the other hand, Talend’s big data and cloud portions of its business have increased more than 100% YoY in each of the past 9 quarters. This is why Talend’s revenue is actually accelerating. In 2015, revenue grew 21% to $76 million and in 2016 revenue soared 39% to $106 million. When you see a company’s revenue growth accelerate, find out why.

Big Differentiation
Talend has a couple differentiators from legacy data integration players. To be clear, Talend is not a top dog in the traditional data integration space but it is quickly stealing market share in the big data and cloud integration portions of the overall market. These two segments, as stated, have grown over 100% for the last 9 quarters versus the industry’s 22% growth for big data and cloud integration. Three main reasons reveal why Talend is doing so well in this space.

First, Talend is made to run on Hadoop (just think of Hadoop as a way to store and analyze an insane amount of data). Talend ran a study and boasts it can run at least 5x faster than the current big data analysis software. This makes a huge difference when petabytes (1,000 terabytes) of data need to be analyzed.

Second, Talend is open source software so thousands of developers have enhanced the platform. Talend has millions of downloads of its Open Studio, the open source version of its products. This will lower sales and marketing costs in the future because developers come across the open source products and try them out, then upgrade to paid versions. Talend’s dollar based expansion rate, the proxy for upsells, has stayed above 120% for the last 12 quarters, hitting 125% in the latest. Plus, Talend is compatible with over 1,000 different data sources (i.e. applications like salesforce.com or different databases). Since it is open source, there is a real network effect that strengthens the ecosystem and integration capabilities.

Third, Talend’s business model is very different from typical providers. Talend charges based on number of developers using the software rather than the amount of data running through the system. This is not a trivial distinction. Companies have complained that they should not be charged more, as naturally, more data points flow through the same integration job. Talend agrees and this has resulted in a 60% increase in the number of customers spending over $100,000. Now Talend boasts 260 of these clients out of a total of over 1,600.

Big Disruptor
The data integration space is always changing and innovation is rampant. Talend is not a shoe-in for grabbing more market share since it still needs to stay cutting edge. However, the company has proven to be a formidable foe to legacy data integrators. Talend looks, specifically, to be a disruptor in the big data and cloud segments of this industry. As revenue accelerates, you might want to hop on this data train. It is just picking up speed.


Not related to this article, but I see that you picked Baozun (BZUN) in Caps, TMFish. Do you know of any well-written articles or posts on Baozun you could point me to?

Being “the Shopify of China” perks my ears up a bit in putting BZUN on my watchlist.

Thanks for a great summary. Some questions:

Do you believe the study that Talend commissioned is valid?

What is the TAM for Hadoop-based data integration? How do we know that Talend won’t saturate the market within a year or two and then growth will abruptly stop?

What is the competition doing to improve their own Hadoop-based offerings?


Hi Ryan,

A well written and clear article

One question: Is there some reason they are so successful besides using hadoop. (Some secret sauce). What’s stopping someone else from using hadoop to compete with them. There must be a reason

One suggestion: In your disruptor conclusion, you might consider using a quote from the CEO like “Our win rates remain ridiculously high!”

One tiny correction: It’s “shoo-in” not “shoe-in”.

Best, and have a great summer working at the Fool.



Great article. Thank you for the sneak peek. You teased the audience by referencing TLND’s competition. Naming their competition and stating if they are publicly traded would enhance the value of your already excellent article.


Well I’m pretty sure saul is not going to like it. He’s not into Chinese stocks last I read.
I actually bought a smalll position
We will see how it works out…
seeking alpha has a write up about it being a possible multibagger
This thread is about talend so please kindly start a new thread about bzun in the future

Thanks Saul! So in terms of hadoop, Talend’s code runs natively on it, meaning the code was made specifically for the parallel computing that hadoop offers. This is why it can run so much faster. Most other providers have scrambled together big data operations and have configured their code to run on hadoop. My understanding is that if another data integrator wanted to show comparable speeds with Talend, they would have to revamp their whole hadoop code base. This seems implausible and is part of the reason why Talend has gained so much traction in this space.


So in terms of hadoop, Talend’s code runs natively on it, meaning the code was made specifically for the parallel computing that hadoop offers. This is why it can run so much faster.

Doesn’t Talend support Spark as well?
If so, isn’t that where the speed gains are from?

1 Like

Great questions Smorgasbord. The TAM for hadoop based integrations is bit tricky to uncover. My estimation based on reading some reports is 5-7 billion by 2022. Management said the big data and cloud portions are growing at a 22% CAGR so it seems to be a growing market, which makes me think market saturation is not super likely. Plus, I believe more and more businesses will start leveraging this technology once they see the value. We don’t know anything for sure but just looking at the numbers and the trends, it seems to be in Talend’s favor. The competition is difficult to decode as well. Informatica was taken private so it is hard to tell but it is hard to believe PE guys are doing the best for the business in the long run. IBM is suffering all around. Oracle and SAP are lagging as well. I know Informatica is trying to run on hadoop but as the study showed (admittedly slightly Talend biased) Talend currently has the upper hand. Thanks for the questions, all very important.


Exactly. Spark is the fastest version of MapReduce, the analysis portion of Hadoop. Since Talend runs natively on Hadoop, it can easily support Spark.

…developers come across the open source products and try them out, then upgrade to paid versions.

This may be true in some smallish shops, but if it’s a small shop, they probably don’t have that much “big data” to deal with.

In a large shop, an IT developer who downloads a tool like this and starts using it (or brings it to work from home because downloads are blocked at the shop) would most likely be fired, no questions asked. Where I worked (very big IT shop) there was a rigorous procedure for bringing in open source products. Any new tool had to be isolated and thoroughly tested before it would be released. And release was closely controlled and only ramped up slowly if no problems were found.

It was understood that a piece of s/w behaving badly could shut down the entire operation, if some of that bad behaviour made it into production, it could literally shut down the business. Even tried and true vendor products were tested and controlled, open source was just much more dangerous.

I’m not sure how Talend brings new customers into their fold, but I really doubt that it’s by developers bringing it into the shop and playing around with it (when, in their spare time? What spare time?) and then convincing management to go buy it because it’s really cool.

1 Like

Oracle and IBM are not doing well because both of them have tailored their data integration products in order to optimise sourcing and loading their own DBMS products. They’ve always viewed data integration as a means of selling more DBMS. With an open source target like Hadoop and sources from everywhere (including a lot of “unstructured data” - note, I hate that phrase because it’s so inappropriate - as well as streaming real time data) the tools from IBM and Oracle basically suck.

1 Like

Spark is the fastest version of MapReduce, the analysis portion of Hadoop.

Spark and MapReduce are both Execution Engines, one is not a faster version of the other. Your data is setup differently when using Spark or when using MapReduce. Map Reduce jobs are typically setup for Batch operation, whereas Spark is setup for Streaming (although Storm proponents would argue that it’s really just a bunch of small batch jobs, not a true record-by-record stream, but we’re getting into the weeds here). And we shouldn’t forget that someone who wants to migrate their data to Spark has some work to do first.

Since Talend runs natively on Hadoop, it can easily support Spark.

I’m sure Talend supports Spark, but I’m equally sure that didn’t come for free just because they were running natively on Hadoop - at least if they wanted to get the speed that Spark promises and not just use some backwards compatibility mode.

I do believe that Talend’s code generation has been updated to take advantage of Spark.

Here is Talend’s own response to the criticism leveled against their performance test versus Informatica: https://www.talend.com/blog/2016/01/14/talends-benchmark-aga…

• The benchmark used a “two year old version of Informatica”. This is mostly true.

• The benchmark compares Informatica using MapReduce to Talend using Spark. True. Informatica’s latest available version at the time only supported Hive (which runs on top of MapReduce), so we used that.

This kind of goes hand in hand with them using an old version of Informatica (like I pointed out earlier, they found a small window where they had updated but Informatica’s new version was a month or so away), and not re-running it since.

Now, that all said, IF you have your data setup in Spark instead of Hadoop, or if you’re willing to make that transformation, then I will believe that it’s easier/faster in Talend to generate Spark compatible processing than it is in Informatica to run on their Spark compatible engine. This is from Talend’s architectural choice of generating code rather than generating a file for a proprietary engine to consume. A good thing for Talend.

But, I don’t believe it’s correct to state that Talend had an easy time of supporting Spark because it ran natively on Hadoop, nor that Spark is just a faster version of MapReduce. I think it’s that Talend’s code generation model enabled them to make it easy for their customers to run on Spark.

The Big Data world is still moving quickly, with new ways of working and new tools coming on board all the time. Even though I’m technically inclined, this isn’t my area of expertise at all and so I’m cautious about making claims unless I’ve not only seen them, but understood what went into the preparation.


Oh, and here’s Informatica’s response to Talend’s performance claims: https://blogs.informatica.com/2015/12/22/top-three-reasons-l…

From these tests it is quite apparent that Informatica Blaze runs on an average 11 to 20 times faster than MapReduce (even more benefits and also 2 to 3 times faster than Spark).

It’s a dirty war out there…