Introducing Confluent (CFLT)

nitinkhanna · September 19, 2021, 12:01pm

Confluent (CFLT) and “data in motion”

This is my first post on TMF and on this board. I have been a lurker for 10 months or so, and benefited hugely by the philosophy and the content of this board.

I have been a software engineer for 22 years. I retired last year and now am a fulltime student doing ‘masters in clinical counseling’. Last few years of my career, I was heavily involved in moving existing applications to the cloud and writing cloud native applications.

I see Confluent mentioned on board a few times, it has great numbers, but I think members don’t understand what Confluent does, what ‘data in motion’ is, how important it is for companies, and how it is different from other database companies like Oracle and MongoDB.

Here is my attempt to explain what they do, and how important they are to companies who are in the cloud.

Let’s take an example of a retailer like ‘Pier 1 Imports’. Their web site used to be a one giant ‘monolith’ application. Everything that an e-commerce website needs like UI, Orders, Shopping-cart, Customers, Products, Inventory and all the business logic would encompass in one application.
What’s the problem with this architecture?
Scalability: If for Black Friday I want to increase resources for UI and product by five times to handle the traffic, I have to increase resources for the entire application five times. That means your entire cloud bill goes five times up.
Development time: Since a huge team of developers is required to manage this application, there is a lot of time wasted in communication and coordination. Developers will be stepping on each other’s toes.
Time to market: Rolling out new features or a bug-fix is hard because every time one feature is ready, some other features might be under construction and so on.
Technological restrictions: if some feature or some part of the application can be written better using some specific language, it’s tough luck! Since it’s one application, everything needs to be written in one language .

Cloud-native architecture or Microservices architecture solves all these issues. It is done by splitting every feature in its own service. So in case of ‘Pier 1 imports’, UI, Order, Shopping-cart, Products etc will have their own service and their own database. This solves all the above issues since an independent service can be written in its own technology/language, have its own small developer team that can release on its own schedule and it could be scaled independently.

But, do you see a new set of problems with this architecture?
With this new architecture, there are a lots of moving parts! When it was one application, it was easier for Order to talk to the Packaging, Inventory and Analytics modules after each order was placed. Now, service to service communication is more complex because they have separate existence.
There are two main ways services communicates to one another in new cloud-native architecture:

API: this is when two services directly talk to each other. This is perfect for real time but the problem is if one service is down it would create havoc in the system. For example if Analytics service is down, and Order wants to let Analytics know that it sold two tables then order will get stuck because analytics is down and which could mean order won’t go through, and future orders might not go through either since Order is still waiting on Analytics to respond.
Publisher/Subscriber Messaging Queue Mechanism: This is a ‘almost-real-time’ model but not the real time. In this mechanism, Publisher is an entity who has important information to release, like Order-service when order is placed or Inventory-service when it wants to publish an event when a product inventory goes below critical mark. Subscriber is an entity which is interested in a particular event like Inventory service would be interested in every order that is placed. If any Subscriber goes down, upon restarting, it would know what it read last and start from there. This broadcasting, subscribing, mailbox, messages, and queueing is done beautifully, easily, reliably and with extensive scalability by an open source project named Kafka.

Lets see this in example: If on the Pier-1 website, a customer adds 2 tables to the shopping cart, shopping-cart would broadcast that message. Inventory service has subscribed to the message, so upon getting the message, it would put 2 tables on hold in inventory. When the customer hits the buy button, Order service will broadcast the message, Inventory service will move those 2 tables from ‘hold’ to ‘sold’ and might publish its own message for inventory depletion for the tables. Packaging service and Transportation service have also subscribed for the particular table sell-message, so that they can run their algorithms. There would be scores of messages created and spread around for each order. This queueing mechanism becomes the spinal-cord of the system and its importance, necessity and usage keeps growing by the day. I have used alternative queueing mechanisms like MSMQ (by microsoft), RabbitMq, JMS etc but no one comes close to Kafka.

Confluent is founded by original creators of Kafka. Confluent made Kafka a managed cloud service, added a bunch of bells and whistles to make development easier and manageable.

I hope this was helpful.

Nitin (Long CFLT 15%)