Approaching Zero-ETL with FOSS

Leaving aside complexities like the Enterprise Integration Patterns, we can consider most integrations as a form of advanced ETL: Extract, Transform, and Load. We extract data from a data store or service. Then we transform it from an input to an output format. And finally we push or load that transformed data into some output channel. It is the easiness to connect with the input and output channels what makes the ETL need a proper integration framework.

Complex integrations will combine these three steps differently. But the outcome is always to move information from one place to another, connecting different systems. Where the information may be a full dataset or just a triggered event.

I already tackled the issue of choosing the right integration tool from an engineer’s perspective and what variables to take into account. But when we are talking about data science and data analysis, there is a requirement that goes on top of all of the previous: the accessibility and easiness of usage of the tool.

Zero-ETL

Also called No-Code-ETL or Zero-Code-ETL, the Zero-ETL is the next generation integration paradigm in which users can perform ETL seamlessly. On the traditional way of performing ETL, you needed a developer that wrote the implementation, even if the code was a simplified language like the ones offered by Apache Camel. And then you needed someone to deploy it.

While writing a workflow using a simplified language is a much easier task than having to write it from scratch in Python or Java, it requires certain skills and maintenance work afterwards.

Zero-ETL helps you focus on a Domain Driven Design approach. We switch the focus to the conceptual data you are going to use, and not the technicalities. Data will be moved from one system to the following without worrying about intermediate steps like transform, and cleaning. You will connect to those systems without really caring what technology lies behind them.

Zero-ETL is defined differently depending on who you ask. On summary, a Zero-ETL differentiates from classic ETL because the interaction between the different data warehouses and data lakes are transparent to the user. You can mix data coming from different sources without really worrying about where the data is coming from and what format it has.

The wonders of Data Mesh

Some cloud platforms like Amazon and Google are selling basic Zero-ETL capabilities that transfer data in near real time from one of their services to another, sometimes also transforming the data transparently, like offering a no-SQL to a SQL translation.

Let’s forget for a moment that has been already possible with the proper tools in place for a long time. And focus on how they all conveniently forget that there is a whole universe outside their offerings; how they are trying to force a dependency to their platforms. They are on purpose ignoring the hybrid cloud, which is what most of us are working with. A cloud composed by several providers, different services, and protocols. A cloud in which a transparent, seamless integration would require providers talking the same language. Data Mesh across providers is a reality. And Zero-ETL provided by most cloud platforms is not covering that use case properly.

If you are a Data Driven organization, you already know that Data Science has gone through several trends in the past decades. We used to have a lot of fuzz about big data. Crowd sourcing. Interoperability. At some point we started talking about data warehouses, data lakes, data ponds, water gardens,…

Is it possible to achieve a Zero-ETL in a hybrid cloud world?

Data Mesh across providers

Once we change the perspective of how we view data data, we can make it closer to a product. It makes sense to store each data domain in their own data store and provider, the one that suits them most. Then, we will need to worry about how we are going to build our applications and services using that data. This paradigm usually comes with Event Driven architectures and data streams.

Sometimes we will have duplicated data in different formats and schemas to offer them in shapes more suitable for each domain. We have to carefully consider how and when to synchronize the different data storage to avoid inconsistencies.

With this change of perspective, there is also a switch on how we approach data mesh. Instead of seeing data as something that can be ingested into our software, now we also have data being served over common protocols. Protocols that need to be discovered easily. Instead of centralized pipelines that distribute data changes, we now publish events as streams.

Staying ahead of the curve in Zero-ETL

There’s many ways to perform Zero-ETL without tying yourself to a specific provider. The easiest way is by using Free and Open Source Software in your software stack.

Probably we will need to combine several clusters. But we will still want to use our hybrid cloud as one single cloud. We can trust Skupper to do the work for us, as shown on the following video.

Distributed event streaming platforms like Kafka can offer us a decentralized solution for connecting different services and data. Shifting to federated data storage with event driven stream of changes requires careful synchronization that Kafka can help with.

Camel K will help us deploy seamlessly and manage the integrations in a Kubernetes like cluster. Installed as an operator, it will deploy and monitor the middleware integrations needed for your specific use cases and make them serverless, if possible.

The user interface

And last but not least, you need some low code or no code editor to build your ETL. That’s what Kaoto is for. Kaoto is a multi-DSL flow editor. It allows you to create and deploy integrations in a visual way. Kaoto supports Apache Camel and works seamlessly with Camel K to deploy the workflows generated.

Five step flow build with Kaoto. This is a no code editor that allows zero-etl.
Building no code Zero-ETL with Kaoto

You can test Kaoto using the official demo instance. This instance does not allow deployment, but lets users see how the design of flows works.

With all these pieces of software, you can build a strong stack for your data science and data analysis without generating any kind of dependency on particular providers.

No Code Integrations

Integration is mostly about being everywhere and nowhere at once: interoperability without a clunky user interface or spaghetti code. Can we get no code or low code integrations?

On this article we will explore how to do no code and low code integrations based on Apache Camel.

“Every paradigm including data flow, programming by example, and programming through analogies brings its own set of affordances and obstacles.”

Alexander Repenning – DOI reference number: 10.18293/VLSS2017-010

We are going to use Kaoto, which just made its 1.0.0 release. On this release, the Kaoto team has focused on the no-code graphical canvas to make sure the user experience is as smooth as possible.

Apache Camel

As we have discussed previously on this blog, Apache Camel is an integration framework. This means Camel will help you orchestrate and compose your services in a decoupled way.

Camel has its own Domain Specific Language (DSL) that translates a simple Camel route into code you can run.

Simplified diagram on how Apache Camel works
Simplified Diagram of Apache Camel converting DSL to integrations

Camel Quarkus

Whether we decide to write our camel routes in Java, Javascript, Groovy, YAML, XML,… In Camel they all get transpiled to Java code for deployment. While usually this is not a problem by itself, it is true that Java uses sometimes a bigger memory footprint that would be needed.

But we can overcome this with the Quarkus build for Apache Camel. Camel Quarkus looks exactly the same from the user or developer perspective. But it brings all the developer joy of native builds and deployments to the integration world. This is done with the Quarkus extensions on Apache Camel.

Simplified diagram on how Apache Camel Quarkus works
The Quarkus version of Camel allows you to deploy both using native and Quarkus runtimes

Camel K and Kamelets

But Camel offers much more than just route integrations. We also have Kamelets that aim to simplify our integration definitions. These kamelets, or camel route snippets, act as meta-steps in an integration, hiding complexities in otherwise pretty simple orchestrations.

We also have Camel K to help in our integration management. It is a lightweight integration framework that runs natively on Kubernetes. It has been specifically designed for serverless and microservice architectures.

Simplified diagram on how Apache Camel K works
Simplified diagram on how Camel K wraps Apache Camel to provide k-native deployments.

No Code Integrations with Kaoto

The issue of making integrations frameworks accessible and easy to use is not new. There was been many different approaches to the same problem.

This is where we decided to try a new approach and create Kaoto as a way of creating no code and low code integrations.

The obvious first step on this diagram for us was some kind of Visual Editor at the beginning of the previously defined workflows, that would allow people to integrate without writing a single line of code.

Simplified diagram on how Kaoto works providing no code and low code integrations.
Simplified diagram on how a low code/no code editor works generating the source code graphically.

The primitives or building blocks of Kaoto are usually steps in the Apache Camel integration DSL: Kamelets, Camel Components, or Enterprise Integration Patterns.

We wanted Kaoto to have a simple user interface that provides building-blocks in a drag-and-drop style manipulation. But we wanted users to be able to manipulate the source code to be as transparent as possible about what they are building.

“Well, this is all fine and well, but the problem with visual programming languages is that you can’t have more than 50 visual primitives on the screen at the same time. How are you going to write an operating system?”

Peter Deutsch
What does Low Code look like?

Low code allows the user to view and interact with the source code to deploy. At the same time, the user will be able to focus on the features being implemented without really knowing how to write the source code. The source code may look as an adjacent add-on or a guide to help new users get familiar with concepts.

Good low code editors will also show some kind of visual aid to understand what the code is implementing. On our case, Kaoto has a visual workflow showing how the components get connected to each other.

Example of low code integrations.

On the above example, the user starts by writing part of the code on an empty template. Once the user stops writing, Kaoto fills the gaps in the code to make it a valid source code. Also, the user can drag and drop on the graphical side to build the integration, while the source code gets updated with the changes.

What does No Code look like?

On a no code solution, there is no need to interact or even see the source code at any point. The user can focus on bringing integration capabilities to their architecture without worrying about implementation details.

How to build an integration with no code

You can see on this video how the user can build the integration and deploy it just by using the graphical space. Drag and drop steps, select steps from a catalog, setup the configuration properties in a HTML forms and clicking the deploy button.

You see full examples of how to use Kaoto in the workshop section you can test using any of the quickstart options.

Bungee jumping into Quarkus: blindfolded but happy

A year ago I started with a couple of friends a new project based on Quarkus to create a visual editor for integrations called Kaoto.

As responsible of the backend side, I obviously chose Java to do it. Coming from the Java 8 world with shy traces of Java 11, I decided to jump directly to Quarkus on Java 17 (unstable at the time) with Reactive and explore the serverless possibilities while, at the same time, keep the over-engineering and the over-fanciness of new features as reasonable as possible.

On this article I will discuss the good and the bad of this experience. I am not a Quarkus developer, I am a developer that used Quarkus. And as any average developer that starts with a new technology, I obviously skipped the documentation and just bungee jumped into the framework, blindfolded and without safe nets.

Continue reading “Bungee jumping into Quarkus: blindfolded but happy”
en_GBEnglish (UK)