Approaching Zero-ETL with FOSS

Leaving aside complexities like the Enterprise Integration Patterns, we can consider most integrations as a form of advanced ETL: Extract, Transform, and Load. We extract data from a data store or service. Then we transform it from an input to an output format. And finally we push or load that transformed data into some output channel. It is the easiness to connect with the input and output channels what makes the ETL need a proper integration framework.

Complex integrations will combine these three steps differently. But the outcome is always to move information from one place to another, connecting different systems. Where the information may be a full dataset or just a triggered event.

I already tackled the issue of choosing the right integration tool from an engineer’s perspective and what variables to take into account. But when we are talking about data science and data analysis, there is a requirement that goes on top of all of the previous: the accessibility and easiness of usage of the tool.

Zero-ETL

Also called No-Code-ETL or Zero-Code-ETL, the Zero-ETL is the next generation integration paradigm in which users can perform ETL seamlessly. On the traditional way of performing ETL, you needed a developer that wrote the implementation, even if the code was a simplified language like the ones offered by Apache Camel. And then you needed someone to deploy it.

While writing a workflow using a simplified language is a much easier task than having to write it from scratch in Python or Java, it requires certain skills and maintenance work afterwards.

Zero-ETL helps you focus on a Domain Driven Design approach. We switch the focus to the conceptual data you are going to use, and not the technicalities. Data will be moved from one system to the following without worrying about intermediate steps like transform, and cleaning. You will connect to those systems without really caring what technology lies behind them.

Zero-ETL is defined differently depending on who you ask. On summary, a Zero-ETL differentiates from classic ETL because the interaction between the different data warehouses and data lakes are transparent to the user. You can mix data coming from different sources without really worrying about where the data is coming from and what format it has.

The wonders of Data Mesh

Some cloud platforms like Amazon and Google are selling basic Zero-ETL capabilities that transfer data in near real time from one of their services to another, sometimes also transforming the data transparently, like offering a no-SQL to a SQL translation.

Let’s forget for a moment that has been already possible with the proper tools in place for a long time. And focus on how they all conveniently forget that there is a whole universe outside their offerings; how they are trying to force a dependency to their platforms. They are on purpose ignoring the hybrid cloud, which is what most of us are working with. A cloud composed by several providers, different services, and protocols. A cloud in which a transparent, seamless integration would require providers talking the same language. Data Mesh across providers is a reality. And Zero-ETL provided by most cloud platforms is not covering that use case properly.

If you are a Data Driven organization, you already know that Data Science has gone through several trends in the past decades. We used to have a lot of fuzz about big data. Crowd sourcing. Interoperability. At some point we started talking about data warehouses, data lakes, data ponds, water gardens,…

Is it possible to achieve a Zero-ETL in a hybrid cloud world?

Data Mesh across providers

Once we change the perspective of how we view data data, we can make it closer to a product. It makes sense to store each data domain in their own data store and provider, the one that suits them most. Then, we will need to worry about how we are going to build our applications and services using that data. This paradigm usually comes with Event Driven architectures and data streams.

Sometimes we will have duplicated data in different formats and schemas to offer them in shapes more suitable for each domain. We have to carefully consider how and when to synchronize the different data storage to avoid inconsistencies.

With this change of perspective, there is also a switch on how we approach data mesh. Instead of seeing data as something that can be ingested into our software, now we also have data being served over common protocols. Protocols that need to be discovered easily. Instead of centralized pipelines that distribute data changes, we now publish events as streams.

Staying ahead of the curve in Zero-ETL

There’s many ways to perform Zero-ETL without tying yourself to a specific provider. The easiest way is by using Free and Open Source Software in your software stack.

Probably we will need to combine several clusters. But we will still want to use our hybrid cloud as one single cloud. We can trust Skupper to do the work for us, as shown on the following video.

Distributed event streaming platforms like Kafka can offer us a decentralized solution for connecting different services and data. Shifting to federated data storage with event driven stream of changes requires careful synchronization that Kafka can help with.

Camel K will help us deploy seamlessly and manage the integrations in a Kubernetes like cluster. Installed as an operator, it will deploy and monitor the middleware integrations needed for your specific use cases and make them serverless, if possible.

The user interface

And last but not least, you need some low code or no code editor to build your ETL. That’s what Kaoto is for. Kaoto is a multi-DSL flow editor. It allows you to create and deploy integrations in a visual way. Kaoto supports Apache Camel and works seamlessly with Camel K to deploy the workflows generated.

Five step flow build with Kaoto. This is a no code editor that allows zero-etl.
Building no code Zero-ETL with Kaoto

You can test Kaoto using the official demo instance. This instance does not allow deployment, but lets users see how the design of flows works.

With all these pieces of software, you can build a strong stack for your data science and data analysis without generating any kind of dependency on particular providers.

No Code Integrations

Integration is mostly about being everywhere and nowhere at once: interoperability without a clunky user interface or spaghetti code. Can we get no code or low code integrations?

On this article we will explore how to do no code and low code integrations based on Apache Camel.

“Every paradigm including data flow, programming by example, and programming through analogies brings its own set of affordances and obstacles.”

Alexander Repenning – DOI reference number: 10.18293/VLSS2017-010

We are going to use Kaoto, which just made its 1.0.0 release. On this release, the Kaoto team has focused on the no-code graphical canvas to make sure the user experience is as smooth as possible.

Apache Camel

As we have discussed previously on this blog, Apache Camel is an integration framework. This means Camel will help you orchestrate and compose your services in a decoupled way.

Camel has its own Domain Specific Language (DSL) that translates a simple Camel route into code you can run.

Simplified diagram on how Apache Camel works
Simplified Diagram of Apache Camel converting DSL to integrations

Camel Quarkus

Whether we decide to write our camel routes in Java, Javascript, Groovy, YAML, XML,… In Camel they all get transpiled to Java code for deployment. While usually this is not a problem by itself, it is true that Java uses sometimes a bigger memory footprint that would be needed.

But we can overcome this with the Quarkus build for Apache Camel. Camel Quarkus looks exactly the same from the user or developer perspective. But it brings all the developer joy of native builds and deployments to the integration world. This is done with the Quarkus extensions on Apache Camel.

Simplified diagram on how Apache Camel Quarkus works
The Quarkus version of Camel allows you to deploy both using native and Quarkus runtimes

Camel K and Kamelets

But Camel offers much more than just route integrations. We also have Kamelets that aim to simplify our integration definitions. These kamelets, or camel route snippets, act as meta-steps in an integration, hiding complexities in otherwise pretty simple orchestrations.

We also have Camel K to help in our integration management. It is a lightweight integration framework that runs natively on Kubernetes. It has been specifically designed for serverless and microservice architectures.

Simplified diagram on how Apache Camel K works
Simplified diagram on how Camel K wraps Apache Camel to provide k-native deployments.

No Code Integrations with Kaoto

The issue of making integrations frameworks accessible and easy to use is not new. There was been many different approaches to the same problem.

This is where we decided to try a new approach and create Kaoto as a way of creating no code and low code integrations.

The obvious first step on this diagram for us was some kind of Visual Editor at the beginning of the previously defined workflows, that would allow people to integrate without writing a single line of code.

Simplified diagram on how Kaoto works providing no code and low code integrations.
Simplified diagram on how a low code/no code editor works generating the source code graphically.

The primitives or building blocks of Kaoto are usually steps in the Apache Camel integration DSL: Kamelets, Camel Components, or Enterprise Integration Patterns.

We wanted Kaoto to have a simple user interface that provides building-blocks in a drag-and-drop style manipulation. But we wanted users to be able to manipulate the source code to be as transparent as possible about what they are building.

“Well, this is all fine and well, but the problem with visual programming languages is that you can’t have more than 50 visual primitives on the screen at the same time. How are you going to write an operating system?”

Peter Deutsch
What does Low Code look like?

Low code allows the user to view and interact with the source code to deploy. At the same time, the user will be able to focus on the features being implemented without really knowing how to write the source code. The source code may look as an adjacent add-on or a guide to help new users get familiar with concepts.

Good low code editors will also show some kind of visual aid to understand what the code is implementing. On our case, Kaoto has a visual workflow showing how the components get connected to each other.

Example of low code integrations.

On the above example, the user starts by writing part of the code on an empty template. Once the user stops writing, Kaoto fills the gaps in the code to make it a valid source code. Also, the user can drag and drop on the graphical side to build the integration, while the source code gets updated with the changes.

What does No Code look like?

On a no code solution, there is no need to interact or even see the source code at any point. The user can focus on bringing integration capabilities to their architecture without worrying about implementation details.

How to build an integration with no code

You can see on this video how the user can build the integration and deploy it just by using the graphical space. Drag and drop steps, select steps from a catalog, setup the configuration properties in a HTML forms and clicking the deploy button.

You see full examples of how to use Kaoto in the workshop section you can test using any of the quickstart options.

In the FOSS trenches

On this article I will discuss about the current state of the art of the Free, Libre and Open Source Software; what are the challenges for its sustainability; and how we can overcome them. Is FOSS model outdated? What can we do to improve it?

After almost 20 years of being involved in the Free and Open Source Software (FOSS) community, and having gone through different associations and foundations, I would like to give my perspective on its sustainability. I have seen how companies get closer and further from FOSS as they evolve, and how different FOSS entities have overcome challenges.

This is not a light matter and the contents of this article are not only opinionated, but a mere scratch on the surface. My intention here is merely to try to open a debate I feel is stagnant.

Where does the software come from?

Let’s cover the basics before digging into the topic. Bear with me, it will take only a few paragraphs to become more interesting.

Why does a software project, whether a FOSS or restricted licensed project, survive in the long run? The answer is simple: because there are people investing time and/or resources in it.

If those contributors only invest in the project in their free time, the bus factor (or lottery factor) is very high: that project is highly dependent on very specific individuals. If this group (usually small) of people contributing to the project decides to move on to another project, to stop contributing, or they just can’t afford to contribute anymore; that project may disappear, affecting who knows how many other software projects.

We could try to rely on public resources and the good will of people to contribute on their free time to the common good which is FOSS. But let’s be clear: we live in a capitalist society. This means, whatever is not sustained by capital, will have a difficult time to survive in the long run unless it is covered by some basic human right. And even then.

Whether we like it or not, we live in a capitalist world in which if you don’t have enough capital to back you up, you risk disappearing.

Sustainability is key to our stack

When building software, we are not only interested in working in a sustainable project. We need to ensure the sustainability of all the ecosystem around our project (dependencies, frameworks, libraries, operative systems, drivers,…). If one of those layers in our software stack fails, our entire software stack may fail. We don’t want to have to quickly replace some dependency because the random person that is maintaining it suddenly stopped.

Mandatory XKCD explanation on FOSS projects.

Our ideal paradigm would be an ecosystem in which all the software projects we depend on are maintained by a diverse funded group of people, where there is spread interest in that project to continue. We don’t want to have a Heartbleed situation.

Which leads us to the first problem for project sustainability: whoever invests more time or resources in a project is who decides how the project evolves.

Where does the sustainability come from?

Most of the software we use has one or more companies behind maintaining them. There are roughly four models in which a software company can get the resources to invest back in a project.

Two column diagram. Left Column: Investment Right Column: Profit Software Development ----sell---> customer Software Development ----sell---> customers Software Development ----I have the knowledge---> Consultancy---offer support--->customers Consultancy---offer support--->customers
Simplified diagram of software economics

Custom software development: The company develops a software specifically tailored for a single customer. They invest once and sell it once. This sale can be either giving an executable, giving the source code, offering it as a service,… An example would be a webpage built from scratch.

Develop once, sell many times: The company develops a more generic software which can be sold to multiple customers. One common example could be an application like Adobe Photoshop.

Develop once, sell services: The company develops a software that is customizable and allows space to different consultancy formulas. From selling technical support to adding extensions. This opens up the possibility to sell to even more customers that need a more specific solution. For example, SAP.

Externalized development: The company doesn’t develop any software at all. They build their business around a software created by a third party and focus on selling consultancy around that software. This company does not have intimate knowledge of the software they sell because they are not involved in its development. An example could be any of the companies (not Microsoft) that sells you support for your Windows.

How does FOSS fit here?

Free and Open Source Software (FOSS) got into this paradigm with a lot of idealism and utopic dreams on how software could be made: Let’s work all together contributing for the greater good, sharing experiences, source code, freedom,…

Rainbow colored kitten with a unicorn and butterfly wings.
The promise of the FOSS utopia

And all those promises of freedom kind of broke the way of selling software. Suddenly you were able to see the source code of others, modify it, redistribute it, and run it yourself.

Now you can share costs, as shown on the following figure. You don’t have to start from scratch to develop a software. There is a consortium of entities, companies, or individuals involved in the development of said software.

Two column diagram. Left Column: Investment Right Column: Profit Software Development ----sell---> customers Software Development ----I have the knowledge---> Consultancy---offer support--->customers Consultancy---offer support--->customers
Simplified diagram of software economics with FOSS

The business model of selling custom software almost disappears. Right now almost all software in the market is based on or depends on FOSS.

Now, when you sell services around a software, even if you are not involved in the development of the software, you can still get knowledge on how that software works. You can have the knowledge expertise of the developer team without having to invest into development itself. The kind of services you can sell around a third party software improved.

FOSS as the ultimate way of profiting

This selling services model is what makes FOSS so attractive to companies. The cost of starting a new business is dramatically reduced. You can base your business in something someone else maintains, without the risk of that third party disappearing and voiding your company. Because the source code is out there and you will never lose it.

And this is the business model that gets abused by malicious actors.

How do you do, fellow FOSS developers?

As now costs are shared, companies need to invest less to get the same profit, which attracts more business around FOSS. Slowly, people contributing to FOSS are no longer the idealists that want to contribute to the greater good. Now there are people that want to get profit but don’t invest on FOSS. Because, why would they? The model works anyway. Their business works anyway. They can fill their pockets with money without having to invest into the software itself.

Are all non-contributors so bad?

Sometimes those non-contributors are an indirect force of good.

Imagine that you have a company in Europe, where all your customers are. And you maintain the software in collaboration with other companies spread around America and Europe.

If a company appears in, let’s say, Australia, looking for customers there, is it so bad, even if they don’t contribute? They are not “stealing” customers, those are customers you will never would have found anyway. They are expanding the market and reaching new niches. Maybe they didn’t even said hello in your mailing lists. But your business will continue unaffected.

We can go further and say that, even if they don’t contribute directly, maybe their customers will contribute. Writing articles about your software, creating tutorials, helping with translations, reporting bugs,… Or maybe as now there are a group of users of your software there, another company will start selling it and contribute back.

What is the problem then?

The main problem comes when the non-contributor is someone that can hurt your project sustainability. Maybe a big company, say for example Amazon, starts selling your product.

How do you do, fellow companies that make a lot of money around this software

As they are a big company, they can offer your product lowering the prices below the cost of offering it, to dry competitors. The big company has the muscle to make a team of consultants appear out of nothing, with the expertise of having studied the available source code, and offering a wide range of services without having to invest in the project itself.

This is a big problem, because the companies that are contributing to the project now have less customers, meaning they can invest less in the sustainability of the project itself. They can’t invest in new features as before, so the project becomes stagnant. There may be minor bugs accumulating because no one has the bandwidth to fix them. The project survival is at risk.

And the worst thing is: this big company doesn’t care if your project dies. They can fork it and continue with their legion of developers. Or they can just move to another similar project. No loss on their side, all gains.

The problem lies within

The problem is not that one company in particular does this. Today it may be Amazon, yesterday it was Apple, tomorrow it may be Google. Who knows. If I sit down with Jeff Bezos and I tell him: “Look, Jeff, pretty face, come here. Do you realize what you are doing? That you are killing this project by making it non sustainable? That you are hurting the common good? Please stop.” He wouldn’t care. Because it is profitable. And legal.

I too despised Elon Musk before it was mainstream

The problem is that in our society, in our economy, this kind of behavior is not only allowed, but incentivized. Exploiting a resource until you break it is profitable. And I may be able to convince Jeff to stop doing it. But there will be other companies that will take its place. Because it is profitable. What would be their motivation to stop doing it?

They don’t understand why we do FOSS. The only thing they see is that we are a naive gratis workforce that is doing all the investment for them to get profit.

What are we doing about it?

This is nothing new. We have been dealing with this for decades already. And there are several approaches to try to solve this. I am going to mention the ones I have found more common.

Post Open Source Software

This became very famous when Github started. Many of the new repositories didn’t have any kind of license attached at the time. Now there is a wizard on project creation to make you choose a license. But at the time, without any license attached, the source code uploaded had no explicit license. People wrongly believed that that was FOSS because it was published.

Some people even published the software without license on purpose, as a way to rebel against people ignoring or workarounding FOSS licenses.

Long story short: this is a legal nightmare that comes from the misconception that there are no default licenses and laws associated to your work. Depending on the country from where that code was created and the country from where you are trying to download and use it, the laws and restrictions over that software are different. Sometimes even contradictory. The code may be visible to everyone, but that’s the only certain thing about it.

A less extreme approach to these are the simpler licenses like the beerware, in which you can use the software however you want with the only condition that if you like it, and meet the developer in person, you have to invite them to a beer. Or the “Do what the fuck you want with the software”, which is literally that, do what the fuck you want with the software. A public domain in disguise.

Ethical Software

Some people tried to remove neutrality to software licenses to force the users to do good. But, well, how do you define good? If you are too detailed, that may not be very useful. But if you are too broad, it may be difficult to enforce.

Some licenses just copy the Human Rights Declaration as clauses. Which is something that any lawyer still can work around, but at least is an attempt. If you want to use that software, you must behave.

Open Core

This is a middle ground between FOSS and restricted licensed software. You make the core of your software FOSS, anyone can use it. While you, as a company (or a consortium of companies), offer a lot of restricted licensed extensions that no one else has, which is what keep your users hooked up. This allows for some community grow around your project while preventing other companies to just sell the same thing you sell without contributing.

But what does prevent a big company, again, to take that free core and with its army of developers build a lot of extensions that make your own seem like toys? So while you focus on developing the core and your extensions, that company just focus on their extensions.

More strict licenses

Las GPL license forces any vendor to redistribute the source code if they modify and redistribute (sell) it. If you use a framework, library, any kind of dependency that is GPL, you have to publish it too as GPL.

This forces companies to share any kind of customization or enhancements that they do to the base software. They are forced to share and contribute somehow. Which can be workarounded too, because you are not forced to do a merge request to the original project, you can just fork it. But at least the source code will be in the open somewhere and someone can take it back to the original project.

The problem is that in a cloud world you no longer sell or redistribute the software. You offer it as a service. This means that you can have a GPL software modified and not make the code available because you are not distributing it, it stays on your server (or that’s how some people claim it works). In a cloud world, GPL is just not enough.

That’s why AGPL was created: if you offer something as a service, you have to make the source code available too. And not only your source code, but all the stack. Why? Because you can always wrap the source code around some other software that enhances it without having to modify the software itself.

And this scared a lot of people away. The discussion if this is a legit fear or not could be another set of articles by itself.

Commons Clause

The software has a regular FOSS license with an extra clause: if you want to sell this software, you have to add something to it. Some feature, some customization. But just don’t sell it as is. If you want to use it, you are completely free to do it. But selling? No, if you are profiting you have to develop something.

This drives away companies that are not interested in developing around the software, while you attract companies that are interested in developing and contributing. On the other hand, you are also driving away smaller companies and startups that may be contributing later, but need to build some business only based on services before they can afford to invest.

Depending on how this commons clause is redacted, it can be a very good or a very bad idea. It may have exceptions to allow smaller companies to participate, but then we are again adding blur that lawyers can work around.

Customized licenses

I don’t like them much, because it requires a full team of lawyers to make sure you are not making a mistake. If a license like Apache took years and an army of lawyers to make it strong and reliable, why do you think your small team of company lawyers can do it better?

There are a wide variety of customized licenses.

Sometimes they explicitly restrict that only companies from a certain consortium can sell or use the software. If you want to sell it, contact us first and discuss the fee.

The main drawback here is that you are scaring away other companies that may be interested in contributing. You are limiting how your community grows. You are introducing a risk to potential business. Will companies trust the consortium of companies enough to build a business around it? What if my company gets too big and the rest of the consortium wants to push me away or make me pay more fees? What if I become too big and I want to push the rest of the companies away?

Some other licenses just release the source code with some delay. The code you see in the repository or the code you can freely use may be two, three, four months old. This gives an advantage to the consortium of companies with access to the latest code over the companies that just take what is in the open.

But again, you are hurting potential contributors. What is the incentive to fix a bug or develop a new feature if the code I am working on may be too old and have conflicts with the latest source code? What if I am working on a bug fix that is already solved?

A change of paradigm

In the end, all these solutions are just a patch over the main issue here: our society is organized in a way that taking advantage of the common good is profitable.

Can we change this paradigm? Can we de-incentivize the capital profit and focus on sustainability instead?

Don’t change the license, change the paradigm

There are attempts to change to a Economy for the Common Good, de-prioritizing capital and trying to focus in the common good. You may make a lot of money by contaminating a river with your chemical plant, but the consequences should make it not appealing to you. Or maybe we just need the Fully Automated Luxury Communism?

Technology and the Virtues

I want to finish the article with some hope. I read a book called Technology and the Virtues, by Shannon Vallor. She explores the need to cultivate a technomoral virtue, based both in historical philosophy and current technology trends.

This may not be the final solution, but I like the approach of trying to change society from inside as the way to stop malicious actors. It is a long book, but as a very rough summary, there is a list of virtues suggested as a force to change society. Some of them may sound already familiar to you.

Honesty

Respect the truth. Avoid infoxication and fake news. Be transparent and respect other people’s privacy. Don’t misguide your customers when you tell them what you are selling. Don’t hide important information.

If vendors were really honest about their restricted licenses, very few people would use anything that was exploiting the common good and making software less sustainable.

Self-Control

Focus on what is really important and don’t consume like crazy. Maybe it is time to stop buying to companies that hurt the common good. Or stop using software that uses dark patterns or hurt some groups of people. Leave social networks that allow violence against groups of people. Or, as a developer, restrain from working in companies that exploit resources or pollute.

Humility

We don’t know everything and we should be cautious. That doesn’t mean stopping innovation, but to think ahead how our decisions impact others. There are unforeseen consequences, like ai algorithms that become racist or collecting too much data that is later hacked.

I am sure the person who invented blockchain didn’t imagine how many scams would flourish around it and how much pollution mining those coins would make.

Justice

Don’t do things just because you can, but because it is the right thing to do.

You were so busy wondering if you could do it, you never stopped to ask if you should.

By the way, the moral of Jurassic Park is not this one, but “pay your developers right”.

Courage

Common good is the priority, even if that affects you negatively. This is a hard one to practice in our current world because you are sacrificing something you want in pursue of something that may or may not benefit others. If you are the only one making the sacrifice, it usually is very discouraging. We need everyone to do their part.

Empathy

Wear someone else’s shoes. Share feelings. We are humans, let’s acknowledge we all have emotions and we all do things that can hurt others, even if we don’t want to.

Care

Be emotionally vulnerable. We are all co-dependent of our environment and the people around us. And that goes both ways, they depend on us. You should care about the people around you the same way they should care about you. We have to help each other to not hurt and evolve to a better place.

Civility

Contribute to society, stop the evil actors, care about the common good. If you are planning on building something that is hurting your environment, don’t.

Flexibility

We are all together in this. Be good with the good actors but firm with the bad actors.

Perspective

We are diverse, use everyone’s perspectives when taking decisions. This is a worldwide effort so it should be adapted to everyone in the world. Consider that not all cultures have the same aspirations or the same definitions of good and bad. We need to act globally but not impose our culture over others.

Magnanimity

Practice humanly leadership. Aspire to utopia without falling into a Messiah’s complex or a white savior complex.

Technomoral Widsom

Practice these virtues always. Theory without practice is useless. And the more you practice them, the easier it will become.

In theory, if we apply these virtues regularly, not only on us but on the people around us, this will also change the paradigm around us and will make exploiting resources and destroying the common good something not desirable for anyone. Could this be our better chance?

es_ESEspañol