Leaving aside complexities like the Enterprise Integration Patterns, we can consider most integrations as a form of advanced ETL: Extract, Transform, and Load. We extract data from a data store or service. Then we transform it from an input to an output format. And finally we push or load that transformed data into some output channel. It is the easiness to connect with the input and output channels what makes the ETL need a proper integration framework.
Complex integrations will combine these three steps differently. But the outcome is always to move information from one place to another, connecting different systems. Where the information may be a full dataset or just a triggered event.
I already tackled the issue of choosing the right integration tool from an engineer’s perspective and what variables to take into account. But when we are talking about data science and data analysis, there is a requirement that goes on top of all of the previous: the accessibility and easiness of usage of the tool.
Continuar leyendo “Approaching Zero-ETL with FOSS”