Release 1.2.0 | Apache InLong

Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data.

1.2.0 Features Overview

The just-released 1.2.0-incubating version closes about 410+ issues, contains 30+ features and 190+ optimizations. Mainly include the following:

Enhance management and control capabilities

Dashboard and Manager add cluster management capabilities
Dashboard optimizes the flow creation process
Manager supports plug-in extension of MQ

Extended collection node

Support for collecting data in Pulsar
Support data collection in MongoDB-CDC
Support data collection in MySQL-CDC
Support data collection in Oracle-CDC
Support data collection in PostgreSQL-CDC
Support data collection in SQLServer-CDC

Extended write node

Support for writing data to Kafka
Support for writing data to HBase
Support for writing data to PostgreSQL
Support for writing data to Oracle
Supports writing data to MySQL
Support writing data to TDSQL-PostgreSQL
Support for writing data to Greenplum
Supports writing data to SQLServer

Support data conversion

Support String Split
Support String Regular Replace
Support String Regular Replace First Matched Value
Support Data Filter
Support Data Distinct
Support Regular Join

Enhanced system monitoring function

Support the reporting and management of data link heartbeat

Other optimizations

Supports the delivery of DataProxy multi-cluster configurations
GitHub Action check, pipeline optimization

1.2.0 Features Details

Support multi-cluster management

Manager adds cluster management function, supports multi-cluster configuration, and solves the limitation that only one set of clusters can be defined through configuration files. Users can create different types of clusters on Dashboard as needed.

The multi-cluster feature is mainly designed and implemented by @healchow, @luchunliang, @leezng, thanks to three contributors.

Enhanced collection of file data and MySQL Binlog

Version 1.2.0 supports collecting complete file data, and also supports collecting data from the specified Binlog location in MySQL. This part of the work was done by @Greedyu.

Support whole database migration

Sort supports migration of data across the entire database, contributed by @EMsnap.

Supports writing data in Canal format

Support for writing data in Canal format to Kafka, contributed by @thexiay.

Optimize the HTTP request method in Manager Client

Optimized the way of executing HTTP requests in Manager Client, and added unit tests for Client, which reduces maintenance costs while reducing duplication of code. This feature was contributed by new contributor @leosanqing.

Supports running SQL scripts

Sort supports running SQL scripts, see INLONG-4405, thanks to @gong for contributing this feature.

Support the reporting and management of data link heartbeat

This version supports the heartbeat reporting and management of data grouping, data flow and underlying components, which is the premise of the state management of each link of the subsequent system.

This feature was primarily designed and contributed by @baomingyu, @healchow and @kipshi.

Manager supports the creation of resources in multiple flow directions

In version 1.2.0, Manager added the creation of some storage resources:

Create Topic for Kafka (contributed by @woofyzhao)
Create databases and tables for Iceberg (contributed by @woofyzhao)
Create namespaces and tables for HBase (contributed by @woofyzhao)
Create databases and tables for ClickHouse (contributed by @lucaspeng12138)
Create indices for Elasticsearch (contributed by @lucaspeng12138)
Create databases and tables for PostgreSQL (contributed by @baomingyu)

Sort supports lightweight architecture

Version 1.2.0 of Sort has done a lot of refactoring and improvements. By introducing Flink-CDC, it supports a variety of Extract and Load nodes, and also supports data transformation (ie Transform).

This feature contains many sub-features. The main developers are: @baomingyu, @EMsnap, @GanfengTan, @gong, @lucaspeng12138, @LvJiancheng, @kipshi, @thexiay, @woofyzhao, @yunqingmoswu, thank you all for your contributions.

For more information, please refer to: Analysis of InLong Sort ETL Solution.

Other features and bug fixes

For related content, please refer to the Release Notes, which details the features, enhancements and bug fixes of this release.

Apache InLong follow-up planning

In subsequent versions, we will expand more data sources and storages to cover more usage scenarios, and gradually improve the usability and robustness of the system, including:

Heartbeat report of each component
Status management of data flow
Full link audit support for writing to ClickHouse
Expand more types of acquisition nodes and storage nodes

1.2.0 Features Overview​

Enhance management and control capabilities​

Extended collection node​

Extended write node​

Support data conversion​

Enhanced system monitoring function​

Other optimizations​

1.2.0 Features Details​

Support multi-cluster management​

Enhanced collection of file data and MySQL Binlog​

Support whole database migration​

Supports writing data in Canal format​

Optimize the HTTP request method in Manager Client​

Supports running SQL scripts​

Support the reporting and management of data link heartbeat​

Manager supports the creation of resources in multiple flow directions​

Sort supports lightweight architecture​

Other features and bug fixes​

Apache InLong follow-up planning​