Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data.
1.2.0 Features Overview
The just-released 1.2.0-incubating version closes about 410+ issues, contains 30+ features and 190+ optimizations. Mainly include the following:
Enhance management and control capabilities
- Dashboard and Manager add cluster management capabilities
- Dashboard optimizes the flow creation process
- Manager supports plug-in extension of MQ
Extended collection node
- Support for collecting data in Pulsar
- Support data collection in MongoDB-CDC
- Support data collection in MySQL-CDC
- Support data collection in Oracle-CDC
- Support data collection in PostgreSQL-CDC
- Support data collection in SQLServer-CDC
Extended write node
- Support for writing data to Kafka
- Support for writing data to HBase
- Support for writing data to PostgreSQL
- Support for writing data to Oracle
- Supports writing data to MySQL
- Support writing data to TDSQL-PostgreSQL
- Support for writing data to Greenplum
- Supports writing data to SQLServer
Support data conversion
- Support String Split
- Support String Regular Replace
- Support String Regular Replace First Matched Value
- Support Data Filter
- Support Data Distinct
- Support Regular Join
Enhanced system monitoring function
- Support the reporting and management of data link heartbeat
Other optimizations
- Supports the delivery of DataProxy multi-cluster configurations
- GitHub Action check, pipeline optimization
1.2.0 Features Details
Support multi-cluster management
Manager adds cluster management function, supports multi-cluster configuration, and solves the limitation that only one set of clusters can be defined through configuration files. Users can create different types of clusters on Dashboard as needed.
The multi-cluster feature is mainly designed and implemented by @healchow, @luchunliang, @leezng, thanks to three contributors.
Enhanced collection of file data and MySQL Binlog
Version 1.2.0 supports collecting complete file data, and also supports collecting data from the specified Binlog location in MySQL. This part of the work was done by @Greedyu.
Support whole database migration
Sort supports migration of data across the entire database, contributed by @EMsnap.
Supports writing data in Canal format
Support for writing data in Canal format to Kafka, contributed by @thexiay.
Optimize the HTTP request method in Manager Client
Optimized the way of executing HTTP requests in Manager Client, and added unit tests for Client, which reduces maintenance costs while reducing duplication of code. This feature was contributed by new contributor @leosanqing.
Supports running SQL scripts
Sort supports running SQL scripts, see INLONG-4405, thanks to @gong for contributing this feature.
Support the reporting and management of data link heartbeat
This version supports the heartbeat reporting and management of data grouping, data flow and underlying components, which is the premise of the state management of each link of the subsequent system.
This feature was primarily designed and contributed by @baomingyu, @healchow and @kipshi.
Manager supports the creation of resources in multiple flow directions
In version 1.2.0, Manager added the creation of some storage resources:
- Create Topic for Kafka (contributed by @woofyzhao)
- Create databases and tables for Iceberg (contributed by @woofyzhao)
- Create namespaces and tables for HBase (contributed by @woofyzhao)
- Create databases and tables for ClickHouse (contributed by @lucaspeng12138)
- Create indices for Elasticsearch (contributed by @lucaspeng12138)
- Create databases and tables for PostgreSQL (contributed by @baomingyu)
Sort supports lightweight architecture
Version 1.2.0 of Sort has done a lot of refactoring and improvements. By introducing Flink-CDC, it supports a variety of Extract and Load nodes, and also supports data transformation (ie Transform).
This feature contains many sub-features. The main developers are: @baomingyu, @EMsnap, @GanfengTan, @gong, @lucaspeng12138, @LvJiancheng, @kipshi, @thexiay, @woofyzhao, @yunqingmoswu, thank you all for your contributions.
For more information, please refer to: Analysis of InLong Sort ETL Solution.
Other features and bug fixes
For related content, please refer to the Release Notes, which details the features, enhancements and bug fixes of this release.
Apache InLong follow-up planning
In subsequent versions, we will expand more data sources and storages to cover more usage scenarios, and gradually improve the usability and robustness of the system, including:
- Heartbeat report of each component
- Status management of data flow
- Full link audit support for writing to ClickHouse
- Expand more types of acquisition nodes and storage nodes