Skip to main content

Release 1.2.0

· 4 min read

Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data.

1.2.0 Features Overview

The just-released 1.2.0-incubating version closes about 410+ issues, contains 30+ features and 190+ optimizations. Mainly include the following:

Enhance management and control capabilities

  • Dashboard and Manager add cluster management capabilities
  • Dashboard optimizes the flow creation process
  • Manager supports plug-in extension of MQ

Extended collection node

  • Support for collecting data in Pulsar
  • Support data collection in MongoDB-CDC
  • Support data collection in MySQL-CDC
  • Support data collection in Oracle-CDC
  • Support data collection in PostgreSQL-CDC
  • Support data collection in SQLServer-CDC

Extended write node

  • Support for writing data to Kafka
  • Support for writing data to HBase
  • Support for writing data to PostgreSQL
  • Support for writing data to Oracle
  • Supports writing data to MySQL
  • Support writing data to TDSQL-PostgreSQL
  • Support for writing data to Greenplum
  • Supports writing data to SQLServer

Support data conversion

  • Support String Split
  • Support String Regular Replace
  • Support String Regular Replace First Matched Value
  • Support Data Filter
  • Support Data Distinct
  • Support Regular Join

Enhanced system monitoring function

  • Support the reporting and management of data link heartbeat

Other optimizations

  • Supports the delivery of DataProxy multi-cluster configurations
  • GitHub Action check, pipeline optimization

1.2.0 Features Details

Support multi-cluster management

Manager adds cluster management function, supports multi-cluster configuration, and solves the limitation that only one set of clusters can be defined through configuration files. Users can create different types of clusters on Dashboard as needed.

The multi-cluster feature is mainly designed and implemented by @healchow, @luchunliang, @leezng, thanks to three contributors.

Enhanced collection of file data and MySQL Binlog

Version 1.2.0 supports collecting complete file data, and also supports collecting data from the specified Binlog location in MySQL. This part of the work was done by @Greedyu.

Support whole database migration

Sort supports migration of data across the entire database, contributed by @EMsnap.

Supports writing data in Canal format

Support for writing data in Canal format to Kafka, contributed by @thexiay.

Optimize the HTTP request method in Manager Client

Optimized the way of executing HTTP requests in Manager Client, and added unit tests for Client, which reduces maintenance costs while reducing duplication of code. This feature was contributed by new contributor @leosanqing.

Supports running SQL scripts

Sort supports running SQL scripts, see INLONG-4405, thanks to @gong for contributing this feature.

This version supports the heartbeat reporting and management of data grouping, data flow and underlying components, which is the premise of the state management of each link of the subsequent system.

This feature was primarily designed and contributed by @baomingyu, @healchow and @kipshi.

Manager supports the creation of resources in multiple flow directions

In version 1.2.0, Manager added the creation of some storage resources:

  • Create Topic for Kafka (contributed by @woofyzhao)
  • Create databases and tables for Iceberg (contributed by @woofyzhao)
  • Create namespaces and tables for HBase (contributed by @woofyzhao)
  • Create databases and tables for ClickHouse (contributed by @lucaspeng12138)
  • Create indices for Elasticsearch (contributed by @lucaspeng12138)
  • Create databases and tables for PostgreSQL (contributed by @baomingyu)

Sort supports lightweight architecture

Version 1.2.0 of Sort has done a lot of refactoring and improvements. By introducing Flink-CDC, it supports a variety of Extract and Load nodes, and also supports data transformation (ie Transform).

This feature contains many sub-features. The main developers are: @baomingyu, @EMsnap, @GanfengTan, @gong, @lucaspeng12138, @LvJiancheng, @kipshi, @thexiay, @woofyzhao, @yunqingmoswu, thank you all for your contributions.

For more information, please refer to: Analysis of InLong Sort ETL Solution.

Other features and bug fixes

For related content, please refer to the Release Notes, which details the features, enhancements and bug fixes of this release.

Apache InLong follow-up planning

In subsequent versions, we will expand more data sources and storages to cover more usage scenarios, and gradually improve the usability and robustness of the system, including:

  • Heartbeat report of each component
  • Status management of data flow
  • Full link audit support for writing to ClickHouse
  • Expand more types of acquisition nodes and storage nodes