Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong offers great power to build data analysis, modeling and other real-time applications based on streaming data.
1.4.0 version overview
Apache InLong recently released version 1.4.0, which closed about 364+ issues with 16+ features and 120+ optimizations. It mainly completes the real-time synchronization of the entire database to Apache Doris and the real-time synchronization of the entire database to Apache Iceberg, the standard architecture supports HTTP reporting, and the standard architecture adds MongoDB and other collection nodes. This release also completes a number of other features, including:
Agent Module
- Refactored sink-sending metrics
- Audit report increases data size
- Support Redis, MQTT, SQLServer, Oracle, and MongoDB data sources
- Enhanced Kubernetes environment file collection capability
DataProxy module
- Heartbeat reporting adds service status and supports authentication
- Added proxy-send mode to send data
- Optimized data link buried point indicators
TubeMQ module
- Added client load balancing consumer group control API
- C++ SDK fixes multiple bugs
Manager module
- Data stream Group and Stream support extended parameters
- Client supports updating and deleting data flow through Key
- Refactored the way to obtain Sort cluster configuration information
- Optimized state management
- Client supports cluster addition, deletion, modification, and query
- Cluster nodes report new protocol types
- Cache layer usage supports using Kafka
Sort module
- Support debezium-json format
- Kafka data nodes support topic dynamic awareness
- Connectors such as Hive/Hbase/Iceberg support indicator status recovery
- Elasticsearch 6/7, JDBC connector added indicator status
- Iceberg sink supports schema revolution, can automatically build tables, and perceive the increase of fields
Dashboard module
- Unified data source, data flow type definition
- Added Agent type for cluster management
- Add data node management
- Support selection of Kafka message type
Other
- docker-compose deploys the built-in Flink environment
- Fix multiple aarch64 mirror bugs
- Fix multiple dependency security bugs
1.4.0 version feature introduction
Agent adds a variety of data sources
In version 1.4.0, Agent supports data sources such as Redis, MQTT, SQLServer, Oracle, MongoDB, etc., so that the collection capabilities of standard architecture and lightweight architecture are basically aligned and users have more choices in massive scenarios. The support of this part of the back-end capabilities is mainly completed by @iamsee123 and @haibo-duan, and the front-end part is completed by @bluewang.
Improve component metrics
In version 1.4.0, the Agent, DataProxy, and Sort modules all have indicators optimized and improved, including the reconstruction of the indicators sent by the Agent, increasing the dimension of the data Group/Stream indicators, and fixing the inaccuracy of the Prometheus Listener indicators. Thanks to @Keylchen, @pocozh, and others for their contributions.
Optimize Docker-compose deployment
There are many InLong service components, and there has always been a problem with high deployment thresholds. In version 1.4.0, the compatibility of docker-compose deployment is optimized, and an Apache Flink environment is built-in to help developers quickly start creating tasks. Thanks to @dockerzhang for optimizing this part.
Optimize heartbeat management
In version 1.4.0, a lot of optimizations have been made for the heartbeat of service components, including adding data protocol when reporting, automatic registration of Agent/DataProxy component reporting, adding heartbeat management API to Manager, and optimizing multiple heartbeat status bugs. Thanks to @gosonzhang, @GanfengTan, @pocozh, @lucaspeng12138 and @haifxu for their contributions.
Support real-time synchronization of the entire database
In version 1.4.0, InLong began to support real-time synchronization of the entire database to follow up on the needs of community users. Currently, it is the first to achieve real-time synchronization of the entire database to Doris and real-time synchronization of the entire database to Iceberg/Kafka/Doris. In the near future, the community will also realize the synchronization of the entire database. Share the details. Thanks to @thesumery, @EMsnap, @yunqingmoswu for their contributions.
For more details on the 1.4.0 release, please refer to the release notes, which detail the features, enhancements, and bug fixes for this release.
Follow-up planning
In the next version, the community will continue to add synchronization scenarios for the entire database, improve task indicators, increase system stability, and conduct stress tests on standard architectures and lightweight architectures.