Skip to main content

Release 2.1.0

· 6 min read

Apache InLong has recently released version 2.1.0, which has closed over 120 issue, including more than 4 major features and over 110 optimizations. The main accomplishments include Dashboard supports batch operation of nodes, Manager supports multiple scheduling engines, Agent supports COS data sources, Sort supports archiving dirty data through the InLong SDK. Simultaneously optimize the user experience of Apache InLong operations and maintenance. In Apache InLong 2.1.0 version, a large number of other features have also been completed.

About Apache InLong

As the industry's first one-stop, all-scenario massive data integration framework, Apache InLong provides automated, secure, reliable, and high-performance data transmission capabilities, enabling businesses to quickly build stream-based data analysis, modeling, and applications. Currently, InLong is widely used in various industries including advertising, payment, social networking, gaming, and artificial intelligence, serving thousands of businesses, with high-performance scenarios processing over hundreds of billions of records per day and highly reliable scenarios handling over tens of trillions of records per day.

The core keywords for InLong's project positioning are "one-stop," "all-scenario," and "massive data." For "one-stop," we aim to shield technical details, provide complete data integration and supporting services, and achieve out-of-the-box usability; for "all-scenario," we aim to offer comprehensive solutions covering common data integration scenarios in the big data field; for "massive data," we hope to leverage architectural advantages such as layered data links, fully extensible components, and built-in multi-cluster management to stably support even larger data volumes based on hundreds of billions of records per day.

2.1.0 Overview

Apache InLong has recently released version 2.1.0, which has closed over 120 issue, including more than 4 major features and over 110 optimizations. The main accomplishments include

  • Dashboard supports batch operation of nodes
  • Manager supports multiple scheduling engines
  • Agent supports COS data sources
  • Sort supports archiving dirty data through the InLong SDK.

Simultaneously optimize the user experience of Apache InLong operations and maintenance. In Apache InLong 2.1.0 version, a large number of other features have also been completed.

Dashboard Module

  • Support COS data source
  • Support batch operation of agents: restart, upgrade
  • Support exporting audit data as CSV files
  • Support sorting of audit data and comparison of differences
  • Support queries for all types of indicators
  • Support data preview field segmentation

Manager Module

  • Support COS data source
  • Support managing multiple scheduling engines: AirFlow、DolphinScheduler
  • Support dirty data management and querying
  • Support querying heartbeat information based on IP
  • Limit one IP to only belong to one cluster
  • Provide an API for querying of dirty data archiving

Agent Module

  • Support COS data source
  • Support quick startup and shutdown
  • Support starting multiple instances
  • Support data supplementation in chronological order
  • Optimize the logic of the Installer process guardian for Agent
  • Support supplementary recording based on local data time

Sort Module

  • Added Elasticsearch connector based on Flink 1.18
  • Support KV separation on Kafka Sink
  • Support audit data reporting
  • Tube Connector source supports dirty data archiving

SDK Module

  • Transform SDK adds 7 new functions
  • Add Dirty Data Archiving SDK

Audit Module

  • Audit Proxy increases metric reporting
  • Audit Store adds metric reporting
  • Audit Service increases metric reporting
  • Add asynchronous flush audit data interface

TubeMQ Module

  • Write the consumption offset information to a local file
  • Optimize the load balancing logic of the Go version SDK

Others

  • Pipeline supports parallel build
  • Support Manager to configure volumes

2.1.0 Feature Introduction

Dashboard supports batch operation of agents

This feature is mainly used for the operation of Inlong Agent: mainly for upgrading and restarting:

  • After finding the cluster in cluster management, select multiple nodes to operate on and click on batch operation.

    2.1.0-dashboard-select.png

  • Select the operation type and fill in the required parameters for the corresponding operation, then click OK.

    2.1.0-dashboard-operate.png

This feature optimizes the operation and maintenance experience of Inlong: interface based operation eliminates the need to operate DB and increases the cohesion of Inlong:

  • Visual Agent version upgrade, which can be upgraded in batches and at regular intervals to control upgrade risks.
  • During agent fault recovery, this function can be used to quickly restart.

Thanks to @wohainilaodou for their contributions to this feature. For more details, please refer to INLONG-11187

Manager supports multiple scheduling engines

Previously, for offline data synchronization, Inlong only supported Quartz scheduling engine. This version has added two third-party engines: DolphinScheduler and AirFlow.

AirFlow engine

  • In order to facilitate the maintenance and expansion of AirFlow interface support in the future, AirflowApi interface and BaseAirflowApi abstract class have been designed, and subsequent extensions only need to be based on this foundation.
  • Implement a unified request class AirflowServerClient for the interface.
  • Add two interceptors in OkHttpClient: AirflowAuthInterceptor for unified authorization of interfaces; LoggingInterceptor is used for logging.

Thanks to @Zkplo for their contributions to this feature. For more details, please refer to INLONG-11400

DolphinScheduler engine

-Add the DolphinScheduler package to org.apache.inlong.manager.schedule -Add client and engine for DS, as well as util for operating open APIs for DS -Add pojo class for DS interaction

Thanks to @emptyOVO for their contributions to this feature. For more details, please refer to INLONG-11401

Agent supports COS data source

  • Create a new COS type node and fill in the corresponding bucket name, credential ID, credential key, and region.

    2.1.0-agent-node.png

  • Create a new COS type data source, select the corresponding node, IP, and file path.

    2.1.0-agent-type.png

    2.1.0-agent-param.png

This feature supports direct data collection from COS storage, and businesses do not need to download COS files locally for data collection. Thanks to @justinwwhuang for their contributions to this feature. For more details, please refer to INLONG-11187

Sort supports archiving dirty data through the InLong SDK.

Added the ability to report dirty data to specified GroupId and StreamId through the InLong SDK. Users can choose to export dirty data or consume it independently from Pulsar.

2.1.0-sort-dirty.png

The following configuration needs to be added to the Connector:

'dirty.side-output.inlong-sdk.inlong-auth-key' = 'your auth key',

'dirty.side-output.inlong-sdk.inlong-auth-id' = 'your auth id',

'dirty.side-output.enable' = 'true',

'dirty.side-output.inlong-sdk.inlong-group-id' = 'target_inlong_group_id',
'dirty.side-output.inlong-sdk.inlong-stream-id' = 'target_inlong_stream_id',

'dirty.side-output.labels' = 'groupId=xx&streamId=xx&serverType=tube&dataflowId=xx',

'dirty.side-output.inlong-sdk.inlong-manager-addr' = 'xxx',

'dirty.side-output.connector' = 'inlong-sdk',

'dirty.ignore' = 'true',`

Thanks to @vernedeng and @fuweng11 for their contributions to this feature. For more details, please refer to INLONG-11481 and INLONG-11508

Future Plans

In version 2.1.0, we have enriched and improved our operational capabilities. Welcome everyone to use it. If you have more scenarios and requirements, or encounter any problems during use, please feel free to raise issues and PR. In future versions, the InLong community will continue to:

  • Support more data source collection capabilities

  • Enrich Flink 1.15, 1.18 Connector

  • Continuously enhance Transform capabilities.

  • Provide real-time synchronization support for more data sources and targets.

  • Optimize SDK capabilities and user experience

  • Optimize Dashboard experience

We also look forward to more developers interested in InLong to contribute and help drive the project's development!