Skip to main content
Version: Next

Deployment

Prepare to get module archive

Module archive is in the directory: inlong-sort-standalone/sort-standalone-dist/target/, the archive file is apache-inlong-sort-standalone-${project.version}-bin.tar.gz.

Start inlong-sort-standalone application

After the compilation is completed and the tar.gz package is generated, unzip the file to start the inlong-sort-standalone application.

example:

./bin/sort-start.sh

Configuration of conf/common.properties

ParameterRequiredDefaultValueRemark
clusterIdYNAinlong-sort-standalone cluster id
sortSource.typeNorg.apache.inlong.sort.standalone.source.readapi.ReadApiSourceSource class name
sortChannel.typeNorg.apache.inlong.sort.standalone.channel.BufferQueueChannelChannel class name
sortSink.typeNorg.apache.inlong.sort.standalone.sink.hive.HiveSinkSink class name. Different distribution types use different Sink classes
sortClusterConfig.typeNorg.apache.inlong.sort.standalone.config.loader.ClassResourceSortClusterConfigLoaderThe distribution cluster configuration loading class name, ClassResourceSortClusterConfigLoader reads the distribution cluster configuration from the SortClusterConfig.conf source file in ClassPath
sortClusterConfig.managerPathNNADistribute the parameters of the cluster configuration loading class org.apache.inlong.sort.standalone.config.loader.ManagerSortClusterConfigLoader and specify the URL path of the Inlong Manager
eventFormatHandlerNorg.apache.inlong.sort.standalone.sink.hive.DefaultEventFormatHandlerFormat conversion class name before distributing Hive
maxThreadsN10Sink thread number
reloadIntervalN60000Interval updating Configuration data(millisecond)
processIntervalN100Interval processing data(millisecond)
metricDomainsNSortDomain name of metric
metricDomains.Sort.domainListenersNorg.apache.inlong.sort.standalone.metrics.prometheus.PrometheusMetricListenerClass name list of metric listener, separated by space
prometheusHttpPortN8080HTTP server port of prometheus simple client
metricDomains.Sort.snapshotIntervalN60000Interval snapshoting metric data(millisecond)

SortClusterConfig Configuration

  • Can read from the SortClusterConfig.conf source file in ClassPath, but does not support real-time updates

  • Can get the configuration from the HTTP interface of Inlong Manager

    ParameterRequiredDefaultValueRemark
    clusterNameYNAUsed to uniquely identify an inlong-sort-standalone cluster
    sortTasksYNADistribute task lists

SortTaskConfig Configuration

ParameterRequiredDefaultValueRemark
nameYNADistribute task name
typeYNADistribute task types, such as HIVE("hive"), TUBE("tube"), KAFKA("kafka"), PULSAR("pulsar"), ElasticSearch("ElasticSearch"), UNKNOWN("n")
idParamsYNAInlong data stream parameter list
sinkParamsYNADistribute task parameters

Hive Distributed Tasks IdParams

ParameterRequiredDefaultValueRemark
inlongGroupIdYNAinlongGroupId
inlongStreamIdYNAinlongStreamId
separatorYNADelimiter
partitionIntervalMsN3600000Partition interval, in milliseconds
idRootPathYNAHDFS root directory of Inlong data stream
partitionSubPathYNAPartition subdirectories for inlong data streams
hiveTableNameYNAHive table name of the Inlong data stream
partitionFieldNameNdtPartition field name of the Inlong data stream
partitionFieldPatternYNAThe partition field value format of the Inlong data stream, such as {yyyyMMdd}, {yyyyMMddHH}, {yyyyMMddHHmm}
msgTimeFieldPatternYNAThe field value format of the message generation time, Java time format
maxPartitionOpenDelayHourN8Maximum opening delay time of the partition, in hours

Hive Distributed Tasks SinkParams

ParameterRequiredDefaultValueRemark
hdfsPathYNAHDFS nameNode
maxFileOpenDelayMinuteN5Maximum write time of a single HDFS file, in minutes
tokenOvertimeMinuteN60The maximum time it takes to create a token for a partition of a single Inlong data stream, in minutes
maxOutputFileSizeGbN2Maximum size of a single HDFS file, in GB
hiveJdbcUrlYNAHive JDBC Path
hiveDatabaseYNAHive Database
hiveUsernameYNAHive Username
hivePasswordYNAHive Password

Pulsar Distributed Tasks IdParams

ParameterRequiredDefaultValueRemark
inlongGroupIdYNAinlongGroupId
inlongStreamIdYNAinlongStreamId
topicYNAPulsar Topic

Pulsar Distributed Tasks SinkParams

ParameterRequiredDefaultValueRemark
serviceUrlYNAPulsar service path
authenticationYNAPulsar cluster authentication
enableBatchingNtrueenableBatching
batchingMaxBytesN5242880batchingMaxBytes
batchingMaxMessagesN3000batchingMaxMessages
batchingMaxPublishDelayN1batchingMaxPublishDelay
maxPendingMessagesN1000maxPendingMessages
maxPendingMessagesAcrossPartitionsN50000maxPendingMessagesAcrossPartitions
sendTimeoutN0sendTimeout
compressionTypeNNONEcompressionType
blockIfQueueFullNtrueblockIfQueueFull
roundRobinRouterBatchingPartitionSwitchFrequencyN10roundRobinRouterBatchingPartitionSwitchFrequency

Hive Configuration Example

{
"data": {
"clusterName": "hivev3-sz-sz1",
"sortTasks": [
{
"idParams": [
{
"inlongGroupId": "0fc00000046",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_0fc00000046",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_0fc00000046",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
},
{
"inlongGroupId": "03600000045",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_03600000045",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_03600000045",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
},
{
"inlongGroupId": "05100054990",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_05100054990",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_05100054990",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
},
{
"inlongGroupId": "09c00014434",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_09c00014434",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_09c00014434",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
},
{
"inlongGroupId": "0c900035509",
"inlongStreamId": "",
"separator": "|",
"partitionIntervalMs": 3600000,
"idRootPath": "/user/hive/warehouse/t_inlong_v1_0c900035509",
"partitionSubPath": "/{yyyyMMdd}/{yyyyMMddHH}",
"hiveTableName": "t_inlong_v1_0c900035509",
"partitionFieldName": "dt",
"partitionFieldPattern": "yyyyMMddHH",
"msgTimeFieldPattern": "yyyy-MM-dd HH:mm:ss",
"maxPartitionOpenDelayHour": 8
}
],
"name": "sid_hive_inlong6th_v3",
"sinkParams": {
"hdfsPath": "hdfs://127.0.0.1:9000",
"maxFileOpenDelayMinute": "5",
"tokenOvertimeMinute": "60",
"maxOutputFileSizeGb": "2",
"hiveJdbcUrl": "jdbc:hive2://127.0.0.2:10000",
"hiveDatabase": "default",
"hiveUsername": "hive",
"hivePassword": "hive"
},
"type": "HIVE"
}
]
},
"errCode": 0,
"md5": "md5",
"result": true
}