首页 Hadoop协同框架-Flume
文章
取消

Hadoop协同框架-Flume

Flume结构

  • Source : 用户配置采集数据的方式(Http、LocalFileSystem、Tcp)
  • Channel ——中间件
    • Memory Channel:临时存放到内存
    • FIle Channel :临时存放到本地磁盘
  • Sink :将数据存放目的地(HDFS、本地文件系统、Logger、Http)

常用配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 每个组件的名称
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# netcat监控方式、监控的ip:localhost、端口:44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# sink 的方式 logger
a1.sinks.k1.type = logger

# 写入到内存、
a1.channels.c1.type = memory

# 绑定source和sink到channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Source

Exec source

用于监控Linux命令

1
2
3
4
5
6
7
a1.sources = r1
a1.channels = c1
# 指定类型、命令
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/secure

a1.sources.r1.channels = c1

Exec Source详细参数

Spooling Directory Source

用于监控文件,比Exec监控更加可靠

1
2
3
4
5
6
7
a1.channels = ch-1
a1.sources = src-1

fs.sources.r3.type=spooldir
fs.sources.r3.spoolDir=/opt/modules/apache-flume-1.6.0-bin/flume_template
fs.sources.r3.fileHeader=true
fs.sources.r3.ignorePattern=^(.)*\\.out$ # 过滤out结尾的文件

Spooling Directory Source 详细参数

Channel

Memory Channel

中间文件存放在内存中

1
2
3
4
5
6
a1.channels = c1
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 10000
a1.channels.c1.byteCapacityBufferPercentage = 20
a1.channels.c1.byteCapacity = 800000

Memory Channel 详细参数

File Channel

中间文件存放在文件中

1
2
3
4
a1.channels = c1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /mnt/flume/checkpoint
a1.channels.c1.dataDirs = /mnt/flume/data

File Channel 详细参数

Sink

Logger Sink

在INFO级别记录文件,通常用于调试

1
2
3
4
a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = logger
a1.sinks.k1.channel = c1

Logger Sink 详细参数

HDFS Sink

记录文件写入到HDFS中

1
2
3
4
5
6
7
8
9
a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

详细参数

本文由作者按照 CC BY 4.0 进行授权