Wednesday, November 12, 2014

Flume and rolling into a big file

I takes me for a while to figure out how to stop Flume from rolling small files, but into a big file (e.g 128 MB). I read 2 books on Flumes and none of them showing me how to accomplish that! Even Flume docs doesn't show you to do that neither.
I started Google and have to put pieces together and here's the working config for Flume to roll in the file size you want. This config for JMS queue. The magic is "JmsAgent.sinks.HDFS.hdfs.minBlockReplicas = 1"!

JmsAgent.sources = JmsSrc
JmsAgent.channels = MemChannel
JmsAgent.sinks = HDFS

JmsAgent.sources.JmsSrc.type = jms
JmsAgent.sources.JmsSrc.initialContextFactory = ...
JmsAgent.sources.JmsSrc.connectionFactory = ...
JmsAgent.sources.JmsSrc.providerURL = ...
JmsAgent.sources.JmsSrc.destinationName = ...
#default batchsize = 100
JmsAgent.sources.JmsSrc.batchSize = 500
JmsAgent.sources.JmsSrc.destinationType = QUEUE


JmsAgent.sinks.HDFS.type = hdfs
JmsAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
JmsAgent.sinks.HDFS.hdfs.path = hdfs://host/path/%Y-%m-%d
JmsAgent.sinks.HDFS.hdfs.filePrefix = jms_sample
JmsAgent.sinks.HDFS.hdfs.fileType = DataStream
JmsAgent.sinks.HDFS.hdfs.writeFormat = Text
JmsAgent.sinks.HDFS.hdfs.batchSize = 10000
#256mg =  268435456
#JmsAgent.sinks.HDFS.hdfs.rollSize = 268435456
#128mg = 134217728
JmsAgent.sinks.HDFS.hdfs.rollSize =  134217728
JmsAgent.sinks.HDFS.hdfs.rollCount = 0
#default rollInterval
JmsAgent.sinks.HDFS.hdfs.rollInterval = 0
JmsAgent.sinks.HDFS.hdfs.idleTimeout = 3600
JmsAgent.sinks.HDFS.hdfs.minBlockReplicas = 1

JmsAgent.channels.MemChannel.type = memory
JmsAgent.channels.MemChannel.capacity = 11000
JmsAgent.channels.MemChannel.transactionCapacity = 10000

JmsAgent.sources.JmsSrc.channels = MemChannel
JmsAgent.sinks.HDFS.channel = MemChannel

Have fun with Flume!

Typesafe Activator behind proxy

As started playing with Scala, I need to run Typesafe Activator. And I tried to run activator ui and got an error. It turned out that Activator needs to download dependencies from maven repos and it was blocked by the firewall.
Here is how to configure the proxy for Activator: https://typesafe.com/activator/docs
Now I can  have some fun with Scala and Activator!