I takes me for a while to figure out how to stop Flume from rolling small files, but into a big file (e.g 128 MB). I read 2 books on Flumes and none of them showing me how to accomplish that! Even Flume docs doesn't show you to do that neither.
I started Google and have to put pieces together and here's the working config for Flume to roll in the file size you want. This config for JMS queue.
The magic is "JmsAgent.sinks.HDFS.hdfs.minBlockReplicas = 1"!
JmsAgent.sources = JmsSrc
JmsAgent.channels = MemChannel
JmsAgent.sinks = HDFS
JmsAgent.sources.JmsSrc.type = jms
JmsAgent.sources.JmsSrc.initialContextFactory = ...
JmsAgent.sources.JmsSrc.connectionFactory = ...
JmsAgent.sources.JmsSrc.providerURL = ...
JmsAgent.sources.JmsSrc.destinationName = ...
#default batchsize = 100
JmsAgent.sources.JmsSrc.batchSize = 500
JmsAgent.sources.JmsSrc.destinationType = QUEUE
JmsAgent.sinks.HDFS.type = hdfs
JmsAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
JmsAgent.sinks.HDFS.hdfs.path = hdfs://host/path/%Y-%m-%d
JmsAgent.sinks.HDFS.hdfs.filePrefix = jms_sample
JmsAgent.sinks.HDFS.hdfs.fileType = DataStream
JmsAgent.sinks.HDFS.hdfs.writeFormat = Text
JmsAgent.sinks.HDFS.hdfs.batchSize = 10000
#256mg = 268435456
#JmsAgent.sinks.HDFS.hdfs.rollSize = 268435456
#128mg = 134217728
JmsAgent.sinks.HDFS.hdfs.rollSize = 134217728
JmsAgent.sinks.HDFS.hdfs.rollCount = 0
#default rollInterval
JmsAgent.sinks.HDFS.hdfs.rollInterval = 0
JmsAgent.sinks.HDFS.hdfs.idleTimeout = 3600
JmsAgent.sinks.HDFS.hdfs.minBlockReplicas = 1
JmsAgent.channels.MemChannel.type = memory
JmsAgent.channels.MemChannel.capacity = 11000
JmsAgent.channels.MemChannel.transactionCapacity = 10000
JmsAgent.sources.JmsSrc.channels = MemChannel
JmsAgent.sinks.HDFS.channel = MemChannel
Have fun with Flume!