Friday 22 September 2017

1)About Apache Flume

Apache Flume:

Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store.


Note:
Standard tool for streaming log and event data into Hadoop, Flume is a critical component for building end-to-end streaming workloads.




Flume Architecture:
  • Flume can deploy any number of agents
  • Agent is the container for Flume data flow, it must have source,channel and sink.


Components of Apache Flume:

i)Flume source:Flume source consumes events delivered to it by an external source like a web server,social-media-generated data, email messages etc.,

ii)Flume Channel:
As soon as Flume source receives an event, it stores into one or more channels. The channel is a passive store that keeps the event until it’s consumed by a Flume sink.


Flume Channel ==> Passive store for Flume source events

Example:
File channel to store the source events in a file of local system.

iii)Flume sink:

Flume sink reads the events from channels and puts it into an external repository like HDFS (via Flume HDFS sink) and removes the events from the channel and looks for the another event from source.

Like chaining between source and sink, asynchronously,

Example Architecture:



Features of Apache Flume:
  • Reliability:
Events are staged in the channels , The events are removed from a channel only after they are stored in the channel of next agent or in the terminal repository.
  • Recoverability
The events are staged in the channel, which manages recovery from failure. Flume supports a durable file channel which is backed by the local file system as well memory channel.








No comments:

Post a Comment

Fundamentals of Python programming

Fundamentals of Python programming: Following below are the fundamental constructs of Python programming: Python Data types Python...