Tuesday, 1 August 2017

2)File storage on HDFS:

File storage on HDFS:
*Each Files are split into blocks
      – Each block is usually 64MB or 128MB 

*Blocks of data is distributed across many machines at load time
      – Different blocks from the same file will be stored on different machines
      – This provides for efficient Data processing

*Blocks are replicated across multiple machines, known as DataNodes

      –By default three,==> Meaning that each block exists on three different machines

*A master node called the NameNode, it keeps track of which blocks make up a file, and where those blocks are located
      – Known as the metadata 

Example:
i)NameNode holds metadata for the two files (Foo.txt and Bar.txt)

ii)DataNodes hold the actual blocks
      – Each block will be 64MB or 128MB in size
       – Each block is replicated three times on the cluster 

No comments:

Post a Comment

Fundamentals of Python programming

Fundamentals of Python programming: Following below are the fundamental constructs of Python programming: Python Data types Python...