Share my learning's: 2)File storage on HDFS:

Tuesday, 1 August 2017

2)File storage on HDFS:

File storage on HDFS:
*Each Files are split into blocks
– Each block is usually 64MB or 128MB

*Blocks of data is distributed across many machines at load time
– Different blocks from the same file will be stored on different machines
– This provides for efficient Data processing

*Blocks are replicated across multiple machines, known as DataNodes

–By default three,==> Meaning that each block exists on three different machines

*A master node called the NameNode, it keeps track of which blocks make up a file, and where those blocks are located
– Known as the metadata

Example:
i)NameNode holds metadata for the two files (Foo.txt and Bar.txt)

ii)DataNodes hold the actual blocks
– Each block will be 64MB or 128MB in size
– Each block is replicated three times on the cluster

Share my learning's

Tuesday, 1 August 2017

2)File storage on HDFS:

No comments:

Post a Comment

Fundamentals of Python programming