File storage on HDFS:
*Each Files are split into blocks
– Each block is usually 64MB or 128MB
*Blocks of data is distributed across many machines at load time
– Different blocks from the same file will be stored on different machines
– This provides for efficient Data processing
*Blocks are replicated across multiple machines, known as DataNodes
–By default three,==> Meaning that each block exists on three different machines
*A master node called the NameNode, it keeps track of which blocks make up a file, and where those blocks are located
– Known as the metadata
Example:
i)NameNode holds metadata for the two files (Foo.txt and Bar.txt)
ii)DataNodes hold the actual blocks
– Each block will be 64MB or 128MB in size
– Each block is replicated three times on the cluster
*Each Files are split into blocks
– Each block is usually 64MB or 128MB
*Blocks of data is distributed across many machines at load time
– Different blocks from the same file will be stored on different machines
– This provides for efficient Data processing
*Blocks are replicated across multiple machines, known as DataNodes
–By default three,==> Meaning that each block exists on three different machines
*A master node called the NameNode, it keeps track of which blocks make up a file, and where those blocks are located
– Known as the metadata
Example:
i)NameNode holds metadata for the two files (Foo.txt and Bar.txt)
ii)DataNodes hold the actual blocks
– Each block will be 64MB or 128MB in size
– Each block is replicated three times on the cluster
No comments:
Post a Comment