How hadoop works

incoggeek
2 min readDec 31, 2022

--

Photo by Mika Baumeister on Unsplash

• Client requests NameNode and NameNode saves the metadata and provides available DataNode where Client should store his data. Now client approaches respective data nodes and stores where replication of data is done to overcome data loss (by default 3). Thereafter acknowledgment is sent among DataNodes as well as to the Client.

• All DataNodes are slaves for NameNode so they make a blog report and tell its current status to NameNode. And if any DataNode is dead. NameNode modifies its report and choose another DataNode.

• Client writes program (SQL,Java, Python etc.) and requests JobTracker to perform operation on data. JobTracker takes the request and ask metadata to NameNode now it checks for whether the data is available or not which is requested then NameNode provides it. After that JobTracker finds nearest node ( data available in other node as well) and submits task to TaskTracker. TaskTracker runs the program on DataNode (called map). Same thing is done with others if required data is available in more than one DataNode.

• File is called input file
• Files stored into different nodes are known as input splits.

• All TaskTrackers are slaves for JobTracker so they tells its current status and slot availability for every 3 sec whether it’s alive or not. If it’s happening JobTracker waits upto 10 second in case nothing changes JobTracker can think either it’s dead or working very slowly.

• When all mapper complete the processes reducer would be present in any one of working DataNode. It combines all and gives final output. It can be given in DataNode that itself or it can give other DataNode also. And that DataNode will send a blog report to NameNode that client task has been processed directed as TestOutput. Client sees via NameNode where output is given. He directly approaches to the DataNode where output is available

--

--