You need to need learn MapReduce 2

incoggeek
3 min readDec 31, 2022
picture credit: cloudera

Reasons to come YARN into existence

• NameNode in Hadoop 1.x is single point of failure i.e. each cluster have only 1 NameNode, if that machine is not available, the whole cluster is not available.

• Only MapReduce jobs are supported.
• MapReduce also acts as a resource negotiator means it takes a lot of burden to process job.

MapReduce 2 YARN

• The earlier version of Hadoop which comprised of only MapReduce did not had a separate resource manager hence it had to perform both the resource management and processing itself.

• This is where YARN comes into the picture, YARN was developed to take up the responsibilities of -

1. Resource management
2. Job scheduling

Components of YARN

1. Resource Manager: (Replaced by JobTracker)

It is a master daemon, and helps in managing resources across all clusters. It has all the information about clusters.

It has 2 parts:
• Application Manager: It is responsible to accept or reject the application when it is submitted to the Resource Manager by the client. It launches Application Master in the slave nodes and monitor the progress of Application Master.

• Scheduler: It helps to negotiate resources so that jobs can run successfully.

Lets say, we have a spark job which i want to run. So when we submit the job through the client machine, it goes to the resource manager and from it it goes to the scheduler and let the scheduler identify all the places where this job can be executed. It helps to allocate resources to the application. But it doesn’t monitor the progress and care if job fails or not due to some errors.

2. Node Manager:(Replaced by TaskTracker)

It is a slave daemon and helps in monitoring and managing the containers on node. It tracks the status of node and reports to the Resource Manager. It contains containers and Application Master.

It has 2 parts:

• Containers: It is just a JVM(Java Virtual Machine) where jobs/applications will run and all the process will be executed. It contains resources like CPU, memory etc.

• Application Master: It is responsible for the execution of a single job/application when it is assigned to the Node Manager by the Resource Manager. For each application there must be only 1 Application Master. The request made by Application Master to Resource Manager and takes the permission to launch the container. After then, it gives voice to Node Manager to start the job/application task and then it keeps on updating the node manager at a specific interval of time.

Steps involved in executing jobs with YARN:

• Client submits a job/application to run a job to the Resource Manager
• Application Manager in the Resource Manager allocates a container according to the available capacity from scheduler and launch Application Master in the Node Manager.
• The Application Master communicates with Node Manager to launch the containers
• Within that container, jobs/applications will run
• Application Master will send signal to Resource Manager to indicate the status of job/application execution.
• Once any job/application execution is completed then Application Master for that application will be de-registered with Resource Manager.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

incoggeek
incoggeek

Written by incoggeek

Just a Tech Enthusiast👨‍💻

No responses yet

Write a response