What is the relation between job and task in Hadoop?

In Hadoop, A job is divided into multiple small parts known as the task.

In Hadoop, the terms “job” and “task” refer to different components of the overall data processing framework.

  1. Job:
    • A job in Hadoop typically represents a complete computation that needs to be performed on a dataset.
    • It is the unit of work that a user wants to be performed. This work may involve processing and analyzing data stored in the Hadoop Distributed File System (HDFS) using a series of tasks.
    • A job is submitted to the Hadoop cluster for execution.
  2. Task:
    • A task, on the other hand, is a unit of work that is part of a larger job.
    • Jobs in Hadoop are divided into smaller tasks, which are then distributed across the nodes in the Hadoop cluster for parallel processing.
    • There are two main types of tasks in Hadoop: Map tasks and Reduce tasks. Map tasks process input data, and Reduce tasks perform aggregation and produce the final output.

In summary, a job is the higher-level computation or analysis that needs to be done, while tasks are the individual units of work that contribute to the completion of the job. Tasks are distributed across the cluster nodes to take advantage of parallel processing capabilities, which is a key feature of the Hadoop framework.