What do you know by storage and compute node?

Storage node: Storage Node is the machine or computer where your file system resides to store the processing data.

Compute Node: Compute Node is a machine or computer where your actual business logic will be executed.

In Hadoop, storage and compute nodes play distinct roles in a distributed computing environment.

  1. Storage Node:
    • A storage node in Hadoop is responsible for storing and managing data. It typically consists of a cluster of machines that provide storage capacity for large amounts of data.
    • In the Hadoop Distributed File System (HDFS), which is the primary storage system in Hadoop, storage nodes store data in a distributed and fault-tolerant manner. Each file is divided into blocks, and these blocks are replicated across multiple storage nodes to ensure data durability and availability.
    • Storage nodes are primarily concerned with data storage and retrieval and are not heavily involved in the computation tasks.
  2. Compute Node:
    • A compute node, on the other hand, is responsible for executing computation tasks or processing data. These nodes are where the actual data processing and analysis take place.
    • In a Hadoop cluster, compute nodes run various tasks, such as MapReduce jobs or other distributed computing frameworks, to analyze and process the data stored on the storage nodes.
    • Compute nodes are equipped with processing power and memory to perform computations on the distributed data.

In summary, while storage nodes focus on storing and managing data in a distributed and fault-tolerant manner, compute nodes handle the processing and analysis of the stored data. This separation of storage and compute allows for scalability, fault tolerance, and efficient data processing in large-scale distributed computing environments like Hadoop.