Storage node: Storage Node is the machine or computer where your file system resides to store the processing data.
Compute Node: Compute Node is a machine or computer where your actual business logic will be executed.
In Hadoop, storage and compute nodes play distinct roles in a distributed computing environment.
- Storage Node:
- A storage node in Hadoop is responsible for storing and managing data. It typically consists of a cluster of machines that provide storage capacity for large amounts of data.
- In the Hadoop Distributed File System (HDFS), which is the primary storage system in Hadoop, storage nodes store data in a distributed and fault-tolerant manner. Each file is divided into blocks, and these blocks are replicated across multiple storage nodes to ensure data durability and availability.
- Storage nodes are primarily concerned with data storage and retrieval and are not heavily involved in the computation tasks.
- Compute Node:
- A compute node, on the other hand, is responsible for executing computation tasks or processing data. These nodes are where the actual data processing and analysis take place.
- In a Hadoop cluster, compute nodes run various tasks, such as MapReduce jobs or other distributed computing frameworks, to analyze and process the data stored on the storage nodes.
- Compute nodes are equipped with processing power and memory to perform computations on the distributed data.
In summary, while storage nodes focus on storing and managing data in a distributed and fault-tolerant manner, compute nodes handle the processing and analysis of the stored data. This separation of storage and compute allows for scalability, fault tolerance, and efficient data processing in large-scale distributed computing environments like Hadoop.