What is NameNode in Hadoop?

February 19, 2024September 16, 2020 by priya

NameNode is a node, where Hadoop stores all the file location information in HDFS (Hadoop Distributed File System). We can say that NameNode is the centerpiece of an HDFS file system which is responsible for keeping the record of all the files in the file system, and tracks the file data across the cluster or … Read more

What is shuffling in MapReduce?

February 19, 2024September 16, 2020 by priya

Shuffling is a process which is used to perform the sorting and transfer the map outputs to the reducer as input. In Hadoop MapReduce, shuffling refers to the process of redistributing and exchanging data between the map tasks and the reduce tasks. It occurs after the map phase and before the reduce phase in a … Read more

What is “map” and what is “reducer” in Hadoop?

February 19, 2024September 16, 2020 by priya

Map: In Hadoop, a map is a phase in HDFS query solving. A map reads data from an input location and outputs a key-value pair according to the input type. Reducer: In Hadoop, a reducer collects the output generated by the mapper, processes it, and creates a final output of its own. In Hadoop, “map” … Read more

What is Map/Reduce job in Hadoop?

February 19, 2024September 16, 2020 by priya

Map/Reduce job is a programming paradigm which is used to allow massive scalability across the thousands of server. MapReduce refers to two different and distinct tasks that Hadoop performs. In the first step maps jobs which takes the set of data and converts it into another set of data and in the second step, Reduce … Read more

Define TaskTracker

February 19, 2024September 16, 2020 by priya

TaskTracker is a node in the cluster that accepts tasks like MapReduce and Shuffle operations from a JobTracker. In Hadoop, the term “TaskTracker” refers to a component of the Hadoop Distributed File System (HDFS) and the MapReduce processing engine. However, it’s important to note that as of my last knowledge update in January 2022, Hadoop … Read more