What is “map” and what is “reducer” in Hadoop?

Map: In Hadoop, a map is a phase in HDFS query solving. A map reads data from an input location and outputs a key-value pair according to the input type.

Reducer: In Hadoop, a reducer collects the output generated by the mapper, processes it, and creates a final output of its own.

In Hadoop, “map” and “reduce” refer to the two main phases of processing in the MapReduce programming model, which is a core component of Hadoop.

  1. Map Phase:
    • In the map phase, data is divided into smaller chunks and processed in parallel.
    • Each chunk of data is processed by a “mapper” function, which applies a specified operation or transformation to the data.
    • The output of the mapper is a set of key-value pairs, where the key is typically used to identify the data and the value is the result of the map operation.
  2. Reduce Phase:
    • In the reduce phase, the output of the map phase is aggregated and processed further.
    • The data is shuffled and grouped based on keys, bringing together all the values associated with a particular key.
    • Each group of key-value pairs is processed by a “reducer” function, which performs another specified operation or aggregation on the data.
    • The final output of the reduce phase is typically the result of the overall computation.

In summary, the “map” phase is responsible for processing and transforming the input data into intermediate key-value pairs, and the “reduce” phase takes these intermediate results, groups them by key, and performs further processing to produce the final output. The MapReduce model allows for distributed and parallel processing of large datasets across a Hadoop cluster.