These are the most common input formats defined in Hadoop:
- TextInputFormat
- KeyValueInputFormat
- SequenceFileInputFormat
TextInputFormat is a by default input format.
In Hadoop, InputFormats define the way in which Hadoop processes input data. Some of the most common InputFormats defined in Hadoop are:
- TextInputFormat: This is the default input format. It treats each line of the input file as a separate record and assigns a key to the entire line and the value to the content of the line.
- KeyValueTextInputFormat: Similar to TextInputFormat, but it interprets the key and value as separate fields separated by a delimiter (default is tab).
- SequenceFileInputFormat: Used for reading files in the Hadoop SequenceFile format, which is a binary file format.
- FileInputFormat: This is an abstract class that provides a generic implementation for handling input files. It is the base class for all other file-based input formats.
- CombineFileInputFormat: This is an extension of FileInputFormat that allows the processing of multiple small files as a single input split to improve efficiency.
- NLineInputFormat: Used for reading files where each input split is a fixed number of lines.
These InputFormats are used to define how Hadoop should read and process input data during MapReduce jobs. The choice of the InputFormat depends on the nature and structure of your input data.