Configure Hadoop On Windows 7



InfoA newer version of installation guide for latest Hadoop 3.2.1 is available.I recommend using that to install as it has a number of new features. Refer to the following article for more details. Building Hadoop Core for Windows 2.1. Choose target OS version The Hadoop developers have used Windows Server 2008 and Windows Server 2008 R2 during development and testing. Windows Vista and Windows 7 are also likely to work because of the Win32 API similarities with the respective server SKUs.

Configure

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

Hadoop Installation On Windows 7 64 Bit

Use the following:

Install Apache Hadoop On Windows

etc/hadoop/core-site.xml:

etc/hadoop/hdfs-site.xml:

Setup passphraseless ssh

Configure Hadoop On Windows 7 64-bit

Now check that you can ssh to the localhost without a passphrase:

Configure Hadoop On Windows 7

If you cannot ssh to localhost without a passphrase, execute the following commands:

Configure

Execution

Configure Hadoop On Windows 7

The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

  1. Format the filesystem:

  2. Start NameNode daemon and DataNode daemon:

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

  3. Browse the web interface for the NameNode; by default it is available at:

    • NameNode - http://localhost:50070/
  4. Make the HDFS directories required to execute MapReduce jobs:

  5. Copy the input files into the distributed filesystem:

  6. Run some of the examples provided:

  7. Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:

    or

    View the output files on the distributed filesystem:

  8. When you’re done, stop the daemons with:

YARN on a Single Node

You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.

Configure

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

Hadoop Client On Windows

  1. Configure parameters as follows:etc/hadoop/mapred-site.xml:

    etc/hadoop/yarn-site.xml:

  2. Start ResourceManager daemon and NodeManager daemon:

  3. Browse the web interface for the ResourceManager; by default it is available at:

    • ResourceManager - http://localhost:8088/
  4. Run a MapReduce job.

  5. When you’re done, stop the daemons with: