Installing HDFS on Windows 10

Apache Hadoop is an open-source framework that provides distributed storage and processing of large data sets. Hadoop Distributed File System (HDFS) is a distributed file system that is designed to store large data sets across multiple machines.

In this tutorial, we will go through the step-by-step process of installing HDFS on Windows 10.

Prerequisites

Before starting the installation process, make sure that you have the following prerequisites installed on your Windows 10 machine:

Step 1: Download Hadoop

First, download the latest version of Hadoop from the official website (http://hadoop.apache.org/).

Step 2: Set up Environment Variables

After downloading Hadoop, you need to set up environment variables in Windows to use the Hadoop commands.

  1. Open the Control Panel and search for System.

  2. Click on Edit the system environment variables.

  3. In the System Properties window, click on the Environment Variables button.

  4. Under System Variables, click on the New button and enter the following values:

    • Variable Name: HADOOP_HOME
    • Variable Value: the path where Hadoop is installed on your machine (e.g. C:\hadoop)
  5. Edit the Path variable and add the following values:

    • %HADOOP_HOME%\bin
    • %HADOOP_HOME%\sbin
  6. Click OK to save the changes.

Step 3: Configure core-site.xml

  1. Navigate to the Hadoop installation directory (e.g. C:\hadoop\etc\hadoop).

  2. Copy the core-site.xml.template file and rename it to core-site.xml.

  3. Open the core-site.xml file in a text editor and add the following properties inside the <configuration> tag:

    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
    

    This will set the default file system to use the HDFS protocol and connect to the local machine on port 9000.

  4. Save and close the core-site.xml file.

Step 4: Configure hdfs-site.xml

  1. Copy the hdfs-site.xml.template file and rename it to hdfs-site.xml.

  2. Open the hdfs-site.xml file in a text editor and add the following properties inside the <configuration> tag:

    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/hadoop/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/hadoop/hdfs/datanode</value>
    </property>
    

    This will set the replication factor to 1, and configure the storage directories for the NameNode and DataNodes.

  3. Save and close the hdfs-site.xml file.

Step 5: Format NameNode

  1. Open Command Prompt as an administrator.

  2. Navigate to the Hadoop installation directory (e.g. C:\hadoop\bin).

  3. Run the following command to format the NameNode:

    hdfs namenode -format
    

    This will initialize the file system metadata and create the NameNode directory.

Step 6: Start HDFS

  1. In the same Command Prompt window, run the following command to start the NameNode and DataNode services:

    start-dfs.cmd
    

    This will start the HDFS cluster.

  2. To check the status of the HDFS cluster, you can run the following command:

    jps
    

    This will list the running processes on your machine, including the NameNode and DataNode processes.

Conclusion

Congratulations! You have successfully installed HDFS on Windows 10. You are now ready to start storing and processing large data sets with Hadoop.

If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!

Alternatively, for the best virtual desktop, try Shells!