Tutorial: How to Install HDFS on MXLinux Latest

Introduction

HDFS or Hadoop Distributed File System is a distributed file system used to store and manage large amounts of data across multiple servers. In this tutorial, we will learn how to install HDFS on MXLinux Latest.

Prerequisites

Before we begin, ensure that you have the following:

Step-by-Step Guide

1. Download Hadoop

The first step is to download Hadoop on your MXLinux. Go to the Apache Hadoop website to download the latest version of the software.

wget http://mirrors.advancedhosters.com/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

2. Verify Download

Verify that the download is complete by checking the SHA-512 hash.

sha512sum hadoop-3.3.0.tar.gz | grep 67a330f2eae80b7be7991a79b8e1493ec26dfcaccf788234b95aecb8d711cc1e10b1dd8742162c5378f7eb9b09da6ef18ff318ee281c6e28e6bd40edb6c14d6

3. Extract Hadoop

After the download is complete, unarchive the software.

tar -zxvf hadoop-3.3.0.tar.gz

4. Configure Environment Variables

Set up the Hadoop environment variables. Open the .bashrc file and add the following lines at the end of the file:

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

5. Install Java

Hadoop requires Java, so install OpenJDK on your MXLinux.

apt-get install openjdk-11-jdk

6. Verify Java

Verify that Java is installed by checking the version.

java -version

7. Configure Hadoop

Navigate to the Hadoop configuration directory.

cd hadoop-3.3.0/etc/hadoop

8. Configure HDFS

Open the hdfs-site.xml file to configure the HDFS settings.

nano hdfs-site.xml

Add the following lines inside the configuration tag:

<property>
   <name>dfs.replication</name>
   <value>1</value>
</property>
<property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>

Save and close the file.

9. Configure Core

Open the core-site.xml file to configure the core settings.

nano core-site.xml

Add the following lines inside the configuration tag:

<property>
   <name>fs.defaultFS</name>
   <value>hdfs://localhost:9000</value>
</property>

Save and close the file.

10. Configure MapReduce

Open the mapred-site.xml file to configure the MapReduce settings.

cp mapred-site.xml.template mapred-site.xml
nano mapred-site.xml

Add the following lines inside the configuration tag:

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

Save and close the file.

11. Configure YARN

Open the yarn-site.xml file to configure the YARN settings.

nano yarn-site.xml

Add the following lines inside the configuration tag:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

Save and close the file.

12. Format HDFS

Before we can start HDFS, we need to format it.

hdfs namenode -format

13. Start HDFS

Start HDFS with the following commands.

start-dfs.sh

Verify that HDFS is running.

jps

You should see the following output:

4955 DataNode
5126 NameNode
5205 SecondaryNameNode

14. Stop HDFS

You can stop HDFS with the following commands.

stop-dfs.sh

15. Conclusion

Congratulations, you have successfully installed HDFS on MXLinux Latest. With this installation, you can now manage and store large amounts of data across your cluster.

If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!