How to Install HDFS on Linux Mint

In this tutorial, you will learn how to install HDFS (Hadoop Distributed File System) on Linux Mint. HDFS is a distributed file system designed to store large data sets reliably and efficiently in a cluster. It is part of the Apache Hadoop project and is used by many big data applications to process and analyze large datasets.

Prerequisites

Before we begin, ensure that:

Step 1: Install Java

Hadoop requires Java to be installed on the system. You can check whether Java is already installed on your system by running the following command:

$ java -version

If Java is not installed, run the following command to install Java on your system:

$ sudo apt-get update
$ sudo apt-get install default-jdk

Verify that Java is installed by running the command:

$ java -version

Step 2: Download Hadoop

Download the Hadoop distribution from the official Apache Hadoop website:

$ wget https://ftp.jaist.ac.jp/pub/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Extract the downloaded archive file by running the following command:

$ tar -xzvf hadoop-3.3.1.tar.gz

Move the extracted directory to the /usr/local directory:

$ sudo mv hadoop-3.3.1 /usr/local/hadoop

Step 3: Configure Environment Variables

Hadoop requires some environment variables to be set up in order to run properly. These variables include HADOOP_HOME, HADOOP_INSTALL, HADOOP_MAPRED_HOME, and HADOOP_COMMON_HOME.

To configure these variables, open the ~/.bashrc file and add the following lines at the end of the file:

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Save and close the file, then run the following command to apply the changes:

$ source ~/.bashrc

Step 4: Configure Hadoop

Hadoop needs to be configured before being used. There are several configuration files located in the /usr/local/hadoop/etc/hadoop directory that need to be edited.

Open the file using your favorite text editor and add the following configuration:

<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>

Save and close the file.

Open the file using your favorite text editor and add the following configuration:

<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.namenode.name.dir</name>
      <value>/usr/local/hadoop/hdfs/namenode</value>
   </property>
   <property>
      <name>dfs.datanode.data.dir</name>
      <value>/usr/local/hadoop/hdfs/datanode</value>
   </property>
</configuration>

Save and close the file.

Step 5: Format HDFS NameNode

To start using HDFS, you need to format the HDFS NameNode using the following command:

$ hdfs namenode -format

Step 6: Start Hadoop Services

Start the Hadoop services by running the following command:

$ start-dfs.sh
$ start-yarn.sh

To stop the services, run the following command:

$ stop-dfs.sh
$ stop-yarn.sh

Conclusion

Congratulations! You have successfully installed HDFS on Linux Mint. You can now use HDFS to store and process large datasets in your cluster.

If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!

Alternatively, for the best virtual desktop, try Shells!