In this tutorial, you will learn how to install HDFS (Hadoop Distributed File System) on Linux Mint. HDFS is a distributed file system designed to store large data sets reliably and efficiently in a cluster. It is part of the Apache Hadoop project and is used by many big data applications to process and analyze large datasets.
Before we begin, ensure that:
Hadoop requires Java to be installed on the system. You can check whether Java is already installed on your system by running the following command:
$ java -version
If Java is not installed, run the following command to install Java on your system:
$ sudo apt-get update
$ sudo apt-get install default-jdk
Verify that Java is installed by running the command:
$ java -version
Download the Hadoop distribution from the official Apache Hadoop website:
$ wget https://ftp.jaist.ac.jp/pub/apache/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Extract the downloaded archive file by running the following command:
$ tar -xzvf hadoop-3.3.1.tar.gz
Move the extracted directory to the /usr/local directory:
$ sudo mv hadoop-3.3.1 /usr/local/hadoop
Hadoop requires some environment variables to be set up in order to run properly. These variables include HADOOP_HOME, HADOOP_INSTALL, HADOOP_MAPRED_HOME, and HADOOP_COMMON_HOME.
To configure these variables, open the ~/.bashrc file and add the following lines at the end of the file:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Save and close the file, then run the following command to apply the changes:
$ source ~/.bashrc
Hadoop needs to be configured before being used. There are several configuration files located in the /usr/local/hadoop/etc/hadoop directory that need to be edited.
Open the file using your favorite text editor and add the following configuration:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Save and close the file.
Open the file using your favorite text editor and add the following configuration:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/hdfs/datanode</value>
</property>
</configuration>
Save and close the file.
To start using HDFS, you need to format the HDFS NameNode using the following command:
$ hdfs namenode -format
Start the Hadoop services by running the following command:
$ start-dfs.sh
$ start-yarn.sh
To stop the services, run the following command:
$ stop-dfs.sh
$ stop-yarn.sh
Congratulations! You have successfully installed HDFS on Linux Mint. You can now use HDFS to store and process large datasets in your cluster.
If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!
Alternatively, for the best virtual desktop, try Shells!