How to Install HDFS on Ubuntu Server

In this tutorial, we will guide you through the steps of installing Hadoop Distributed File System (HDFS) on Ubuntu Server.

Note: This tutorial assumes that you have installed Java on your Ubuntu Server, which is required to run HDFS. If you haven't installed Java yet, please refer to another tutorial for Java installation.

Step 1: Download Hadoop

Visit the official Apache Hadoop website at http://hadoop.apache.org/ and download the latest stable release of Hadoop. You can download it directly from this link.

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

Step 2: Extract Hadoop

After downloading the Hadoop package, extract it to a directory of your choice.

tar -xvf hadoop-3.3.0.tar.gz

Step 3: Edit Configuration

Before we proceed, we need to modify some configuration files in Hadoop.

Open the hadoop-env.sh file located in the etc/hadoop directory.

nano hadoop-3.3.0/etc/hadoop/hadoop-env.sh

Uncomment the following line to set the Java home directory:

# export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Replace the path with your actual Java home directory.

Next, open the core-site.xml file and add the following lines:

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://localhost:9000</value>
</property>

Next, rename the mapred-env.sh.template file to mapred-env.sh.

mv mapred-env.sh.template mapred-env.sh

Open the mapred-env.sh file and uncomment the following lines:

# export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
# export HADOOP_MAPRED_HOME=${HADOOP_HOME}
# export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec

Replace the path with your actual Java home directory.

Step 4: Format HDFS

HDFS requires formatting before it can be used. Run the following command to format HDFS:

hadoop-3.3.0/bin/hdfs namenode -format

Step 5: Start HDFS

To start HDFS, run the following commands:

hadoop-3.3.0/sbin/start-dfs.sh

To check the status of HDFS, run the following command:

hadoop-3.3.0/bin/hdfs dfsadmin -report

This should provide you with a report of the HDFS nodes and their status.

Conclusion

Congratulations, you have successfully installed HDFS on your Ubuntu Server. You can now start using HDFS to store and distribute large data files across multiple nodes. We hope you found this tutorial helpful. If you have any questions or suggestions, please feel free to leave a comment below.

If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!