Installing HDFS on Debian Latest

HDFS (Hadoop Distributed File System) is a distributed file system that provides scalable and reliable storage for big data applications. In this tutorial, we will learn how to install HDFS on Debian Latest.

Prerequisites

##Step 1: Install Java

Before installing HDFS, we need to install Java 8 or later. If Java is not installed on your system, run the following command:

sudo apt-get update
sudo apt-get install default-jdk

Verify the Java installation using the following command:

java -version

##Step 2: Download HDFS

Visit the Hadoop website to download the Hadoop distribution.

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Extract the downloaded file:

tar -xvf hadoop-3.3.1.tar.gz

##Step 3: Configure HDFS

Before we start HDFS, we need to make some configuration changes. Navigate to the HDFS configuration directory and modify the hadoop-env.sh file:

cd hadoop-3.3.1/etc/hadoop/
sudo nano hadoop-env.sh

Add the following lines to the bottom of the file:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root

Save and close the file.

##Step 4: Start HDFS

Now we can start HDFS. Navigate to the HDFS directory and format the NameNode:

cd ~/hadoop-3.3.1/bin
./hdfs namenode -format

Start the HDFS daemons:

./start-dfs.sh

Verify that the daemons are running:

jps

You should see the following output:

2676 Jps
2470 NameNode
2565 DataNode

Congratulations! You have successfully installed HDFS on Debian Latest.

Conclusion

In this tutorial, we learned how to install HDFS on Debian Latest. We downloaded and configured HDFS, and started the HDFS daemons. HDFS is now ready to be used for big data applications.

If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!

Alternatively, for the best virtual desktop, try Shells!