HDFS (Hadoop Distributed File System) is a distributed file system that provides scalable and reliable storage for big data applications. In this tutorial, we will learn how to install HDFS on Debian Latest.
##Step 1: Install Java
Before installing HDFS, we need to install Java 8 or later. If Java is not installed on your system, run the following command:
sudo apt-get update
sudo apt-get install default-jdk
Verify the Java installation using the following command:
java -version
##Step 2: Download HDFS
Visit the Hadoop website to download the Hadoop distribution.
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Extract the downloaded file:
tar -xvf hadoop-3.3.1.tar.gz
##Step 3: Configure HDFS
Before we start HDFS, we need to make some configuration changes. Navigate to the HDFS configuration directory and modify the hadoop-env.sh file:
cd hadoop-3.3.1/etc/hadoop/
sudo nano hadoop-env.sh
Add the following lines to the bottom of the file:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
Save and close the file.
##Step 4: Start HDFS
Now we can start HDFS. Navigate to the HDFS directory and format the NameNode:
cd ~/hadoop-3.3.1/bin
./hdfs namenode -format
Start the HDFS daemons:
./start-dfs.sh
Verify that the daemons are running:
jps
You should see the following output:
2676 Jps
2470 NameNode
2565 DataNode
Congratulations! You have successfully installed HDFS on Debian Latest.
In this tutorial, we learned how to install HDFS on Debian Latest. We downloaded and configured HDFS, and started the HDFS daemons. HDFS is now ready to be used for big data applications.
If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!
Alternatively, for the best virtual desktop, try Shells!