In this tutorial, we will guide you through the steps of installing Hadoop Distributed File System (HDFS) on Ubuntu Server.
Note: This tutorial assumes that you have installed Java on your Ubuntu Server, which is required to run HDFS. If you haven't installed Java yet, please refer to another tutorial for Java installation.
Visit the official Apache Hadoop website at http://hadoop.apache.org/ and download the latest stable release of Hadoop. You can download it directly from this link.
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
After downloading the Hadoop package, extract it to a directory of your choice.
tar -xvf hadoop-3.3.0.tar.gz
Before we proceed, we need to modify some configuration files in Hadoop.
Open the hadoop-env.sh file located in the etc/hadoop directory.
nano hadoop-3.3.0/etc/hadoop/hadoop-env.sh
Uncomment the following line to set the Java home directory:
# export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Replace the path with your actual Java home directory.
Next, open the core-site.xml file and add the following lines:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
Next, rename the mapred-env.sh.template file to mapred-env.sh.
mv mapred-env.sh.template mapred-env.sh
Open the mapred-env.sh file and uncomment the following lines:
# export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
# export HADOOP_MAPRED_HOME=${HADOOP_HOME}
# export HADOOP_LIBEXEC_DIR=${HADOOP_HOME}/libexec
Replace the path with your actual Java home directory.
HDFS requires formatting before it can be used. Run the following command to format HDFS:
hadoop-3.3.0/bin/hdfs namenode -format
To start HDFS, run the following commands:
hadoop-3.3.0/sbin/start-dfs.sh
To check the status of HDFS, run the following command:
hadoop-3.3.0/bin/hdfs dfsadmin -report
This should provide you with a report of the HDFS nodes and their status.
Congratulations, you have successfully installed HDFS on your Ubuntu Server. You can now start using HDFS to store and distribute large data files across multiple nodes. We hope you found this tutorial helpful. If you have any questions or suggestions, please feel free to leave a comment below.
If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!
Alternatively, for the best virtual desktop, try Shells!