HDFS or Hadoop Distributed File System is a distributed file system used to store and manage large amounts of data across multiple servers. In this tutorial, we will learn how to install HDFS on MXLinux Latest.
Before we begin, ensure that you have the following:
The first step is to download Hadoop on your MXLinux. Go to the Apache Hadoop website to download the latest version of the software.
wget http://mirrors.advancedhosters.com/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
Verify that the download is complete by checking the SHA-512 hash.
sha512sum hadoop-3.3.0.tar.gz | grep 67a330f2eae80b7be7991a79b8e1493ec26dfcaccf788234b95aecb8d711cc1e10b1dd8742162c5378f7eb9b09da6ef18ff318ee281c6e28e6bd40edb6c14d6
After the download is complete, unarchive the software.
tar -zxvf hadoop-3.3.0.tar.gz
Set up the Hadoop environment variables. Open the .bashrc file and add the following lines at the end of the file:
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Hadoop requires Java, so install OpenJDK on your MXLinux.
apt-get install openjdk-11-jdk
Verify that Java is installed by checking the version.
java -version
Navigate to the Hadoop configuration directory.
cd hadoop-3.3.0/etc/hadoop
Open the hdfs-site.xml file to configure the HDFS settings.
nano hdfs-site.xml
Add the following lines inside the configuration tag:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>
Save and close the file.
Open the core-site.xml file to configure the core settings.
nano core-site.xml
Add the following lines inside the configuration tag:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
Save and close the file.
Open the mapred-site.xml file to configure the MapReduce settings.
cp mapred-site.xml.template mapred-site.xml
nano mapred-site.xml
Add the following lines inside the configuration tag:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Save and close the file.
Open the yarn-site.xml file to configure the YARN settings.
nano yarn-site.xml
Add the following lines inside the configuration tag:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
Save and close the file.
Before we can start HDFS, we need to format it.
hdfs namenode -format
Start HDFS with the following commands.
start-dfs.sh
Verify that HDFS is running.
jps
You should see the following output:
4955 DataNode
5126 NameNode
5205 SecondaryNameNode
You can stop HDFS with the following commands.
stop-dfs.sh
Congratulations, you have successfully installed HDFS on MXLinux Latest. With this installation, you can now manage and store large amounts of data across your cluster.
If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!
Alternatively, for the best virtual desktop, try Shells!