In this tutorial, we will guide you through the process of installing HDFS, which is a distributed file system from Apache Hadoop project, on Fedora CoreOS latest version. We assume that you already have a running instance of Fedora CoreOS up and running.
Before installing HDFS, it is essential to ensure you have Java installed on your system.
Update the package cache by running the following command:
$ sudo dnf makecache
Install the OpenJDK package by running the following command:
$ sudo dnf install java-1.8.0-openjdk-headless
Verify that Java has been installed by running the following command:
$ java -version
If Java has been installed correctly, you should see the version information displayed on your screen.
Open your web browser and navigate to the official Hadoop website. You can download the latest version of Hadoop by clicking the following link: https://hadoop.apache.org/releases.html.
Once the download is complete, switch to the directory where you have downloaded the Hadoop package. In this case, we assume that the package has been downloaded to your home directory.
$ cd ~
Extract the Hadoop package by running the following command:
$ tar xvf hadoop-3.3.1.tar.gz
Move the extracted Hadoop directory to the /opt
directory:
$ sudo mv hadoop-3.3.1 /opt/hadoop
Before configuring HDFS, it is important to set the JAVA_HOME
environment variable. Edit the /etc/environment
file and add the following line at the end:
JAVA_HOME=/usr/lib/jvm/java-1.8.0/
Reload the shell settings by running the following command:
$ source /etc/environment
Open the core-site.xml
file located in the /opt/hadoop/etc/hadoop
directory using your preferred text editor:
$ sudo vi /opt/hadoop/etc/hadoop/core-site.xml
Add the following configuration settings within the <configuration>
tags:
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
Save and exit the file.
Open the hdfs-site.xml
file using the following command:
$ sudo vi /opt/hadoop/etc/hadoop/hdfs-site.xml
Add the following configuration settings within the <configuration>
tags:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/data/hdfs/datanode</value>
<property>
<name>dfs.secondary.http.address</name>
<value>localhost:50090</value>
</property>
Save and exit the file.
Before starting HDFS, it is important to format the file system. Run the following command to format HDFS:
$ /opt/hadoop/bin/hdfs namenode -format
If formatting is successful, you will see the following message on your screen:
Storage directory /opt/data/hdfs/namenode has been successfully formatted.
To start HDFS, run the following command:
$ /opt/hadoop/sbin/start-dfs.sh
If HDFS starts successfully, you will see the following message on your screen:
Starting namenodes on [localhost]
localhost: starting namenode, logging to /opt/hadoop/logs/hadoop-<your-username>-namenode-<your-hostname>.log
localhost: starting datanode, logging to /opt/hadoop/logs/hadoop-<your-username>-datanode-<your-hostname>.log
To verify that HDFS is running, navigate to the following URL using your web browser:
http://localhost:9870/
If HDFS is running successfully, you will see the Hadoop Web UI displayed on your screen.
To verify that the HDFS file system has been mounted correctly, run the following command:
$ /opt/hadoop/bin/hdfs dfs -ls /
If the HDFS file system has been mounted correctly, you will see a list of root directory contents displayed on your screen.
Congratulations! You have successfully installed and configured HDFS on your Fedora CoreOS instance. You can now begin storing and managing your data using HDFS.
If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!
Alternatively, for the best virtual desktop, try Shells!