How to Install HDFS on Alpine Linux

In this tutorial, we will walk you through the steps to install Hadoop Distributed File System (HDFS) from http://hadoop.apache.org/ on Alpine Linux.

Prerequisites

Before getting started, make sure you have the following prerequisites:

Step 1: Install Java

Hadoop requires Java to operate, hence we will start by installing Java on our system. To install Java, run the following command:

apk add openjdk8-jre

After the installation, you can verify that Java is installed correctly by running the following command:

java -version

Step 2: Download and Install Hadoop

Now, we will download and install Hadoop from Apache's official website. To download Hadoop, navigate to the directory where you want to store the package and run the following command:

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Once the download is complete, extract the package using the following command:

tar -xzf hadoop-3.3.1.tar.gz

This will extract the files to a new directory named hadoop-3.3.1 in the current directory.

Step 3: Configure Hadoop

Now, we will configure Hadoop to run on our system. Create a new file named hadoop-env.sh in the hadoop-3.3.1/etc/hadoop/ directory:

nano hadoop-3.3.1/etc/hadoop/hadoop-env.sh

Add the following lines to the file:

export JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk
export HADOOP_HOME=/path/to/your/hadoop-3.3.1
export PATH=$PATH:/path/to/your/hadoop-3.3.1/bin

Save and exit the file.

Next, we need to configure the HDFS. Edit the core-site.xml file in the hadoop-3.3.1/etc/hadoop/ directory:

nano hadoop-3.3.1/etc/hadoop/core-site.xml

Add the following lines to the file:

<configuration>
   <property>
      <name>fs.default.name</name>
      <value>hdfs://localhost:9000</value>
   </property>
</configuration>

Save and exit the file.

Now, we need to configure the HDFS data nodes. Edit the hdfs-site.xml file in the hadoop-3.3.1/etc/hadoop/ directory:

nano hadoop-3.3.1/etc/hadoop/hdfs-site.xml

Add the following lines to the file:

<configuration>
   <property>
      <name>dfs.replication</name>
      <value>1</value>
   </property>
   <property>
      <name>dfs.datanode.directory</name>
      <value>file:///path/to/your/hadoop-3.3.1/hdfs/datanode</value>
   </property>
</configuration>

Save and exit the file.

Step 4: Start HDFS

We are now ready to start HDFS on our system. To start HDFS, run the following command:

hadoop-3.3.1/sbin/start-dfs.sh

This will start the HDFS name and data nodes.

Step 5: Verify HDFS

We can now verify that HDFS is running correctly on our system. To do this, run the following command:

hadoop-3.3.1/bin/hdfs dfs -mkdir /test

This will create a new directory named /test in the HDFS file system. To list the contents of the directory, run the following command:

hadoop-3.3.1/bin/hdfs dfs -ls /

This should list the /test directory along with other system directories.

Congratulations! You have successfully installed Hadoop Distributed File System (HDFS) on Alpine Linux.

If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!