In this tutorial, we will walk you through the steps to install Hadoop Distributed File System (HDFS) from http://hadoop.apache.org/ on Alpine Linux.
Before getting started, make sure you have the following prerequisites:
Hadoop requires Java to operate, hence we will start by installing Java on our system. To install Java, run the following command:
apk add openjdk8-jre
After the installation, you can verify that Java is installed correctly by running the following command:
java -version
Now, we will download and install Hadoop from Apache's official website. To download Hadoop, navigate to the directory where you want to store the package and run the following command:
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
Once the download is complete, extract the package using the following command:
tar -xzf hadoop-3.3.1.tar.gz
This will extract the files to a new directory named hadoop-3.3.1 in the current directory.
Now, we will configure Hadoop to run on our system. Create a new file named hadoop-env.sh
in the hadoop-3.3.1/etc/hadoop/
directory:
nano hadoop-3.3.1/etc/hadoop/hadoop-env.sh
Add the following lines to the file:
export JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk
export HADOOP_HOME=/path/to/your/hadoop-3.3.1
export PATH=$PATH:/path/to/your/hadoop-3.3.1/bin
Save and exit the file.
Next, we need to configure the HDFS. Edit the core-site.xml
file in the hadoop-3.3.1/etc/hadoop/
directory:
nano hadoop-3.3.1/etc/hadoop/core-site.xml
Add the following lines to the file:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Save and exit the file.
Now, we need to configure the HDFS data nodes. Edit the hdfs-site.xml
file in the hadoop-3.3.1/etc/hadoop/
directory:
nano hadoop-3.3.1/etc/hadoop/hdfs-site.xml
Add the following lines to the file:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.datanode.directory</name>
<value>file:///path/to/your/hadoop-3.3.1/hdfs/datanode</value>
</property>
</configuration>
Save and exit the file.
We are now ready to start HDFS on our system. To start HDFS, run the following command:
hadoop-3.3.1/sbin/start-dfs.sh
This will start the HDFS name and data nodes.
We can now verify that HDFS is running correctly on our system. To do this, run the following command:
hadoop-3.3.1/bin/hdfs dfs -mkdir /test
This will create a new directory named /test
in the HDFS file system. To list the contents of the directory, run the following command:
hadoop-3.3.1/bin/hdfs dfs -ls /
This should list the /test
directory along with other system directories.
Congratulations! You have successfully installed Hadoop Distributed File System (HDFS) on Alpine Linux.
If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!
Alternatively, for the best virtual desktop, try Shells!