How to Install HDFS on Manjaro

In this tutorial, we will go through the steps of installing HDFS on Manjaro. HDFS is a part of the Apache Hadoop project and is used for storing and processing large datasets. Follow the instructions below to get started.

Prerequisites

Before we begin, ensure that you have the following prerequisites.

Installing Hadoop

Follow the steps below to install Hadoop on your Manjaro system.

  1. Download the latest stable release of Hadoop from the Apache Hadoop website. You can download it using the following command in your terminal.

    wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
    
  2. Extract the downloaded archive using the following command.

    tar -xzf hadoop-3.3.1.tar.gz
    
  3. Move the extracted archive to the /opt/ directory using the following command.

    sudo mv hadoop-3.3.1 /opt/
    
  4. Set the HADOOP_HOME environment variable by adding the following line to your .bashrc file.

    export HADOOP_HOME=/opt/hadoop-3.3.1
    

    You can open .bashrc using the following command.

    nano ~/.bashrc
    
  5. Refresh your environment variables using the following command.

    source ~/.bashrc
    

Configuring Hadoop

Follow the steps below to configure Hadoop.

  1. Create a directory for Hadoop to store its data files using the following command.

    mkdir -p /opt/hadoop-3.3.1/data/hdfs/namenode
    mkdir -p /opt/hadoop-3.3.1/data/hdfs/datanode
    
  2. Edit the hadoop-env.sh file using the following command.

    nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
    

    Add the following line at the end of the file and save it.

    export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
    
  3. Edit the core-site.xml file using the following command.

    nano $HADOOP_HOME/etc/hadoop/core-site.xml
    

    Add the following lines between the configuration tags and save the file.

    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
    </property>
    
  4. Edit the hdfs-site.xml file using the following command.

    nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    

    Add the following lines between the configuration tags and save the file.

    <property>
      <name>dfs.replication</name>
      <value>1</value>
    </property>
    <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/opt/hadoop-3.3.1/data/hdfs/namenode</value>
    </property>
    <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/opt/hadoop-3.3.1/data/hdfs/datanode</value>
    </property>
    
  5. Edit the mapred-site.xml file using the following command.

    nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
    

    Add the following lines between the configuration tags and save the file.

    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
    <property>
      <name>mapreduce.application.classpath</name>
      <value>/opt/hadoop-3.3.1/share/hadoop/mapreduce/*:/opt/hadoop-3.3.1/share/hadoop/mapreduce/lib/*</value>
    </property>
    
  6. Edit the yarn-site.xml file using the following command.

    nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
    

    Add the following lines between the configuration tags and save the file.

    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
    </property>
    

Formatting the NameNode

Before starting HDFS, you need to format the NameNode. Follow the instructions below to format the NameNode.

  1. Run the following command in your terminal.

    hdfs namenode -format
    

Starting and Stopping HDFS

Follow the steps below to start and stop HDFS.

  1. To start HDFS, run the following command in your terminal.

    start-dfs.sh
    
  2. To stop HDFS, run the following command in your terminal.

    stop-dfs.sh
    

Congratulations! You have successfully installed HDFS on your Manjaro system. You can now start using HDFS for storing and processing large datasets.

If you want to self-host in an easy, hands free way, need an external IP address, or simply want your data in your own hands, give IPv6.rs a try!

Alternatively, for the best virtual desktop, try Shells!