Learn Apache Hive installation on Ubuntu Linux (14.04 LTS)

Apache Hive is data warehouse software designed on Hadoop. The software facilitates many functions like data analysis, large database management, and data summarization. You must install and take a Hive tour on your Ubuntu Linux. Pioneer hadoop development & integration consultants are sharing this tutorial to make you learn about the basics of Apache Hive and how to install it on Ubuntu Linux. They have mentioned the basic prerequisites you may need for installation. You can learn installing Hive step by step by following the guidelines shared by experts in this article.



What is Apache Hive?


  • The Apache Hive is a data warehouse software built on top of Hadoop.
  • Hive facilitates querying, data analysis, data summarization and managing large datasets residing in distributed storage (Usually HDFS and also compitable with Amazon S3 filesystem).
  • Hive also provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
  • HiveQL also allows traditional map/reduce programmers to plug in their custom mappers and reducers.


Conventions that I followed in this article:

commands

configuration file names

Lines to be inserted in files.


Basic Prerequisites

  • Java
  • Hadoop

This tutorial has been tested using following environment.

OS                         :       Ubuntu Linux(14.04 LTS) – 64bit

Java                      :       Oracle Java 1.7.0_75

Hadoop               :       Hadoop - 0.20.1

Hive                      :       Hive – 0.9.0


Note:
Hive uses Embedded apache Derby for storing metastore. If you want to use any other RDBMS for storing     metastore server, you have to install that perticular RDBMS in your system or you shuold have URL of the     RDBMS of your metastore server(in case of central metastore server).


In order to install Hive, please follow the below steps.


Step 1: Download Apache Hive & Extract it.


  • Download Hive from here
  • Enter into directory, where Hive is downloaded, By default it is downloaded into /Downloads

$ cd Downloads/

  • Extract tar file using following command

$ tar -xzvf hive-0.9.0-bin.tar.gz


Create directory using following command in /user/local (you can also use your desired location).

$ sudo mkdir /usr/local/hive


  • Move extracted hive-0.9.0-bin folder to newly created directory using following command

$ mv hive-0.9.0-bin /usr/local/hive


Step 2: Edit ".bashrc" file to update environment variables for user.


$ gedit ~/.bashrc


Add the following lines to the file, prefferably you have to add them at the end of file.


export HADOOP_HOME=/usr/local/hadoop/hadoop-0.20.1

export PATH=$PATH:$HADOOP_HOME/bin


export HIVE_HOME=/usr/local/hive/hive-0.9.0-bin

export PATH=$PATH:$HIVE_HOME/bin


My, "bashrc" file looks like,


Step 3: Create Hive directories within HDFS.


Now, start hadoop if it is not already running. And make sure that it is running and it is not in safe mode.

$ hadoop fs -mkdir /user/hive/warehouse


The directory "warehouse" is the location to store the table or data related to hive.


$ hadoop fs -mkdir /tmp
The temporary directory “tmp” is the temporary location to store the intermediate result of processing.


Step 4: Set Permissions for read/write on those folders.


$ hadoop fs -chmod g+w /user/hive/warehouse

$ hadoop fs -chmod g+w /user/tmp


Step 5: Set Hadoop path in Hive



Now all the configurations are complete you can launch the hive consol.


You can launch hive consol by following command,

$ hive



Now, your Hive is running, you can run hbase shell commands.
You can exit from that Hive shell by using exit command.


Note: Though Hive is compitable with most of the hadoop versions, you have to cross varify the compitibility of hadoop and hive before installing hive.


Optional Step:

You can also change your hive site or metastore server location by editing “hive-site.xml" file in “/conf" folder of your hive installtion.

For My installation, I used mysql as metastore server , my “hive-site.xml" file looks like,


<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>


<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->

<!-- that are implied by Hadoop setup variables. -->

<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->

<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->

<!-- resource). -->


<!-- Hive Execution Parameters -->


<property>

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>


<property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>


<property>

<name>javax.jdo.option.ConnectionUserName</name>

<value>hive</value>

</property>


<property>

<name>javax.jdo.option.ConnectionPassword</name>

<value>hive</value>

</property>


<property>

<name>hive.hwi.war.file</name>

<value>/usr/local/hive/hive-0.9.0/lib/hive-hwi-0.9.0.jar</value>

<description>This is the WAR file with the jsp content for Hive Web Interface</description>

</property>


<property>

<name>datanucleus.fixedDatastore</name>

<value>true</value>

</property>


<property>

<name>datanucleus.autoCreateSchema</name>

<value>false</value>

</property>


<property>

<name>hive.metastore.uris</name>

<value>thrift://127.0.0.1:9083</value>

<description>IP address (or fully-qualified domain name) and port of the metastore host</description>

</property>

</configuration>


Hope you get the idea of installing Hive on Ubuntu Linux. You can approach hadoop consultants to ask and clear your doubts about the software. To get more info related to hadoop development, you can stay in touch with us! Hope, you can easily install and enjoy hive elearning.