Learn Apache Hive installation on Ubuntu Linux (14.04 LTS)
Apache Hive is data warehouse software designed on Hadoop. The software facilitates many functions like data analysis, large database management, and data summarization. You must install and take a Hive tour on your Ubuntu Linux. Pioneer hadoop development & integration consultants are sharing this tutorial to make you learn about the basics of Apache Hive and how to install it on Ubuntu Linux. They have mentioned the basic prerequisites you may need for installation. You can learn installing Hive step by step by following the guidelines shared by experts in this article.

What
is Apache Hive?
- The Apache Hive is a data warehouse software built on top of Hadoop.
- Hive facilitates querying, data analysis, data summarization and managing large datasets residing in distributed storage (Usually HDFS and also compitable with Amazon S3 filesystem).
- Hive also provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
- HiveQL also allows traditional map/reduce programmers to plug in their custom mappers and reducers.
Conventions that I followed in this article:
commands
configuration file names
Lines to be inserted in files.
Basic
Prerequisites
- Java
- Hadoop
This tutorial has been tested using following environment.
OS : Ubuntu Linux(14.04 LTS) – 64bit
Java : Oracle Java 1.7.0_75
Hadoop : Hadoop - 0.20.1
Hive : Hive – 0.9.0
Note:
Hive uses Embedded apache Derby for storing metastore. If you want to use any other RDBMS for storing metastore server, you have to install that perticular RDBMS in your system or you shuold have URL of the RDBMS of your metastore server(in case of central metastore server).
In order to install Hive, please follow the below steps.
Step 1: Download Apache Hive & Extract it.
- Download Hive from here
- Enter into directory, where Hive is downloaded, By default it is downloaded into /Downloads
$ cd Downloads/
- Extract tar file using following command
$ tar -xzvf hive-0.9.0-bin.tar.gz
Create
directory using following command in /user/local (you can also use
your desired location).
$ sudo mkdir /usr/local/hive
- Move extracted hive-0.9.0-bin folder to newly created directory using following command
$ mv hive-0.9.0-bin /usr/local/hive
Step 2: Edit ".bashrc" file to update environment variables for user.
$ gedit ~/.bashrc
Add the following lines to the file, prefferably you have to add them at the end of file.
export HADOOP_HOME=/usr/local/hadoop/hadoop-0.20.1
export PATH=$PATH:$HADOOP_HOME/bin
export HIVE_HOME=/usr/local/hive/hive-0.9.0-bin
export PATH=$PATH:$HIVE_HOME/bin
My, "bashrc" file looks like,

Step 3: Create Hive directories within HDFS.
Now, start hadoop if it is not already running. And make sure that it is running and it is not in safe mode.
$ hadoop fs -mkdir /user/hive/warehouse
The directory "warehouse" is the location to store the table or data related to hive.
$
hadoop fs -mkdir /tmp
The
temporary directory “tmp”
is the temporary location to store the intermediate result of
processing.
Step 4: Set Permissions for read/write on those folders.
$ hadoop fs -chmod g+w /user/hive/warehouse
$ hadoop fs -chmod g+w /user/tmp
Step 5: Set Hadoop path in Hive

Now all the configurations are complete you can launch the hive consol.
You can launch hive consol by following command,
$ hive

Now,
your Hive is running, you can run hbase shell commands.
You
can exit from that Hive shell by using
exit
command.
Note:
Though
Hive is compitable with most of the hadoop versions, you have to
cross varify the compitibility of hadoop and hive before installing
hive.
Optional Step:
You
can also change your hive site or metastore server location by
editing
“hive-site.xml"
file in
“/conf"
folder of your hive installtion.
For
My installation, I used mysql as metastore server , my
“hive-site.xml"
file looks like,
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->
<!-- Hive Execution Parameters -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>/usr/local/hive/hive-0.9.0/lib/hive-hwi-0.9.0.jar</value>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://127.0.0.1:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>
Hope you get the idea of installing Hive on Ubuntu Linux. You can approach hadoop consultants to ask and clear your doubts about the Java Software Development. To get more info related to hadoop development, you can stay in touch with us! Hope, you can easily install and enjoy hive elearning.