Learn Apache Hive installation on Ubuntu Linux (14.04 LTS)
Apache Hive is data warehouse software designed on Hadoop. The software facilitates many functions like data analysis, large database management, and data summarization. You must install and take a Hive tour on your Ubuntu Linux. Pioneer hadoop development & integration consultants are sharing this tutorial to make you learn about the basics of Apache Hive and how to install it on Ubuntu Linux. They have mentioned the basic prerequisites you may need for installation. You can learn installing Hive step by step by following the guidelines shared by experts in this article.
What is Apache Hive?
- The Apache Hive is a data warehouse software built on top of Hadoop.
- Hive facilitates querying, data analysis, data summarization and managing large datasets residing in distributed storage (Usually HDFS and also compitable with Amazon S3 filesystem).
- Hive also provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
- HiveQL also allows traditional map/reduce programmers to plug in their custom mappers and reducers.
Conventions that I followed in this article:
configuration file names
Lines to be inserted in files.
This tutorial has been tested using following environment.
OS : Ubuntu Linux(14.04 LTS) – 64bit
Java : Oracle Java 1.7.0_75
Hadoop : Hadoop - 0.20.1
Hive : Hive – 0.9.0
Hive uses Embedded apache Derby for storing metastore. If you want to use any other RDBMS for storing metastore server, you have to install that perticular RDBMS in your system or you shuold have URL of the RDBMS of your metastore server(in case of central metastore server).
In order to install Hive, please follow the below steps.
Step 1: Download Apache Hive & Extract it.
- Download Hive from here
- Enter into directory, where Hive is downloaded, By default it is downloaded into /Downloads
$ cd Downloads/
- Extract tar file using following command
$ tar -xzvf hive-0.9.0-bin.tar.gz
Create directory using following command in /user/local (you can also use your desired location).
$ sudo mkdir /usr/local/hive
- Move extracted hive-0.9.0-bin folder to newly created directory using following command
$ mv hive-0.9.0-bin /usr/local/hive
Step 2: Edit ".bashrc" file to update environment variables for user.
$ gedit ~/.bashrc
Add the following lines to the file, prefferably you have to add them at the end of file.
My, "bashrc" file looks like,
Step 3: Create Hive directories within HDFS.
Now, start hadoop if it is not already running. And make sure that it is running and it is not in safe mode.
$ hadoop fs -mkdir /user/hive/warehouse
The directory "warehouse" is the location to store the table or data related to hive.
hadoop fs -mkdir /tmp
The temporary directory “tmp” is the temporary location to store the intermediate result of processing.
Step 4: Set Permissions for read/write on those folders.
$ hadoop fs -chmod g+w /user/hive/warehouse
$ hadoop fs -chmod g+w /user/tmp
Step 5: Set Hadoop path in Hive
Now all the configurations are complete you can launch the hive consol.
You can launch hive consol by following command,
your Hive is running, you can run hbase shell commands.
can exit from that Hive shell by using exit
Hive is compitable with most of the hadoop versions, you have to
cross varify the compitibility of hadoop and hive before installing
can also change your hive site or metastore server location by
file in “/conf"
folder of your hive installtion.
My installation, I used mysql as metastore server , my
file looks like,
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->
<!-- Hive Execution Parameters -->
<description>JDBC connect string for a JDBC metastore</description>
<description>Driver class name for a JDBC metastore</description>
<description>This is the WAR file with the jsp content for Hive Web Interface</description>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
Hope you get the idea of installing Hive on Ubuntu Linux. You can approach hadoop consultants to ask and clear your doubts about the software. To get more info related to hadoop development, you can stay in touch with us! Hope, you can easily install and enjoy hive elearning.