Hadoop Tutorial: how to create example tables in HBase

Hue brings another new app for making Apache Hadoop easier to use: HBase Browser. Apache HBase is the main keyvalue datastore for Hadoop. This post is the first episode describing the new user experience brought by the app. We will start by describing how to create some sample tables with various HBase schemas.

If you want to see the HBase Browser demo, jump episode 2!


Tutorial

When building the new HBase Browser, we wanted to test the app against various HBase tables. It happened to be difficult to find some ready to play with schema and data to load. Hence, base on the most common uses cases, we created our our own HBase schemas and decided to share them in order to help anybody wanting to start with HBase.


This how-to describes how to create a very simple table that counts the daily number of votes for certain candidates and get you warmed-up. Then part 2 focuses on creating a HBase table  with lot of columns and part 3 about inserting and visualizing binary data.



Setup

The HBase Browser application is tailored for quickly browsing huge tables and accessing any content. You can also create new tables, add data, modify existing cells and filter data with the autocompleting search bar.


The first step is to install HBase in your Hadoop cluster. We recommend to use the CDH packages. HBase Browser requires the Thrift 1 service to be started.


Then, grab the app from a special tarball release of Hue or get the latest and slickest version from the nightly ‘hue’ package. CDH 4.4 (target date early September) will bring a stable v1. After the installation, if HBase master is not running on the same host as Hue, have the app pointing to it by updating the hue.ini and restarting Hue.


Then go to http://127.0.0.1:8888/hbase/ to check that all is setup correctly! We show in the video how to create a table and add some columns in just a few clicks. In the next steps, we are showing how to create and populate a real life example table.


The sample data and scripts are published on github. In a terminal, use git to retrieve the repository:

cd /tmp
git clone https://github.com/romainr/hadoop-tutorials-examples.git
cd hbase-tables


Analytics table

The goals of this data is to show the search and smart layout of HBase Browser.


This table contains more than 1000 columns of text. The idea is to have counters for 3 Web domains of 3 countries for each hour of the day. The data is then aggregated by day and for all the countries.


image

Schema of the table


How to create the HBase table and insert some data:


  1. Generate column names and data with create_schemas.py. Run it with ./create_schemas.py

  2. Upload the date data /tmp/hbase-analytics.tsv to HDFS with File Browser

  3. In HBase Browser create a ‘analytics’ table with 3 column families ‘hour’, ‘day’, ‘total

  4. Load the data into the analytics table with the HBase bulk import command.


It will trigger a MapReduce job and display the progress of the import.

That’s it! Go open the analytics table in HBase Browser!


Binary table

This second tables focus on big data cells, various formats, demonstrating the preview and editing of data within HBase Browser.

We are re-using the app API for inserting into HBase some cells of various content types, e.g. text, json, pictures, binary…


  1. First create a table ‘events’ with a column family ‘doc’.

  2. Then cd in the root of Hue

  3. cd /usr/share/hue

  4. /opt/cloudera/parcels/CDH-4.X/share/hue (if using parcels)

And start the Hue shell build/env/bin/hue shell and type the content of locad_binary.py:


Load the HBase API and insert some text data:


from hbase.api import HbaseApi

HbaseApi().putRow('Cluster', 'events', 'hue-20130801', {'doc:txt': 'Hue is awesome!'})
HbaseApi().putRow('Cluster', 'events', 'hue-20130801', {'doc:json': '{"user": "hue", "coolness": "extra"}'})
HbaseApi().putRow('Cluster', 'events', 'hue-20130802', {'doc:version': 'I like HBase'})
HbaseApi().putRow('Cluster', 'events', 'hue-20130802', {'doc:version': 'I LOVE HBase'})


Then insert a picture, and HTML page and a PDF:

root='/tmp/hadoop-tutorials-examples'

HbaseApi().putRow('Cluster', 'events', 'hue-20130801', {'doc:img': open(root + '/hbase-tables/data/hue-logo.png', "rb").read()})
HbaseApi().putRow('Cluster', 'events', 'hue-20130801', {'doc:html': open(root + '/hbase-tables/data/gethue.com.html', "rb").read()})
HbaseApi().putRow('Cluster', 'events', 'hue-20130801', {'doc:pdf': open(root + '/hbase-tables/data/gethue.pdf', "rb").read()})


Notice that the column names do not matter for the type detection. The go look at the events table and play around!


Conclusion

These two schemas and data enable the user to easily get started with HBase. This first version of HBase Browser brings a new way to quickly explore and search for some rows and columns. New versions will support bulk loads and upload in order to completely free the user from the command line.


The new HBase Browser app will be demo-ed on these two tables in the upcoming blog posts, so stay tuned!


This article was originally posted 1 year ago.

Tags: hbase video tutorial


comments powered by Disqus

Blog Archive

Browse archive

Blog Tags

loading...