Hadoop tutorial: High Availability of Hue

Very few projects within the Hadoop umbrella have as much end user visibility as Hue. Thus, it is useful to add a degree of fault tolerance to deployments. This blog post describes how to achieve a higher level of availability (HA) by placing several Hue instances behind a load balancer.

Tutorial

This tutorial demonstrates how to setup high availability by:

  1. Installing Hue 2.3 on two nodes in a three-node RedHat 5 cluster.

  2. Managing all Hue instances via Cloudera Manager 4.7.

  3. Load balancing using HA Proxy 1.4. In reality, any load balancer with sticky sessions should work.


Here is a video summary of the new features:

Installing Hue

Hue should be installed on two of the three nodes. To have Cloudera Manager automatically install Hue, follow the “Parcel Install via Cloudera Manager” section. To install manually, follow the “Package Install” section.

Parcel Install via Cloudera Manager

For more information on Parcels, see Managing Parcels.

  1. From Cloudera Manager, click on “Hosts” in the menu. Then, go to the “Parcels” section.

  2. Find the latest CDH parcel, click “Download”.

  3. Once the parcel has finished downloading, click “Distribute”.

  4. Once the parcel has finished distributing, click “Activate”.

Package Install

  1. Download the yum repository RPM.

  2. Install the yum repository using “sudo yum —nogpgcheck localinstall cloudera-cdh-4-0.x86_64.rpm”. For more information, see Installing CDH4.

  3. Install Hue on each node using the command “sudo yum install hue” via the command line interface. For more information on installing Hue, see CDH documentation.

Managing Hue through Cloudera Manager

Cloudera Manager provides management of the Hue servers on each node. Add two Hue services using the directions below. For more information on managing services, see the Cloudera Manager documentation.

  1. Go to “Services -> All Services” in the menu.

  2. Click “Actions -> Add a Service”.

  3. Select “Hue” and follow the steps on the screen. NOTE: For each Hue service we choose a unique host.

  4. Ensure that the “Jobsub Examples and Templates Directory” configuration points to different directories in HDFS for each Hue service. It can be changed by going to Services -> <hue service>. In the menu, go to Configuration -> View and Edit. Then, click on “Hue Server”. “Jobsub Examples and Templates Directory” should be at the bottom of the page.


image

Image 1: Cloudera Manager handling two Hue services.


HA Proxy Installation/Configuration

  1. Download and unzip the binary distribution of HA Proxy 1.4 on the node that doesn’t have Hue installed.

  2. Add the following HA Proxy configuration to /tmp/hahue.conf:

global
    daemon
    nbproc 1
    maxconn 100000
    log 127.0.0.1 local6 debug

defaults
    option http-server-close
    mode http
    timeout http-request 5s
    timeout connect 5s
    timeout server 10s
    timeout client 10s

listen Hue 0.0.0.0:80
    log global
    mode http
    stats enable
    balance source
    server hue1 servera.cloudera.com:8888 cookie ServerA check inter 2000 fall 3
    server hue2 serverb.cloudera.com:8888 cookie ServerB check inter 2000 fall 3
  1. Start HA Proxy:

haproxy -f /tmp/hahue.conf


The key configuration options are balance and server in the listen section. When the balance parameter is set to source, a client is guaranteed to communicate with the same server every time it makes a request. If the server the client is communicating with goes down, the request will automatically be sent to another active server. This is necessary because Hue stores session information in process memory. The server parameters define which servers will be used for load balancing and takes on the form:

server  [:port] [settings ...]


In the configuration above, the server “hue1” is available at “servera.cloudera.com:8888” and “hue2” is available at “serverb.cloudera.com:8888”. Both servers have health checks every two seconds and are declared down after three failed health checks. In this example, HAProxy is configured to bind to “0.0.0.0:80”. Thus, Hue should now be available at “http://serverc.cloudera.com”.

 

Conclusion

Hue can be load balanced easily as long as the server a client is directed to is constant (i.e.: sticky sessions). It can improve performance, but the primary goal is high availability. Also, multiple Hue instances can be easily managed through Cloudera Manager. For true High Availability, Hue needs to be configured to use HA MySQL, PostGreSQL, or Oracle.


Coming up, there will be a blog post on JobTracker HA with Hue. Have any suggestions? Feel free to tell us what you think through hue-user.


This article was originally posted 11 months ago.

Tags: video tutorial


comments powered by Disqus

Blog Archive

Browse archive

Blog Tags

loading...