Hadoop Tutorial: Schedule your Hadoop jobs intuitively with the new Oozie crontab!

Hue is taking advantage of a new way to specify the frequency of a coordinator in Oozie (OOZIE-1306). Here is how to put it in practice:

The crontab requires Oozie 4. In order to use the previous Frequency drop-down from Oozie 3, the feature can be disabled in hue.ini:


 # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.


As usual feel free to comment on the hue-user list or @gethue!

This article was originally posted 7 months ago.

Tags: oozie tutorial video

Using Hadoop MR2 and YARN with an alternative Job Browser interface

Hue now defaults to using Yarn since version 3.

First, it is a bit simpler to configure Hue with MR2 than in MR1 as Hue does not need to use the Job Tracker plugin since Yarn provides a REST API. Yarn is also going to provide an equivalent of Job Tracker HA with YARN-149.

Here is how to configure the clusters in hue.ini. Mainly, if you are using a pseudo distributed cluster it will work by default. If not, you will just need to update all the localhost to the hostnames of the Resource Manager and History Server:


  # Configuration for YARN (MR2)
  # ------------------------------------------------------------------------

      # Enter the host on which you are running the ResourceManager

      # The port where the ResourceManager IPC listens on

      # Whether to submit jobs to this cluster

      # URL of the ResourceManager API

      # URL of the ProxyServer API

      # URL of the HistoryServer API

  # Configuration for MapReduce (MR1)
  # ------------------------------------------------------------------------

      # Whether to submit jobs to this cluster

And that’s it! You can now look at jobs in Job Browser, get logs and submit jobs to Yarn!

As usual feel free to comment on the hue-user list or @gethue!

This article was originally posted 9 months ago.

Tags: video tutorial yarn Job Browser

A new Spark Web UI: Spark App

Hi Spark Makers!

A Hue Spark application was recently created. It lets users execute and monitor Spark jobs directly from their browser and be more productive.

We previously released the app with an Oozie submission backend but switched to the Spark Job Server (SPARK-818) contributed by Ooyala and Evan’s team at the last Spark Summit. This new server will enable a real interactivity with Spark and is closer to the community.


We hope to work with the community and have support for Python, Java, direct script submission without compiling/uploading and other improvements in the future!

As usual feel free to comment on the hue-user list or @gethue! About questions directly related to Job Server, participate on the pull request, SPARK-818 or the Spark user list!

Get Started!

Currently only Scala jobs are supported and programs need to implement this trait and be packaged into a jar. Here is a WordCount example. To learn more about Spark Job Server, check its README.


We assume you have Scala installed on your system.

Get Spark Job Server

Currently on github on this branch:

git clone https://github.com/ooyala/incubator-spark.git spark-server
cd spark-server
git checkout -b jobserver-preview-2013-12 origin/jobserver-preview-2013-12

Then type:

project jobserver

Get Hue

Currently only on github (will be in CDH5b2):


If Hue and Spark Job Server are not on the same machine update the hue.ini property in desktop/conf/pseudo-distributed.ini:

  # URL of the Spark Job Server.

Get a Spark example to run

Then follow this walk-through and create the example jar that is used in the video demo.

This article was originally posted 9 months ago.

Tags: video tutorial spark

Hadoop Tutorial: Language assistant in Pig Editor with Navigator

In Hue 3.5, a new assistant was added to the Pig Editor: Navigator.

Similarly to the Hive and Impala Editors, functions and Pig statements are made directly available from within the editor:

Navigator is:

  • Sorted by category

  • Searchable

  • Auto-completable (as well as HDFS paths and Metastore tables)

So now, get started with Apache Pig!

This article was originally posted 10 months ago.

Tags: pig video tutorial

Hadoop Tutorial: Submit any Oozie jobs directly from HDFS

With HUE-1476, users can submit Oozie jobs directly from HDFS. Just upload your configuration or browse an existing workspace and select a workflow, coordinator or bundle. A submit button will appear and let you execute the job in one click!

File Browser supports:

  • Parameters from workflow.xml, coordinator.xml, bundle.xml

  • Parameters from job.properties

Oozie Dashboard supports:

  • Dynamic progress and log report

  • One click MapReduce log access

  • Stop, Pause, Rerun buttons

Here is the workflow tutorial used in the video demo.

Of course, the Oozie Editor is still recommended if you want to avoid any XML :)

This article was originally posted 10 months ago.

Tags: oozie video tutorial HDFS

Hadoop Tutorial: Create Hive tables with headers and load quoted CSV data

Hue makes it easy to create Hive tables.

With HUE-1746, Hue guesses the columns names and types (int, string, float…) directly by looking at your data. If your data starts with a header, this one will automatically be used and skipped while creating the table.

Quoted CSV fields are also compatible thanks to HUE-1747.

Here is the data file used:


This is the SerDe for reading quoted CSV:


And the command to switch the SerDe used by the table:

ALTER TABLE banks SET SERDE 'com.bizo.hive.serde.csv.CSVSerde'

Now go analyze the data with the Hive, Impala or Pig editors!

This article was originally posted 10 months ago.

Tags: hive metastore video tutorial

New ZooKeeper Browser app!

Hello animal lovers, in Hue 3, a new application was added in order to make Apache ZooKeeper easier to use: ZooKeeper Browser.

The app is not totally new: it consists of a rebasing from Hue 1 to Hue 3 of the ZooKeeper UI made by Andrei during his Google Summer of Code 3 years ago.

The main two features are:

  • Listing of the ZooKeeper cluster stats and clients

  • Browsing and editing of the ZNode hierarchy

ZooKeeper Browser requires the ZooKeeper REST service to be running on the same host as ZooKeeper itself. Here is how to set it up:

First get and build ZooKeeper:

git clone https://github.com/apache/zookeeper
cd zookeeper
Buildfile: /home/hue/Development/zookeeper/build.xml

    [mkdir] Created dir: /home/hue/Development/zookeeper/build/classes
    [mkdir] Created dir: /home/hue/Development/zookeeper/build/lib
    [mkdir] Created dir: /home/hue/Development/zookeeper/build/package/lib
    [mkdir] Created dir: /home/hue/Development/zookeeper/build/test/lib


Then start the REST service:

cd src/contrib/rest
nohup ant run&

If ZooKeeper and the REST service are not on the same machine as Hue, please update the Hue settings and specify the correct hostnames and ports:



      # Zookeeper ensemble. Comma separated list of Host/Port.
      # e.g. localhost:2181,localhost:2182,localhost:2183
      ## host_ports=localhost:2181

      # The URL of the REST contrib service
      ## rest_url=http://localhost:9998

And that’s it, jump up to ZooKeeper Browser!

As usual feel free to comment on the hue-user list or @gethue!

This article was originally posted 11 months ago.

Tags: zookeeper video tutorial

DBQuery App: MySQL, PostgreSQL, Oracle and Sqlite Query Editors

In Thailand, a brand new application that enables viewing data in MySQL, PostgreSQL, Oracle and Sqlite has been committed.

Inspired from the Beeswax application, it allows you to query a relational database and view it in a table.

This article was originally posted 11 months ago.

Tags: video tutorial dbquery

Integrate external Web applications in any language

Completed in ThailandHUE-826 brings a new way to integrate external Web application into Hue. Java apps or already existing websites can now be shown as a Hue app with little effort.

For example, let’s integrate Tableau:

To create a new app:

build/env/bin/hue create_proxy_app my_hue http://gethue.com
tools/app_reg/app_reg.py --install my_hue --relative-paths

If you want to update the url later, change it in the ini:


As usual feel free to comment on the hue-user list or @gethue!

This article was originally posted 11 months ago.

Tags: sdk video tutorial

Tutorial: Live demo of Search on Hadoop

Hue comes up with some demo Collection/Indexes for the Search Application.

To make the demo work by default, Hue is using a predefined Solr response. Hue displays a warning in this case as the page is not updated when typing a query:

In order to query a live dataset, you need to index some data. Go on the Hue machine:

cd apps/search/examples/bin

Then create the Solr collections:


In case Solr is not on the same machine, add this parameter in the script:
--solr http://localhost:8983/solr

Then index some example data with:


Same, if Solr is on a different machine, update the url:

And that’s it! The above warning message will disappear and you will be able to query Solr indexes in live!

This article was originally posted 1 year ago.

Tags: search tutorial

comments powered by Disqus

Blog Archive

Browse archive

Blog Tags