Hadoop Tutorial: Schedule your Hadoop jobs intuitively with the new Oozie crontab!

Hue is taking advantage of a new way to specify the frequency of a coordinator in Oozie (OOZIE-1306). Here is how to put it in practice:


The crontab requires Oozie 4. In order to use the previous Frequency drop-down from Oozie 3, the feature can be disabled in hue.ini:


[oozie]

 # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit.

 enable_cron_scheduling=false

As usual feel free to comment on the hue-user list or @gethue!


This article was originally posted 4 months ago.

Tags: oozie tutorial video


Using Hadoop MR2 and YARN with an alternative Job Browser interface

Hue now defaults to using Yarn since version 3.


First, it is a bit simpler to configure Hue with MR2 than in MR1 as Hue does not need to use the Job Tracker plugin since Yarn provides a REST API. Yarn is also going to provide an equivalent of Job Tracker HA with YARN-149.

Here is how to configure the clusters in hue.ini. Mainly, if you are using a pseudo distributed cluster it will work by default. If not, you will just need to update all the localhost to the hostnames of the Resource Manager and History Server:


[hadoop]
  ...

  # Configuration for YARN (MR2)
  # ------------------------------------------------------------------------
  [[yarn_clusters]]

    [[[default]]]
      # Enter the host on which you are running the ResourceManager
      resourcemanager_host=localhost

      # The port where the ResourceManager IPC listens on
      resourcemanager_port=8032

      # Whether to submit jobs to this cluster
      submit_to=True

      # URL of the ResourceManager API
      resourcemanager_api_url=http://localhost:8088

      # URL of the ProxyServer API
      proxy_api_url=http://localhost:8088

      # URL of the HistoryServer API
      history_server_api_url=http://localhost:19888

  # Configuration for MapReduce (MR1)
  # ------------------------------------------------------------------------
  [[mapred_clusters]]

    [[[default]]]
   
      # Whether to submit jobs to this cluster
      submit_to=False



And that’s it! You can now look at jobs in Job Browser, get logs and submit jobs to Yarn!


As usual feel free to comment on the hue-user list or @gethue!


This article was originally posted 6 months ago.

Tags: video tutorial yarn Job Browser


A new Spark Web UI: Spark App

Hi Spark Makers!


A Hue Spark application was recently created. It lets users execute and monitor Spark jobs directly from their browser and be more productive.


We previously released the app with an Oozie submission backend but switched to the Spark Job Server (SPARK-818) contributed by Ooyala and Evan’s team at the last Spark Summit. This new server will enable a real interactivity with Spark and is closer to the community.

 

We hope to work with the community and have support for Python, Java, direct script submission without compiling/uploading and other improvements in the future!


As usual feel free to comment on the hue-user list or @gethue! About questions directly related to Job Server, participate on the pull request, SPARK-818 or the Spark user list!


Get Started!

Currently only Scala jobs are supported and programs need to implement this trait and be packaged into a jar. Here is a WordCount example. To learn more about Spark Job Server, check its README.


Requirements

We assume you have Scala installed on your system.


Get Spark Job Server

Currently on github on this branch:

git clone https://github.com/ooyala/incubator-spark.git spark-server
cd spark-server
git checkout -b jobserver-preview-2013-12 origin/jobserver-preview-2013-12


Then type:

sbt/sbt
project jobserver
re-start


Get Hue

Currently only on github (will be in CDH5b2):

https://github.com/cloudera/hue#getting-started


If Hue and Spark Job Server are not on the same machine update the hue.ini property in desktop/conf/pseudo-distributed.ini:

[spark]
  # URL of the Spark Job Server.
  server_url=http://localhost:8090/

Get a Spark example to run

Then follow this walk-through and create the example jar that is used in the video demo.


This article was originally posted 7 months ago.

Tags: video tutorial spark


Hadoop Tutorial: Language assistant in Pig Editor with Navigator

In Hue 3.5, a new assistant was added to the Pig Editor: Navigator.


Similarly to the Hive and Impala Editors, functions and Pig statements are made directly available from within the editor:



Navigator is:

  • Sorted by category

  • Searchable

  • Auto-completable (as well as HDFS paths and Metastore tables)


So now, get started with Apache Pig!


This article was originally posted 7 months ago.

Tags: pig video tutorial


Hadoop Tutorial: Submit any Oozie jobs directly from HDFS

With HUE-1476, users can submit Oozie jobs directly from HDFS. Just upload your configuration or browse an existing workspace and select a workflow, coordinator or bundle. A submit button will appear and let you execute the job in one click!

File Browser supports:

  • Parameters from workflow.xml, coordinator.xml, bundle.xml

  • Parameters from job.properties

Oozie Dashboard supports:

  • Dynamic progress and log report

  • One click MapReduce log access

  • Stop, Pause, Rerun buttons


Here is the workflow tutorial used in the video demo.


Of course, the Oozie Editor is still recommended if you want to avoid any XML :)


This article was originally posted 8 months ago.

Tags: oozie video tutorial HDFS


Hadoop Tutorial: Create Hive tables with headers and load quoted CSV data

Hue makes it easy to create Hive tables.

With HUE-1746, Hue guesses the columns names and types (int, string, float…) directly by looking at your data. If your data starts with a header, this one will automatically be used and skipped while creating the table.

Quoted CSV fields are also compatible thanks to HUE-1747.



Here is the data file used:

http://www.fdic.gov/bank/individual/failed/banklist.html


This is the SerDe for reading quoted CSV:

https://github.com/ogrodnek/csv-serde


And the command to switch the SerDe used by the table:

ALTER TABLE banks SET SERDE 'com.bizo.hive.serde.csv.CSVSerde'



Now go analyze the data with the Hive, Impala or Pig editors!


This article was originally posted 8 months ago.

Tags: hive metastore video tutorial


New ZooKeeper Browser app!

Hello animal lovers, in Hue 3, a new application was added in order to make Apache ZooKeeper easier to use: ZooKeeper Browser.


The app is not totally new: it consists of a rebasing from Hue 1 to Hue 3 of the ZooKeeper UI made by Andrei during his Google Summer of Code 3 years ago.



The main two features are:

  • Listing of the ZooKeeper cluster stats and clients

  • Browsing and editing of the ZNode hierarchy


ZooKeeper Browser requires the ZooKeeper REST service to be running on the same host as ZooKeeper itself. Here is how to set it up:


First get and build ZooKeeper:

git clone https://github.com/apache/zookeeper
cd zookeeper
ant
Buildfile: /home/hue/Development/zookeeper/build.xml

init:
    [mkdir] Created dir: /home/hue/Development/zookeeper/build/classes
    [mkdir] Created dir: /home/hue/Development/zookeeper/build/lib
    [mkdir] Created dir: /home/hue/Development/zookeeper/build/package/lib
    [mkdir] Created dir: /home/hue/Development/zookeeper/build/test/lib

…

Then start the REST service:

cd src/contrib/rest
nohup ant run&


If ZooKeeper and the REST service are not on the same machine as Hue, please update the Hue settings and specify the correct hostnames and ports:


[zookeeper]

  [[clusters]]

    [[[default]]]
      # Zookeeper ensemble. Comma separated list of Host/Port.
      # e.g. localhost:2181,localhost:2182,localhost:2183
      ## host_ports=localhost:2181

      # The URL of the REST contrib service
      ## rest_url=http://localhost:9998


And that’s it, jump up to ZooKeeper Browser!


As usual feel free to comment on the hue-user list or @gethue!



This article was originally posted 8 months ago.

Tags: zookeeper video tutorial


DBQuery App: MySQL, PostgreSQL, Oracle and Sqlite Query Editors

In Thailand, a brand new application that enables viewing data in MySQL, PostgreSQL, Oracle and Sqlite has been committed.


Inspired from the Beeswax application, it allows you to query a relational database and view it in a table.


This article was originally posted 8 months ago.

Tags: video tutorial dbquery


Integrate external Web applications in any language

Completed in ThailandHUE-826 brings a new way to integrate external Web application into Hue. Java apps or already existing websites can now be shown as a Hue app with little effort.


For example, let’s integrate Tableau:



To create a new app:

build/env/bin/hue create_proxy_app my_hue http://gethue.com
tools/app_reg/app_reg.py --install my_hue --relative-paths


If you want to update the url later, change it in the ini:

[my_hue]
url=http://gethue.com


As usual feel free to comment on the hue-user list or @gethue!


This article was originally posted 8 months ago.

Tags: sdk video tutorial


Tutorial: Live demo of Search on Hadoop

Hue comes up with some demo Collection/Indexes for the Search Application.

To make the demo work by default, Hue is using a predefined Solr response. Hue displays a warning in this case as the page is not updated when typing a query:

In order to query a live dataset, you need to index some data. Go on the Hue machine:

cd $HUE_HOME
cd apps/search/examples/bin

Then create the Solr collections:

./create_collections.sh

In case Solr is not on the same machine, add this parameter in the script:
--solr http://localhost:8983/solr

Then index some example data with:

./post.sh

Same, if Solr is on a different machine, update the url:
URL=http://localhost:8983/solr

And that’s it! The above warning message will disappear and you will be able to query Solr indexes in live!


This article was originally posted 1 year ago.

Tags: search tutorial


comments powered by Disqus

Blog Archive

Browse archive

Blog Tags

loading...