Hue is taking advantage of a new way to specify the frequency of a coordinator in Oozie (OOZIE-1306). Here is how to put it in practice:
The crontab requires Oozie 4. In order to use the previous Frequency drop-down from Oozie 3, the feature can be disabled in hue.ini:
[oozie] # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit. enable_cron_scheduling=false
First, it is a bit simpler to configure Hue with MR2 than in MR1 as Hue does not need to use the Job Tracker plugin since Yarn provides a REST API. Yarn is also going to provide an equivalent of Job Tracker HA with YARN-149.
Here is how to configure the clusters in hue.ini. Mainly, if you are using a pseudo distributed cluster it will work by default. If not, you will just need to update all the localhost to the hostnames of the Resource Manager and History Server:
[hadoop] ... # Configuration for YARN (MR2) # ------------------------------------------------------------------------ [[yarn_clusters]] [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=localhost # The port where the ResourceManager IPC listens on resourcemanager_port=8032 # Whether to submit jobs to this cluster submit_to=True # URL of the ResourceManager API resourcemanager_api_url=http://localhost:8088 # URL of the ProxyServer API proxy_api_url=http://localhost:8088 # URL of the HistoryServer API history_server_api_url=http://localhost:19888 # Configuration for MapReduce (MR1) # ------------------------------------------------------------------------ [[mapred_clusters]] [[[default]]] # Whether to submit jobs to this cluster submit_to=False
And that’s it! You can now look at jobs in Job Browser, get logs and submit jobs to Yarn!
Hi Spark Makers!
We previously released the app with an Oozie submission backend but switched to the Spark Job Server (SPARK-818) contributed by Ooyala and Evan’s team at the last Spark Summit. This new server will enable a real interactivity with Spark and is closer to the community.
We hope to work with the community and have support for Python, Java, direct script submission without compiling/uploading and other improvements in the future!
We assume you have Scala installed on your system.
Currently on github on this branch:
git clone https://github.com/ooyala/incubator-spark.git spark-server cd spark-server git checkout -b jobserver-preview-2013-12 origin/jobserver-preview-2013-12
sbt/sbt project jobserver re-start
Currently only on github (will be in CDH5b2):
If Hue and Spark Job Server are not on the same machine update the hue.ini property in desktop/conf/pseudo-distributed.ini:
[spark] # URL of the Spark Job Server. server_url=http://localhost:8090/
Then follow this walk-through and create the example jar that is used in the video demo.
In Hue 3.5, a new assistant was added to the Pig Editor: Navigator.
Sorted by category
Auto-completable (as well as HDFS paths and Metastore tables)
So now, get started with Apache Pig!
With HUE-1476, users can submit Oozie jobs directly from HDFS. Just upload your configuration or browse an existing workspace and select a workflow, coordinator or bundle. A submit button will appear and let you execute the job in one click!
File Browser supports:
Parameters from workflow.xml, coordinator.xml, bundle.xml
Parameters from job.properties
Oozie Dashboard supports:
Dynamic progress and log report
One click MapReduce log access
Stop, Pause, Rerun buttons
Here is the workflow tutorial used in the video demo.
Of course, the Oozie Editor is still recommended if you want to avoid any XML :)
Hue makes it easy to create Hive tables.
With HUE-1746, Hue guesses the columns names and types (int, string, float…) directly by looking at your data. If your data starts with a header, this one will automatically be used and skipped while creating the table.
Quoted CSV fields are also compatible thanks to HUE-1747.
Here is the data file used:
This is the SerDe for reading quoted CSV:
And the command to switch the SerDe used by the table:
ALTER TABLE banks SET SERDE 'com.bizo.hive.serde.csv.CSVSerde'
The app is not totally new: it consists of a rebasing from Hue 1 to Hue 3 of the ZooKeeper UI made by Andrei during his Google Summer of Code 3 years ago.
The main two features are:
Listing of the ZooKeeper cluster stats and clients
Browsing and editing of the ZNode hierarchy
ZooKeeper Browser requires the ZooKeeper REST service to be running on the same host as ZooKeeper itself. Here is how to set it up:
First get and build ZooKeeper:
git clone https://github.com/apache/zookeeper cd zookeeper ant Buildfile: /home/hue/Development/zookeeper/build.xml init: [mkdir] Created dir: /home/hue/Development/zookeeper/build/classes [mkdir] Created dir: /home/hue/Development/zookeeper/build/lib [mkdir] Created dir: /home/hue/Development/zookeeper/build/package/lib [mkdir] Created dir: /home/hue/Development/zookeeper/build/test/lib …
Then start the REST service:
cd src/contrib/rest nohup ant run&
If ZooKeeper and the REST service are not on the same machine as Hue, please update the Hue settings and specify the correct hostnames and ports:
[zookeeper] [[clusters]] [[[default]]] # Zookeeper ensemble. Comma separated list of Host/Port. # e.g. localhost:2181,localhost:2182,localhost:2183 ## host_ports=localhost:2181 # The URL of the REST contrib service ## rest_url=http://localhost:9998
And that’s it, jump up to ZooKeeper Browser!
In Thailand, a brand new application that enables viewing data in MySQL, PostgreSQL, Oracle and Sqlite has been committed.
Inspired from the Beeswax application, it allows you to query a relational database and view it in a table.
For example, let’s integrate Tableau:
To create a new app:
build/env/bin/hue create_proxy_app my_hue http://gethue.com tools/app_reg/app_reg.py --install my_hue --relative-paths
If you want to update the url later, change it in the ini:
Hue comes up with some demo Collection/Indexes for the Search Application.
To make the demo work by default, Hue is using a predefined Solr response. Hue displays a warning in this case as the page is not updated when typing a query:
In order to query a live dataset, you need to index some data. Go on the Hue machine:
Then create the Solr collections:
In case Solr is not on the same machine, add this parameter in the script:
Then index some example data with:
Same, if Solr is on a different machine, update the url:
And that’s it! The above warning message will disappear and you will be able to query Solr indexes in live!
comments powered by Disqus