Hadoop Tutorial: Hive Query editor with HiveServer2 and Sentry

Hue provides a Web interface for submitting Hive queries. Hue had its own server to service Hive queries called Beeswax. The more sophisticated and robust service, Apache HiveServer2, is supported as of Hue 2.5.


Beeswax Hive Editor

Thanks to HiveServer2 integration, Hue is getting the benefits from Sentry (How to configure Sentry Video). In addition to the security provided, Hue’s interface becomes more consistent. For example, a user without permissions on a database or table won’t see it in the query editor or in the Metastore app.


HiveServer2 also provides performant access to the Metastore.


On top of this, the Beeswax Hive UI is a Web editor for increasing the productivity:

  • Syntax highlighting and auto completion

  • Submit several queries and check they progress later

  • UDF integration

  • Multiple queries execution

  • Select and send a fraction of a query

  • Download or save the query results

  • Navigate through the metadata



Hue 2.x

We recommend to use the latest version of Hue (2.5). Have Hue point to HiveServer2 by updating the Beeswax section in the hue.ini:


[beeswax]
  beeswax_server_host=<FQDN of Beeswax server>
  server_interface=hiveserver2
  beeswax_server_port=10000


Hue 3.x

Hue 3 does not bundle Beeswaxd anymore, and is configured by default to use HiveServer2. If HiveServer2 is not on the same machine as Hue update hue.ini with:


[beeswax]
 hive_server_host=<FQDN of HiveServer2>

Other Hive specific settings (e.g. security, impersonation) are read from a local /etc/hive/conf/hive-site.xml. We recommend to keep this one in exact sync with the original Hive one (or put Hue and Hive on the same machine).


Note
:

If you are using Hive 0.12 or later, Hue needs to have HUE-1561 (or use Hue 3.0 or later).


With Sentry: Hue 2.x or 3.x

Hue will automatically work with a HiveServer2 configured with Sentry.

Notice that HiveServer2 impersonation (described below) should be turned off in case of Sentry. Permissions of the impersonated user (e.g. ‘bob’) will be used instead of the ones of the ‘hue’ user. Also we need the warehouse permissions to be owned by hive:hive with 770 so that only super users in hive group can read, write.

HiveServer2 needs to be using strong authentication like Kerberos/LDAP for Sentry to work.


Troubleshooting without Sentry

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=hive, access=WRITE, inode="/user/test/data":test:supergroup:drwxr-xr-x


By default HiveServer2 now owns the Hive warehouse (default ‘/user/hive/warehouse’), meaning the data files need to belong to the ‘hive’ user. If you get this error when creating a table, change the permission of the data directory (here /user/test/data) to ‘write’ for everybody or revert HiveServer2 to the old Beeswax behavior by authorizing ‘hive’ to impersonate the user.

Adding ‘hive’ as a Hadoop proxy user and edit your hive-site.xml:

 <property>
   <name>hive.server2.enable.doAs</name>
   <value>true</value>
 </property>

Then restart HiveServer2:

sudo service hive-server2 restart



Another common error when using YARN is:

Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.

It means that the HADOOP_MAPRED_HOME environment variable is not set to:

export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce

HADOOP_HOME could also be wrong.


TTransportException('Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found',)

Hue is missing a SASL lib in your system.

HiveServer2 supports 3 authentication modes specified by the ‘hive.server2.authentication’ in hive-site.xml:

  • NOSASL

  • NONE (default)

  • KERBEROS


Only NOSASL does not require SASL, so you either switch to it or install the missing packages.

Hue will pick the value from its local /etc/hive/conf/hive-site.xml so make sure it is synced with the original hive-site.xml (manually or via CM Beeswax safety valve).


Error while compiling statement: FAILED: RuntimeException org.apache.hadoop.security.AccessControlException: Permission denied: user=admin, access=WRITE, inode="/tmp/hive-hive":hive:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:234) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:214) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:158)

The Hive HDFS workspace ‘/tmp/hive-hive’ would need to be set to 1777 permissions.

Troubleshooting with Sentry

AuthorizationException: User 'hue/test.com' does not have privileges to execute 'CREATE' on: default.sample_08"

The user ‘hue’ is not configured in Sentry and have not the CREATE table permission. 

Conclusion

Hue provides a great environment for executing Hive queries in a friendly UI. Beeswaxd was a great service but has been deprecated in favor of HiveServer2. HiveServer2 offers more stability and security.


As a side note, if you are looking for even faster SQL queries, we encourage you to test the Impala Editor!


If you have questions or feedback, feel free to contact the Hue community on hue-user or @gethue.com!


This article was originally posted 6 months ago.

Tags: hive tutorial video


comments powered by Disqus

Blog Archive

Browse archive

Blog Tags

loading...