In the previous installment of the demo series about Hue — the open source Web UI that makes Apache Hadoop easier to use — you learned how to analyze data with Hue using Apache Hive via Hue’s Beeswax and Catalog applications. In this installment, we’ll focus on using the new editor for Apache Pig in Hue 2.3.
Complementing the editors for Hive and Cloudera Impala, the Pig editor provides a great starting point for exploration and real-time interaction with Hadoop. This new application lets you edit and run Pig scripts interactively in an editor tailored for a great user experience. Features include:
Here’s a short video demoing its capabilities and ease of use:
Here is the Pig script used and explained in this demo. It is loading the Yelp business file that was converted in the previous demo and computing the top-25 most reviewed restaurants:
business = LOAD '/user/hive/warehouse/business/yelp_academic_dataset_business_clean.json' AS (business_id: CHARARRAY, categories: CHARARRAY, city: CHARARRAY, full_address: CHARARRAY, latitude: FLOAT, longitude: FLOAT, name: CHARARRAY, neighborhoods: CHARARRAY, open: BOOLEAN, review_count: INT, stars: FLOAT, state: CHARARRAY, type: CHARARRAY); business_group = GROUP business BY city; business_by_city = FOREACH business_group GENERATE group, COUNT(business) AS ct; top = ORDER business_by_city BY ct DESC; top_25 = LIMIT top 25; DUMP top_25;
comments powered by Disqus