Working with Apache Hive
Working with Apache Hive Course Details:
Hive is the de-facto standard for data warehousing Hadoop. This course starts with a Hive setup and operations and continues into advanced Hive uses. It also discusses performance and execution engines while ending with a practical workshop.
No classes are currenty scheduled for this course.
Call (919) 283-1674 to get a class scheduled online or in your area!
Hive Basics
- Defining Hive Tables
- SQL Queries over Structured Data
- Filtering / Search
- Aggregations / Ordering
- Partitions
- Joins
- Text Analytics (Semi-Structured Data)
Hive Advanced
- Transformation, Aggregation
- Working with Dates, Timestamps, and Arrays
- Converting Strings to Date, Time, and Numbers
- Create new Attributes, Mathematical Calculations, Windowing Functions
- Use Character and String Functions
- Binning and Smoothing
- Processing JSON Data
- Execution Engines (Tez, MR, and Spark)
Impala (for Cloudera track)
- Architecture
- Impala joins and other SQL specifics
Bonus Project
- Students will work in teams to do this end-to-end workshop
- Setup a data warehouse with Hive
- Query and analyze data with Hive and Spark
*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.
Join an engaging hands-on learning environment, where you’ll learn:
- Hive basics and features
- How to process, transform, and manage data
- Processing and performance management
- How to setup a date warehouse with Hive
- Data query and analysis
This course has a 50% hands-on labs to 50% lecture ratio with engaging instruction, demos, group discussions, labs, and project work.
Before attending this course, you should:
- Be familiar with SQL
- Be able to navigate the Linux command line
- Have basic knowledge of command line Linux editors (VI/nano)
Data Scientists, Software Engineers, Developers, and Administrators