Introduction to SAS and Hadoop
Introduction to SAS and Hadoop Course Details:
In this course, you will learn how to use SAS programming methods to read, write, and manipulate Hadoop data. You will learn about Base SAS methods, including reading and writing raw data with the DATA step as well as managing the Hadoop file system and executing Map-Reduce and Pig code from SAS via the HADOOP procedure. In addition, the SAS/ACCESS Interface to Hadoop methods that allow LIBNAME access and SQL pass-through techniques to read and write Hadoop HIVE or Cloudera Impala tables structures is covered. You will receive a brief overview of additional SAS and Hadoop technologies, including DS2, high-performance analytics, SAS LASR Server, and In-Memory Statistics, as well as the computing infrastructure and data access methods that support these.
Call (919) 283-1674 to get a class scheduled online or in your area!
1. Introduction
- What is Hadoop?
- How SAS interfaces with Hadoop
2. Accessing HDFS and Invoking Hadoop Applications from SAS
- Overview of methods available in Base SAS for interacting with Hadoop
- Reading and writing Hadoop files using Base SAS
- Methods
- Executing mapreduce code
- Executing Pig code using PROC HADOOP
3. Using the SQL Pass-Through Facility
- Understand the SQL procedure pass-through facility
- Connecting to a Hadoop Hive database
- Learning methods to query Hive tables
- Investigating Hadoop Hive metadata
- Creating SQL procedure pass-through queries
- Creating and loading Hive tables with SQL pass-through EXECUTE statements
- Handling Hive STRING data types
4. Using the SAS/ACCESS LIBNAME Engine
- Using the LIBNAME statement for Hadoop
- Using data set options
- Creating views
- Combining tables
- Benefits of the LIBNAME method
- Using PROC HDMD to access delimited data, XML data, and other non-Hive formats
- Performance considerations for the SAS/ACCESS LIBNAME statement
- Copying data from a SAS library to a Hive library
5. Partitioning and Clustering Hive Tables
- Identifying partitioning, clustering, and indexing methods in Hive
- How partitioning and clustering can increase query performance
- Creating and loading partitioned and clustered Hive tables
6. Overview of SAS In-Memory Analytics and the Code Accelerator for Hadoop
- Using high-performance procedures and the SASHDAT library engine
- Creating a LASR Analytic server session
- Using the SASIOLA engine
- Executing DS2 threads in the Hadoop cluster to summarize data
- Using PROC HDMD to access HDFS files
*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.
Exercises or hands-on workshops are included with most SAS courses
SAS programmers who need to access data in Hadoop from within SAS