Data Mining Techniques: Theory and Practice
Data Mining Techniques: Theory and Practice Course Details:
In this course, you will learn about data mining methodology that is a superset to the SAS SEMMA methodology around which SAS Enterprise Miner is organized. You will also learn about a wide range of data mining algorithms as well as theoretical knowledge and practical skills. In this class, you will work through all the steps of a data mining project, beginning with problem definition and data selection, and continuing through data exploration, data transformation, sampling, portioning, modeling, and assessment.
Call (919) 283-1674 to get a class scheduled online or in your area!
1. Introduction to Data Mining
- What is data mining?
- Directed and undirected data mining
- Models
- Profiling and prediction
2. Data Mining Methodology
- Why have a methodology?
- How data miners can inadvertently learn things that are not true
- Translating business problems into data mining problems
- The importance of model stability
- Finding the right input variables
- Sampling to create balanced model sets
- Partitioning to create training, validation, and test sets
- Data preparation
- Model assessment
3. Data Exploration
- Developing intuition about data
- Data structure
- Data types
- Data values
- Exploring distributions
- Summary statistics
- Histograms
- using SAS Enterprise Miner for data exploration
4. Regression Models
- The null hypothesis
- Statistical significance
- Confidence bounds
- Variance and standard deviation
- Standardized values
- Correlation
- Linear regression
- Logistic regression
- Using SAS Enterprise Miner to build regression models
5. Decision Trees
- Decision trees as data exploration and classification tools
- Decision trees for modeling and scoring
- Decision trees for variable selection
- Alternate representations of decision trees
- Algorithms used to build decision trees
- Splitting criteria
- Recognizing instability and overfitting in decision tree models
- Capturing interactions between variables
- Using SAS Enterprise Miner to build decision trees
6. Neural Networks
- Origins of neural networks
- Neural networks compared with regression
- Algorithms used to train neural networks
- Data preparation requirements for neural networks
- Picking appropriate inputs for neural networks
- Creating neural network models using SAS Enterprise Miner
7. Memory-Based Reasoning
- Similarity and distance
- Distance metrics appropriate for different kinds of data
- The role of the training set in memory-based reasoning (MBR)
- Combining the votes of several neighbors
- Other K-nearest neighbor techniques
- Collaborative filtering
- Using the SAS Enterprise Miner MBR node
8. Clustering
- More on similarity and distance
- The k-means algorithm
- Divisive clustering
- Agglomerative clustering
- Data preparation for clustering
- Interpreting clusters
- Finding clusters with SAS Enterprise Miner
9. Survival Analysis
- Origins of survival analysis
- How business data is different from clinical data
- Hazards and hazard charts
- Retention curves and survival curves
- Calculating survival from retention
- Calculating hazards empirically
- Parametric hazard models
- Censoring
- Competing risks
- Survival-based forecasting
- Using SAS code in SAS Enterprise Miner to create survival curves
10. Association Rules
- Market basket analysis
- Association rules
- Sequential pattern analysis
- Using SAS Enterprise Miner to discover associations in retail data
11. Link Analysis
- Background on graph theory
- Sphere of influence
- Using link analysis to generate derived variables
- Graph-coloring algorithm
- Kleinberg's algorithm
12. Genetic Algorithms
- Optimization techniques and problems (SAS/OR software)
- Other algorithms
- Linear programming problems
- Genetic algorithms
*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.
Exercises or hands-on workshops are included with most SAS courses.
- Business analysts and their managers
- Statisticians