Data Science Overview | Technologies, Tools, and Roles in the Data-Driven Enterprise
Data Science Overview | Technologies, Tools, and Roles in the Data-Driven Enterprise Course Details:
This foundation-level level course introduces the multi-disciplinary Data Science team to the many evolving and related terms. It includes a focus on Big Data, Data Science, Predictive Analytics, Artificial Intelligence, Data Mining, and Data Warehousing. You’ll also explore the current state of the art and science, the major components of a modern data science infrastructure, team roles and responsibilities, and level-setting of possible outcomes for your investment.
This course provides a high-level view of current data science related technologies, concepts, strategies, skillsets, initiatives and supporting tools in common business enterprise practices. This goal of this course is to provide you with a baseline understanding of core concepts.
Call (919) 283-1674 to get a class scheduled online or in your area!
Foundations
- Grids and Virtualization
- Service-Oriented Architecture
- Enterprise Service Bus
- Enterprise Message Bus
- The Cloud
The Hadoop Ecosystem
- HDFS: Hadoop Distributed File System
- Resource Negotiators: YARN, Mesos, and Spark; ZooKeeper
- Hadoop Map/Reduce
- Spark
- Hadoop Ecosystem Distributions: Cloudera, Hortonworks, OpenSource
Big Data, NOSQL, and ETL
- Big Data vs. RDBMS
- NOSQL: Not Only SQL
- Relational Databases: Oracle, MariaDB, DB/2, SQL Server, PostGreSQL
- Key/Value Databases: JBoss Infinispan, Terracotta, Dynamo, Voldemort
- Columnar Databases: Cassandra, HBase, BigTable
- Document Databases: MongoDB, CouchDB/CouchBase
- Graph Databases: Giraph, Neo4J, GraphX
- Apache Hive
- Common Data Formats
- Leveraging SQL and SQL variants
ETL: Exchange, Transform, Load
- Data Ingestion, Transformation, and Loading
- Exporting Data
- Sqoop, Flume, Informatica, and other tools
Enterprise Integration Patterns and Message Busses
- Enterprise Integration Patterns: Apache Camel and Spring Integration
- Enterprise Message Busses: Apache Kafka, ActiveMQ, and other tools
Developing in Hadoop Ecosystem
- Languages: R, Python, Java, Scala, Pig, and BPMN
- Libraries and Frameworks
- Development, Testing, and Deployment
Artificial Intelligence and Business Systems
- Artificial Intelligence: Myths, Legends, and Reality
- The Math
- Statistics
- Probability
- Clustering Algorithms, Mahout, MLLib, SciKit, and Madlib
- Business Rule Systems: Drools, JRules, Pegasus
The Team
- Agile Data Science
- NOSQL Data Architects and Administrators
- Developers
- Grid Administrators
- Business and Data Analysts
- Management
- Evolving your Team
- Growing your Infrastructure
*Please Note: Course Outline is subject to change without notice. Exact course outline will be provided at time of registration.
Join an engaging learning environment, where you’ll explore:
- Foundations: Grids & Virtualization; SOA, ESB/EMB and the Cloud
- The Hadoop Ecosystem: HDFS, Resource Navigators, MapReduce, Spark, and Distributions
- Big Data, NOSQL, and ETL
- ETL: Exchange, Transform, Load
- Handling Data and a Survey of Useful tools
- Enterprise Integration Patterns and Message Busses
- Developing in Hadoop Ecosystem: R, Python, Java, Scala, Pig, and BPMN
- Artificial Intelligence and Business Systems
- Who’s on the Team? Roles and Functions in Data Science
- Growing your Infrastructure
This is a seminar-style course that combines engaging expert lectures, pertinent skills, tool demonstrations, and group discussions.
Attendees should have:
- Exposure to Enterprise Information Technology
- Familiarity with Relational Databases
Business Analysts, Data Analysts, Data Architects, Database Administrators, Network Administrators (Grid), Developers, Technical Manager, or anyone else in the data science realm who needs to have a baseline understanding of the core areas of modern Data Science technologies, practices, and tools.