Big Data Solution – Spark Development and data analysis

Spark Development and data analysis

（更多资料和具体参加方法）

Course overview

Data scientists/engineer/analyst build information platform to provide deep insight and answer previously unimaginable questions. Spark and Hadoop are transforming how data scientists/engineer/analyst works by allowing interactive and integrative data analysis at scale.

You will learn how Spark and Hadoop enable data scientists/engineer/analyst to help companies reduce costs, increase profits, improve products, retain customers, and identify the new opportunities.

You will learn what data scientists/engineer/analyst do, the problems they solve, the tools and techniques they use. Through in-class simulations, participates apply data analysis methods to real-world challenges in different industries and, ultimately, prepare for big data application development and big data analyst roles in the field.

Outline

Part I Fundamental

Module 1 - Spark Introduction and Basic Programming

Introduction Spark

What is Spark?

A brief History of Spark

Programming with RDDs

Module 2 - Advanced Spark Programming

Spark Storage - Loading and saving data

Advanced Spark Programming

Standalone applications

Module 3 - Spark SQL

Linking with Spark SQL

Using Spark SQL in Applications

JDBC/ODBC server

User-Defined Functions

Spark SQL Performance

Module 4 - Spark Streaming

Architecture and abstraction

Input/output operations

Streaming UI

Performance Considerations

Module 5 - Tuning and Debug Spark

Configuration Spark

Key Performance considerations

Module 6 - Running on Cluster

Runtime Architecture

Cluster Manager

Part II Applications

Module 7 - Machine Learning

Designing a Machine learning system

Building a Recommendation Engine with Spark

MLlib Decision Trees

Module 8 – Prediction with Decision tree

Decision tree

Training Examples

Preparing the data

A First Decision tree

Tuning Decision Trees

Making Predictions

Conclusions

Module 9 – Anomaly Detection with K-means Clustering

Anomaly Detection

K-means clustering

A First Take on Clustering

Choosing k

Visualization

Feature Normalisation

Clustering in action

Module 10 – Exploring Property Location data

Loading data

Variables to explore

Exploring property value

Exploring lot size

Exploring costs

Exploring the year a property has been built

Exploring rent and income

Module 11 - Estimating Financial Risk through Mote Carlo Simulation

Build model

Getting the data

Preprocessing

Determine the factor Weights

Visualizing the results

Evaluating results

Module 12 - Interactive Data Analysis with Zeppelin

Appendix Scala programming Essential

（更多资料和具体参加方法）

数据管理

维多利亚教育中心 - 热线电话：416-665-1888
Toronto: 250 Consumers Road, Suite 901, Toronto, Ontario, Canada M2J 4V6
Mississauga: Unit 129, 1140 Burnhamthorpe Road West, Mississauga, Ontario L5C 4E6
Copyright © 2009-2017 Victoria Toronto Training Center. All rights reserved.

本页最后更新: | -- | 网站设计和虚拟主机服务 WECAN