Spark Development and data analysis
(更多资料和具体参加方法)
Course overview
Data scientists/engineer/analyst build information platform to provide deep insight and answer previously unimaginable questions. Spark and Hadoop are transforming how data scientists/engineer/analyst works by allowing interactive and integrative data analysis at scale.
You will learn how Spark and Hadoop enable data scientists/engineer/analyst to help companies reduce costs, increase profits, improve products, retain customers, and identify the new opportunities.
You will learn what data scientists/engineer/analyst do, the problems they solve, the tools and techniques they use. Through in-class simulations, participates apply data analysis methods to real-world challenges in different industries and, ultimately, prepare for big data application development and big data analyst roles in the field.
Outline
Part I Fundamental
Module 1 - Spark Introduction and Basic Programming
Introduction Spark
What is Spark?
A brief History of Spark
Programming with RDDs
Module 2 - Advanced Spark Programming
Spark Storage - Loading and saving data
Advanced Spark Programming          
Standalone applications
Module 3 - Spark SQL
          Linking with Spark SQL
            Using Spark SQL in Applications
            JDBC/ODBC server
            User-Defined Functions
            Spark SQL Performance
Module 4 - Spark Streaming
          Architecture and abstraction
          Input/output operations
          Streaming UI
          Performance Considerations
Module 5 - Tuning and Debug Spark
          Configuration Spark
          Key Performance considerations
Module 6 - Running on Cluster
          Runtime Architecture
          Cluster Manager
Part II Applications
Module 7 - Machine Learning
 Designing a Machine learning system
 Building a Recommendation Engine with Spark      
MLlib Decision Trees
Module 8 – Prediction with Decision tree
          Decision tree
          Training Examples
          Preparing the data
          A First Decision tree
          Tuning Decision Trees
          Making Predictions
          Conclusions
Module 9 – Anomaly Detection with K-means Clustering
          Anomaly Detection
            K-means clustering
            A First Take on Clustering
            Choosing k
            Visualization
            Feature Normalisation
            Clustering in action
Module 10 – Exploring Property Location data 
             Loading data
Variables to explore
Exploring property value
Exploring lot size
Exploring costs   
Exploring the year a property has been built      
Exploring rent and income     
Module 11 - Estimating Financial Risk through Mote Carlo Simulation
 Build model
Getting the data
Preprocessing
Determine the factor Weights
Visualizing the results
Evaluating results
Module 12 - Interactive Data Analysis with Zeppelin
Appendix Scala programming Essential  

