Big Data Solution – Hadoop Development
Introduction Big Data
All about Data!
Data Storage and Analysis
Comparison with Other Systems
Rational Database Management System
Grid Computing
Volunteer Computing
A Brief History of Hadoop
Compatibility
Installation single node Hadoop
Prerequisites Installation Configuration Standalone Mode
Pseudo distributed Mode Configuration SSH Formatting HDFS filesystem
Starting and stopping MapReduce
Fully Distributed Mode
Creating Eclipse Plugin for Hadoop-2.x.0
Contents
Download and install Eclipse
Install git
Download source code for Hadoop Plugin for Eclipse from git
Compile and create jar
Install the plugin to eclipse
Developing a MapReduce Application
The Configuration Combining Resources Variable Expansion
Setting Up the Development Environment Managing Configuration GenericOptionsParser, Tool, and ToolRunne
Writing a Unit Test with MRUnit
Mapper
Reducer
Running Locally on Test Data Running a Job in a Local Job Runner Testing the Driver
Running on a Cluster Packaging a Job Launching a Job
The MapReduce Web UI Retrieving the Results Debugging a Job
Hadoop Logs Remote Debugging Tuning a Job Profiling Tasks
MapReduce Workflows
Decomposing a Problem into MapReduce Jobs
JobControl
Apache Oozie
MapReduce Features
Counters
Built-in Counters
User-Defined Java Counters
User-Defined Streaming Counters
Sorting Preparation Partial Sort Total Sort Secondary Sort Joins
Map-Side Joins
Reduce-Side Joins
Side Data Distribution
Using the Job Configuration Distributed Cache MapReduce Library Classes
Setting Up a Hadoop Cluster
Cluster Specification
Network Topology
Cluster Setup and Installation
Installing Java
Creating a Hadoop User Installing Hadoop Testing the Installation SSH Configuration
Hadoop Configuration
Configuration Management
Environment Settings
Important Hadoop Daemon Properties Hadoop Daemon Addresses and Ports Other Hadoop Properties
User Account Creation
YARN Configuration
Important YARN Daemon Properties YARN Daemon Addresses and Ports Security
Kerberos and Hadoop
Delegation Tokens
Other Security Enhancements Benchmarking a Hadoop Cluster Hadoop Benchmarks
User Jobs
Hadoop in the Cloud
Apache Whirr
Administering Hadoop
HDFS
Persistent Data Structures
Safe Mode Audit Logging Tools Monitoring Logging Metrics
Java Management Extensions
Maintenance
Routine Administration Procedures Commissioning and Decommissioning Nodes Upgrades
Pig
Installing and Running Pig
Execution Types Running Pig Programs Grunt
Pig Latin Editors An Example Generating Examples
Comparison with Databases
Pig Latin Structure Statements Expressions Types Schemas Functions Macros
User-Defined Functions
A Filter UD An Eval UDF A Load UDF
Data Processing Operators Loading and Storing Data Filtering Data
Grouping and Joining Data
Sorting Data
Combining and Splitting Data
Pig in Practice
Parallelism
Parameter Substitution
Hive
Installing Hive The Hive Shell An Example Running Hive
Configuring Hive
Hive Services
The Metastore
Comparison with Traditional Databases Schema on Read Versus Schema on Write Updates, Transactions, and Indexes HiveQL
Data Types
Operators and Functions
Tables
Managed Tables and External Tables
Partitions and Buckets
Storage Formats
Importing Data Altering Tables Dropping Tables Querying Data
Sorting and Aggregating
MapReduce Scripts
Joins Subqueries Views
User-Defined Functions
Writing a UDF Writing a UDAF
HBase
HBasics Backdrop Concepts
Whirlwind Tour of the Data Model
Implementation
Installation Test Drive Clients
Java
Avro, REST, and Thrift
Example Schemas Loading Data Web Queries
HBase Versus RDBMS
Successful Service
HBase
Use Case: HBase at Streamy.com
Praxis Versions HDFS
UI Metrics
Schema Design
Counters
Bulk Load
R and Hadoop
Introduction R language
Introduction RHadoop Big Data solution
RHadoop
RHadoop data analysis
RHadoop machine learning
Python and Hadoop
Python Programming
Python and Hadoop
Hadoop - mrjob development
Spark
Introduction Spark
PySpark
Machine Learning
Advanced Administration and monitoring
Multiple nodes
Add nodes
Decommission nodes
Recovering from Namenode failure
Monitoring cluster health using Ganglia - Pure Monitoring
Install Ambari - Manage and monitoring
Install Hue - Emphasis on use of hadoop environment and management
Clouderea Hadoop Certification
CCHA - Hadoop Administrator
CCHD – Hadoop Developer
Case Studies
Hadoop Usage at Last.fm
Last.fm: The Social Music Revolution
Hadoop at Last.fm
Generating Charts with Hadoop The Track Statistics Program Summary
(更多资料和具体参加方法)