Duration: 14h 7m | Video: h264, 1280x720 | Audio: AAC, 44100 Hz, 2 Ch | 21 GB
Genre: eLearning | Language: English | Beginner - Intermediate
The perfect (and fast) way to get started with Hadoop and Spark
Hadoop and Spark Fundamentals LiveLessons provides 9+ hours of video introduction to the Apache Hadoop Big Data ecosystem. The tutorial includes background information and explains the core components of Hadoop, including Hadoop Distributed File Systems (HDFS), MapReduce, the YARN resource manager, and YARN Frameworks. In addition, it demonstrates how to use Hadoop at several levels, including the native Java interface, C++ pipes, and the universal streaming program interface. Examples include how to use benchmarks and high-level tools, including the Apache Pig scripting language, Apache Hive “SQL-like” interface, Apache Flume for streaming input, Apache Sqoop for import and export of relational data, and Apache Oozie for Hadoop workflow management. In addition, there is comprehensive coverage of Spark, PySpark, and the Zeppelin web-GUI. The steps for easily installing a working Hadoop/Spark system on a desktop/laptop and on a local stand-alone cluster using the powerful Ambari GUI are also included. All software used in these LiveLessons is open source and freely available for your use and experimentation. A bonus lesson includes a quick primer on the Linux command line as used with Hadoop and Spark.
About the Instructor
Douglas Eadline, PhD, began his career as a practitioner and a chronicler of the Linux cluster HPC revolution and now documents big data analytics. Starting with the first Beowulf Cluster how-to document, Doug has written hundreds of articles, white papers, and instructional documents covering High Performance Computing (HPC) and Data Analytics. Prior to starting and editing the popular ClusterMonkey.net website in 2005, he served as editor-in-chief for ClusterWorld Magazine, and was senior HPC editor for Linux Magazine. Currently, he is a writer and consultant to the HPC/Data Analytics industry and leader of the Limulus Personal Cluster Project. He is author of Hadoop Fundamentals LiveLessons and Apache Hadoop YARN Fundamentals LiveLessons videos from Pearson, and book coauthor of Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 and Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale. He is also the sole author of Hadoop 2 Quick Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem.
Skill Level
Beginner
Intermediate
Learn How To
Understand Hadoop design and key components
How the MapReduce process works in Hadoop
Understand the relationship of Spark and Hadoop
Key aspects of the new YARN design and Frameworks
Use, administer, and program HDFS
Run and administer Hadoop/Spark programs
Write basic MapReduce/Spark programs
Install Hadoop/Spark on a laptop/desktop
Run Apache Pig, Hive, Flume, Sqoop, Oozie, Spark applications
Perform basic data Ingest with Hive and Spark
Use the Zeppelin web-GUI for Spark/Hive programing
Install and administer Hadoop with the Apache Ambari GUI tool
Who Should Take This Course
Users, developers, and administrators interested in learning the fundamental aspects and operations of the open source Hadoop and Spark ecosystems
Course Requirements
Basic understanding of programming and development
A working knowledge of Linux systems and tools
Familiarity with Bash, Python, Java, and C++
发布日期: 2018-02-24