CG数据库 >> Hadoop and Spark Fundamentals

Hadoop and Spark Fundamentals的图片1

Duration: 14h 7m | Video: h264, 1280x720 | Audio: AAC, 44100 Hz, 2 Ch | 21 GB

Genre: eLearning | Language: English | Beginner - Intermediate

The perfect (and fast) way to get started with Hadoop and Spark

Hadoop and Spark Fundamentals LiveLessons provides 9+ hours of video introduction to the Apache Hadoop Big Data ecosystem. The tutorial includes background information and explains the core components of Hadoop, including Hadoop Distributed File Systems (HDFS), MapReduce, the YARN resource manager, and YARN Frameworks. In addition, it demonstrates how to use Hadoop at several levels, including the native Java interface, C++ pipes, and the universal streaming program interface. Examples include how to use benchmarks and high-level tools, including the Apache Pig scripting language, Apache Hive “SQL-like” interface, Apache Flume for streaming input, Apache Sqoop for import and export of relational data, and Apache Oozie for Hadoop workflow management. In addition, there is comprehensive coverage of Spark, PySpark, and the Zeppelin web-GUI. The steps for easily installing a working Hadoop/Spark system on a desktop/laptop and on a local stand-alone cluster using the powerful Ambari GUI are also included. All software used in these LiveLessons is open source and freely available for your use and experimentation. A bonus lesson includes a quick primer on the Linux command line as used with Hadoop and Spark.

About the Instructor

Douglas Eadline, PhD, began his career as a practitioner and a chronicler of the Linux cluster HPC revolution and now documents big data analytics. Starting with the first Beowulf Cluster how-to document, Doug has written hundreds of articles, white papers, and instructional documents covering High Performance Computing (HPC) and Data Analytics. Prior to starting and editing the popular ClusterMonkey.net website in 2005, he served as editor-in-chief for ClusterWorld Magazine, and was senior HPC editor for Linux Magazine. Currently, he is a writer and consultant to the HPC/Data Analytics industry and leader of the Limulus Personal Cluster Project. He is author of Hadoop Fundamentals LiveLessons and Apache Hadoop YARN Fundamentals LiveLessons videos from Pearson, and book coauthor of Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 and Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale. He is also the sole author of Hadoop 2 Quick Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem.

Skill Level

Beginner

Intermediate

Learn How To

Understand Hadoop design and key components

How the MapReduce process works in Hadoop

Understand the relationship of Spark and Hadoop

Key aspects of the new YARN design and Frameworks

Use, administer, and program HDFS

Run and administer Hadoop/Spark programs

Write basic MapReduce/Spark programs

Install Hadoop/Spark on a laptop/desktop

Run Apache Pig, Hive, Flume, Sqoop, Oozie, Spark applications

Perform basic data Ingest with Hive and Spark

Use the Zeppelin web-GUI for Spark/Hive programing

Install and administer Hadoop with the Apache Ambari GUI tool

Who Should Take This Course

Users, developers, and administrators interested in learning the fundamental aspects and operations of the open source Hadoop and Spark ecosystems

Course Requirements

Basic understanding of programming and development

A working knowledge of Linux systems and tools

Familiarity with Bash, Python, Java, and C++

Hadoop and Spark Fundamentals的图片2

发布日期: 2018-02-24