$125 | Duration: 2h 26m | Video: h264, 1920x1080 | Audio: AAC, 48kHz, 2 Ch | 550 MBGenre: eLearning | Language: English | November 30, 2018Apache Spark has been around for quite some time, but do you really know how to get the most out of Spark? This course aims at giving you new possibilities; you will explore many aspects of Spark, some you may have never heard of and some you never knew existed.
In this course you'll learn to implement some practical and proven techniques to improve particular aspects of programming and administration in Apache Spark.
You will explore 7 sections that will address different aspects of Spark via 5 specific techniques with clear instructions on how to carry out different Apache Spark tasks with hands-on experience.
The techniques are demonstrated using practical examples and best practices.
By the end of this course, you will have learned some exciting tips, best practices, and techniques with Apache Spark.
You will be able to perform tasks and get the best data out of your databases much faster and with ease.
All the code and supporting files for this course are available on Github atStyle and ApproachThis step-by-step and fast-paced guide will help you learn different techniques you can use to optimize your testing time, speed, and results with a practical approach, take your skills to the next level, and get you up-and-running with Spark.
Table of ContentsTRANSFORMATIONS AND ACTIONSIMMUTABLE DESIGNAVOID SHUFFLE AND REDUCE OPERATIONAL EXPENSESSAVING DATA IN THE CORRECT FORMATWORKING WITH SPARK KEY/VALUE APITESTING APACHE SPARK JOBSLEVERAGING SPARK GRAPHX APIWhat You Will LearnCompose Spark jobs from actions and transformationsCreate highly concurrent Spark programs by leveraging immutabilityWays to avoid the most expensive operation in the Spark API—ShuffleHow to save data for further processing by picking the proper data format saved by SparkParallelize keyed data; learn of how to use Spark's Key/Value APIRe-design your jobs to use reduceByKey instead of groupByCreate robust processing pipelines by testing Apache Spark jobsSolve repeated problems by leveraging the GraphX API
发布日期: 2018-12-03