Apache Spark Data Analytics Best Practices & Troubleshooting

Miembro Mayor Mensajes : 78675 Fecha de inscripción : 20/08/2016

Last updated 4/2019
MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
Language: English | Size: 4.98 GB | Duration: 9h 52m

Perform analytics on real-time data by discovering techniques to test and parallelize Spark jobs & solve common problems

What you'll learn
Implement high-velocity streaming and data processing use cases while working with streaming API.
Dive into MLlib- the machine learning functional library in Spark with highly scalable algorithms.
Create machine learning pipelines to combine multiple algorithms in a single workflow.
Create highly concurrent Spark programs by leveraging immutability.
Re-design your jobs to use reduceByKey instead of groupBy.
Create robust processing pipelines by testing Apache Spark jobs.
Solve repeated problems by leveraging the GraphX API.
Solve long-running computation problems by leveraging lazy evaluation in Spark.
Avoid memory leaks by understanding the internal memory management of Apache Spark.
Troubleshoot real-time pipelines written in Spark Streaming, APIs for joins - DataFrames or DataSets.
Requirements
Basic Knowledge of Spark Programming and the fundamentals of Apache Spark along with some basic understanding of real-time data processing is necessary.
Some familiarity with Scala would be helpful.
Description
If you face challenges on how to analyze real-time data, create real-world streaming processing in Spark, and face some common pitfalls in your Spark code and are looking for a solution to get you out of the development problems providing you with some best practices so that you can code better, efficiently and faster for analyzing a large amount of data, then this learning series is perfect for you!With this well thought out Learning Path, you will first begin by learning the fundamentals of Apache Spark which includes Resilient Distributed Datasets (RDD), HDFS, YARN, create effective Spark application and execute it on Hadoop cluster & much more. Then you will learn to analyze data using machine learning techniques and graphs. Moving further you will focus o some amazing tips & tricks to improve particular aspects of programming & administration in Apache Spark & also speed up your Spark jobs by reducing shuffles. Finally, you will learn some quick & simple solutions to troubleshoot development issues and debugging techniques with Apache Spark.Contents and OverviewThis training program includes 4 complete courses, carefully chosen to give you the most comprehensive training possible.The first course, Apache Spark Fundamentals you will begin learning about the Apache Spark programming fundamentals such as Resilient Distributed Datasets (RDD) and See which operations can be used to perform a transformation or action operation on the RDD. We'll show you how to load and save data from various data sources as a different type of files, No-SQL and RDBMS databases, etc.. We'll also explain Spark advanced programming concepts such as managing Key-Value pairs, accumulators, etc. Finally, you'll discover how to create an effective Spark application and execute it on the Hadoop cluster to the data and gain insights to make informed business decisions. By the end of this video, you will be well-versed with all the fundamentals of Apache Spark and implementing them in Spark.The second course, Advanced Analytics, and Real-Time Data Processing in Apache Spark you will learn how to implement the high-velocity streaming operation for data processing in order to perform efficient analytics on your real-time data. You'll analyze data using machine learning techniques and graphs. You'll learn about Spark Streaming and create real-world streaming processing that addresses all the problems that need to be solved. You'll solve problems using Machine Learning techniques and find out about all the tools available in the MLlibtoolkit. You'll find out how to leverage Graphs to solve real-world problems. At the end of this video, you'll also see some useful Machine Learning algorithms with the help of Spark MLlib and will integrate Spark with R. We'll also make sure you're confident and prepared for graph processing, as you'll learn more about the GraphX API. By the end, you'll be well-versed in the aspects of real-time analytics and implement them with Apache Spark.The third course, Apache Spark: Tips, Tricks, & Techniques you'll learn to implement some practical and proven techniques to improve particular aspects of programming and administration in Apache Spark. You will explore 7 sections that will address different aspects of Spark via 5 specific techniques with clear instructions on how to carry out different Apache Spark tasks with hands-on experience. The techniques are demonstrated using practical examples and best practices. By the end of this course, you will have learned some exciting tips, best practices, and techniques with Apache Spark. You will be able to perform tasks and get the best data out of your databases much faster and with ease.The fourth course, Troubleshooting Apache Spark will give you new possibilities and you'll cover many aspects of Apache Spark; some you may know and some you probably never knew existed. If you take a lot of time learning and performing tasks on Spark, you are unable to leverage Apache Spark's full capabilities and features, and face a roadblock in your development journey. You'll face issues and will be unable to optimize your development process due to common problems and bugs; you'll be looking for techniques which can save you from falling into any pitfalls and common errors during development. With this course, you'll learn to implement some practical and proven techniques to improve particular aspects of Apache Spark with proper research. You need to understand the common problems and issues Spark developers face, collate them, and build simple solutions for these problems. One way to understand common issues is to look out for Stack Overflow queries. This course is a high-quality troubleshooting course, highlighting issues faced by developers in different stages of their application development and providing them with simple and practical solutions to these issues. It supplies solutions to some problems and challenges faced by developers; however, this course also focuses on discovering new possibilities with Apache Spark. By the end of this course, you will have solved your Spark problems without any hassle.About the Authors:Nishant Garg has over 16 years of software architecture and development experience in various technologies, such as Java Enterprise Edition, SOA, Spring, Hadoop, Hive, Flume, Sqoop, Oozie, Spark, YARN, Impala, Kafka, Storm, Solr/Lucene, NoSQL databases (such as HBase, Cassandra, and MongoDB), and MPP databases (such as Greenplum). He received his MS in software systems from the Birla Institute of Technology and Science, Pilani, India, and is currently working as a senior technical architect for the Big Data R&D Labs with Impetus Infotech Pvt. Ltd. Previously, Nishant has enjoyed working with some of the most recognizable names in IT services and financial industries, employing full software life cycle methodologies such as Agile and SCRUM. Nishant has also undertaken many speaking engagements on big data technologies and is also the author of Learning Apache Kafka & HBase Essentials, Packt Publishing.Tomasz Lelek is a Software Engineer and Co-Founder of InitLearn. He mostly does programming in Java and Scala. He dedicates his time and effort to get better at everything. He is currently diving into Big Data technologies. Tomasz is very passionate about everything associated with software development. He has been a speaker at a few conferences in Poland-Confitura and JDD, and at the Krakow Scala User Group. He has also conducted a live coding session at Geecon Conference. He was also a speaker at an international event in Dhaka. He is very enthusiastic and loves to share his knowledge. Amazon Keywords: Data processing, data modeling, data analysis, data analytics, graphical processing, data frame operations, R algorithm.
Overview
Section 1: Apache Spark Fundamentals
Lecture 1 Course Overview
Lecture 2 Spark Introduction
Lecture 3 Spark Components
Lecture 4 Getting Started
Lecture 5 Introduction to Hadoop
Lecture 6 Hadoop Processes and Components
Lecture 7 HDFS and YARN
Lecture 8 Map Reduce
Lecture 9 Introduction to Scala
Lecture 10 Scala Programming Fundamentals
Lecture 11 Objects in Scala
Lecture 12 Collections
Lecture 13 Spark Execution
Lecture 14 Understanding RDD
Lecture 15 RDD Operations
Lecture 16 Loading and Saving Data in Spark
Lecture 17 Managing Key-Value Pairs
Lecture 18 Accumulators
Lecture 19 Writing a Spark Application
Section 2: Advanced Analytics and Real-Time Data Processing in Apache Spark
Lecture 20 The Course Overview
Lecture 21 Introducing Spark Streaming
Lecture 22 Streaming Context
Lecture 23 Processing Streaming Data
Lecture 24 Use Cases
Lecture 25 Spark Streaming Word Count Hands-On
Lecture 26 Spark Streaming - Understanding Master URL
Lecture 27 Integrating Spark Streaming with Apache Kafka
Lecture 28 mapWithState Operation
Lecture 29 Transform and Window Operation
Lecture 30 Join and Output Operations
Lecture 31 Output Operations -Saving Results to Kafka Sink
Lecture 32 Handling Time in High Velocity Streams
Lecture 33 Connecting External Systems That Works in At Least Once Guarantee - Deduplicaion
Lecture 34 Building Streaming Application -Handling Events That Are Not in Order
Lecture 35 Filtering Bots from Stream of Page View Events
Lecture 36 Introducing Machine Learning with Spark
Lecture 37 Feature Extraction and Transformation
Lecture 38 Transforming Text into Vector of Numbers - ML Bag-of-Words Technique
Lecture 39 Logistic Regression
Lecture 40 Model Evaluation
Lecture 41 Clustering
Lecture 42 Gaussian Mixture Model
Lecture 43 Principal Component Analysis and Distributing the Singular Value Decomposition
Lecture 44 Collaborative Filtering - Building Recommendation Engine
Lecture 45 Introducing Spark GraphX-How to Represent a Graph?
Lecture 46 Limitations of Graph-Parallel System - Why Spark GraphX?
Lecture 47 Importing GraphX
Lecture 48 Create a Graph Using GraphX and Property Graph
Lecture 49 List of Operators
Lecture 50 Perform Graph Operations Using GraphX
Lecture 51 Triplet View
Lecture 52 Perform Subgraph Operations
Lecture 53 Neighbourhood Aggregations - Collecting Neighbours
Lecture 54 Counting Degree of Vertex
Lecture 55 Caching and Uncaching
Lecture 56 GraphBuilder
Lecture 57 Vertex and Edge RDD
Lecture 58 Structural Operators - Connected Components
Lecture 59 Introduction to SparkR and How It's Used?
Lecture 60 Setting Up from RStudio
Lecture 61 Creating Spark DataFrames from Data Sources
Lecture 62 SparkDataFrames Operations - Grouping, Aggregation
Lecture 63 Run a Given Function on a Large Dataset Using dapply or dapplyCollect
Lecture 64 Running Large Dataset by Input Column(s) and Using gapply or gapplyCollect
Lecture 65 Run Local R Functions Distributed Using spark.lapply
Lecture 66 Running SQL Queries from SparkR
Lecture 67 PageRank Using Spark GraphX
Lecture 68 Sending Real-Time NotificationWhen User Want to Buy a Product on E-Commerce site
Section 3: Apache Spark: Tips, Tricks, & Techniques
Lecture 69 The Course Overview
Lecture 70 Using Spark Transformations to Defer Computations to a Later Time
Lecture 71 Avoiding Transformations
Lecture 72 Using reduce and reduceByKey to Calculate Results
Lecture 73 Performing Actions That Trigger Computations
Lecture 74 Reusing the Same RDD for Different Actions
Lecture 75 Delve into Spark RDDs Parent/Child Chain
Lecture 76 Using RDD in an Immutable Way
Lecture 77 Using DataFrame Operations to Transform It
Lecture 78 Immutability in the Highly Concurrent Environment
Lecture 79 Using Dataset API in an Immutable Way
Lecture 80 Detecting a Shuffle in a Processing
Lecture 81 Testing Operations That Cause Shuffle in Apache Spark
Lecture 82 Changing Design of Jobs with Wide Dependencies
Lecture 83 Using keyBy() Operations to Reduce Shuffle
Lecture 84 Using Custom Partitioner to Reduce Shuffle
Lecture 85 Saving Data in Plain Text
Lecture 86 Leveraging JSON as a Data Format
Lecture 87 Tabular Formats - CSV
Lecture 88 Using Avro with Spark
Lecture 89 Columnar Formats - Parquet
Lecture 90 Available Transformations on Key/Value Pairs
Lecture 91 Using aggregateByKey Instead of groupBy()
Lecture 92 Actions on Key/Value Pairs
Lecture 93 Available Partitioners on Key/Value Data
Lecture 94 Implementing Custom Partitioner
Lecture 95 Separating Logic from Spark Engine - Unit Testing
Lecture 96 Integration Testing Using SparkSession
Lecture 97 Mocking Data Sources Using Partial Functions
Lecture 98 Using ScalaCheck for Property-Based Testing
Lecture 99 Testing in Different Versions of Spark
Lecture 100 Creating Graph from Datasource
Lecture 101 Using Vertex API
Lecture 102 Using Edge API
Lecture 103 Calculate Degree of Vertex
Lecture 104 Calculate Page Rank
Section 4: Troubleshooting Apache Spark
Lecture 105 The Course Overview
Lecture 106 Eager Computations: Lazy Evaluation
Lecture 107 Caching Values: In-Memory Persistence
Lecture 108 Unexpected API Behavior: Picking the Proper RDD API
Lecture 109 Wide Dependencies: Using Narrow Dependencies
Lecture 110 Making Computations Parallel: Using Partitions
Lecture 111 Defining Robust Custom Functions: Understanding User-Defined Functions
Lecture 112 Logical Plans Hiding the Truth: Examining the Physical Plans
Lecture 113 Slow Interpreted Lambdas: Code Generation Spark Optimization
Lecture 114 Avoid Wrong Join Strategies: Using a Join Type Based on Data Volume
Lecture 115 Slow Joins: Choosing an Execution Plan for Join
Lecture 116 Distributed Joins Problem: DataFrame API
Lecture 117 TypeSafe Joins Problem: The Newest DataSet API
Lecture 118 Minimizing Object Creation: Reusing Existing Objects
Lecture 119 Iterating Transformations - The mapPartitions() Method
Lecture 120 Slow Spark Application Start: Reducing Setup Overhead
Lecture 121 Performing Unnecessary Recomputation: Reusing RDDs
Lecture 122 Repeating the Same Code in Stream Pipeline: Using Sources and Sinks
Lecture 123 Long Latency of Jobs: Understanding Batch Internals
Lecture 124 Fault Tolerance: Using Data Checkpointing
Lecture 125 Maintaining Batch and Streaming: Using Structured Streaming Pros
This course is for data scientists, big data technology developers and analysts, Apache Spark developers, who want to learn the fundamentals of Apache Spark & improve their Apache Spark skills with amazing tricks & techniques.

Apache Spark Data Analytics Best Practices & Troubleshooting Cec2b89a85a5a9cb2b357510eb7b4320

Download link

rapidgator.net:

Código:: https://rapidgator.net/file/a0f567b853006b4659b9205c918192c4/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part1.rar.html
https://rapidgator.net/file/4c20eb70490a83d1d1c510bd953eb6fa/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part2.rar.html
https://rapidgator.net/file/89a2bd9ba580458915ada05cec65842b/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part3.rar.html
https://rapidgator.net/file/efb5d7da77c5e6d3e4d262a39bd37176/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part4.rar.html
https://rapidgator.net/file/2593172d276e9b71bcf2ab72ed99a62b/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part5.rar.html
https://rapidgator.net/file/4eb5848d9f2ac12f58d3ce17fa7ba9cb/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part6.rar.html

uploadgig.com:

Código:: https://uploadgig.com/file/download/a9429e04ebD7a1dc/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part1.rar
https://uploadgig.com/file/download/b8f41ff4e6af6137/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part2.rar
https://uploadgig.com/file/download/69b29f6179CBE811/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part3.rar
https://uploadgig.com/file/download/C9E5ceDfA178f8f8/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part4.rar
https://uploadgig.com/file/download/A4e2dda5E891864d/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part5.rar
https://uploadgig.com/file/download/88fdef00c22b81ff/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part6.rar

nitroflare.com:

Código:: https://nitroflare.com/view/81ED5E86A22FAA5/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part1.rar
https://nitroflare.com/view/26E3DFFBEF43D60/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part2.rar
https://nitroflare.com/view/933959B8B79FF07/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part3.rar
https://nitroflare.com/view/5C15B2907A4318A/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part4.rar
https://nitroflare.com/view/5A55B84C3815D81/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part5.rar
https://nitroflare.com/view/C5F891FF30E4C7E/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part6.rar

1dl.net:

Código:: https://1dl.net/ylif0f0a8zcr/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part1.rar
https://1dl.net/gx1izodqz86q/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part2.rar
https://1dl.net/f18puiz4kbqi/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part3.rar
https://1dl.net/tji3iu5b9nyp/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part4.rar
https://1dl.net/givztz0eo12j/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part5.rar
https://1dl.net/85p4v4x334xn/edyya.Apache.Spark.Data.Analytics.Best.Practices..Troubleshooting.part6.rar