Big Data Analytics Using Spark

Yoav Freund, UCSanDiegoX

Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform.

In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation.

The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.

You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).

In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.

What will you learn

  • Programming Spark using Pyspark
  • Identifying the computational tradeoffs in a Spark application
  • Performing data loading and cleaning using Spark and Parquet
  • Modeling data through statistical and machine learning methods

Dates:
  • 3 September 2019
Course properties:
  • Free:
  • Paid:
  • Certificate:
  • MOOC:
  • Video:
  • Audio:
  • Email-course:
  • Language: English Gb

Reviews

No reviews yet. Want to be the first?

Register to leave a review

Show?id=n3eliycplgk&bids=695438
NVIDIA
More on this topic:
Cloud_applications_v01_600x340 Cloud Computing Applications
Learn how to use the cloud and write programs for data analytics. Learn about...
Large-icon Data Manipulation at Scale: Systems and Algorithms
Data analysis has replaced data acquisition as the bottleneck to evidence-based...
Big-data-_2_ Introduction to Big Data Analytics
********* A new, improved version of the Big Data Specialization will become...
Dat202.2x-course_card_image-378x225 Implementing Real-Time Analysis with Hadoop in Azure HDInsight
Learn how to use Hadoop technologies like HBase, Storm, and Spark in Microsoft...
464572_3f38_3 Big Data Analytics with Apache Spark and Python
Learn to use Apache Spark to store and analyze data in real time.
More from 'Mathematics, Statistics and Data Analysis':
D8d3c316-0e41-4083-93ff-733a7e9b16bb-46a802220de9.small Capstone Exam in Statistics and Data Science
Solidify and demonstrate your knowledge and abilities in probability, data analysis...
8bdd5da6-35c5-43da-920e-0140ec37d4aa-6f908f92bda2.small AGRIMONITOR: Agricultural Policy in the Caribbean
Learn the effects of agricultural policy in the Caribbean and Latin America...
Logo2 Network Science
The course is an interdisciplinary course, focused on the emerging science of...
0673236f-aaf9-4e38-ba92-c990f4f7b4cb-f07786bf5142.small Introduction to Linear Models and Matrix Algebra
Learn to use R programming to apply linear models to analyze data in life sciences...
Cb555d73-5183-446c-8555-69a7ffd19206-9672fd296e4a.small High-Dimensional Data Analysis
A focus on several techniques that are widely used in the analysis of high-dimensional...
More from 'edX':
D8d3c316-0e41-4083-93ff-733a7e9b16bb-46a802220de9.small Capstone Exam in Statistics and Data Science
Solidify and demonstrate your knowledge and abilities in probability, data analysis...
949a4020-22e5-4762-9e15-8be6be00aedf-412a05da2ef9.small What Works in Education: Evidence-Based Education Policies
Learn what works in education and how to identify, analyze and implement evidence...
83c62468-3458-40cc-ac21-9eb3909ec204-be2d4e9c8ea9.small Risk Management in Development Projects
Learn to preemptively manage positive and negative events that may affect the...
75c23566-6acf-4db4-85d2-ac8f29f20377-c49ecd049460.small Global History Lab
Learn the span of world history from 1300 to the present. In this global history...
7bdf79de-56a9-4a5d-ae06-67c82a34a470-3dbd2386f2fb.small Leading Change: Go Beyond Gamification with Gameful Learning
Learn the tools to support gameful learning environments that foster personalized...

© 2013-2019