Big Data Analytics Using Spark

Yoav Freund, UCSanDiegoX

Learn how to analyze large datasets using Jupyter notebooks, MapReduce and Spark as a platform.

In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation.

The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.

In this course, part of the Data Science MicroMasters program, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.

You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).

In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.

What will you learn

  • Programming Spark using Pyspark
  • Identifying the computational tradeoffs in a Spark application
  • Performing data loading and cleaning using Spark and Parquet
  • Modeling data through statistical and machine learning methods

Dates:
  • 14 May 2019
Course properties:
  • Free:
  • Paid:
  • Certificate:
  • MOOC:
  • Video:
  • Audio:
  • Email-course:
  • Language: English Gb

Reviews

No reviews yet. Want to be the first?

Register to leave a review

More on this topic:
Cloud_applications_v01_600x340 Cloud Computing Applications
Learn how to use the cloud and write programs for data analytics. Learn about...
Large-icon Data Manipulation at Scale: Systems and Algorithms
Data analysis has replaced data acquisition as the bottleneck to evidence-based...
Big-data-_2_ Introduction to Big Data Analytics
********* A new, improved version of the Big Data Specialization will become...
Dat202.2x-course_card_image-378x225 Implementing Real-Time Analysis with Hadoop in Azure HDInsight
Learn how to use Hadoop technologies like HBase, Storm, and Spark in Microsoft...
464572_3f38_3 Big Data Analytics with Apache Spark and Python
Learn to use Apache Spark to store and analyze data in real time.
More from 'Mathematics, Statistics and Data Analysis':
4c70ad9b-9602-49af-bf00-83fa4bf47708-c5edba3c3294.small Machine Learning with Python: from Linear Models to Deep Learning
An in-depth introduction to the field of machine learning, from linear models...
C2750912-8e29-426f-91b8-c03b0dd9ee8f-ddebdf3fcd22.small Autonomous Mobile Robots
Basic concepts and algorithms for locomotion, perception, and intelligent navigation...
62467d39-05f3-4453-aee0-46cf5781c10d-6418aabb2255.small Paradox and Infinity
This is a class about awe-inspiring issues at the intersection between philosophy...
291ed465-7c33-4b3d-b8fb-ffd1f12193a6-82efab2fc44a.small Marketing Analytics: Data Tools and Techniques
Learn how to measure, manage and analyze customer data to make effective marketing...
Af600da1-30be-47ed-8e81-0931a2898f92-efcd6785a043.small Innovation: From Plan to Product
Learn how to build an innovative business model using the most effective tools...
More from 'edX':
9dfa7041-eb6c-41fe-a1d7-54e7cc21f812-716252d8d5e6.small Business Communication
Learn how to effectively communicate and build professional relationships through...
B826a61b-6d2b-42a7-b757-cbcbe8b794c6-64bc93c988a6.small Teamwork & Collaboration
Learn essential teamwork and collaboration skills to lead, build and motivate...
4c70ad9b-9602-49af-bf00-83fa4bf47708-c5edba3c3294.small Machine Learning with Python: from Linear Models to Deep Learning
An in-depth introduction to the field of machine learning, from linear models...
E42186c5-bb5a-475d-9af8-2548444e8dcf-2aaa78a4cd03.small EQ for Family Business
An introduction to the importance of cultivating emotional intelligence in family...
7448c6b1-7aae-42ee-911b-cddfa9fe01c5-096386855b90.small English Composition
Improve your writing skills in this comprehensive introduction to English composition...

© 2013-2019