[Edx]Scalable Machine Learning #1 - 2015-06-22@Berkeley
本帖最后由 yhfyhf 于 2015-6-23 12:21 编辑 |
CS 190.1x Scalable Machine Learning 这门课是CS 100.1x Introduction to Big Data with Apache Spark的后续课程，由Databricks sponsor，主要用Apache Spark和Python实现ML算法。
课程链接：https://www.edx.org/course/scala ... -berkeleyx-cs190-1x
CS 100.1x Introduction to Big Data with Apache Spark 的活动见这里：http://www.1point3acres.com/bbs/thread-135600-1-1.html
Machine learning aims to extract knowledge from data and enables a wide range of applications. With datasets rapidly growing in size and complexity, learning techniques are fast becoming a core component of large-scale data processing pipelines. This course introduces the underlying statistical and algorithmic principles required to develop scalable real-world machine learning pipelines. We present an integrated view of data processing by highlighting the various components of these pipelines, including feature extraction, supervised learning, model evaluation, and exploratory data analysis. Students will gain hands-on experience applying these principles by using Apache Spark to implement several scalable learning pipelines.
Programming background; comfort with mathematical and algorithmic reasoning; familiarity with basic machine learning concepts; exposure to algorithms, probability, linear algebra and calculus; experience with Python (or the ability to learn it quickly). All exercises will use PySpark, but previous experience with Spark or distributed computing is NOT required. You should take this Python quiz before the course and take this Python mini-course if you need to learn Python or refresh your Python knowledge. This self-assessment document provides online resources that review additional relevant background material.
WEEK 0已经开始了，先是环境的搭建，上过CS 100.1x的同学用原先的虚拟机就可以啦。