# Advanced Topics in Machine Learning: Kernel Methods

## Arthur Gretton (with Zoltan Szabo, Kacper Chwialkowski), University College London

All lecture locations are listed on p. 4 of the first set of slides.

Course announcements will be posted on the mailing list.

This page will contain slides and detailed notes for the kernel part of the course. The assignment may also be found here (at the bottom of the page). Note that the slides will be updated as the course progresses, and I modify them to answer questions I get in the classes. I'll put the date of last update next to each document - be sure to get the latest one. Let me know if you find errors.

There are sets of practice exercises and solutions further down the page (after the slides).

See David Silver's page for the reinforcement learning part of the course.

## Slides and notes

Lectures 1, 2, and 3 slides and notes, last modified 26 Jan 2016
• Definition of a kernel, how it relates to a feature space
• Combining kernels to make new kernels
• The reproducing kernel Hilbert space
• Applications: difference in means, kernel PCA, kernel ridge regression

Lectures 4, 5, 6, and 7 slides and notes, last modified 23 Feb 2016

• Distance between means in RKHS, integral probability metrics, the maximum mean discrepancy (MMD), two-sample tests
• Choice of kernels for distinguishing distributions, characteristic kernels
• Covariance operator in RKHS: proof of existence, definition of norms (including HSIC, the Hilbert-Schmidt independence criterion)
• Application of HSIC to independence testing
• Application of HSIC to feature selection, taxonomy discovery.
• Introduction to independent component analysis, kernel ICA

• Introduction to convex optimization
• The representer theorem
• Large margin classification, support vector machines for clasification

Lecture 9 slides, lecture 10 slides , and notes, last modified 20 Mar 2013

• Metric, normed, and unitary spaces, Cauchy sequences and completion, Banach and Hilbert spaces
• Bounded linear operators and the Riesz Theorem
• Equivalent notions of an RKHS: existence of reproducing kernel, boundedness of the evaluation operator
• Positive definiteness of reproducing kernels, the Moore-Aronszajn Theorem
• Mercer's Theorem for representing kernels

• Loss and risk, estimation and approximation error, a new interpretation of MMD
• Why use an RKHS: comparison with other function classes (Lipschitz and bounded Lipschitz)
• Characteristic kernels and universal kernels

## Assignment

The assignment (first part due in on Thursday March 24th 2016). You will need this extract on incomplete Cholesky (scanned from Shawe-Taylor and Cristianini, Kernel Methods for Pattern Analysis). Last modified 05 Jan 2016.

## Practice exercises and solutions

The exercises are taken from exams in previous years, with minor modifications. Worked solutions are provided. Last modified 18 Oct 2015.
• Set 1
• Set 2
•

Dates:
• Free schedule
Course properties:
• Free:
• Paid:
• Certificate:
• MOOC:
• Video:
• Audio:
• Email-course:
• Language: English

### Reviews

No reviews yet. Want to be the first?

Register to leave a review

Included in selections:
Deep Learning
Good materials on deep learning.
Machine Learning
Machine learning: from the basics to advanced topics. Includes statistics...

More from 'Computer Science':
CS 282: Principles of Operating Systems II: Systems Programming for Android
Developing high quality distributed systems software is hard; developing high...
Ruby on Rails Tutorial: Learn From Scratch
This post is part of our “Getting Started” series of free text tutorials on...
NYU Course on Deep Learning (Spring 2014)
Lectures from the NYU Course on Deep Learning (Spring 2014) This is a graduate...
C++ Grandmaster Certification
The C++ Grandmaster Certification is an online course in which participants...
Computational Chemistry (CHEM 4021/8021)
Modern theoretical methods used in study of molecular structure, bonding, and...