NYU Course on Deep Learning (Spring 2014)

Yann LeCun, New York University

This is a graduate course on deep learning, one of the hottest topics in machine learning and AI at the moment.

In the last two or three years, Deep learning has revolutionized speech recognition and image recognition. Deep learning is widely deployed by such companies as Google, Facebook, Microsoft, IBM, Baidu, Apple and others for audio/speech, image, video, and natural language processing.


  • Spring 2014 instructor: Yann LeCun, 715 Broadway, Room 1220, 212-998-3283, yann [ a t ] cs.nyu.edu
  • Teaching Assistant: Liu Hao, haoliu [ at ] nyu.edu
  • Classes: Mondays 5:10 to 7:00 PM. Location: Cantor, room 101
  • Lab Sessions: Wednesdays 5:10 to 6:00 PM. Location: Warren Weaver Hall, room 109.
  • Office Hours for Prof. LeCun: Wednesdays 3:00-5:00 and 6:00-7:00 PM. Please send an email to Prof. LeCun prior to an office hour visit.

Course Description

The course covers a wide variety of topics in deep learning, feature learning and neural computation. It covers the mathematical methods and theoretical aspects as well as algorithmic and practical issues. Deep Learning is at the core of many recent advances in AI, particularly in audio, image, video, and language analysis and undestanding.

Who Can Take This Course?

This course is primarily designed for student in the Data Science programs. But any student who is familiar with the basics of machine learning can take this course.

The only formal pre-requisites is to have successfully completed “Intro to Data Science” or any basic course on machine learning. Familiarity with computer programming is assumed. The course relies heavily on such mathematical tools as linear algebra, probability and statistics, multi-variate calculus, and function optimization. The basic mathematical concepts will be introduced when needed, but students will be expected to assimilate a non-trivial amount of mathematical concepts in a fairly short time.

Familiarity with basic ML/stats concepts such as multinomial linear regression, logistic regression, K-means clustering, Principal Components Analysis, and simple regularization is assumed.


The topics studied in the course include:

  • learning representations of data.
  • the energy-based view of model estimation.
  • basis function expansion
  • supervised learning in multilayer architectures. Backpropagation
  • optimization issues in deep learning
  • heterogeneous learning systems, modular approach to learning.
  • convolutional nets
  • applications to image recognition
  • structured prediction, factor graphs and deep architectures
  • applications to speech recognition
  • learning embeddings, metric learning
  • recurrent nets: learning dynamical systems
  • recursive nets: algebra on representations
  • the basics of unsupervised learning
  • the energy-based view of unsupervised learning
  • energy-shaping methods for unsupervised learning
  • decoder-only models: K-means, sparse coding, convolutional sparse coding
  • encoder-only models: ICA, Product of Experts, Field of Experts.
  • the encoder-decoder architecture
  • Sparse Auto-encoders,
  • Denoising, Contracting, and Saturating auto-encoders
  • Restricted Boltzmann Machines. Contrastive Divergence.
  • learning invariant features: group sparsity
  • feature factorization
  • scattering transform
  • software implementation issues. GPU implementations.
  • parallelizing deep learning
  • theoretical questions
  • open questions

On-Line Material from Other Sources

  • A quick overview of some of the material contained in the course is available from my ICML 2013 tutorial on Deep Learning:
  • Q&A about deep learning (Spring 2013 course on large-scale ML)
  • 2012 IPAM Summer School deep learning and representation learning
  • 2014 International Conference on Learning Representations (ICLR 2014)

Week 1

2014-01-27 Lecture

* Intro to Deep Learning

2014-01-29 Lab

* Roy Lowrance's tutorial on Lua

Week 2

2014-02-03 Lecture

* Modular Learning, Neural Nets and Backprop

  • Slides: PDF | DjVu
  • Topics: : Backprop, modular models
  • Reading Material:
    • Gradient-Based Learning Applied to Document Recognition (Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, 1998): pages 1-5 (part I) PDF | DjVu
    • ?: Additional readings: ICML 2013 pp 34 - 53?
2014-02-05 Lab

* Clement Farabet's tutorial on the Torch ML library

Week 3

2014-02-10 Lecture

* Mixture of experts, recurrent nets, intro to ConvNets

  • Slides: PDF | DjVu
  • Topics: : Discussion of some modules, Sum/branch, Switch, Logsum module; RBF Net; MAP/MLE loss; Parameter Space Transforms; Convolutional Module
  • Reading Material:
    • Gradient-Based Learning Applied to Document Recognition (Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, 1998): pages 5-16 (part II and III) PDF | DjVu
2014-02-12 Lab

* Unscheduled

Week 4

2014-02-17 Lecture
2014-02-19 Lab

* Unscheduled

Week 5

2014-02-24 Lecture

Guest lecture by Rob Fergus on Conv nets

  • Topics:
  • Reading Material:
    • Yann LeCun CVPR talk on scene understanding
    • Sermanet et al. ICLR 2014: “OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks” arXiv
    • LeCun. ECCV 2012 “Learning Invariant Feature Hierarchies”: PDF, DjVu
    • Sermanet et al. ICPR 2012 “Convolutional Neural Networks Applied to House Numbers Digit Classification”: PDF, DjVu
    • Farabet et al. PAMI 2013 “Learning Hierarchical Features for Scene Labeling”: PDF, DjVu
    • Sermanet et al. CVPR 2013 “Pedestrian Detection with Unsupervised Multi-Stage Feature Learning”: PDF,DjVu
2014-02-26 Lab

* Unscheduled

Week 6

2014-03-03 Lecture

* Energy–Based Models for Supervised Learning

  • Slides: PDF | DjVu
  • Topics: : energy for inference, objective for learning, loss functionals.
  • Reading Material:
    • Yann LeCun, Sumit Chopra, Raia Hadsell, Marc'Aurelio Ranzato and Fu-Jie Huang: A Tutorial on Energy-Based Learning, in Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds), Predicting Structured Data, MIT Press, 2006 PDF, DjVu
  • Other On-Line Material:
2014-03-05 Lab

* Optimization Tricks for Deep Learning and Computer Vision

  • Topics: : Aspect Ratio, Randomization, Normalization mean / std, Channel Decorrelation
  • Reading Material:
    • Y. LeCun, L. Bottou, G. Orr and K. Muller: Efficient BackProp, in Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998. PDF | DjVu

Week 7

2014-03-10 Lecture

* Energy-Based Models for Unsupervised Learning

  • Slides: PDF | DjVu
  • Topics: : Learning energy function is hard. These are different strategies (?); Use PCA; NLL: problem intractable; Contrastive Divergence; Just learn E-surface around datapoints; Denoising AE (with drawing on blackboard); Sparse coding
  • Reading Material:
2014-03-12 Lab

* Optimization for Deep Learning

  • Slides: PDF | DjVu
  • Topics: : Importance of normalization: no stretched ellipses.; Newton algorithm / Hessian Estimation algorithms (?); 1-hidden NN
  • Notes from Lab session: PDF | DjVu
  • Notes from the Blackboard: 1 and 2
  • Video (2013 part 1)
  • Video (2013 part 2)
  • Reading Material:
    • Y. LeCun, L. Bottou, G. Orr and K. Muller: Efficient BackProp, in Orr, G. and Muller K. (Eds), Neural Networks: Tricks of the trade, Springer, 1998. PDF | DjVu

Spring Break 03-17 to 03-23

Week 8

2014-03-24 Lecture
2014-03-26 Lab

* Metric Learning and Optimization / Dr Lim

  • Topics: : NCA; Dr Lim
  • Reading Material:
    • Not relevant

Week 9

2014-03-31 Lecture

* Latent Factor Graphs

  • Topics: : Latent Variable Models, Probabilistic LVM, Loss Function, Example handwriting recognition
  • Reading Material:
    • Maybe this are the Slides: PDF | DjVu
    • This is covered in the energy learning tutorial
    • Video (2013)
2014-04-02 Lab

* Unscheduled

Week 10

2014-04-07 Lecture

* Restricted Boltzmann Machines

2014-04-09 Lab

* Optimization for Deep Learning?

Week 11

2014-04-14 Lecture

* Guest Lecture by Antoine Bordes on NLP

2014-04-16 Lab

* Unscheduled

Week 12

2014-04-21 Lecture

* Energy-Based Models for Unsupervised Learning

2014-04-23 Lab

* Recurrent Networks Lab

Week 13

2014-04-28 Lecture

* Speech Recognition / Structured Prediction

  • Topics: FFT/DFT, Time Delay Conv Nets, Acoustic Modeling
  • Reading Material:
    • Not relevant
2014-04-30 Lab

* Discussion of Project Topics

Week 14

2014-05-05 Lecture

* Back propagation, History of Deep Learning

  • Topics: Lagrange derivation of back propagation, development of neural networks and deep learning since the 1940s
  • Reading Material:
    • Not relevant
2014-05-07 Lab

* Sparse Coding

  • Reading Material:
    • Not relevant

Week 15

* Final Exam Period May 12 to May 19

  • Final Project May 16
  • If you are not graduating and need an extension talk to the TA Liu Hao: haoliu [ at ] nyu.edu
  • Final Exam May 19

Final Exam Topics

  • the reasons for deep learning.
  • fprop/bprop: here is the fprop function for a module. Write the bprop.
  • modules you should know about:
    • linear, point-wise non-linearity, max,
    • Y branch, square distance, log-softmax
  • loss functions: least square, cross-entropy, hinge
  • energy-based supervised learning: energy/inference - objective function/learning
  • loss functionals: energy loss, negative log likelihood, perceptron, hinge
  • metric learning, siamese nets
  • DrLIM, WSABIE criteria
  • network architectures:
    • shared weights and other weight space transformations
    • recurrent nets: basic algorithm for backprop-through-time
  • mixture of experts
  • convolutional nets:
    • architecture, usage, for image and speech recognition and detection of objects in images
  • optimization:
    • SGD
    • tricks to make learning efficient: data normalization and such.
    • computing 2nd derivatives (diagonal terms)
  • deep learning + structured prediction
  • inference through energy minimization and marginalization
  • latent variables E(X,Y,Z) → F(X,Y)
  • learning using a loss functional
  • applications to sequence processing (e.g. Speech and handwriting recognition)
  • applications:
    • speech and audio (temporal convnets)
    • image (spatial convnets)
    • text (see Jason Weston and Antoine Bordes¹ lectures)
  • unsupervised learning:
    • basic idea of energy-based unsupervised learning
    • the 7 methods to make the energy low on/near the samples and high everywhere else
    • sparse coding and sparse auto-encoders
    • group sparsity
  • 27 January 2014, 16 weeks
Course properties:
  • Free:
  • Paid:
  • Certificate:
  • MOOC:
  • Video:
  • Audio:
  • Email-course:
  • Language: English Gb


No reviews yet. Want to be the first?

Register to leave a review

Included in selections:
Small-icon.hover Deep Learning
Good materials on deep learning.
More from 'Computer Science':
Maxresdefault CS 282: Principles of Operating Systems II: Systems Programming for Android
Developing high quality distributed systems software is hard; developing high...
Banner_ruby Ruby on Rails Tutorial: Learn From Scratch
This post is part of our “Getting Started” series of free text tutorials on...
Cppgm C++ Grandmaster Certification
The C++ Grandmaster Certification is an online course in which participants...
Umnchem Computational Chemistry (CHEM 4021/8021)
Modern theoretical methods used in study of molecular structure, bonding, and...
Photo Machine Learning Course - CS 156
This is an introductory course by Caltech Professor Yaser Abu-Mostafa on machine...

© 2013-2019