Pennsylvania State University

Pennsylvania State University

Machine Learning

Study Guide

Please note that the precise schedule is subject to change. The lecture slides (and lecture notes, if any) are updated after the lecture.


Lecture 1. Learning from data: motivations, representative applications

Why should machines learn? What is machine learning? Machine learning in context: artificial intelligence, statistics, data science. Types of machine learning. Course overview and course mechanics. Quick overview of the Python-based tools for machine learning. Getting started with Google Colab.

Required readings

Recommended materials

Optional Materials


Lecture 2. Simple Examples of Machine Learning for classification and function approximation

Nearest Neighbor Classifier. Measures of similarity/distance for different types of data. Variants of K-nearest neighbor method. Applications. Advantages and limitations. Making nearest neighbor classification work on large data sets. Nearest neighbor in high dimensions.

Required Readings

Recommended readings

Optional Materials


Lecture 3. Elements of optimization

Linear regression for function approximation. Least Mean Squared (LMS) Error Criterion. Minimization of Error Functions. Basics of quadratic optimization. Review of Relevant Mathematics (Limits, Continuity and Differentiablity of Functions, Local Minima and Maxima, Derivatives, Partial Derivatives, Taylor Series Approximation, Multi-Variate Taylor Series Approximation)

Required Readings

Recommended readings


Lecture 4. Linear Regresion

Required Readings

Recommended readings


Lecture 5: Evaluating predictive models

Evaluation of classifiers. Accuracy, Sensitivity, Specificity, Correlation Coefficient. Tradeoffs between sensitivity and specificity. When does a classifier outperform another? ROC curves.

Required Readings

Recommended readings


Lecture 6: Evaluating predictive models (Continued)

Estimating performance measures; confidence interval calculation for estimates; cross-validation based estimates of model performance; leave-one-out and bootstrap estimates of performance; comparing two models; comparing two learning algorithms.

Required Readings

Recommended readings


Lecture 7: Linear Classifiers

Introduction to Linear Classifiers. Threshold logic unit (perceptron). Connection with Logic and Geometry. Weight space and pattern space representations of perceptrons. Linear separability. Perceptron Learning algorithm and its variants. Convergence properties of perceptron algorithm. Winner-Take-All Networks.

Required Readings

Recommended Readings


Lecture 8: Linear Classifiers (Continued).

Perceptron Learning algorithm and its variants. Convergence properties of perceptron algorithm. Winner-Take-All Networks.

Required Readings

Recommended Readings


Lecture 9: Linear Classifiers (Continued).

Alternative loss functions for perceptrons and multi-class perceptrons. Gradient-based minimization of loss.

Required Readings

Recommended Readings


Lecture 10 Maximum Margin Classifiers

Perceptrons revisited. Is there an optimal hyperplane? Maximum Margin Classifiers. Empirical Error and error bounds for linear classifiers. Minimizing error by maximizing margin. Finding margin maximizing separating hyperplane as an optimization problem. Algorithm for finding margin maximizing hyperplane. Soft marging stochastic gradient algorithm. Interpretation of the maximum margin classifier and support vectors.

Required Readings

Recommended Readings


Lecture 11 Review

Review: Loss functions, deriving weight update rules gradient based minimization of loss functions, additional in-class exercises.

Required Readings

  • Required readings assigned for lectures 1-10.

Recommended Readings

  • Recommended readings assigned for lectures 1-10.


Lectures 12 Kernel Machines

How to train classifiers for handling non-linear decision boundaries? Classic methods - feature engineering. Modern tools - Kernel trick. Dual representation of linear classifiers. Why are kernels useful?

Required Readings

  • Lecture Slides.. Vasant Honavar Note: the lecture slides give a much simpler, yet rigorous treatment of kernel machines than the one available in the textbook.

Recommended Readings


Lectures 13-14 Kernel Machines

How to 'kernelize' any linear model using the kernel trick? Properties of Kernel Functions. Examples of Kernel Functions. Constructing Kernel Functions. Distinguishing good kernels from bad ones. Examples of kernel machines - kernel perceptron, kernel SVM, kernel regression. Applications of Kernel Machines. Text classification, Image classification, etc.

Required Readings

  • Lecture Slides.. Vasant Honavar. Note: the lecture slides give a much simpler, yet rigorous treatment of kernel machines than the one available in the textbook.

Recommended Readings


Midterm Review

Midterm Exam


Lecture 15: Probabilistic models for Machine Learning

Brief review of probability: Sample spaces. Random variables. Events. Probabilities of compound events. Joint probability. Marginalization. Conditional Probability. Bayes Rule. Product rule.

Probabilistic generative models. Decision theoretic models of classification. Bayes optimal classifier. How Bayes Optimal Classifier optimally trades off between prior beliefs and information supplied by data. Minimum risk classifier.

Required readings

Recommended Readings
Lecture 16: Probabilistic models for Machine Learning

Probabilistic generative models. Naive Bayes Classifier. Applications of Naive Bayes Classifiers. Tabular, Sequence Text, and Image Data Classification. Multi-variate, multi-nomial, and Gaussian Naive Bayes. Avoiding overfitting - robust probability estimates

Required readings

Recommended Readings


Lectures 17 From Generative to Discriminative Models (Naive Bayes to Logistic Regression

Generative versus Discriminative Models for Classification. Naive Bayes classifier as a generative model. Logistic Regression as the discriminative counterpart of Naive Bayes. Relationship between generative models and linear classifiers. Additional examples of generative models. Generative models from the exponential family of distributions. Transforming generative models of the exponential family into discriminative models. Estimating parameters of discriminative models. Generative models versus discriminative models for classification.

Required Readings

Recommended Readings


Lecture 18: Decision Trees

Modeling dependence between attributes. The decision tree classifier. Introduction to information theory. Information, entropy, mutual information, and related concepts. Algorithm for learning decision tree classifiers from data. Overfitting and methods to avoid overfitting -- dealing with small sample sizes; prepruning and post-pruning.

Required Readings

Recommended Readings


Lecture 19: Decision Trees (contd) and Random Forest - An Ensemble of Decision Trees

Pitfalls of entropy as a splitting criterion for multi-valued splits. Alternative splitting strategies -- two-way versus multi-way splits; Alternative split criteria: Gini impurity, Entropy, etc. Cost-sensitive decision tree induction -- incorporating attribute measurement costs and misclassification costs into decision tree induction. Dealing with categorical, numeric, and ordinal attributes. Dealing with missing attribute values during tree induction and instance classification.

Why a forest is better than a single tree. The random forest algorithm for classification. Why random forest works.

Required Readings

Recommended Readings



Lecture 19

Review

Lecture 20 Multi-layer Neural Networks and Generalized Delta Rule

Introduction to multi-layer neural networks for classification and regression. Universal function approximation theorem. Generalized delta rule (GDR) (the backpropagation learning algorithm).

Required readings

Recommended Readings


Lecture 21 Multi-layer Neural Networks and Generalized Delta Rule (Contd.)

Generalized delta rule (backpropagation algorithm) in practice - avoiding overfitting, choosing neuron activation functions, choosing learning rate, choosing initial weights, speeding up learning, improving generalization, circumventing local minima. Variations -- Radial basis function networks. Learning non linear functions by searching the space of network topologies as well as weights.

Required readings

Recommended Readings


Lectures 22-23 Deep Learning

Motivations behind deep learning. Deep representation learning. Autoencoders. Stacked autoencoders. Convolutional neural networks. Required readings



Lectures 24 Review and Wrapup