Machine Learning

Data Sciences Program

Course Information

Course Materials

AI Resources

Quick Links

Machine Learning

Study Guide

Please note that the precise schedule is subject to change. The lecture slides (and lecture notes, if any) are updated after the lecture.

Lecture 1. Learning from data: motivations, representative applications

Why should machines learn? What is machine learning? Machine learning in context: artificial intelligence, statistics, data science. Types of machine learning. Course overview and course mechanics. Quick overview of the Python-based tools for machine learning. Getting started with Google Colab.

Required readings

Review: Course Syllabus
Review: Course Policies
Read: Sections 1.1 and 1.2 from Daume III (2015) A Course on Machine Learning.
Review: Lecture slides, Vasant Honavar
Review: Python ecosystem
Review: The Python Language
Review: Hands-on Python Tutorial (optional, for those who need it).
Read: Introduction to Google Colab

Recommended materials

Videos 1, 2 from the Introduction to Machine Learning in Python with Scikit-Learn video series, Kevin Markham.
NumPy (Intro to arrays)
Intro to Pandas
Chapter 1 of Python Data Science Handbook, Jake Vanderplas, O'Reilly (2017)
Python For Data Science An excellent hands-on tutorial on Python for Data Science by Jason Seabold. Download on Git: Python for Data Science Repository.

Optional Materials

Probabilistic machine learning and artificial intelligence. Ghahramani, Z. Nature 521, 452–459 (28 May 2015) doi:10.1038/nature1454
Chapters 2-3 Python Data Science Handbook, Jake Vanderplas, O'Reilly (2017).

Lecture 2. Simple Examples of Machine Learning for classification and function approximation

Nearest Neighbor Classifier. Measures of similarity/distance for different types of data. Variants of K-nearest neighbor method. Applications. Advantages and limitations. Making nearest neighbor classification work on large data sets. Nearest neighbor in high dimensions.

Required Readings

Chapter 3 from Daume III (2015) A Course on Machine Learning.
Sections 6-6.2, 6.4-6.6 from Daume III (2015) A Course on Machine Learning.
Lecture Slides, Vasant Honavar

Recommended readings

Surya Prasath, V.B., Alfeilat, H., Lasassmeh, O., and Hassanat, A. (2017). Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier - A Review arXiv:1708.04321

Optional Materials

Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. (1999). When is Nearest Neighbor Meaningful? In: Beeri C., Buneman P. (eds) Database Theory — ICDT’99. ICDT 1999. Lecture Notes in Computer Science, vol 1540. Springer, Berlin, Heidelberg
Slaney, M. and Casey, M., 2008. Locality-sensitive hashing for finding nearest neighbors. IEEE Signal processing magazine, 25(2), pp.128-131.

Lecture 3. Elements of optimization

Linear regression for function approximation. Least Mean Squared (LMS) Error Criterion. Minimization of Error Functions. Basics of quadratic optimization. Review of Relevant Mathematics (Limits, Continuity and Differentiablity of Functions, Local Minima and Maxima, Derivatives, Partial Derivatives, Taylor Series Approximation, Multi-Variate Taylor Series Approximation)

Required Readings

Chapter 3 from Daume III (2015) A Course on Machine Learning.
Linear Regression for Function Approximation", Notes, Vasant Honavar
Lecture Slides, Vasant Honavar

Recommended readings

Chapter 5, Mathematics for Machine Learning, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong (2020).

Lecture 4. Linear Regresion

Required Readings

Chapter 3 from Daume III (2015) A Course on Machine Learning.
Linear Regression for Function Approximation", Notes, Vasant Honavar
Lecture Slides, Vasant Honavar

Recommended readings

Atkins, K. Moore, A., Schaal, S. (1997). Locally weighted learning. Artif Intell Rev 11.1-5 (1997): 11-73

Lecture 5: Evaluating predictive models

Evaluation of classifiers. Accuracy, Sensitivity, Specificity, Correlation Coefficient. Tradeoffs between sensitivity and specificity. When does a classifier outperform another? ROC curves.

Required Readings

Sections 5.5, 5.6, 5.7, 5.8, and 5.9 from Daume III (2015) A Course on Machine Learning.
Lecture Slides, Vasant Honavar

Recommended readings

Baldi, P., Brunak, S., Chauvin, Y. and Nielsen, H. (2000) Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics Vol. 16. pp. 412-424.
Confidence intervals and hypothesis testing Confidence Intervals and Hypothesis Tests.
F Provost, T Fawcett, R Kohavi (1998). A case against accuracy estimation of machine learning algorithms. In: proceedings of the fifteenth International Conference on Machine Learning.
Fawcett, T. (2003) ROC Graphs: Notes and Practical Considerations for Researchers HP Labs Tech. Report HPL2003-4.

Lecture 6: Evaluating predictive models (Continued)

Estimating performance measures; confidence interval calculation for estimates; cross-validation based estimates of model performance; leave-one-out and bootstrap estimates of performance; comparing two models; comparing two learning algorithms.

Required Readings

Sections 5.5, 5.6, 5.7, 5.8, and 5.9 from Daume III (2015) A Course on Machine Learning.
Lecture Slides, Vasant Honavar

Recommended readings

Confidence intervals and hypothesis testing Confidence Intervals and Hypothesis Tests.
Stąpor, K. (2018). Evaluating and Comparing Classifiers: Review, Some Recommendations and Limitations In International Conference on Computer Recognition Systems (pp. 12-21). Springer, Cham.

Lecture 7: Linear Classifiers

Introduction to Linear Classifiers. Threshold logic unit (perceptron). Connection with Logic and Geometry. Weight space and pattern space representations of perceptrons. Linear separability. Perceptron Learning algorithm and its variants. Convergence properties of perceptron algorithm. Winner-Take-All Networks.

Required Readings

Chapter 4 of Daume III (2015) A Course on Machine Learning.
Lecture Slides, Vasant Honavar
Honavar, V. Threshold Logic Units.
Honavar, V. Perceptron Learning Algorithm.

Recommended Readings

Chapters 3 and 4 from Neural Networks: A Systematic Introduction, Rojas, R. (1996).

Lecture 8: Linear Classifiers (Continued).

Perceptron Learning algorithm and its variants. Convergence properties of perceptron algorithm. Winner-Take-All Networks.

Required Readings

Chapter 4 of Daume III (2015) A Course on Machine Learning.
Lecture Slides, Vasant Honavar
Honavar, V. Perceptron Learning Algorithm.
Honavar, V. Multi-category Generalizations of Perceptron Algorithm.

Recommended Readings

Chapters 3 and 4 from Neural Networks: A Systematic Introduction, Rojas, R. (1996).

Lecture 9: Linear Classifiers (Continued).

Alternative loss functions for perceptrons and multi-class perceptrons. Gradient-based minimization of loss.

Required Readings

Sections 7.2 through 7.5 of Daume III (2015) A Course on Machine Learning.
Lecture Slides, Vasant Honavar

Recommended Readings

Wang, Qi, Yue Ma, Kun Zhao, and Yingjie Tian. A comprehensive survey of loss functions in machine learning. Annals of Data Science 9, no. 2 (2022): 187-212.

Lecture 10 Maximum Margin Classifiers

Perceptrons revisited. Is there an optimal hyperplane? Maximum Margin Classifiers. Empirical Error and error bounds for linear classifiers. Minimizing error by maximizing margin. Finding margin maximizing separating hyperplane as an optimization problem. Algorithm for finding margin maximizing hyperplane. Soft marging stochastic gradient algorithm. Interpretation of the maximum margin classifier and support vectors.

Required Readings

Chapter 7 (with special attention to section 7.7) of Daume III (2015) A Course on Machine Learning.
Lecture Slides, Vasant Honavar

Recommended Readings

Sun, Shiliang, Zehui Cao, Han Zhu, and Jing Zhao. A survey of optimization methods from a machine learning perspective. IEEE transactions on cybernetics 50, no. 8 (2019): 3668-3681.

Lecture 11 Review

Review: Loss functions, deriving weight update rules gradient based minimization of loss functions, additional in-class exercises.

Required Readings

Required readings assigned for lectures 1-10.

Recommended Readings

Recommended readings assigned for lectures 1-10.

Lectures 12 Kernel Machines

How to train classifiers for handling non-linear decision boundaries? Classic methods - feature engineering. Modern tools - Kernel trick. Dual representation of linear classifiers. Why are kernels useful?

Required Readings

Lecture Slides.. Vasant Honavar Note: the lecture slides give a much simpler, yet rigorous treatment of kernel machines than the one available in the textbook.

Recommended Readings

A tutorial on kernel methods for categorizationFrank Jakel, Berhnard Scholkoph, and Felix Wichmann
Chapter 11 of Daume III (2015) A Course on Machine Learning.
Kernel Methods in Machine LearningThomas Hofmann, Bernhard Scholkopf, and Alex Smola

Lectures 13-14 Kernel Machines

How to 'kernelize' any linear model using the kernel trick? Properties of Kernel Functions. Examples of Kernel Functions. Constructing Kernel Functions. Distinguishing good kernels from bad ones. Examples of kernel machines - kernel perceptron, kernel SVM, kernel regression. Applications of Kernel Machines. Text classification, Image classification, etc.

Required Readings

Lecture Slides.. Vasant Honavar. Note: the lecture slides give a much simpler, yet rigorous treatment of kernel machines than the one available in the textbook.

Recommended Readings

A tutorial on kernel methods for categorizationFrank Jakel, Berhnard Scholkoph, and Felix Wichmann
Chapter 11 of Daume III (2015) A Course on Machine Learning.
A primer on kernel methodsJean-Philippe Vert, Koji Tsuda, and Bernhard Scholkopf

Midterm Review

Midterm Exam

Lecture 15: Probabilistic models for Machine Learning

Brief review of probability: Sample spaces. Random variables. Events. Probabilities of compound events. Joint probability. Marginalization. Conditional Probability. Bayes Rule. Product rule.

Probabilistic generative models. Decision theoretic models of classification. Bayes optimal classifier. How Bayes Optimal Classifier optimally trades off between prior beliefs and information supplied by data. Minimum risk classifier.

Required readings

Chapter 9 (Sections 9.1 through 9.5) of Daume III (2015) A Course on Machine Learning.
Arian Maleki and Tom DoReview of probability theory(for machine learning)
Lecture Slides. Vasant Honavar

Recommended Readings

Chapter 6, Deisenroth, M.P., Faisal, A., and Ong, C.S., Mathematics for Machine Learning

Lecture 16: Probabilistic models for Machine Learning

Probabilistic generative models. Naive Bayes Classifier. Applications of Naive Bayes Classifiers. Tabular, Sequence Text, and Image Data Classification. Multi-variate, multi-nomial, and Gaussian Naive Bayes. Avoiding overfitting - robust probability estimates

Required readings

Chapter 9 (Sections 9.1 through 9.5) of Daume III (2015) A Course on Machine Learning.
Tom MurphyNaive Bayes Classifiers
Lecture Slides. Vasant Honavar

Recommended Readings

P. Domingos and M. Pazzani. On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103--130, 1997.

Lectures 17 From Generative to Discriminative Models (Naive Bayes to Logistic Regression

Generative versus Discriminative Models for Classification. Naive Bayes classifier as a generative model. Logistic Regression as the discriminative counterpart of Naive Bayes. Relationship between generative models and linear classifiers. Additional examples of generative models. Generative models from the exponential family of distributions. Transforming generative models of the exponential family into discriminative models. Estimating parameters of discriminative models. Generative models versus discriminative models for classification.

Required Readings

Chapter 9 (Sections 9.6 through 9.7) of Daume III (2015) A Course on Machine Learning.
Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression, Tom Mitchell
Lecture Slides. Vasant Honavar

Recommended Readings

Rubinstein, D and Hastie, T. Discriminative vs Informative Learning. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1997.
Ng, A. and Jordan, M. (2002) On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, Proceedings of the IEEE Conference on Neural Information Systems (NIPS 2002).
Bouchard, G. and Triggs, B. (2004). The tradeoff between Generative and Discriminative Classifiers, Proceedings of Computational Statistics (Compstat 04).
Raina, R., Shen, Y., Ng, A., and McCallum, A. (2003). Classification with Hybrid Generative/Discriminative Models. In Proceedings of the IEEE Conference on Neural Information Systems (NIPS 2003).
Ulusoy, I. and C. M. Bishop (2005b). Generative versus discriminative models for object recognition. In Proceedings IEEE International Conference on Computer Vision and Pattern Recognition, CVPR., San Diego.

Lecture 18: Decision Trees

Modeling dependence between attributes. The decision tree classifier. Introduction to information theory. Information, entropy, mutual information, and related concepts. Algorithm for learning decision tree classifiers from data. Overfitting and methods to avoid overfitting -- dealing with small sample sizes; prepruning and post-pruning.

Required Readings

Sections 1.1-1.3 of Chapter 1 of Daume III (2015) A Course on Machine Learning.
Decision Tree Tutorial, by H. Hamilton, E. Gurak, L. Findlater, and W. Olive
Lecture Slides., Vasant Honavar.

Recommended Readings

A Mathematical Theory of Communication, C. Shannon.
Domingos, P. (1999). The Role of Occam's Razor in Knowledge Discovery. Vol. 3, no. 4., pp. 409-425.
Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC. pp. 880-887.
Silvescu, A., and Honavar, V. (2001). Temporal Boolean Network Models of Genetic Networks and Their Inference from Gene Expression Time Series. Complex Systems.. Vol. 13. No. 1. pp. 54-.
Kang, D-K., Silvescu, A. and Honavar, V. (2006) RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer Science., Berlin: Springer-Verlag. pp. 45-54, 2006.

Lecture 19: Decision Trees (contd) and Random Forest - An Ensemble of Decision Trees

Pitfalls of entropy as a splitting criterion for multi-valued splits. Alternative splitting strategies -- two-way versus multi-way splits; Alternative split criteria: Gini impurity, Entropy, etc. Cost-sensitive decision tree induction -- incorporating attribute measurement costs and misclassification costs into decision tree induction. Dealing with categorical, numeric, and ordinal attributes. Dealing with missing attribute values during tree induction and instance classification.

Why a forest is better than a single tree. The random forest algorithm for classification. Why random forest works.

Required Readings

Sections 13.1, 13.3 of Daume III (2015) A Course on Machine Learning.
Lecture Slides., Vasant Honavar.

Recommended Readings

Fayyad, U. and Irani, K.B. (1992). On the handling of continuous valued attributes in decision tree generation. Machine Learning vol. 8. pp. 87-102.
Codrington, C. W. and Brodley, C. E., On the Qualitative Behavior of Impurity-Based Splitting Rules: The Minima-Free Property. Tech. Rep. 97-05. Dept. of Computer Science. Cornell University.
Atramentov, A., Leiva, H., and Honavar, V. (2003). A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments.. In: Proceedings of the Thirteenth International Conference on Inductive Logic Programming. Berlin: Springer-Verlag. Lecture Notes in Computer Science. Vol. 2835, pp. 38-56.
L. Breiman. Random forests. Machine Learning, 45(1): 5–32,2001

Lecture 19

Review

Lecture 20 Multi-layer Neural Networks and Generalized Delta Rule

Introduction to multi-layer neural networks for classification and regression. Universal function approximation theorem. Generalized delta rule (GDR) (the backpropagation learning algorithm).

Required readings

Chapter 10 of Daume III (2015) A Course on Machine Learning.
Lecture Slides.. Vasant Honavar
Honavar, V. Multi-layer networks.
Honavar, V. Radial Basis Function Networks.

Recommended Readings

Rumelhart, D., Hinton, G., and Williams, R (1986). Learning Internal Representations by Error Propagation.. In: Parallel Distributed Processing. Vol. 1. Rumelhart, D., McClelland, J. and the PDP Research Group. MIT Press.

Lecture 21 Multi-layer Neural Networks and Generalized Delta Rule (Contd.)

Generalized delta rule (backpropagation algorithm) in practice - avoiding overfitting, choosing neuron activation functions, choosing learning rate, choosing initial weights, speeding up learning, improving generalization, circumventing local minima. Variations -- Radial basis function networks. Learning non linear functions by searching the space of network topologies as well as weights.

Required readings

Chapter 10 of Daume III (2015) A Course on Machine Learning.
Lecture Slides.. Vasant Honavar
Honavar, V. Multi-layer networks.
Honavar, V. Radial Basis Function Networks.

Recommended Readings

Rumelhart, D., Hinton, G., and Williams, R (1986). Learning Internal Representations by Error Propagation.. In: Parallel Distributed Processing. Vol. 1. Rumelhart, D., McClelland, J. and the PDP Research Group. MIT Press.
Bishop, C. (1991). Improving the generalization properties of radial basis function neural networks. Neural computation, 3(4), 579-588.

Lectures 22-23 Deep Learning

Motivations behind deep learning. Deep representation learning. Autoencoders. Stacked autoencoders. Convolutional neural networks. Required readings

Ng et al., 2015. Deep Learning Tutorial
Lecture Slides.. Vasant Honavar

Lectures 24 Review and Wrapup

Pennsylvania State University

Course Information

Course Materials

AI Resources

Quick Links