Machine Learning
|
Study Guide
Please note that the precise schedule is subject to change. The lecture slides (and lecture notes, if any) are updated after the lecture.
Lecture 1. Learning from data: motivations, representative applications
Why should machines learn? What is machine learning? Machine learning in context: artificial intelligence, statistics, data science. Types of machine learning. Course overview and course mechanics. Quick overview of the Python-based tools for machine learning. Getting started with Google Colab.
Required readings
Recommended materials
Optional Materials
Lecture 2. Simple Examples of Machine Learning for classification and function approximation
Nearest Neighbor Classifier. Measures of similarity/distance for different types of data. Variants of K-nearest neighbor method. Applications. Advantages and limitations. Making nearest neighbor classification work on large data sets. Nearest neighbor in high dimensions.
Required Readings
Recommended readings
Optional Materials
- Beyer, K., Goldstein, J., Ramakrishnan, R., and Shaft, U. (1999).
When is Nearest Neighbor Meaningful? In: Beeri C., Buneman P. (eds) Database Theory — ICDT’99. ICDT 1999. Lecture Notes in Computer Science, vol 1540. Springer, Berlin, Heidelberg
-
Slaney, M. and Casey, M., 2008. Locality-sensitive hashing for finding nearest neighbors. IEEE Signal processing magazine, 25(2), pp.128-131.
Lecture 3. Elements of optimization
Linear regression for function approximation.
Least Mean Squared (LMS) Error Criterion. Minimization of Error Functions. Basics of quadratic optimization. Review of Relevant Mathematics (Limits, Continuity and Differentiablity of Functions, Local Minima and Maxima, Derivatives, Partial Derivatives, Taylor Series Approximation, Multi-Variate Taylor Series Approximation)
Required Readings
Recommended readings
Lecture 4. Linear Regresion
Required Readings
Recommended readings
Lecture 5: Evaluating predictive models
Evaluation of classifiers. Accuracy, Sensitivity, Specificity, Correlation Coefficient. Tradeoffs between sensitivity and specificity. When does a classifier outperform another? ROC curves.
Required Readings
Recommended readings
Lecture 6: Evaluating predictive models (Continued)
Estimating performance measures; confidence interval calculation for estimates; cross-validation based estimates of model performance; leave-one-out and bootstrap estimates of performance; comparing two models; comparing two learning algorithms.
Required Readings
Recommended readings
Lecture 7: Linear Classifiers
Introduction to Linear Classifiers. Threshold logic unit (perceptron). Connection with Logic and Geometry. Weight space and pattern space representations of perceptrons.
Linear separability. Perceptron Learning algorithm and its variants. Convergence properties of perceptron algorithm. Winner-Take-All Networks.
Required Readings
Recommended Readings
Lecture 8: Linear Classifiers (Continued).
Perceptron Learning algorithm and its variants. Convergence properties of perceptron algorithm. Winner-Take-All Networks.
Required Readings
Recommended Readings
Lecture 9: Linear Classifiers (Continued).
Alternative loss functions for perceptrons and multi-class perceptrons. Gradient-based minimization of loss.
Required Readings
Recommended Readings
Lecture 10 Maximum Margin Classifiers
Perceptrons revisited. Is there an optimal hyperplane? Maximum Margin Classifiers. Empirical Error and error bounds for linear classifiers. Minimizing error by maximizing margin. Finding margin maximizing separating hyperplane as an optimization problem. Algorithm for finding margin maximizing hyperplane. Soft marging stochastic gradient algorithm. Interpretation of the maximum margin classifier and support vectors.
Required Readings
Recommended Readings
Lecture 11 Review
Review: Loss functions, deriving weight update rules gradient based minimization of loss functions, additional in-class exercises.
Required Readings
- Required readings assigned for lectures 1-10.
Recommended Readings
- Recommended readings assigned for lectures 1-10.
Lectures 12 Kernel Machines
How to train classifiers for handling non-linear decision boundaries?
Classic methods - feature engineering. Modern tools - Kernel trick. Dual representation of linear classifiers. Why are kernels useful?
Required Readings
- Lecture Slides.. Vasant Honavar
Note: the lecture slides give a much simpler, yet rigorous treatment of kernel machines than the one available in the textbook.
Recommended Readings
Lectures 13-14 Kernel Machines
How to 'kernelize' any linear model using the kernel trick? Properties of Kernel Functions. Examples
of Kernel Functions. Constructing Kernel Functions. Distinguishing good kernels from bad ones.
Examples of kernel machines - kernel perceptron, kernel SVM, kernel regression. Applications of Kernel
Machines. Text classification, Image classification, etc.
Required Readings
- Lecture Slides.. Vasant Honavar. Note: the lecture slides give a much simpler, yet rigorous treatment of kernel machines than the one available in the textbook.
Recommended Readings
Midterm Review
Midterm Exam
Lecture 15: Probabilistic models for Machine Learning
Brief review of probability: Sample spaces. Random variables. Events. Probabilities of compound events. Joint probability. Marginalization. Conditional Probability. Bayes Rule. Product rule.
Probabilistic generative models. Decision theoretic models of classification.
Bayes optimal classifier. How Bayes Optimal Classifier optimally trades off between prior beliefs and information supplied by data. Minimum risk classifier.
Required readings
Recommended Readings
Lecture 16: Probabilistic models for Machine Learning
Probabilistic generative models.
Naive Bayes Classifier. Applications of Naive Bayes Classifiers. Tabular, Sequence Text, and Image Data Classification. Multi-variate, multi-nomial, and Gaussian Naive Bayes. Avoiding overfitting - robust probability estimates
Required readings
Recommended Readings
Lectures 17 From Generative to Discriminative Models (Naive Bayes to Logistic Regression
Generative versus Discriminative Models for Classification. Naive Bayes classifier as a generative model. Logistic Regression as the discriminative counterpart of Naive Bayes. Relationship between generative models and linear classifiers. Additional examples of generative models. Generative models from the exponential family of distributions. Transforming generative models of the exponential family into discriminative models. Estimating parameters of discriminative models. Generative models versus discriminative models for classification.
Required Readings
Recommended Readings
-
Rubinstein, D and Hastie, T. Discriminative vs Informative Learning. In: Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1997.
-
Ng, A. and Jordan, M. (2002)
On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes, Proceedings of the IEEE Conference on Neural Information Systems (NIPS 2002).
-
Bouchard, G. and Triggs, B. (2004). The tradeoff between Generative and Discriminative Classifiers, Proceedings of Computational Statistics (Compstat 04).
-
Raina, R., Shen, Y., Ng, A., and McCallum, A. (2003). Classification with Hybrid Generative/Discriminative Models. In Proceedings of the IEEE Conference on Neural Information Systems (NIPS 2003).
-
Ulusoy, I. and C. M. Bishop (2005b). Generative versus discriminative models for object recognition. In Proceedings IEEE International Conference on Computer Vision and Pattern Recognition, CVPR., San Diego.
Lecture 18: Decision Trees
Modeling dependence between attributes. The decision tree classifier. Introduction to information theory. Information, entropy, mutual information, and related concepts. Algorithm for learning decision tree classifiers from data. Overfitting and methods to avoid overfitting -- dealing with small sample sizes; prepruning and post-pruning.
Required Readings
Recommended Readings
- A Mathematical Theory of Communication, C. Shannon.
-
Domingos, P. (1999). The Role of Occam's Razor in Knowledge Discovery.
Vol. 3, no. 4., pp. 409-425.
-
Zhang, J. and Honavar, V. (2003). Learning Decision Tree Classifiers from Attribute Value Taxonomies and Partially Specified Data. In: Proceedings of the International Conference on Machine Learning (ICML-03). Washington, DC. pp. 880-887.
-
Silvescu, A., and Honavar, V. (2001). Temporal Boolean Network Models of Genetic Networks and Their Inference from Gene Expression Time Series. Complex Systems.. Vol. 13. No. 1. pp. 54-.
-
Kang, D-K., Silvescu, A. and Honavar, V. (2006) RNBL-MN: A Recursive Naive Bayes Learner for Sequence Classification. Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006). Lecture Notes in Computer Science., Berlin: Springer-Verlag. pp. 45-54, 2006.
Lecture 19: Decision Trees (contd) and Random Forest - An Ensemble of Decision Trees
Pitfalls of entropy as a splitting criterion for multi-valued splits. Alternative splitting strategies -- two-way versus multi-way splits; Alternative split criteria: Gini impurity, Entropy, etc. Cost-sensitive decision tree induction -- incorporating attribute measurement costs and misclassification costs into decision tree induction.
Dealing with categorical, numeric, and ordinal attributes. Dealing with missing attribute values during tree induction and instance classification.
Why a forest is better than a single tree.
The random forest algorithm for classification. Why random forest works.
Required Readings
Recommended Readings
-
Fayyad, U. and Irani, K.B. (1992). On the handling of continuous valued attributes in decision tree generation. Machine Learning vol. 8. pp. 87-102.
-
Codrington, C. W. and Brodley, C. E., On the Qualitative Behavior of Impurity-Based Splitting Rules: The Minima-Free Property. Tech. Rep. 97-05. Dept. of Computer Science. Cornell University.
-
Atramentov, A., Leiva, H., and Honavar, V. (2003).
A Multi-Relational Decision Tree Learning Algorithm - Implementation and Experiments.. In: Proceedings of the Thirteenth International Conference on Inductive Logic Programming. Berlin: Springer-Verlag. Lecture Notes in Computer Science. Vol. 2835, pp. 38-56.
-
L. Breiman. Random forests. Machine Learning, 45(1): 5–32,2001
Lecture 19
Review
Lecture 20 Multi-layer Neural Networks and Generalized Delta Rule
Introduction to multi-layer neural networks for classification and regression.
Universal function approximation theorem.
Generalized delta rule (GDR) (the backpropagation learning algorithm).
Required readings
Recommended Readings
Lecture 21 Multi-layer Neural Networks and Generalized Delta Rule (Contd.)
Generalized delta rule (backpropagation algorithm) in practice - avoiding overfitting, choosing neuron activation functions, choosing learning rate, choosing initial weights, speeding up learning, improving generalization, circumventing local minima.
Variations -- Radial basis function networks. Learning non linear functions by searching the space of network topologies as well as weights.
Required readings
Recommended Readings
- Rumelhart, D., Hinton, G., and Williams, R (1986). Learning Internal Representations by Error Propagation.. In: Parallel Distributed Processing. Vol. 1. Rumelhart, D., McClelland, J. and the PDP Research Group. MIT Press.
-
Bishop, C. (1991). Improving the generalization properties of radial basis function neural networks. Neural computation, 3(4), 579-588.
Lectures 22-23 Deep Learning
Motivations behind deep learning. Deep representation learning. Autoencoders. Stacked autoencoders. Convolutional neural networks.
Required readings
Lectures 24 Review and Wrapup
|