Pennsylvania State University

Pennsylvania State University

Machine Learning

Course Descripton

Course Staff

This fall 2022 offering of the Machine Learning course for the Data Sciences major is taught by Professor Vasant Honavar.

Course Schedule

Lectures: Tue, Thu 10:35am - 11:50pm, 202E Westgate Building

Office Hours:

  Instructor: Dr. Vasant Honavar: Wed, Fri 4:00pm - 5:00pm. E335 Westgate Building
  or Zoom (link provided on canvas).

  Teaching Assistant: Ms. Sahar Hanifi: Tue, Thu 4:00pm - 5:00pm.;
  or Zoom (link provided on canvas).

Course Prerequisites

The prerequisites for the course include knowledge of programming in python and data structures, discrete mathematics, calculus, basic probability theory and elementary statistics. In addition, students are expected to have writing and presentation skills necessary for preparing written reports and presentations based on laboratory assignments and projects.

The laboratory assignments will require competency in reading and writing programs in Python. Students are expected to acquire familiarity with scikit-Learn, numpy, scipy, and pandas packages.

If you are not sure whether you have the necessary background, please talk to the instructor.

Course Overview

The course aims to introduce students to the principles of machine learning, representative machine learning algorithms, and their applications in the data sciences. Topics to be covered include: principled approaches to classification, clustering, and function approximation, feature selection and dimensionality reduction, performance assessment of predictive models, relative strengths and weaknesses of alternative algorithms. Representative machine learning approaches to be covered include: Supervised machine learning methods for classification including Probabilistic generative models (e.g., Naïve Bayes), and their discriminative counterparts (e.g., Logistic regression), Linear Classifiers, Tree-based models (decision trees, random forests), Nearest-neighbor methods, Kernel machines (Support Vector Machines), Neural networks and deep neural networks; Clustering algorithms (K-means, hierarchical, spectral), function approximation (e.g., neural networks), and representation learning (deep learning). Laboratory assignments will provide students with hands-on experience with applications of the algorithms to problems from several domains. Projects will focus on development, evaluation, and comparison of machine learning solutions to data sets drawn from real-world applications. Problem sets will focus on understanding of the conceptual, mathematical, statistical and algorithmic underpinnings of machine learning.

Upon completion of this course, students will be able to:

  • Demonstrate broad understanding of the principles of machine learning and of representative machine learning algorithms and their applications in data sciences.
  • Implement, adapt, and apply representatative machine learning algorithms using a high-level programming language, e.g., Python, to perform real-world clustering, classification, and regression tasks
  • Identify, formulate and solve exploratory data analysis and predictive modeling problems that arise in practical applications.
  • Demonstrate an understanding of the strengths and weaknesses of alternative machine learning algorithms (relative to the characteristics ofthe application domain)
  • Adapt or combine some of the key elements of existing machine learning algorithms to design new algorithms as needed.
  • Rigorously evaluate or compare alternative machine learning algorithms on particular problems
  • Apply best practices in responsible data science (covering issues such as reproducibility, and bias).
  • Effectively communicate the results of machine learning projects to technical and non-technical audiences

Target Audience

This course is targeted primarily to undergraduate students in Data Sciences and related disciplines., e.g., Electrical Engineering and Computer Science, Information Sciences and Technology, Statistics. It is likely to be useful to graduate students interested in applying modern machine learning methods to gain useful insights from data in Physical Sciences, Life Sciences, Health Sciences, Social Sciences, Cognitive and Brain Sciences, Learning Sciences, Material Sciences, Environmental Sciences, Agricultural Sciences, Business, Public Policy, and other disciplines. The instructor welcomes students with a broad range of disciplinary backgrounds and interests.