Pennsylvania State University

Pennsylvania State University

Data Science for Researchers and Scholars

Course Descripton

Course Staff

This fall 2023 offering of the Data Science for Researchers and Scholars is taught by Professor Vasant Honavar.

Course Schedule

Lectures: Tue, Thu 10:35am - 11:50pm, 206E Westgate Building

Office Hours:

  Instructor: Dr. Vasant Honavar: Tue, Thu 4:00pm - 5:00pm. E335 Westgate Building
  or Zoom (link provided on canvas).

  Teaching Assistant: Zhimeng Guo: Mon, Wed 4:00pm - 5:00pm.;
  or Zoom (link provided on canvas).

Course Description

Rationale: Progress in many fields, including sciences and humanities, is increasingly enabled by our ability to acquire, share, integrate and analyze disparate types of data. Advances in machine learning, coupled with large data sets, are leading to breakthroughs in many sciences. Consequently, there is a need for researchers, scholars, and practitioners, regardless of their disciplinary background and interests, to become proficient in applying modern data science methods and tools to gain useful insights from data.

Course Objectives: This course aims to introduce students from a broad range of disciplinary backgrounds to effective tools to formulate and answer research questions in their respective fields using large and complex data sets using modern data science methods and tools. Topics to be covered include descriptive, predictive, and causal analyses to answer research questions from data. Laboratory assignments will provide students with hands-on experience with the application and evaluation of common data science methods. A term project will provide students with the opportunity to apply the methods learned in the course to identify and use relevant data to answer research questions within their respective area of study.

Upon completion of this course, students will be able to:
  • Demonstrate broad understanding of the principles and practice of data sciences
  • Assess the feasibility of answering chosen research questions using available data and methods
  • Demonstrate understanding of the strengths and weaknesses of different data science methods within specific application settings
  • Validate analyses
  • Ensure reproducibility of analyses
  • Responsibly handle sensitive data
  • Assess data and algorithmic bias
  • Critically evaluate research and scholarly studies that rely on data science methods and tools
  • Effectively communicate the results of analyses to technical and non-technical audiences

Intended Audience: The course is designed to be an accessible to graduate students from a wide range of disciplinary backgrounds and interests, including informatics, life sciences, behavioral, cognitive and brain sciences, social sciences and public policy, biomedical and health sciences, agricultural sciences, learning sciences, and even the humanities. The course is not intended for students with strong prior exposure to computational and data sciences (including computer science, statistics) or engineering disciplines. While no prerequisites other than graduate standing at Penn State will be enforced, to fully benefit from the course, students should be familiar with, or be willing to learn the relevant concepts in probability, statistics, multi-variate differential calculus, and basics of programming in a modern high level programming language, e.g., Python. The instructor will review the relevant topics and provide self-study resources as needed.

The laboratory assignments will require some basic proficiency in reading and writing programs in Python. Students are expected to acquire familiarity with scikit-Learn, numpy, scipy, and pandas packages.

If you are not sure whether the course is appropriate for you, please talk to the instructor.