Pennsylvania State University

Pennsylvania State University

Vasant Honavar: Research Interests


Honavar's research and teaching interests cut across Computer Science, Information Science, Cognitive Science, and Bioinformatics.

His research is driven by fundamental scientific questions or important practical problems such as the following:

  • How can we build useful predictive models from large, distributed, autonomous data sources (Big Data Analytics)?
  • How can we elicit causal relations from multiple disparate observational and experimental studies?
  • How can we build predictive models from semantically disparate data?
  • How can we integrate data, hypothesis, and knowledge-based inference, predictive modeling, experimentation, simulation, and hypothesis testing into an exploratory apparatus for scientific discovery (Discovery Informatics)?
  • How can we extract useful knowledge from richly structured data (sequences, images, graphs, etc.)?
  • How can we efficiently represent and reason about preferences?
  • How can we query and reason with federated data and knowledge bases?
  • How can we answer queries from knowledge bases that contain secrets without revealing secrets?
  • How can we discover the relationships between macromolecular sequence, structure, expression, interaction and macromolecular function?
  • How can we construct, compare, analyze multi-scale, predictive models of molecular networks involved in cellular development, differentiation, immune response, and biological function?
  • How can we discover the relationships between brain structure, activity, function, and behavior?
  • How can we support the design, assembly and execution of complex web services using autonomously developed components?
  • What are the information requirements and algorithmic basis of learning in specific scenarios?
  • What are the information requirements and algorithmic basis of inter-agent communication, multi-agent interaction, coordination, and organization?
  • How is information encoded, stored, retrieved, decoded, and used in macromolecular, neural, and cognitive systems?
  • How can we build robust intelligent agents that incorporate multiple facets of intelligence?

Current Research Interests

  • Artificial Intelligence: Logical, probabilistic, causal, and decision-theoretic knowledge representation and inference, Neural architectures for knowledge representation and inference, Computational models of perception and action. Intelligent agents and multi-agent systems.
  • Machine Learning, Data Mining, and Big Data Analytics: Statistical, information theoretic, linguistic and structural approaches to machine learning, Learning and refinement of Bayesian networks, causal networks, decision networks, neural networks, support vector machines, kernel classifiers, multi-relational models, language models (n-grams, grammars, automata), grammars; Learning classifiers from attribute value taxonomies and partially specified data; Learning attribute value taxonomies from data; Learning classifiers from sequential and spatial data; Learning relationships from multi-modal data (e.g., text, images), Learning classifiers from distributed data, multiple instance data, multiple instance, multiple class data; networked data; multi-relational data, linked open data (RDF), and semantically heterogeneous data; Incremental learning, Ensemble methods, multi-agent learning, curriculum-based learning; selected topics in computational learning theory.
  • Bioinformatics, Computational Molecular Biology, and Computational Systems Biology: Bioinformatics and Computational Molecular and Systems Biology: Data-driven discovery of macromolecular sequence-structure-function-interaction-expression relationships, identification of sequence and structural correlates of protein-protein, protein-RNA, and protein-DNA interactions, protein sub-cellular localization, automated protein structure and function annotation, modeling and inference of genetic regulatory networks from gene expression (micro-array, proteomics) data, modeling and inference of signal transduction and metabolic pathways, comparative analysis of biological networks (network alignment), integrative analysis of molecular interaction networks and macro-molecular interfaces.
  • Discovery Informatics: Computational models of scientific discovery; Discovery informatics infrastructure to integrate data, hypothesis, and knowledge-based inference, predictive modeling, experimentation, simulation, and hypothesis testing to provide an orderly formal framework and exploratory apparatus for science; Applications in computational systems biology and health informatics.
  • Knowledge Representation and Semantic Web: Probabilistic, grammatical, network based, relational, logical, epistemic knowledge representation; knowledge-based, network based, and probabilistic approaches to information integration; description logics, federated data bases – statistical queries against federated databases, knowledge bases – federated reasoning, selective knowledge sharing, services – service composition, substitution, and adaptation; epistemic description logics; secrecy-preserving query answering, representing and reasoning about qualitative preferences, representing and reasoning about causality.
  • Applied Informatics: Applications of artificial intelligence, machine learning, and big data analytics to problems in bioinformatics, health informatics, medical informatics, neuroinformatics, geo-informatics, environmental informatics, chemo-informatics, security informatics, social informatics, and e-science.
  • Other Topics of Interest: Biological Computation – Evolutionary, Cellular and Neural Computation, Complex Adaptive Systems, Sensory systems and behavior evolution, Language evolution, Mimetic evolution; Computational Semiotics – Origins and use of signs, emergence of semantics; Computational organization theory; Computational Neuroscience; Computational models of creativity, Computational models of discovery.

Honavar's recent research contributions have spanned Computer Science (especially on the topics of Machine Learning and Data Mining, Knowledge Representation and Inference, Causal Inference) and in Bioinformatics and Computational Biology (especially on the topic of analysis and prediction of biomolecular (protein-protein, protein-DNA, and protein-RNA) interfaces and comparative analysis of biomolecular interaction networks). Some of my most recent work has focused on (i) Scalable algorithms for building predictive models from large, distributed, semantically disparate data (big data), including more recently, linked open data (ii) Algorithms for constructing predictive models from sequence, image, text, multi-relational, graph-structured data; (iii) New approaches to selective sharing of knowledge across autonomous knowledge bases (including knowledge base federation, secrecy-preserving query answering); (iv) Theoretically sound yet practically useful approaches to functional and non-functional specification driven composition of complex services from components; (v) Expressive languages for representing, and model checking approaches to reasoning with, qualitative preferences; (vi) Algorithms for eliciting causal effects from disparate sources of observational and experimental data; (vii) Scalable algorithms and software for comparative analyses of large bio-molecular networks and (6) Machine learning approaches to analysis and prediction of macromolecular interactions and interfaces (including in particular, the first algorithm for partner-specific prediction of protein-protein interface sites and state-of-the-art sequence based protein-RNA interface predictors) that have resulted in several widely used web servers for analysis and prediction of protein-protein, protein-DNA, and protein-RNA interactions and interfaces, B-cell and T-cell epitopes. Research in progress is focused around topics in (1) Discovery Informatics, especially novel approaches to integrating data, hypothesis, and knowledge-based inference, predictive modeling, experimentation, simulation, and hypothesis testing in large-scale scientific discovery; (2) Scalable machine learning approaches to predictive modeling from very large, richly structured data sets (including sequences, graphs, relational and RDF data); (3) Elucidation of causal relationships from disparate experimental and observational studies; (4) Analysis and prediction of macromolecular interactions, elucidation of complex biological pathways e.g., those involved in immune response, development, and disease; and (5) Applications in brain and behavioral informatics, health informatics, social informatics, educational informatics, and security informatics.