Publications

24 Results
Skip to search filters

Disparate data fusion for protein phosphorylation prediction

Annals of Operations Research

Gray, Genetha A.; Williams, Pamela J.; Brown, W.M.; Faulon, Jean-Loup M.; Sale, Kenneth L.

New challenges in knowledge extraction include interpreting and classifying data sets while simultaneously considering related information to confirm results or identify false positives. We discuss a data fusion algorithmic framework targeted at this problem. It includes separate base classifiers for each data type and a fusion method for combining the individual classifiers. The fusion method is an extension of current ensemble classification techniques and has the advantage of allowing data to remain in heterogeneous databases. In this paper, we focus on the applicability of such a framework to the protein phosphorylation prediction problem. © Springer Science+Business Media, LLC 2008.

More Details

Genome scale enzyme - Metabolite and drug - Target interaction predictions using the signature molecular descriptor

Bioinformatics

Faulon, Jean-Loup M.; Misra, Milind; Martin, Shawn; Sale, Kenneth L.; Sapra, Rajat

Motivation: Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. There is now sufficient information to apply machine - learning techniques to predict interactions between chemicals and proteins at a genome scale. Current machine-learning techniques use as input either protein sequences and structures or chemical information. We propose here a method to infer protein - chemical interactions using heterogeneous input consisting of both protein sequence and chemical information. Results: Our method relies on expressing proteins and chemicals with a common cheminformatics representation. We demonstrate our approach by predicting whether proteins can catalyze reactions not present in training sets. We also predict whether a given drug can bind a target, in the absence of prior binding information for that drug and target. Such predictions cannot be made with current machine - learning techniques requiring binding information for individual reactions or individual targets. © 2007 The Author(s).

More Details

Understanding virulence mechanisms in M. tuberculosis infection via a circuit-based simulation framework

Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS'08 - "Personalized Healthcare through Technology"

May, Elebeoba E.; Leitao, Andrei; Faulon, Jean-Loup M.; Joo, Jaewook J.; Misra, Milind; Oprea, Tudor I.

Tuberculosis (TB), caused by the bacterium Mycobacterium tuberculosis (Mtb), is a growing international health crisis. Mtb is able to persist in host tissues in a nonreplicating persistent (NRP) or latent state. This presents a challenge in the treatment of TB. Latent TB can re-activate in 10% of individuals with normal immune systems, higher for those with compromised immune systems. A quantitative understanding of latency-associated virulence mechanisms may help researchers develop more effective methods to battle the spread and reduce TB associated fatalities. Leveraging BioXyce's ability to simulate whole-cell and multi-cellular systems we are developing a circuit-based framework to investigate the impact of pathogenicity-associated pathways on the latency/reactivation phase of tuberculosis infection. We discuss efforts to simulate metabolic pathways that potentially impact the ability of Mtb to persist within host immune cells. We demonstrate how simulation studies can provide insight regarding the efficacy of potential anti-TB agents on biological networks critical to Mtb pathogenicity using a systems chemical biology approach. © 2008 IEEE.

More Details

Use of a Designed Peptide Array To Infer Dissociation Trends for Nontryptic Peptides in Quadrupole Ion Trap and Quadrupole Time-of-Flight Mass Spectrometry

Analytical Chemistry

Gaucher, Sara P.; Faulon, Jean-Loup M.

More Details

Boolean dynamics of genetic regulatory networks inferred from microarray time series data

Bioinformatics

Zhang, Zhaoduo Z.; Martino, Anthony M.; Faulon, Jean-Loup M.

Methods available for the inference of genetic regulatory networks strive to produce a single network, usually by optimizing some quantity to fit the experimental observations. In this paper we investigate the possibility that multiple networks can be inferred, all resulting in similar dynamics. This idea is motivated by theoretical work which suggests that biological networks are robust and adaptable to change, and that the overall behavior of a genetic regulatory network might be captured in terms of dynamical basins of attraction. We have developed and implemented a method for inferring genetic regulatory networks for time series microarray data. Our method first clusters and discretizes the gene expression data using k-means and support vector regression. We then enumerate Boolean activation–inhibition networks to match the discretized data. In conclusion, the dynamics of the Boolean networks are examined. We have tested our method on two immunology microarray datasets: an IL-2-stimulated T cell response dataset and a LPS-stimulated macrophage response dataset. In both cases, we discovered that many networks matched the data, and that most of these networks had similar dynamics.

More Details

Prediction of β-strand packing interactions using the signature product

Journal of Molecular Modeling

Brown, W.M.; Martin, Shawn; Chabarek, Joseph P.; Strauss, Charlie; Faulon, Jean-Loup M.

The prediction of β-sheet topology requires the consideration of long-range interactions between β-strands that are not necessarily consecutive in sequence. Since these interactions are difficult to simulate using ab initio methods, we propose a supplementary method able to assign β-sheet topology using only sequence information. We envision using the results of our method to reduce the three-dimensional search space of ab initio methods. Our method is based on the signature molecular descriptor, which has been used previously to predict protein-protein interactions successfully, and to develop quantitative structure-activity relationships for small organic drugs and peptide inhibitors. Here, we show how the signature descriptor can be used in a Support Vector Machine to predict whether or not two β-strands will pack adjacently within a protein. We then show how these predictions can be used to order β-strands within β-sheets. Using the entire PDB database with ten-fold cross-validation, we have achieved 74.0% accuracy in packing prediction and 75.6% accuracy in the prediction of edge strands. For the case of β-strand ordering, we are able to predict the correct ordering accurately for 51.3% of the β-sheets. Furthermore, using a simple confidence metric, we can determine those sheets for which accurate predictions can be obtained. For the top 25% highest confidence predictions, we are able to achieve 95.7% accuracy in β-strand ordering. © Springer-Verlag 2005.

More Details

Developing algorithms for predicting protein-protein interactions of homology modeled proteins

Roe, Diana C.; Sale, Kenneth L.; Faulon, Jean-Loup M.

The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.

More Details

Reverse engineering biological networks :applications in immune responses to bio-toxins

Faulon, Jean-Loup M.; Zhang, Zhaoduo Z.; Martino, Anthony M.; Timlin, Jerilyn A.; Haaland, David M.; Davidson, George S.; May, Elebeoba E.; Slepoy, Alexander S.

Our aim is to determine the network of events, or the regulatory network, that defines an immune response to a bio-toxin. As a model system, we are studying T cell regulatory network triggered through tyrosine kinase receptor activation using a combination of pathway stimulation and time-series microarray experiments. Our approach is composed of five steps (1) microarray experiments and data error analysis, (2) data clustering, (3) data smoothing and discretization, (4) network reverse engineering, and (5) network dynamics analysis and fingerprint identification. The technological outcome of this study is a suite of experimental protocols and computational tools that reverse engineer regulatory networks provided gene expression data. The practical biological outcome of this work is an immune response fingerprint in terms of gene expression levels. Inferring regulatory networks from microarray data is a new field of investigation that is no more than five years old. To the best of our knowledge, this work is the first attempt that integrates experiments, error analyses, data clustering, inference, and network analysis to solve a practical problem. Our systematic approach of counting, enumeration, and sampling networks matching experimental data is new to the field of network reverse engineering. The resulting mathematical analyses and computational tools lead to new results on their own and should be useful to others who analyze and infer networks.

More Details

Model-building codes for membrane proteins

Brown, William M.; Faulon, Jean-Loup M.; Gray, Genetha A.; Hunt, Thomas W.; Schoeniger, Joseph S.; Slepoy, Alexander S.; Young, Malin M.

We have developed a novel approach to modeling the transmembrane spanning helical bundles of integral membrane proteins using only a sparse set of distance constraints, such as those derived from MS3-D, dipolar-EPR and FRET experiments. Algorithms have been written for searching the conformational space of membrane protein folds matching the set of distance constraints, which provides initial structures for local conformational searches. Local conformation search is achieved by optimizing these candidates against a custom penalty function that incorporates both measures derived from statistical analysis of solved membrane protein structures and distance constraints obtained from experiments. This results in refined helical bundles to which the interhelical loops and amino acid side-chains are added. Using a set of only 27 distance constraints extracted from the literature, our methods successfully recover the structure of dark-adapted rhodopsin to within 3.2 {angstrom} of the crystal structure.

More Details

Reverse engineering chemical structures from molecular descriptors: How many solutions?

Journal of Computer-Aided Molecular Design

Faulon, Jean-Loup M.; Brown, W.M.; Martin, Shawn

Physical, chemical and biological properties are the ultimate information of interest for chemical compounds. Molecular descriptors that map structural information to activities and properties are obvious candidates for information sharing. In this paper, we consider the feasibility of using molecular descriptors to safely exchange chemical information in such a way that the original chemical structures cannot be reverse engineered. To investigate the safety of sharing such descriptors, we compute the degeneracy (the number of structure matching a descriptor value) of several 2D descriptors, and use various methods to search for and reverse engineer structures. We examine degeneracy in the entire chemical space taking descriptors values from the alkane isomer series and the PubChem database. We further use a stochastic search to retrieve structures matching specific topological index values. Finally, we investigate the safety of exchanging of fragmental descriptors using deterministic enumeration. © Springer 2005.

More Details

Optimal bundling of transmembrane helices using sparse distance constraints

Protein Science

Sale, Ken; Faulon, Jean-Loup M.; Gray, Genetha A.; Schoeniger, Joseph S.; Young, Malin M.

We present a two-step approach to modeling the transmembrane spanning helical bundles of integral membrane proteins using only sparse distance constraints, such as those derived from chemical cross-linking, dipolar EPR and FRET experiments. In Step 1, using an algorithm, we developed, the conformational space of membrane protein folds matching a set of distance constraints is explored to provide initial structures for local conformational searches. In Step 2, these structures refined against a custom penalty function that incorporates both measures derived from statistical analysis of solved membrane protein structures and distance constraints obtained from experiments. We begin by describing the statistical analysis of the solved membrane protein structures from which the theoretical portion of the penalty function was derived. We then describe the penalty function, and, using a set of six test cases, demonstrate that it is capable of distinguishing helical bundles that are close to the native bundle from those that are far from the native bundle. Finally, using a set of only 27 distance constraints extracted from the literature, we show that our method successfully recovers the structure of dark-adapted rhodopsin to within 3.2 Å of the crystal structure.

More Details

A deterministic algorithm for constrained enumeration of transmembrane protein folds

Faulon, Jean-Loup M.; Sale, Kenneth L.; Schoeniger, Joseph S.; Young, Malin M.

A deterministic algorithm for enumeration of transmembrane protein folds is presented. Using a set of sparse pairwise atomic distance constraints (such as those obtained from chemical cross-linking, FRET, or dipolar EPR experiments), the algorithm performs an exhaustive search of secondary structure element packing conformations distributed throughout the entire conformational space. The end result is a set of distinct protein conformations, which can be scored and refined as part of a process designed for computational elucidation of transmembrane protein structures.

More Details

Inferring genetic networks from microarray data

Davidson, George S.; May, Elebeoba E.; Faulon, Jean-Loup M.

In theory, it should be possible to infer realistic genetic networks from time series microarray data. In practice, however, network discovery has proved problematic. The three major challenges are: (1) inferring the network; (2) estimating the stability of the inferred network; and (3) making the network visually accessible to the user. Here we describe a method, tested on publicly available time series microarray data, which addresses these concerns. The inference of genetic networks from genome-wide experimental data is an important biological problem which has received much attention. Approaches to this problem have typically included application of clustering algorithms [6]; the use of Boolean networks [12, 1, 10]; the use of Bayesian networks [8, 11]; and the use of continuous models [21, 14, 19]. Overviews of the problem and general approaches to network inference can be found in [4, 3]. Our approach to network inference is similar to earlier methods in that we use both clustering and Boolean network inference. However, we have attempted to extend the process to better serve the end-user, the biologist. In particular, we have incorporated a system to assess the reliability of our network, and we have developed tools which allow interactive visualization of the proposed network.

More Details

3-D structural modeling of humic acids through experimental characterization, computer assisted structure elucidation and atomistic simulations. 1. Chelsea soil humic acid

Environmental Science and Technology

Diallo, Mamadou S.; Simpson, Andre; Gassman, Paul; Faulon, Jean-Loup M.; Johnson, James H.; Goddard, William A.; Hatcher, Patrick G.

This paper describes an integrated experimental and computational framework for developing 3-D structural models for humic acids (HAs). This approach combines experimental characterization, computer assisted structure elucidation (CASE), and atomistic simulations to generate all 3-D structural models or a representative sample of these models consistent with the analytical data and bulk thermodynamic/structural properties of HAs. To illustrate this methodology, structural data derived from elemental analysis, diffuse reflectance FT-IR spectroscopy, 1-D/2-D 1H and 13C solution NMR spectroscopy, and electrospray ionization quadrupole time-of-flight mass spectrometry (ESI QqTOF MS) are employed as input to the CASE program SIGNATURE to generate all 3-D structural models for Chelsea soil humic acid (HA). These models are subsequently used as starting 3-D structures to carry out constant temperature-constant pressure molecular dynamics simulations to estimate their bulk densities and Hildebrand solubility parameters. Surprisingly, only a few model isomers are found to exhibit molecular compositions and bulk thermodynamic properties consistent with the experimental data. The simulated 13C NMR spectrum of an equimolar mixture of these model isomers compares favorably with the measured spectrum of Chelsea soil HA.

More Details

Carbon sequestration in Synechococcus Sp.: from molecular machines to hierarchical modeling

Proposed for publication in OMICS: A Journal of Integrative Biology, Vol. 6, No.4, 2002.

Heffelfinger, Grant S.; Faulon, Jean-Loup M.; Frink, Laura J.; Haaland, David M.; Hart, William E.; Lane, Todd L.; Heffelfinger, Grant S.; Plimpton, Steven J.; Roe, Diana C.; Timlin, Jerilyn A.; Martino, Anthony M.; Rintoul, Mark D.; Davidson, George S.

The U.S. Department of Energy recently announced the first five grants for the Genomes to Life (GTL) Program. The goal of this program is to ''achieve the most far-reaching of all biological goals: a fundamental, comprehensive, and systematic understanding of life.'' While more information about the program can be found at the GTL website (www.doegenomestolife.org), this paper provides an overview of one of the five GTL projects funded, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling.'' This project is a combined experimental and computational effort emphasizing developing, prototyping, and applying new computational tools and methods to elucidate the biochemical mechanisms of the carbon sequestration of Synechococcus Sp., an abundant marine cyanobacteria known to play an important role in the global carbon cycle. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO(2) are important terms in the global environmental response to anthropogenic atmospheric inputs of CO(2) and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. The project includes five subprojects: an experimental investigation, three computational biology efforts, and a fifth which deals with addressing computational infrastructure challenges of relevance to this project and the Genomes to Life program as a whole. Our experimental effort is designed to provide biology and data to drive the computational efforts and includes significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Our computational efforts include coupling molecular simulation methods with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes and developing a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. Furthermore, given that the ultimate goal of this effort is to develop a systems-level of understanding of how the Synechococcus genome affects carbon fixation at the global scale, we will develop and apply a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, because the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats, we have also established a companion computational infrastructure to support this effort as well as the Genomes to Life program as a whole.

More Details
24 Results
24 Results