Proceedings of the 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS'08 - "Personalized Healthcare through Technology"
Observed peptide gas-phase fragmentation patterns are a complex function of many variables. In order to systematically probe this phenomenon, an array of 40 peptides was synthesized for study. The array of sequences was designed to hold certain variables (peptide length) constant and randomize or balance others (peptide amino acid distribution and position). A high-quality tandem mass spectrometry (MS/MS) data set was acquired for each peptide for all observed charge states on multiple MS instruments, quadrupole-time-of-flight and quadrupole ion trap. The data were analyzed as a function of total charge state and number of mobile protons. Previously known dissociation trends were observed, validating our approach. In addition, the general influence of basic amino acids on dissociation could be determined because, in contrast to the more widely studied tryptic peptides, the amino acids H, K, and R were positionally distributed. Interestingly, our results suggest that cleavage at all basic amino acids is suppressed when a mobile proton is available. Cleavage at H becomes favored only under conditions where a partially mobile proton is present, a caveat to the previously reported trend of enhanced cleavage at H. In conclusion, all acquired data were used as a benchmark to determine how well these sequences would have been identified in a database search using a common algorithm, Mascot.
Methods available for the inference of genetic regulatory networks strive to produce a single network, usually by optimizing some quantity to fit the experimental observations. In this paper we investigate the possibility that multiple networks can be inferred, all resulting in similar dynamics. This idea is motivated by theoretical work which suggests that biological networks are robust and adaptable to change, and that the overall behavior of a genetic regulatory network might be captured in terms of dynamical basins of attraction. We have developed and implemented a method for inferring genetic regulatory networks for time series microarray data. Our method first clusters and discretizes the gene expression data using k-means and support vector regression. We then enumerate Boolean activation–inhibition networks to match the discretized data. In conclusion, the dynamics of the Boolean networks are examined. We have tested our method on two immunology microarray datasets: an IL-2-stimulated T cell response dataset and a LPS-stimulated macrophage response dataset. In both cases, we discovered that many networks matched the data, and that most of these networks had similar dynamics.
The goal of this project was to examine the protein-protein docking problem, especially as it relates to homology-based structures, identify the key bottlenecks in current software tools, and evaluate and prototype new algorithms that may be developed to improve these bottlenecks. This report describes the current challenges in the protein-protein docking problem: correctly predicting the binding site for the protein-protein interaction and correctly placing the sidechains. Two different and complementary approaches are taken that can help with the protein-protein docking problem. The first approach is to predict interaction sites prior to docking, and uses bioinformatics studies of protein-protein interactions to predict theses interaction site. The second approach is to improve validation of predicted complexes after docking, and uses an improved scoring function for evaluating proposed docked poses, incorporating a solvation term. This scoring function demonstrates significant improvement over current state-of-the art functions. Initial studies on both these approaches are promising, and argue for full development of these algorithms.
Our aim is to determine the network of events, or the regulatory network, that defines an immune response to a bio-toxin. As a model system, we are studying T cell regulatory network triggered through tyrosine kinase receptor activation using a combination of pathway stimulation and time-series microarray experiments. Our approach is composed of five steps (1) microarray experiments and data error analysis, (2) data clustering, (3) data smoothing and discretization, (4) network reverse engineering, and (5) network dynamics analysis and fingerprint identification. The technological outcome of this study is a suite of experimental protocols and computational tools that reverse engineer regulatory networks provided gene expression data. The practical biological outcome of this work is an immune response fingerprint in terms of gene expression levels. Inferring regulatory networks from microarray data is a new field of investigation that is no more than five years old. To the best of our knowledge, this work is the first attempt that integrates experiments, error analyses, data clustering, inference, and network analysis to solve a practical problem. Our systematic approach of counting, enumeration, and sampling networks matching experimental data is new to the field of network reverse engineering. The resulting mathematical analyses and computational tools lead to new results on their own and should be useful to others who analyze and infer networks.
We have developed a novel approach to modeling the transmembrane spanning helical bundles of integral membrane proteins using only a sparse set of distance constraints, such as those derived from MS3-D, dipolar-EPR and FRET experiments. Algorithms have been written for searching the conformational space of membrane protein folds matching the set of distance constraints, which provides initial structures for local conformational searches. Local conformation search is achieved by optimizing these candidates against a custom penalty function that incorporates both measures derived from statistical analysis of solved membrane protein structures and distance constraints obtained from experiments. This results in refined helical bundles to which the interhelical loops and amino acid side-chains are added. Using a set of only 27 distance constraints extracted from the literature, our methods successfully recover the structure of dark-adapted rhodopsin to within 3.2 {angstrom} of the crystal structure.
We present a two-step approach to modeling the transmembrane spanning helical bundles of integral membrane proteins using only sparse distance constraints, such as those derived from chemical cross-linking, dipolar EPR and FRET experiments. In Step 1, using an algorithm, we developed, the conformational space of membrane protein folds matching a set of distance constraints is explored to provide initial structures for local conformational searches. In Step 2, these structures refined against a custom penalty function that incorporates both measures derived from statistical analysis of solved membrane protein structures and distance constraints obtained from experiments. We begin by describing the statistical analysis of the solved membrane protein structures from which the theoretical portion of the penalty function was derived. We then describe the penalty function, and, using a set of six test cases, demonstrate that it is capable of distinguishing helical bundles that are close to the native bundle from those that are far from the native bundle. Finally, using a set of only 27 distance constraints extracted from the literature, we show that our method successfully recovers the structure of dark-adapted rhodopsin to within 3.2 Å of the crystal structure.
A deterministic algorithm for enumeration of transmembrane protein folds is presented. Using a set of sparse pairwise atomic distance constraints (such as those obtained from chemical cross-linking, FRET, or dipolar EPR experiments), the algorithm performs an exhaustive search of secondary structure element packing conformations distributed throughout the entire conformational space. The end result is a set of distinct protein conformations, which can be scored and refined as part of a process designed for computational elucidation of transmembrane protein structures.
In theory, it should be possible to infer realistic genetic networks from time series microarray data. In practice, however, network discovery has proved problematic. The three major challenges are: (1) inferring the network; (2) estimating the stability of the inferred network; and (3) making the network visually accessible to the user. Here we describe a method, tested on publicly available time series microarray data, which addresses these concerns. The inference of genetic networks from genome-wide experimental data is an important biological problem which has received much attention. Approaches to this problem have typically included application of clustering algorithms [6]; the use of Boolean networks [12, 1, 10]; the use of Bayesian networks [8, 11]; and the use of continuous models [21, 14, 19]. Overviews of the problem and general approaches to network inference can be found in [4, 3]. Our approach to network inference is similar to earlier methods in that we use both clustering and Boolean network inference. However, we have attempted to extend the process to better serve the end-user, the biologist. In particular, we have incorporated a system to assess the reliability of our network, and we have developed tools which allow interactive visualization of the proposed network.
Diallo, Mamadou S.; Simpson, Andre; Gassman, Paul; Faulon, Jean-Loup M.; Johnson, James H.; Goddard, William A.; Hatcher, Patrick G.
This paper describes an integrated experimental and computational framework for developing 3-D structural models for humic acids (HAs). This approach combines experimental characterization, computer assisted structure elucidation (CASE), and atomistic simulations to generate all 3-D structural models or a representative sample of these models consistent with the analytical data and bulk thermodynamic/structural properties of HAs. To illustrate this methodology, structural data derived from elemental analysis, diffuse reflectance FT-IR spectroscopy, 1-D/2-D 1H and 13C solution NMR spectroscopy, and electrospray ionization quadrupole time-of-flight mass spectrometry (ESI QqTOF MS) are employed as input to the CASE program SIGNATURE to generate all 3-D structural models for Chelsea soil humic acid (HA). These models are subsequently used as starting 3-D structures to carry out constant temperature-constant pressure molecular dynamics simulations to estimate their bulk densities and Hildebrand solubility parameters. Surprisingly, only a few model isomers are found to exhibit molecular compositions and bulk thermodynamic properties consistent with the experimental data. The simulated 13C NMR spectrum of an equimolar mixture of these model isomers compares favorably with the measured spectrum of Chelsea soil HA.
The U.S. Department of Energy recently announced the first five grants for the Genomes to Life (GTL) Program. The goal of this program is to ''achieve the most far-reaching of all biological goals: a fundamental, comprehensive, and systematic understanding of life.'' While more information about the program can be found at the GTL website (www.doegenomestolife.org), this paper provides an overview of one of the five GTL projects funded, ''Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling.'' This project is a combined experimental and computational effort emphasizing developing, prototyping, and applying new computational tools and methods to elucidate the biochemical mechanisms of the carbon sequestration of Synechococcus Sp., an abundant marine cyanobacteria known to play an important role in the global carbon cycle. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO(2) are important terms in the global environmental response to anthropogenic atmospheric inputs of CO(2) and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. The project includes five subprojects: an experimental investigation, three computational biology efforts, and a fifth which deals with addressing computational infrastructure challenges of relevance to this project and the Genomes to Life program as a whole. Our experimental effort is designed to provide biology and data to drive the computational efforts and includes significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Our computational efforts include coupling molecular simulation methods with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes and developing a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. Furthermore, given that the ultimate goal of this effort is to develop a systems-level of understanding of how the Synechococcus genome affects carbon fixation at the global scale, we will develop and apply a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, because the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats, we have also established a companion computational infrastructure to support this effort as well as the Genomes to Life program as a whole.