Publications

Results 26–30 of 30
Skip to search filters

An error-correcting code framework for genetic sequence analysis

Journal of the Franklin Institute

May, Elebeoba E.; Vouk, Mladen A.; Bitzer, Donald L.; Rosnick, David I.

A fundamental challenge for engineering communication systems is the problem of transmitting information from the source to the receiver over a noisy channel. This same problem exists in a biological system. How can information required for the proper functioning of a cell, an organism, or a species be transmitted in an error introducing environment? Source codes (compression codes) and channel codes (error-correcting codes) address this problem in engineering communication systems. The ability to extend these information theory concepts to study information transmission in biological systems can contribute to the general understanding of biological communication mechanisms and extend the field of coding theory into the biological domain. In this work, we review and compare existing coding theoretic methods for modeling genetic systems. We introduce a new error-correcting code framework for understanding translation initiation, at the cellular level and present research results for Escherichia coli K-12. By studying translation initiation, we hope to gain insight into potential error-correcting aspects of genomic sequences and systems. Published by Elsevier Ltd. on behalf of The Franklin Institute.

More Details

Detection and reconstruction of error control codes for engineered and biological regulatory systems

May, Elebeoba E.; May, Elebeoba E.; Johnston, Anna M.; Hart, William E.; Watson, Jean-Paul W.; Pryor, Richard J.; Rintoul, Mark D.

A fundamental challenge for all communication systems, engineered or living, is the problem of achieving efficient, secure, and error-free communication over noisy channels. Information theoretic principals have been used to develop effective coding theory algorithms to successfully transmit information in engineering systems. Living systems also successfully transmit biological information through genetic processes such as replication, transcription, and translation, where the genome of an organism is the contents of the transmission. Decoding of received bit streams is fairly straightforward when the channel encoding algorithms are efficient and known. If the encoding scheme is unknown or part of the data is missing or intercepted, how would one design a viable decoder for the received transmission? For such systems blind reconstruction of the encoding/decoding system would be a vital step in recovering the original message. Communication engineers may not frequently encounter this situation, but for computational biologists and biotechnologist this is an immediate challenge. The goal of this work is to develop methods for detecting and reconstructing the encoder/decoder system for engineered and biological data. Building on Sandia's strengths in discrete mathematics, algorithms, and communication theory, we use linear programming and will use evolutionary computing techniques to construct efficient algorithms for modeling the coding system for minimally errored engineered data stream and genomic regulatory DNA and RNA sequences. The objective for the initial phase of this project is to construct solid parallels between biological literature and fundamental elements of communication theory. In this light, the milestones for FY2003 were focused on defining genetic channel characteristics and providing an initial approximation for key parameters, including coding rate, memory length, and minimum distance values. A secondary objective addressed the question of determining similar parameters for a received, noisy, error-control encoded data set. In addition to these goals, we initiated exploration of algorithmic approaches to determine if a data set could be approximated with an error-control code and performed initial investigations into optimization based methodologies for extracting the encoding algorithm given the coding rate of an encoded noise-free and noisy data stream.

More Details

Towards a biological coding theory discipline

Proposed for publication in New Thesis.

May, Elebeoba E.; May, Elebeoba E.

How can information required for the proper functioning of a cell, an organism, or a species be transmitted in an error-introducing environment? Clearly, similar to engineering communication systems, biological systems must incorporate error control in their information transmissino processes. if genetic information in the DNA sequence is encoded in a manner similar to error control encoding, the received sequence, the messenger RNA (mRNA) can be analyzed using coding theory principles. This work explores potential parallels between engineering communication systems and the central dogma of genetics and presents a coding theory approach to modeling the process of protein translation initiation. The messenger RNA is viewed as a noisy encoded sequence and the ribosoe as an error control decoder. Decoding models based on chemical and biological characteristics of the ribosome and the ribosome binding site of the mRNA are developed and results of applying the models to the Escherichia coli K-12 are presented.

More Details

Optimal generators for a systematic block code model of prokaryotic translation initiation

May, Elebeoba E.; May, Elebeoba E.

The decoding of received error control encoded bit streams is fairly straightforward when the channel encoding algorithms are efficient and known. But if the encoding scheme is unknown or part of the data is missing, how would one design a viable decoder for the received transmission? Communication engineers may not frequently encounter this situation, but for computational biologists this is an immediate challenge as we attempt to decipher and understand the vast amount of sequence data produced by genome sequencing projects. Assuming the systematic parity check block code model of protein translation initiation, this work presents an approach for determining the generator matrix given a set of potential codewords. The resulting generators and corresponding parity matrices are applied to valid and invalid Escherichia coli K-12 MG1655 messenger RNA leader sequences. The generators constructed using strict subsets of the 16S ribosomal RNA performed better than those constructed using the block code model in earlier works.

More Details
Results 26–30 of 30
Results 26–30 of 30