Category Books

PostGenome Bio Data Management: Modeling and Applications »

Authors: Jake Chen (Editor), Amandeep S. Sidhu
ISBN-13: 9781596932586, ISBN-10: 1596932589
Format: Hardcover
Publisher: Artech House, Incorporated
Date Published: October 2007
Edition: (Non-applicable)

Author Biography: Jake Chen

Book Synopsis

This cutting-edge, resource helps professionals and researchers master the next generation of biological database modeling techniques. The book shows how to representing complex biological concepts and their evolving data structures. This highly accessible volume presents innovative biological databases modeling concepts, methods, and software tools that are integrated with case studies in genomics, functional genomics, proteomics, and drug discovery projects. Breaking new ground at the intersection of high-throughput biology, bioinformatics, and data management, this essential resource offers comprehensive coverage of: Public Biological Databases for -Omics Studies in Medicine; Modeling Biomedical Data; Fundamentals of Gene Ontology; Protein Ontology; Information Quality Management Challenges for High-Throughput Data; Data Management for Fungal Genomics; Microarray Data Management; Data Management in Expression-Based Proteomics; Model-Driven Drug Discovery; Information Management and Interaction in High-Throughput Screening for Drug Discovery.

Preface     xiii
Acknowledgments     xvii
Introduction to Data Modeling     1
Generic Modern Markup Languages     1
Modeling Complex Data Structures     3
Data Modeling with General Markup Languages     3
Ontologies: Enriching Data with Text     4
Hyperlinks for Semantic Modeling     5
Evolving Subject Indexes     6
Languages     6
Views     7
Modeling Biological Data     7
References     8
Public Biological Databases for -Omics Studies in Medicine     9
Introduction     9
Public Databases in Medicine     10
Application of Public Bioinformatics Database in Medicine     11
Application of Genomic Database     11
Application of Proteomic Database     16
Application of the Metabolomics Database     18
Application of Pharmacogenomics Database     19
Application of Systomics Database     21
References     21
Modeling Biomedical Data     25
Introduction     25
Biological Concepts and EER Modeling     27
Sequence Ordering Concept     27
Input/OutputConcept     29
Molecular Spatial Relationship Concept     30
Formal Definitions for EER Extensions     31
Ordered Relationships     31
Process Relationships     33
Molecular Spatial Relationships     34
Summary of New EER Notation     35
Semantic Data Models of the Molecular Biological System     35
The DNA/Gene Model     36
The Protein 3D Structure Model     36
The Molecular Interaction and Pathway Model     40
EER-to-Relational Mapping     41
Ordered Relationship Mapping     41
Process Relationship Mapping     42
Molecular Spatial Relationship Mapping     43
Introduction to Multilevel Modeling and Data Source Integration     45
Multilevel Concepts and EER Modeling     46
Conclusion     48
References     49
Fundamentals of Gene Ontology     51
Introduction to Gene Ontology     51
Construction of an Ontology     52
General Evolution of GO Structures and General Annotation Strategy of Assigning GO Terms to Genes     56
General Evolution of GO Structures     56
General Annotation Strategy of Assigning GO Terms to Genes     57
Applications of Gene Ontology in Biological and Medical Science     57
Application of Gene Ontology in Biological Science     57
Application of Gene Ontology in Medical Science     58
References     60
Protein Ontology     63
Introduction     63
What Is Protein Annotation?     64
Underlying Issues with Protein Annotation     64
Other Biomedical Ontologies     65
Protein Data Frameworks     66
Critical Analysis of Protein Data Frameworks     68
Developing Protein Ontology     68
Protein Ontology Framework     69
The ProteinOntology Concept     70
Generic Concepts in Protein Ontology     70
The ProteinComplex Concept     71
Entry Concept     71
Structure Concept     72
StructuralDomains Concept     72
FunctionalDomains Concept     73
ChemicalBonds Concept     74
Constraints Concept     74
Comparison with Protein Annotation Frameworks     75
Protein Ontology Instance Store     76
Strengths and Limitations of Protein Ontology     77
Summary     78
References      78
Information Quality Management Challenges for High-Throughput Data     81
Motivation     81
The Experimental Context     84
Transcriptomics     86
Qualitative Proteomics     88
A Survey of Quality Issues     89
Variability and Experimental Design     89
Analysis of Quality Issues and Techniques     91
Specificity of Techniques and Generality of Dimensions     93
Beyond Data Generation: Annotation and Presentation     94
Current Approaches to Quality     96
Modeling, Collection, and Use of Provenance Metadata     96
Creating Controlled Vocabularies and Ontologies     97
Conclusions     98
Acknowledgments     98
References     98
Data Management for Fungal Genomics: An Experience Report     103
Introduction     103
Materials Tracking Database     109
Annotation Database     110
Microarray Database     111
Target Curation Database     111
Discussion     112
Issue of Data and Metadata Capture     113
Conclusion     116
Acknowledgments     116
References      116
Microarray Data Management: An Enterprise Information Approach     119
Introduction     119
Microarray Data Standardization     122
Gene Ontologies     123
Microarray Ontologies     125
Minimum Information About a Microarray Experiment     125
Database Management Systems     126
Relational Data Model     127
Object-Oriented Data Model     128
Object-Relational Data Model     131
Microarray Data Storage and Exchange     131
Microarray Repository     133
Microarray Data Warehouses and Datamarts     133
Microarray Data Federations     134
Enterprise Microarray Databases and M-KM     135
Challenges and Considerations     136
Conclusions     138
Acknowledgments     138
References     139
Data Management in Expression-Based Proteomics     143
Background     143
Proteomics Data Management Approaches     147
Data Standards in Mass Spectrometry Based Proteomics Studies     149
Public Repositories for Mass Spectrometry Data     152
Proteomics Data Management Tools     154
Expression Proteomics in the Context of Systems Biology Studies     155
Protein Annotation Databases     159
Conclusions     159
References     160
Model-Driven Drug Discovery: Principles and Practices     163
Introduction     163
Model Abstraction     165
Evolution of Models     166
Target Identification     168
Sequence-to-Function Models     170
Sequence Alignments and Phylogenetic Trees     170
Structure-to-Function Models     172
Systems-Based Approaches     173
Target Validation     176
Lead Identification     177
Target Structure-Based Design     177
Ligand-Based Models     179
Lead to Drug Phase     182
Predicting Drug-Likeness     182
ADMET Properties     182
Future Perspectives     183
Acknowledgments     184
References     184
Information Management and Interaction in High-Throughput Screening for Drug Discovery     189
Introduction     189
Prior Research     191
Overview of Antimalarial Drug Discovery     192
Overview of the Proposed Solution and System Architecture     193
HTS Data Processing     194
Introduction to HTS     194
Example of HTS for Antimalarial Drug Screening     195
Data Modeling     199
The Database Design     202
User Interface     204
Conclusions     206
Acknowledgments     207
References     207
Selected Bibliography     208
About the Authors     209
Index     217