Authors: Jake Chen (Editor), Amandeep S. Sidhu
ISBN-13: 9781596932586, ISBN-10: 1596932589
Format: Hardcover
Publisher: Artech House, Incorporated
Date Published: October 2007
Edition: (Non-applicable)
Book Synopsis
This cutting-edge, resource helps professionals and researchers master the next generation of biological database modeling techniques. The book shows how to representing complex biological concepts and their evolving data structures. This highly accessible volume presents innovative biological databases modeling concepts, methods, and software tools that are integrated with case studies in genomics, functional genomics, proteomics, and drug discovery projects. Breaking new ground at the intersection of high-throughput biology, bioinformatics, and data management, this essential resource offers comprehensive coverage of: Public Biological Databases for -Omics Studies in Medicine; Modeling Biomedical Data; Fundamentals of Gene Ontology; Protein Ontology; Information Quality Management Challenges for High-Throughput Data; Data Management for Fungal Genomics; Microarray Data Management; Data Management in Expression-Based Proteomics; Model-Driven Drug Discovery; Information Management and Interaction in High-Throughput Screening for Drug Discovery.
Table of Contents
Preface xiii
Acknowledgments xvii
Introduction to Data Modeling 1
Generic Modern Markup Languages 1
Modeling Complex Data Structures 3
Data Modeling with General Markup Languages 3
Ontologies: Enriching Data with Text 4
Hyperlinks for Semantic Modeling 5
Evolving Subject Indexes 6
Languages 6
Views 7
Modeling Biological Data 7
References 8
Public Biological Databases for -Omics Studies in Medicine 9
Introduction 9
Public Databases in Medicine 10
Application of Public Bioinformatics Database in Medicine 11
Application of Genomic Database 11
Application of Proteomic Database 16
Application of the Metabolomics Database 18
Application of Pharmacogenomics Database 19
Application of Systomics Database 21
References 21
Modeling Biomedical Data 25
Introduction 25
Biological Concepts and EER Modeling 27
Sequence Ordering Concept 27
Input/OutputConcept 29
Molecular Spatial Relationship Concept 30
Formal Definitions for EER Extensions 31
Ordered Relationships 31
Process Relationships 33
Molecular Spatial Relationships 34
Summary of New EER Notation 35
Semantic Data Models of the Molecular Biological System 35
The DNA/Gene Model 36
The Protein 3D Structure Model 36
The Molecular Interaction and Pathway Model 40
EER-to-Relational Mapping 41
Ordered Relationship Mapping 41
Process Relationship Mapping 42
Molecular Spatial Relationship Mapping 43
Introduction to Multilevel Modeling and Data Source Integration 45
Multilevel Concepts and EER Modeling 46
Conclusion 48
References 49
Fundamentals of Gene Ontology 51
Introduction to Gene Ontology 51
Construction of an Ontology 52
General Evolution of GO Structures and General Annotation Strategy of Assigning GO Terms to Genes 56
General Evolution of GO Structures 56
General Annotation Strategy of Assigning GO Terms to Genes 57
Applications of Gene Ontology in Biological and Medical Science 57
Application of Gene Ontology in Biological Science 57
Application of Gene Ontology in Medical Science 58
References 60
Protein Ontology 63
Introduction 63
What Is Protein Annotation? 64
Underlying Issues with Protein Annotation 64
Other Biomedical Ontologies 65
Protein Data Frameworks 66
Critical Analysis of Protein Data Frameworks 68
Developing Protein Ontology 68
Protein Ontology Framework 69
The ProteinOntology Concept 70
Generic Concepts in Protein Ontology 70
The ProteinComplex Concept 71
Entry Concept 71
Structure Concept 72
StructuralDomains Concept 72
FunctionalDomains Concept 73
ChemicalBonds Concept 74
Constraints Concept 74
Comparison with Protein Annotation Frameworks 75
Protein Ontology Instance Store 76
Strengths and Limitations of Protein Ontology 77
Summary 78
References 78
Information Quality Management Challenges for High-Throughput Data 81
Motivation 81
The Experimental Context 84
Transcriptomics 86
Qualitative Proteomics 88
A Survey of Quality Issues 89
Variability and Experimental Design 89
Analysis of Quality Issues and Techniques 91
Specificity of Techniques and Generality of Dimensions 93
Beyond Data Generation: Annotation and Presentation 94
Current Approaches to Quality 96
Modeling, Collection, and Use of Provenance Metadata 96
Creating Controlled Vocabularies and Ontologies 97
Conclusions 98
Acknowledgments 98
References 98
Data Management for Fungal Genomics: An Experience Report 103
Introduction 103
Materials Tracking Database 109
Annotation Database 110
Microarray Database 111
Target Curation Database 111
Discussion 112
Issue of Data and Metadata Capture 113
Conclusion 116
Acknowledgments 116
References 116
Microarray Data Management: An Enterprise Information Approach 119
Introduction 119
Microarray Data Standardization 122
Gene Ontologies 123
Microarray Ontologies 125
Minimum Information About a Microarray Experiment 125
Database Management Systems 126
Relational Data Model 127
Object-Oriented Data Model 128
Object-Relational Data Model 131
Microarray Data Storage and Exchange 131
Microarray Repository 133
Microarray Data Warehouses and Datamarts 133
Microarray Data Federations 134
Enterprise Microarray Databases and M-KM 135
Challenges and Considerations 136
Conclusions 138
Acknowledgments 138
References 139
Data Management in Expression-Based Proteomics 143
Background 143
Proteomics Data Management Approaches 147
Data Standards in Mass Spectrometry Based Proteomics Studies 149
Public Repositories for Mass Spectrometry Data 152
Proteomics Data Management Tools 154
Expression Proteomics in the Context of Systems Biology Studies 155
Protein Annotation Databases 159
Conclusions 159
References 160
Model-Driven Drug Discovery: Principles and Practices 163
Introduction 163
Model Abstraction 165
Evolution of Models 166
Target Identification 168
Sequence-to-Function Models 170
Sequence Alignments and Phylogenetic Trees 170
Structure-to-Function Models 172
Systems-Based Approaches 173
Target Validation 176
Lead Identification 177
Target Structure-Based Design 177
Ligand-Based Models 179
Lead to Drug Phase 182
Predicting Drug-Likeness 182
ADMET Properties 182
Future Perspectives 183
Acknowledgments 184
References 184
Information Management and Interaction in High-Throughput Screening for Drug Discovery 189
Introduction 189
Prior Research 191
Overview of Antimalarial Drug Discovery 192
Overview of the Proposed Solution and System Architecture 193
HTS Data Processing 194
Introduction to HTS 194
Example of HTS for Antimalarial Drug Screening 195
Data Modeling 199
The Database Design 202
User Interface 204
Conclusions 206
Acknowledgments 207
References 207
Selected Bibliography 208
About the Authors 209
Index 217
Subjects