Authors: Laxmi Parida
ISBN-13: 9781584885498, ISBN-10: 1584885491
Format: Hardcover
Publisher: Taylor & Francis, Inc.
Date Published: July 2007
Edition: (Non-applicable)
Book Synopsis
The computational methods of bioinformatics are being used more and more to process the large volume of current biological data. Promoting an understanding of the underlying biology that produces this data, Pattern Discovery in Bioinformatics: Theory and Algorithms provides the tools to study regularities in biological data.
Taking a systematic approach to pattern discovery, the book supplies sound mathematical definitions and efficient algorithms to explain vital information about biological data. It explores various data patterns, including strings, clusters, permutations, topology, partial orders, and boolean expressions. Each of these classes captures a different form of regularity in the data, providing possible answers to a wide range of questions. The book also reviews basic statistics, including probability, information theory, and the central limit theorem.
This self-contained book provides a solid foundation in computational methods, enabling the solution of difficult biological questions.
Table of Contents
Introduction 1
Ubiquity of Patterns 1
Motivations from Biology 2
The Need for Rigor 2
Who is a Reader of this Book? 3
About this book 4
The Fundamentals 7
Basic Algorithmics 9
Introduction 9
Graphs 9
Tree Problem 1: Minimum Spanning Tree 14
Prim's algorithm 17
Tree Problem 2: Steiner Tree 21
Tree Problem 3: Minimum Mutation Labeling 22
Fitch's algorithm 23
Storing & Retrieving Elements 27
Asymptotic Functions 30
Recurrence Equations 32
Counting binary trees 34
Enumerating unrooted trees (Prufer sequence) 36
NP-Complete Class of Problems 40
Exercises 41
Basic Statistics 47
Introduction 47
Basic Probability 48
Probability space foundations 48
Multiple events (Bayes' theorem) 50
Inclusion-exclusion principle 51
Discrete probability space 54
Algebra of random variables 57
Expectations 58
Discrete probability distribution (binomial, Poisson) 60
Continuous probability distribution (normal) 64
Continuous probability space ([ohm] is R) 66
The Bare Truth about Inferential Statistics 69
Probability distribution invariants 70
Samples & summary statistics 72
The central limit theorem 77
Statistical significance (p-value) 80
Summary 82
Exercises 82
What Are Patterns? 89
Introduction 89
Common Thread 90
Pattern Duality 90
Operators on p 92
Irredundant Patterns 92
Special case: maximality 93
Transitivity of redundancy 95
Uniqueness property 95
Case studies 96
Constrained Patterns 99
When is a Pattern Specification Nontrivial? 99
Classes of Patterns 100
Exercises 103
Patterns on Linear Strings 111
Modeling the Stream of Life 113
Introduction 113
Modeling a Biopolymer 113
Repeats in DNA 114
Directionality of biopolymers 115
Modeling a random permutation 117
Modeling a random string 119
Bernoulli Scheme 120
Markov Chain 121
Stationary distribution 123
Computing probabilities 127
Hidden Markov Model (HMM) 128
The decoding problem (Viterbi algorithm) 130
Comparison of the Schemes 133
Conclusion 133
Exercises 134
String Pattern Specifications 139
Introduction 139
Notation 140
Solid Patterns 142
Maximality 144
Counting the maximal patterns 144
Rigid Patterns 149
Maximal rigid patterns 150
Enumerating maximal rigid patterns 152
Density-constrained patterns 156
Quorum-constrained patterns 157
Large-|[Sigma]| input 158
Irredundant patterns 160
Extensible Patterns 164
Maximal extensible patterns 165
Generalizations 165
Homologous sets 165
Sequence on reals 167
Exercises 170
Algorithms & Pattern Statistics 183
Introduction 183
Discovery Algorithm 183
Pattern Statistics 191
Rigid Patterns 191
Extensible Patterns 193
Nondegenerate extensible patterns 194
Degenerate extensible patterns 196
Correction factor for the dot character 197
Measure of Surprise 198
z-score 199
x-square ratio 199
Interplay of combinatorics & statistics 200
Applications 201
Exercises 203
Motif Learning 213
Introduction: Local Multiple Alignment 213
Probabilistic Model: Motif Profile 214
The Learning Problem 215
Importance Measure 216
Statistical significance 216
Information content 219
Algorithms to Learn a Motif Profile 220
An Expectation Maximization Framework 222
The initial estimate [rho subscript o] 222
Estimating z given [rho] 223
Estimating [rho] given z 224
A Gibbs Sampling Strategy 227
Estimating [rho] given an alignment 227
Estimating background probabilities given Z 228
Estimating Z given [rho] 228
Interpreting the Motif Profile in Terms of p 229
Exercises 230
The Subtle Motif 235
Introduction: Consensus Motif 235
Combinatorial Model: Subtle Motif 236
Distance between Motifs 238
Statistics of Subtle Motifs 240
Performance Score 245
Enumeration Schemes 246
Neighbor enumeration (exact) 246
Submotif enumeration (inexact) 249
A Combinatorial Algorithm 252
A Probabilistic Algorithm 255
A Modular Solution 257
Conclusion 259
Exercises 260
Patterns on Meta-Data 263
Permutation Patterns 265
Introduction 265
Notation 266
How Many Permutation Patterns? 267
Maximality 268
P[subscript = 1]: Linear notation & PQ trees 269
P[subscript > i]: Linear notation? 271
Parikh Mapping-based Algorithm 273
Tagging technique 275
Time complexity analysis 275
Intervals 278
The naive algorithm 280
The Uno-Yagiura RC algorithm 281
Intervals to PQ Trees 294
Irreducible intervals 295
Encoding intervals as a PQ tree 297
Applications 307
Case study I: Human and rat 308
Case study II: E. Coli K-12 and B. Subtilis 309
Conclusion 311
Exercises 312
Permutation Pattern Probabilities 323
Introduction 323
Unstructured Permutations 323
Multinomial coefficients 325
Patterns with multiplicities 328
Structured Permutations 329
P-arrangement 330
An incremental method 331
An upper bound on P-arrangements 336
A lower bound on P-arrangements 341
Estimating the number of frontiers 342
Combinatorics to probabilities 345
Exercises 346
Topological Motifs 355
Introduction 355
Graph notation 355
What Are Topological Motifs? 356
Combinatorics in topologies 357
Input with self-isomorphisms 358
The Topological Motif 359
Maximality 367
Compact Topological Motifs 369
Occurrence-isomorphisms 369
Vertex indistinguishability 372
Compact list 373
Compact vertex, edge & motif 373
Maximal compact lists 374
Conjugates of compact lists 374
Characteristics of compact lists 378
Maximal operations on compact lists 380
Maximal subsets of location lists 381
Binary relations on compact lists 384
Compact motifs from compact lists 384
The Discovery Method 392
The algorithm 393
Related Classical Problems 399
Applications 400
Conclusion 402
Exercises 402
Set-Theoretic Algorithmic Tools 417
Introduction 417
Some Basic Properties of Finite Sets 418
Partial Order Graph G(S, E) of Sets 419
Reduced partial order graph 420
Straddle graph 421
Boolean Closure of Sets 423
Intersection closure 423
Union closure 424
Consecutive (Linear) Arrangement of Set Members 426
PQ trees 426
Straddling sets 429
Maximal Set Intersection Problem (maxSIP) 434
Ordered enumeration trie 435
Depth first traversal of the trie 436
Minimal Set Intersection Problem (minSIP) 447
Algorithm 447
Minimal from maximal sets 448
Multi-Sets 450
Ordered enumeration trie of multi-sets 451
Enumeration algorithm 453
Adapting the Enumeration Scheme 455
Exercises 458
Expression & Partial Order Motifs 469
Introduction 469
Motivation 470
Extracting (Monotone CNF) Boolean Expressions 471
Extracting biclusters 475
Extracting patterns in microarrays 478
Extracting Partial Orders 480
Partial orders 480
Partial order construction problem 481
Excess in partial orders 483
Statistics of Partial Orders 485
Computing Cex(B) 489
Redescriptions 493
Application: Partial Order of Expressions 494
Summary 495
Exercises 496
References 503
Index 515
Subjects