You are not signed in. Sign in.

List Books: Buy books on ListBooks.org

Introduction To Data Technologies For Science » (New Edition)

Book cover image of Introduction To Data Technologies For Science by Paul Murrell

Authors: Paul Murrell
ISBN-13: 9781420065176, ISBN-10: 1420065173
Format: Hardcover
Publisher: Taylor & Francis, Inc.
Date Published: March 2009
Edition: New Edition

Find Best Prices for This Book »

Author Biography: Paul Murrell

Paul Murrell is a Senior Lecturer in the Department of Statistics at the University of Auckland, New Zealand. Author of the bestselling R Graphics (2006), he is also part of the development team for the R and Omegahat statistical computing projects. Dr. Murrell’s research interests include computational and graphical statistics.

Book Synopsis

Providing key information on how to work with research data, Introduction to Data Technologies presents ideas and techniques for performing critical, behind-the-scenes tasks that take up so much time and effort yet typically receive little attention in formal education. With a focus on computational tools, the book shows readers how to improve their awareness of what tasks can be achieved and describes the correct approach to perform these tasks.

Practical examples demonstrate the most important points
The author first discusses how to write computer code using HTML as a concrete example. He then covers a variety of data storage topics, including different file formats, XML, and the structure and design issues of relational databases. After illustrating how to extract data from a relational database using SQL, the book presents tools and techniques for searching, sorting, tabulating, and manipulating data. It also introduces some very basic programming concepts as well as the R language for statistical computing. Each of these topics has supporting chapters that offer reference material on HTML, CSS, XML, DTD, SQL, R, and regular expressions.

One-stop shop of introductory computing information
Written by a member of the R Development Core Team, this resource shows readers how to apply data technologies to tasks within a research setting. Collecting material otherwise scattered across many books and the web, it explores how to publish information via the web, how to access information stored in different formats, and how to write small programs to automate simple, repetitive tasks.

Table of Contents

List of Figures xv

List of Tables xvii

preface xix

1 Introduction 1

1.1 Case study: Point Nemo 1

2 Writing Computer Code 9

2.1 Case study: Point Nemo (continued) 11

2.2 Syntax 13

2.2.1 HTML syntax 13

2.2.2 Escape sequences 17

2.3 Semantics 18

2.3.1 HTML semantics 19

2.4 Writing code 21

2.4.1 Text editors 21

2.4.2 Important features of a text editor 21

2.4.3 Layout of code 22

2.4.4 Indenting code 24

2.4.5 Long lines of code 25

2.4.6 Whitespace 26

2.4.7 Documenting code 26

2.4.8 HTML comments 28

2.5 Checking code 29

2.5.1 Checking HTML code 29

2.5.2 Reading error information 30

2.5.3 Reading documentation 32

2.6 Running code 32

2.6.1 Running HTML code 33

2.6.2 Debugging code 33

2.7 The DRY principle 35

2.7.1 Cascading Style Sheets 36

2.8 Further reading 41

3 HTML Reference 43

3.1 HTML syntax 43

3.1.1 HTML comments 44

3.1.2 HTML entities 45

3.2 HTML semantics 45

3.2.1 Common HTML elements 46

3.2.2 Common HTML attributes 51

3.3 Further reading 51

4 CSS Reference 53

4.1 CSS syntax 53

4.2 CSS semantics 54

4.2.1 CSS selectors 54

4.2.2 CSS properties 56

4.3 Linking CSS to HTML 59

4.4 CSS tips 60

4.5 Further reading 61

5 Data Storage 63

5.1 Case study: YBC 7289 64

5.2 Plain text formats 69

5.2.1 Computer memory 71

5.2.2 Files and formats 71

5.2.3 Case study: Point Nemo (continued) 72

5.2.4 Advantages and disadvantages 73

5.2.5 CSV files 76

5.2.6 Line endings 76

5.2.7 Text encodings 78

5.2.8 Case study: The Data Expo 80

5.3 Binary formats 83

5.3.1 More on computer memory 84

5.3.2 Case study: Point Nemo (continued) 86

5.3.3 NetCDF 87

5.3.4 PDF documents90

5.3.5 Other types of data 91

5.4 Spreadsheets 94

5.4.1 Spreadsheet formats 94

5.4.2 Spreadsheet software 95

5.4.3 Case study: Over the limit 96

5.5 XML 99

5.5.1 XML syntax 102

5.5.2 XML design 105

5.5.3 XML schema 110

5.5.4 Case study: Point Nemo (continued) 110

5.5.5 Advantages and disadvantages 114

5.6 Databases 118

5.6.1 The database data model 119

5.6.2 Database notation 121

5.6.3 Database design 122

5.6.4 Flashback: The DRY principle 132

5.6.5 Case study: The Data Expo (continued) 133

5.6.6 Advantages and disadvantages 138

5.6.7 Flashback: Database design and XML design 139

5.6.8 Case study: The Data Expo (continued) 139

5.6.9 Database software 141

5.7 Further reading 142

6 XML Reference 145

6.1 XML syntax 145

6.2 Document Type Definitions 147

6.2.1 Element declarations 148

6.2.2 Attribute declarations 149

6.2.3 Including a DTD 150

6.2.4 An example 151

6.3 Further reading 152

7 Data Queries 153

7.1 Case study: The Data Expo (continued) 154

7.2 Querying databases 158

7.2.1 SQL syntax 159

7.2.2 Case study: The Data Expo (continued) 159

7.2.3 Collations 165

7.2.4 Querying several tables: Joins 166

7.2.5 Case study: Commonwealth swimming 166

7.2.6 Cross joins 169

7.2.7 Inner joins 170

7.2.8 Case study: The Data Expo (continued) 171

7.2.9 Subqueries 175

7.2.10 Outer joins 176

7.2.11 Case study: Commonwealth swimming (continued) 176

7.2.12 Self joins 179

7.2.13 Case study: The Data Expo (continued) 179

7.2.14 Running SQL code 180

7.3 Querying XML 182

7.3.1 XPath syntax 182

7.3.2 Case study: Point Nemo (continued) 182

7.4 Further reading 185

8 SQL Reference 187

8.1 SQL syntax 187

8.2 SQL queries 187

8.2.1 Selecting columns 188

8.2.2 Specifying tables: The FROM clause 189

8.2.3 Selecting rows: The WHERE clause 190

8.2.4 Sorting results: The ORDER BY clause 192

8.2.5 Aggregating results: The GROUP BY clause 192

8.2.6 Subqueries 193

8.3 Other SQL commands 194

8.3.1 Defining tables 194

8.3.2 Populating tables 195

8.3.3 Modifying data 197

8.3.4 Deleting data 197

8.4 Further reading 197

9 Data Processing 199

9.1 Case study: The Population Clock 204

9.2 The R environment 214

9.2.1 The command line 214

9.2.2 The workspace 217

9.2.3 Packages 218

9.3 The R language 219

9.3.1 Expressions 219

9.3.2 Constant values 219

9.3.3 Arithmetic 220

9.3.4 Conditions 221

9.3.5 Function calls 222

9.3.6 Symbols and assignment 224

9.3.7 Keywords 226

9.3.8 Flashback: Writing for an audience 227

9.3.9 Naming variables 227

9.4 Data types and data structures 229

9.4.1 Case study: Counting candy 232

9.4.2 Vectors 234

9.4.3 Factors 237

9.4.4 Data frames 237

9.4.5 Lists 239

9.4.6 Matrices and arrays 241

9.4.7 Flashback: Numbers in computer memory 242

9.5 Subsetting 243

9.5.1 Assigning to a subset 250

9.5.2 Subsetting factors 251

9.6 More on data structures 252

9.6.1 The recycling rule 252

9.6.2 Type coercion 253

9.6.3 Attributes 256

9.6.4 Classes 259

9.6.5 Dates 261

9.6.6 Formulas 262

9.6.7 Exploring objects 263

9.6.8 Generic functions 264

9.7 Data import/export 266

9.7.1 The working directory 267

9.7.2 Specifying files 267

9.7.3 Text formats 268

9.7.4 Case study: Point Nemo (continued) 269

9.7.5 Binary formats 275

9.7.6 Spreadsheets 278

9.7.7 XML 280

9.7.8 Databases 284

9.7.9 Case study: The Data Expo (continued) 285

9.8 Data manipulation 287

9.8.1 Case study: New Zealand schools 287

9.8.2 Transformations 289

9.8.3 Sorting 293

9.8.4 Tables of counts 295

9.8.5 Aggregation 297

9.8.6 Case study: NCEA 302

9.8.7 The "apply" functions 304

9.8.8 Merging 309

9.8.9 Flashback: Database joins 312

9.8.10 Splitting 312

9.8.11 Reshaping 314

9.8.12 Case study: Utilities 318

9.9 Text processing 326

9.9.1 Case study: The longest placename 326

9.9.2 Regular expressions 333

9.9.3 Cage study: Rusty wheat 335

9.10 Data display 343

9.10.1 Case study: Point Nemo (continued) 343

9.10.2 Converting to text 345

9.10.3 Results for reports 348

9.11 Programming 351

9.11.1 Case study: The Data Expo (continued) 352

9.11.2 Control flow 554

9.11.3 Writing functions 356

9.11.4 Flashback: Writing functions, waiting code, and the DRY principle 359

9.11.5 Flashback: Debugging 360

9.12 Other software 361

10 R Reference 365

10.1 R syntax 365

10.1.1 Constants 365

10.1.2 Arithmetic operators 366

10.1.3 Logical operators 366

10.1.4 Function calls 366

10.1.5 Symbols and assignment 367

10.1.6 Loops 367

10.1.7 Conditional expressions 368

10.2 Data types and data structures 368

10.3 Functions 369

10.3.1 Session management 370

10.3.2 Generating vectors 370

10.3.3 Numeric functions 371

10.3.4 Comparisons 372

10.3.5 Type coercion 373

10.3.6 Exploring data structures373

10.3.7 Subsetting 374

10.3.8 Data import/export 375

10.3.9 Transformations 378

10.3.10 Sorting 379

10.3.11 Tables of counts 379

10.3.12 Aggregation 380

10.3.13 The "apply" functions 380

10.3.14 Merging 381

10.3.15 Splitting 382

10.3.16 Reshaping 382

10.3.17 Text processing 384

10.3.18 Data display 385

10.3.19 Debugging 386

10.4 Getting help 386

10.5 Packages 388

10.6 Searching for functions 389

10.7 Further reading 390

11 Regular Expressions Reference 391

11.1 Literals 391

11.2 Metacharacters 392

11.2.1 Character sets 392

11.2.2 Anchors 393

11.2.3 Alternation 394

11.2.4 Repetitions 395

11.2.5 Grouping 396

11.2.6 Backreferences 396

11.3 Further reading 397

12 Conclusion 399

Attributions 401

Bibliography 403

Index 407

Subjects