You are not signed in. Sign in.

List Books: Buy books on ListBooks.org

Applied Data Mining for Business and Industry » (2nd Edition)

Book cover image of Applied Data Mining for Business and Industry by Paolo Giudici

Authors: Paolo Giudici, Silvia Figini
ISBN-13: 9780470058879, ISBN-10: 0470058870
Format: Paperback
Publisher: Wiley, John & Sons, Incorporated
Date Published: June 2009
Edition: 2nd Edition

Find Best Prices for This Book »

Author Biography: Paolo Giudici

Paolo Giudici – Department of Economics and Quantitative Methods, University of Pavia, A lecturer in data mining, business statistics, data analysis and risk management, Professor Giudici is also the director of the data mining laboratory. He is the author of around 80 publications, and the coordinator of 2 national research grants on data mining, and local coordinator of a European integrated project on the topic. He was the sole author of the first edition of this book, which has been translated into both Italian and Chinese. He is also one of the Editors of Wiley's Series in Computational Statistics.

Silvia Figini, Ms Figini has worked for 2 years for the Competence centre for data mining analysis and business intelligence at SAS Milan. She is currently completing a PhD in statistics, and already has a collection of publications to her name

Book Synopsis

The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications.

  • Introduces data mining methods and applications.

  • Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.

  • Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.

  • Features detailed case studies based on applied projects within industry.

  • Incorporates discussion of data mining software, with case studies analysed using R.

  • Is accessible to anyone with a basic knowledge of statistics or data analysis.

  • Includes an extensive bibliography and pointers to further reading within the text.

Applied Data Mining for Business and Industry, 2nd edition is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance.

Table of Contents

1 Introduction 1

Part I Methodology 5

2 Organisation of the data 7

2.1 Statistical units and statistical variables 7

2.2 Data matrices and their transformations 9

2.3 Complex data structures 10

2.4 Summary 11

3 Summary statistics 13

3.1 Univariate exploratory analysis 13

3.1.1 Measures of location 13

3.1.2 Measures of variability 15

3.1.3 Measures of heterogeneity 16

3.1.4 Measures of concentration 17

3.1.5 Measures of asymmetry 19

3.1.6 Measures of kurtosis 20

3.2 Bivariate exploratory analysis of quantitative data 22

3.3 Multivariate exploratory analysis of quantitative data 25

3.4 Multivariate exploratory analysis of qualitative data 27

3.4.1 Independence and association 28

3.4.2 Distance measures 29

3.4.3 Dependency measures 31

3.4.4 Model-based measures 32

3.5 Reduction of dimensionality 34

3.5.1 Interpretation of the principal components 36

3.6 Further reading 39

4 Model specification 41

4.1 Measures of distance 42

4.1.1 Euclidean distance 43

4.1.2 Similarity measures 44

4.1.3 Multidimensional scaling 46

4.2 Cluster analysis 47

4.2.1 Hierarchical methods 49

4.2.2 Evaluation of hierarchical methods 53

4.2.3 Non-hierarchical methods 55

4.3 Linear regression 57

4.3.1 Bivariate linear regression 57

4.3.2 Properties of the residuals 60

4.3.3 Goodness of fit 62

4.3.4 Multiple linear regression 63

4.4 Logistic regression 67

4.4.1 Interpretation of logistic regression 68

4.4.2 Discriminant analysis 70

4.5 Tree models 71

4.5.1 Division criteria 73

4.5.2 Pruning 74

4.6 Neural networks 76

4.6.1 Architecture of a neural network 79

4.6.2 The multilayer perceptron 81

4.6.3 Kohonen networks 87

4.7 Nearest-neighbour models 89

4.8 Local models 90

4.8.1 Association rules 90

4.8.2 Retrieval by content 96

4.9 Uncertainty measures and inference 96

4.9.1 Probability 97

4.9.2 Statistical models 99

4.9.3 Statistical inference 103

4.10 Non-parametric modelling 109

4.11 The normal linear model 112

4.11.1 Main inferential results 113

4.12 Generalised linear models 116

4.12.1 The exponential family 117

4.12.2 Definition of generalised linear models 118

4.12.3 The logistic regression model 125

4.13 Log-linear models 126

4.13.1 Construction of a log-linear model 126

4.13.2 Interpretation of a log-linear model 128

4.13.3 Graphical log-linear models 129

4.13.4 Log-linear model comparison 132

4.14 Graphical models 133

4.14.1 Symmetric graphical models 135

4.14.2 Recursive graphical models 139

4.14.3 Graphical models and neural networks 141

4.15 Survival analysis models 142

4.16 Further reading 144

5 Model evaluation 147

5.1 Criteria based on statistical tests 148

5.1.1 Distance between statistical models 148

5.1.2 Discrepancy of a statistical model 150

5.1.3 Kullback-Leibler discrepancy 151

5.2 Criteria based on scoring functions 153

5.3 Bayesian criteria 155

5.4 Computational criteria 156

5.5 Criteria based on loss functions 159

5.6 Further reading 162

Part II Business case studies 163

6 Describing website visitors 165

6.1 Objectives of the analysis 165

6.2 Description of the data 165

6.3 Exploratory analysis 167

6.4 Model building 167

6.4.1 Cluster analysis 168

6.4.2 Kohonen networks 169

6.5 Model comparison 171

6.6 Summary report 172

7 Market basket analysis 175

7.1 Objectives of the analysis 175

7.2 Description of the data 176

7.3 Exploratory data analysis 178

7.4 Model building 181

7.4.1 Log-linear models 181

7.4.2 Association rules 184

7.5 Model comparison 186

7.6 Summary report 191

8 Describing customer satisfaction 193

8.1 Objectives of the analysis 193

8.2 Description of the data 194

8.3 Exploratory data analysis 194

8.4 Model building 197

8.5 Summary 201

9 Predicting credit risk of small businesses 203

9.1 Objectives of the analysis 203

9.2 Description of the data 203

9.3 Exploratory data analysis 205

9.4 Model building 206

9.5 Model comparison 209

9.6 Summary report 210

10 Predicting e-learning student performance 211

10.1 Objectives of the analysis 211

10.2 Description of the data 212

10.3 Exploratory data analysis 212

10.4 Model specification 214

10.5 Model comparison 217

10.6 Summary report 218

11 Predicting customer lifetime value 219

11.1 Objectives of the analysis 219

11.2 Description of the data 220

11.3 Exploratory data analysis 221

11.4 Model specification 223

11.5 Model comparison 224

11.6 Summary report 225

12 Operational risk management 227

12.1 Context and objectives of the analysis 227

12.2 Exploratory data analysis 228

12.3 Model building 230

12.4 Model comparison 232

12.5 Summary conclusions 235

References 237

Index 243

Subjects