The procedure known as Data Mining can be defined, in a broad sense, as an activity that involves the search, collection, and analysis of data.
Significant amounts of data can be collected from various sources, such as databases or websites. The form in which this data is found can be that of correlations or patterns, and, due to innovation and technological progress, extracting this data has become increasingly simple.
At the company or industry level, data has become an extremely important resource. Extracting useful information from unorganized data sources is an increasingly popular activity, thanks to current Data Mining techniques, which allow the transformation of primary data into valuable information.
This book, named Data Mining in the R Environment. Theory and Applications, is structured in 11 chapters and designed with the aim of being a useful guide for both researchers and practitioners, in order to know some more or less analytical aspects related to the field of Data Mining.
The first chapter of this book is dedicated to a brief introduction to the text analysis side (Text Mining) and implementation in the R programming environment. Later, an extremely important stage in the context of Text Mining type analysis, namely data preprocessing, is detailed in the second chapter. Chapters 3 and 4 follow the exposition of cluster analysis and sentiment analysis, respectively, in the context of Text Mining. In the fifth chapter, the issue of regression trees and classification trees is treated. Chapter 6 is dedicated to the presentation of some popular and extremely useful Data Mining methods, namely ensemble methods: bagging, boosting, and stacking. In the seventh chapter, the comparison between two ensemble-type methods, bagging and boosting, is highlighted. Ensemble-type models will also be addressed in chapters 8 and 9, respectively, in the context of classification and regression. The bagging and AdaBoost procedures will be detailed in both the classification case and the regression context. The penultimate chapter aims to detail the construction of models such as C5.0, Stochastic Gradient Boosting, Bagged CART, respectively Random Forest in the R environment, while chapter 11 exposes a series of applications using ensemble methods, based on techniques such as : principal component analysis (PCA), Artificial Neural Networks, decision trees, Random Forest, or SVM.
Consequently, the book Data Mining in the R Environment. Theory and Applications is recommended both to those who study Data Analysis, Data Mining, or Artificial Neural Networks (RNA), as well as to researchers, PhD students, and those interested in the various aspects related to data exploration.