Manipulating and Pre-Processing Data with R: Statistics with R Series

Nirmal Kumar, N. K. Sinha


We collect data from various sources which have different structure. Before integrating these data for modeling or other purposes, first logical step is the manipulation and preprocessing of data in terms of identifying missing data, detection of outliers, knowing the probability distribution, data transformation, data standardization, etc. These preprocessing techniques aim at improving the quality and accuracy in data interpretation and modeling. However, in many cases, these tests are avoided due to unawareness or due to unavailability of heavily paid softwares. This book is designed to give the user a guided tour of the R platform – an open source software – for manipulation of tabular data through 10 chapters. The first three chapters introduce the environment of R, procedures of importing and exporting different formats of data into and from R, understanding the structure of the data. Dedicated chapters for data manipulation and preprocessing steps such as testing normality, identifying outliers, identifying heteroscadasticity and data normalization, are included in the book with various statistical tests. A chapter also discusses, in brief, the manipulation and preprocessing of text data. This book will foster an understanding of Basic R programming and data manipulation and preprocessing procedure to the graduate, post graduate students, teachers and research scientists working in the field of data analytics and modeling.

 Download and Installation of R
 Data-reading and Writing
 Data-Overview in R
 Data Manipulations
 Descriptive Statistics
 Data Distribution
 Detection of Outliers
 Data-Transformation
 Homoscedasticity and Heteroscedasticity
 Text Data Pre-processing