Must Know Packages For a Successful Data Scientist

Must Know Packages For a Successful Data Scientist

2037

 

Packages For Data Manipulation
Must know Packages for a successful Data Scientist
Packages for Data Manipulation

XLSX: To read and write excel files
Foreign: To read and write SAS,SPSS files
XML: To read and write XML File
JSON: To read and write Json files
Moments: To Find Skewness and Kurtosis
Httr: A set of useful tools for working with http connections
ggplot2: For visualixation purpose
lubridate: To work with date-spans, time-spans, date-time dd/mm/yy to yy/mm/dd
dplyr: Consistent and fast tool for working on R and modify the Data

Packages for Imputation

HotDeckimputation: To resolve missing Data

Yalmpute: Performs nearest neighbour-based imputation using one or more alternative approaches to process multivariate data

Mvnmle: Finds the maximum likelihood estimate of the mean vector and variance-covariance matrix for multivariate normal data with missing values.

Mice: Multiple Imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm

Lattice: A powerful, high-level data visualization system, emphasis on multivariate data. Sufficient for typical graphics needs, flexible to handle non-standard requirements.

Packages for Kmeans

Plyr: break a big problem down into pieces, operate on each piece and then put all the pieces back together.

Animation: Provides functions for animations in probability theory, mathematical, multivariate, nonparametric, computational statistics, sampling survey, linear models, time series, np data mining and machine learning.

kselection : selection of number of clusters via bootstrap

Doparallel : provides a parallel backend for the proper %dopar% function using the parallel package  

Cluster :  Finding groups in data

Package for KNN: 

Class : various functions for classification, including k nearest neighbour , learning vector quantization self-organizing maps 

Gmodels: various R programming tools for model fitting 

Package for linear regression : 

Lattice :  A powerful high level data visualisation system emphasis on multivariate data. sufficient for  typical graphics needs, flexible to handle most non-standards requirements 

Car : function and database to accompany
Cor2poor : used to find partial correlation 
MASS : function and database to support “ modern applied statistics with s”

Package for Naive Bayes:
e1071: functions for latent class analysis, fuzzy clustering . short time fourier transform , support vector machine, shortest path computation, bagged clustering , naive bayes classifier

gmodels : various programming tools for ,model fitting .


Packages for Text mining 

 
rjava: Low-level interface to java Vm similar to .c/.call.   This allows creation of objects, calling methods and accessing fields. 

tm :  This is a framework. for text mining applications within R

Snowballc:  Collapsing words to a common word to understand vocabulary. currently supportIng Danish, Dutch, English, Finnish, French, UMW, Flunganan, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish languages. 

Wordcloud : Describing words in a beautiful way. 
Rweka: collection of machine learning algorithms for data mining  tasks written in java, containing teals for data pre-processing, Visualization, association rules, classification, regression and Clustering. 

igraph: Routines for simple graph and network analysis. Handling large graphs and providing functions for generating random and  regular graphs, graph visualization, centralitymethod. 

qdap: Automates many of the tasks associated with quantitative discourse analysis of transcripts, parsing tools for preparing transcript data. 


Maptpx: Posterior maximizatIon for topic models (LDA)  In text analysis.

Packages for SVM/Neural: 

KernIab: Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction.  ‘KernIab,' includes Support Vector Machines,Spectral Clustering, KernIab PCA, Gaussian Process and OP solver .

Neuralnet : Training of neural networks using backpropagation, resilient backpropagation, resilient backpropagation allows flexible settings through custom-choice of error and action function. 

Packages for Twitter: 
TwitterR : It provides an interface to the Twitter web API.

Base64enc:  It provides tools for handling base64 encoding. This is more flexible than the orphaned base64. Pacbge.

Httpuv: It provides protocol support for handling HTTP and WebSocket requests directly from R. It Is a building block for other packages. 
 

 

 

Post Comments

Call Us