Thursday, April 5, 2012

Decision Trees in R

Classification and Regression Trees (CART)
R packages: tree and rpart
Sample codes:
library("tree")
tree.result <- tree(Y~., data=mydata)
summary(tree.result)
plot(tree.result); text(tree.result)
library("rpart")
fit <- rpart(Y~., data=mydata, control=rpart.control(minsplit=100, method=class, cp=0.001))
printcp(fit) # display the results
plotcp(fit) # visualize cross-validation results
summary(fit) # detailed summary of splits
rsq.rpart(fit) # generate two plots - r square and relative error 
summary(predict(fit, mydata, type="class"))

Bagging with Regression Trees (BRT)
R package: ipred
Sample codes:
library(ipred)
bag.result <- bagging(Y~., data=mydata, nbagg=30)

Random Forest (RF)
R package: randomForest
library("randomForest")
rf <- randomForest(Y~., data=training.data, ntree=100, mtry=40)
pred <- predict(rf, test.data)
cmatrix <- table(observed=test.data [, "Y"], predicted=pred) # Confusion matrix
# pred <- predict(rf, test.data , type="prob")

Random Fields in R

Package 'RandomFields'
http://cran.r-project.org/web/packages/RandomFields/RandomFields.pdf
Package 'CompRandFld'
http://cran.r-project.org/web/packages/CompRandFld/CompRandFld.pdf

A Trick to Better Search R in Google

Include [R] instead of R as one of the keywords

In addition, in R
library("sos")
findFn("string")
Other helpful websites:

Great News for Lazy Bones Who are Addictive to R

One package: Rcmdr

Now cluster analysis is just several mouse clicks away...