AnsweredAssumed Answered

Failed to do data mining using R in PDI

Question asked by Pooja saxena on Mar 7, 2018
Latest reply on Mar 15, 2018 by Pooja saxena

Hello,

I installed the EE version of Pentaho and PDI. I wanted to do data Mining task using R and PDI, for that, I attached R environment in PDI using this website R script executor - Pentaho Data Integration - Pentaho Wiki

 

System Information

1. MAC OX 13

2. Pentaho 8.0

 

Observations/Comment

1. I could not find any 'jri.dll' library, as the above website mentioned. Instead, I added ' libjri.jnilib' in

    /Pentaho/design-tools/data-integration/libswt/osx64/

  I found this library here '/Library/Frameworks/R.framework/Versions/3.4/Resources/library/rJava/jri/libjri.jnilib'

 

2. I set the environment variables

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home/jre

export R_HOME=/Library/Frameworks/R.framework/Resources/

export R_LIBS_USER=/Library/Frameworks/R.framework/Versions/3.4/Resources/library

export PATH=/Library/Frameworks/R.framework/Resources/R:$PATH

 

3. I followed this nice example Supervised Learning Demonstration from the same above link, which involves using/loading the randomForest library. I successfully manage to finish each and every step using iris dataset and R (randomForest library).

 

Error/Question

1. Now, I am creating a new transformation using a big dataset and using the function of caret library in R. To my surprise, this R failed to execute.

### Libraries

library(caret)

nr <- nrow(df)

df$dataset <- rep("score", nr)

cdp2 <- createDataPartition(1:nrow(df), 1, p = 0.7, list = FALSE)

df$dataset[cdp2] <- 'train'

### Return df

df

 

The error message is following

 

2018/03/07 22:04:01 - R Script Executor.0 - ERROR (version 8.0.0.0-28, build 8.0.0.0-28 from 2017-11-05 07.27.50 by buildguy) : Unexpected error

2018/03/07 22:04:01 - R Script Executor.0 - ERROR (version 8.0.0.0-28, build 8.0.0.0-28 from 2017-11-05 07.27.50 by buildguy) : org.pentaho.di.core.exception.KettleException:

2018/03/07 22:04:01 - R Script Executor.0 - There doesnt seem to be any output from the script! Check the script - if all looks OK then this could be due to the random input data that is generated for testing the script/determining output meta data. If necessary, you can manually define the output fields that the script produces.

2018/03/07 22:04:01 - R Script Executor.0 -

2018/03/07 22:04:01 - R Script Executor.0 - at org.pentaho.r.b.a(SourceFile:382)

2018/03/07 22:04:01 - R Script Executor.0 - at org.pentaho.di.trans.steps.rscriptexecutor.a.a(SourceFile:552)

2018/03/07 22:04:01 - R Script Executor.0 - at org.pentaho.di.trans.steps.rscriptexecutor.a.a(SourceFile:358)

2018/03/07 22:04:01 - R Script Executor.0 - at org.pentaho.di.trans.steps.rscriptexecutor.a.a(SourceFile:331)

2018/03/07 22:04:01 - R Script Executor.0 - at org.pentaho.di.trans.steps.rscriptexecutor.a.processRow(SourceFile:246)

2018/03/07 22:04:01 - R Script Executor.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)

2018/03/07 22:04:01 - R Script Executor.0 - at java.lang.Thread.run(Thread.java:745)

 

Very much appreciate your help to understand the root cause of this error.

 

Thank you,

Pooja

Outcomes