PennController for IBEX › Forums › Support › Pre-processing self-paced reading data in R
- This topic has 4 replies, 3 voices, and was last updated 3 years, 4 months ago by apspj.
-
AuthorPosts
-
July 27, 2020 at 10:24 pm #5873rosa303Participant
Hi Jeremy,
I’ve written a script for a self-paced reading experiment, and I want to make sure I can analyze the output before I recruit participants. Right now I’m struggling with writing an R script that can process my results file. Is there an existing R script or template I can adapt to pre-process PCIbex results for a self-paced reading study? I’ve read the documentation for “data analysis in R,” but I’m still having difficulty, for example, removing outliers based on criteria like:
-remove participants with an accuracy rate below 75% on the comprehension Qs for the experimental items
-remove items with incorrect responses to comprehension Qs
-remove RTs exceeding a threshold of 3000msThis might be more related to R coding rather than PCIbex, but I wasn’t sure where else to look for help, and it’d be really helpful if you could refer me to any sources. Thank you!
– Rosa
July 28, 2020 at 11:33 am #5875JeremyKeymasterHi Rosa,
EDIT: well, I read your message too fast and didn’t realize you were asking about self-paced reading specifically—I’d be happy to adapt the example in this message to self-paced reading trials if it helps
For this example, I’ll be working from an extremely minimal trial structure:
newTrial( "experimental" , newScale("comprehensionanswer", "Yes", "No") .print() .wait() .log() ) .log("id", GetURLParameter("id")) .log("correct", "Yes") .log("itemnumber" , 1 )
I’m assuming all experimental trials are labeled experimental and that itemnumber uniquely identifies your trials. Let’s first load the results in a table:
results <- read.pcibex( "results_comprehension" )
We’ll be comparing Value and correct a lot, so we’ll de-factorize those columns:
results$Value <- as.character(results$Value) results$correct <- as.character(results$correct)
Now let’s load dplyr and do our magic:
library("dplyr") results <- results %>% group_by(id) %>% mutate(accuracy=mean(Value[Label=="experimental"&Parameter=="Choice"] ==correct[Label=="experimental"&Parameter=="Choice"])) %>% group_by(id,itemnumber) %>% mutate(RT=EventTime[Parameter=="Choice"] - EventTime[Parameter=="_Trial_"&Value=="Start"])
The first
mutate
compares Value against correct for the rows of the experimental trials where Parameter is “Choice” (= rows reporting which option was selected on the scale) and outputs the mean for each participant (seegroup_by(id)
)The second
mutate
simply subtracts the EventTime corresponding to the start of the trial from the EventTime corresponding to the choice on the scale, for each trial for each participant (seegroup_by(id,itemnumber)
).Now that we have added the accuracy column which reports the proportion of correct answers to the experimental trials for each participant, and the RT column which reports how long they took to make a decision for each trial, we can proceed to the filtering:
results_filtered <- results %>% filter(Label=="experimental" & accuracy>=3/4 & Value==correct & RT<=3000)
Let me know if you have questions
Jeremy
August 24, 2021 at 9:14 am #7184apspjParticipantHi, Jeremy.
I am trying to clean some demo-results from this demo-experiment: https://farm.pcibex.net/r/dTlroq/
However, when I clean the results file using the function
read.pcibex
, some columns get misplaced. It is as if there is some additional column label and all of column labels are pushed to the right. For instance, the column “Value” displays the event time and the column “Event Time” that follows “Value” displays the type of verb that I logged in the script, which is supposed to the in the following column “verb”. I tried to put a print of an image of the table here but it does not work.Thanks a lot,
Ana.
2 min later EDIT: I guess I found the mistake. I created a columns with
.log
in my script only for the experimental items and not for the fillers so, in the end, the column of fillers is pushed towards the columns “verbo” and “número” which are supposed to be empty. The next question is: how do I avoid this? I don’t need such information for the fillers, just for the experimental items.Thanks!
- This reply was modified 3 years, 4 months ago by apspj.
August 24, 2021 at 10:54 am #7186JeremyKeymasterHi Ana,
There is no one-size-fits-all answer to your question. What
pcibex.read
does is find the rows with the most columns and use those for the whole tableIt’s pretty easy to subset a data frame in R and delete a column, then rename the remaining ones:
df <- data.frame(colInts=c(1,2,3),colLetters=c(NA,NA,NA),colExtra=c('a','b','c'),colBools=c(T,F,T),colRats=c(1/1,1/2,1/3)) cols <- names(df) df[,2] <- NULL cols <- cols[c(seq(1,2),seq(4,5))] names(df) <- cols
The value of
df
after the first line:colInts colLetters colExtra colBools colRats 1 1 NA a TRUE 1.0000000 2 2 NA b FALSE 0.5000000 3 3 NA c TRUE 0.3333333
The value of
df
after the last line:colInts colLetters colBools colRats 1 1 a TRUE 1.0000000 2 2 b FALSE 0.5000000 3 3 c TRUE 0.3333333
Jeremy
August 24, 2021 at 10:59 am #7187apspjParticipantHi, Jeremy.
I thought it could be some problem with the script. But, yes, I have to play with it a bit in R.
Thanks a lot,
Ana.
-
AuthorPosts
- You must be logged in to reply to this topic.