PennController for IBEX › Forums › Support › Repetitive records BUT different results
Tagged: Data Collection, data repetition
- This topic has 6 replies, 2 voices, and was last updated 1 week, 1 day ago by Jeremy.
May 26, 2023 at 8:05 am #10630
It’s me again. I’m cleaning the results for data quality check. However, I found that, for some participants, the records were duplicated (the correct lines would be 216, but for some people it’s 432 – although it’s not very frequent), and the tricky thing is – some results are different. I would assume that the participant actually only did the experiment once and it has something to do with data recording, because it’s highly unlikely that she/he spend exactly the same time when doing all the trials twice.
Below is part of the erroneous results. RT is in milliseconds.
ID group score entry RT Label word_id fr51 A 1 bateau 94.15 block_1 22 fr51 A 1 bateau 94.15 block_1 22 fr51 A 6 amour 270.6 block_1 8 fr51 A 4 amour 270.6 block_1 8 fr51 A 1 haine 139.6 block_3 119 fr51 A 7 haine 139.6 block_3 119 fr51 A 1 larme 122.55 block_3 131 fr51 A 1 larme 122.55 block_3 131 fr51 A 7 peur 206.8 block_4 165 fr51 A 7 peur 206.8 block_4 165 fr51 A 7 regret 175.3 block_4 180 fr51 A 7 regret 175.3 block_4 180 fr51 A 7 tristesse 135.35 block_4 206 fr51 A 7 tristesse 135.35 block_4 206
Do you know the possible reason for this? How do I tell which records were actually the “real” ones? How to only keep the correct records when cleaning the results in R?
Here’s the link for my experiment:
ChlorisMay 30, 2023 at 8:18 am #10636
The table you include in your message presents the data after it has been transformed, so unless you provide the script or a detailed explanation of how you obtained that table, I can hardly understand how it maps to the raw results of your experiment
That being said, judging from your table, it looks like you maybe summarized the data in groups as defined by the ID column above, ie. treated the results as if all the lines referencing “fr51” corresponded to the same submission. That, however, is not the case: I find five submissions in the database that reference “fr51”. Four of those five submissions report the same MD5 hash, indicating that they were taken on the same device using the same browser and the same connection — the remaining submission has a different MD5 hash. I don’t know whether it’s something unexpected for your collection method
JeremyMay 31, 2023 at 3:48 am #10644
Thanks a lot for your reply! My experiment asked participants to rate French words according to certain criteria. I got the table by selecting relevant columns and then selecting certain words (rows) in R:
# select and rename relevant columns tidied_val <- results_val %>% filter(Parameter == "Choice" | Value == "Start") %>% select(ID, group, entry, Label, Parameter, PennElementName, Value, EventTime, word_id) %>% group_by(group, ID, entry) %>% mutate(RT = (mean(EventTime[Parameter=="Choice"] - EventTime[Value=="Start"]))/10) %>% ungroup() %>% filter(Parameter == "Choice") %>% select(ID, group, Value, entry, RT, Label, word_id) %>% rename(score = Value) # quality check (valence) quality_val <- tidied_val %>% filter(entry == "amour"| entry == "espoir"| entry == "haine"| entry == "peur"| entry == "tristesse"| entry == "fer"| entry == "sandwich"| entry == "cauchemar"| entry =="douleur"| entry =="rire"| entry =="passion"| entry =="jalousie"| entry == "racisme"| entry == "anxiété"| entry == "dent"| entry == "plaisir"| entry == "souffrance"| entry == "colère"| entry == "désagréable"| entry == "bonheur"| entry == "richesse"| entry == "poire"| entry == "moyen"| entry == "horreur"| entry == "mensonge"| entry == "confort"| entry == "victoire"| entry == "diable"| entry == "chagrin"| entry == "vertu" | entry == "appareil") %>% group_by(ID) readr::write_excel_csv(quality_val, "quality_val.csv")
The ID variable was acquired by asking participants to enter it by themselves. I thought about whether it might be two different participants, on of whom accidentally put the wrong ID. But why would the two records have the same response time in that case?
May I know what you referred when saying “MD5 hash”? I’m wondering how to distinguish which four records belonged to that single participant…
ChlorisMay 31, 2023 at 5:26 am #10646
The first columns of each line of the results file are described in the IBEX manual: the second column is the MD5 hash, the first one is the reception time of the submission; using the two together reliably identifies the rows that come from the same submission
The reason why you get the same RT for different scores is that your code calculates RTs across all the scores:
group_by(group, ID, entry)groups the data by group, ID and entry, but not by MD5+ReceptionTime, so the groups might contain more than one choice (in case of multiple submissions being associated with the same ID). In those cases where your groups contain several choices,
mutate(RT = (mean(EventTime[Parameter=="Choice"] - EventTime[Value=="Start"]))/10)will calculate one RT per group spanning multiple choices. Even though you’re adding a column that contains a single RT value per group, your table at that point still contains multiple choices per group, so when you do
filter(Parameter == "Choice")+
rename(score = Value)later on, you end up with multiple scores (for those groups that contain multiple ones)
I don’t know why you got multiple submissions with the same ID: it could be that one participant took the experiment several times, or that they shared their ID with other participants that could have taken the experiment on the same browser+device in some cases. It looks like that didn’t just happen with fr51; you should double-check your results file yourself
JeremyMay 31, 2023 at 7:54 am #10647
The same problem also occurred in my other experiment:
It happened to fr24, still not very frequent – only once, but perplexing.May 31, 2023 at 8:07 am #10648
Thanks for providing the latest tutorial! I referred to this tutorial https://www.linguisticsociety.org/sites/default/files/PCIbex_Tutorial%5B2%5D.pdf when writing R codes…
Could you suggest a more unequivocal way of calculating the RTs?
ChlorisMay 31, 2023 at 10:51 am #10651
The PDF you link to is indeed outdated. Please follow the latest tutorial instead
As far as I can tell, however, your issue does not come from PCIbex, or from the way you calculate RTs per se. It comes from the fact that some values (fr51, fr24, …) were entered on several occasions, for multiple submissions. As I mentioned in my previous message, and as suggested by the IBEX manual, you could group by MD5+ReceptionTime instead of, or in addition to, ID (which turned out to not uniquely identify submissions, as we found out):
group_by(group, Results.reception.time, MD5.hash.of.participant.s.IP.address, ID, entry). But at the end of the day, if you initially assumed that ID should uniquely identify submissions, you should probably try to figure out why that turns out not to be the case
- You must be logged in to reply to this topic.