Repetitive records BUT different results

PennController for IBEX Forums Support Repetitive records BUT different results

Viewing 7 posts - 1 through 7 (of 7 total)
  • Author
    Posts
  • #10630
    Chloris
    Participant

    Hi Jeremy,

    It’s me again. I’m cleaning the results for data quality check. However, I found that, for some participants, the records were duplicated (the correct lines would be 216, but for some people it’s 432 – although it’s not very frequent), and the tricky thing is – some results are different. I would assume that the participant actually only did the experiment once and it has something to do with data recording, because it’s highly unlikely that she/he spend exactly the same time when doing all the trials twice.

    Below is part of the erroneous results. RT is in milliseconds.

    ID	group	score	entry	RT	Label	word_id
    
    fr51	A	1	bateau	94.15	block_1	22
    fr51	A	1	bateau	94.15	block_1	22
    fr51	A	6	amour	270.6	block_1	8
    fr51	A	4	amour	270.6	block_1	8
    fr51	A	1	haine	139.6	block_3	119
    fr51	A	7	haine	139.6	block_3	119
    fr51	A	1	larme	122.55	block_3	131
    fr51	A	1	larme	122.55	block_3	131
    fr51	A	7	peur	206.8	block_4	165
    fr51	A	7	peur	206.8	block_4	165
    fr51	A	7	regret	175.3	block_4	180
    fr51	A	7	regret	175.3	block_4	180
    fr51	A	7	tristesse	135.35	block_4	206
    fr51	A	7	tristesse	135.35	block_4	206

    Do you know the possible reason for this? How do I tell which records were actually the “real” ones? How to only keep the correct records when cleaning the results in R?

    Here’s the link for my experiment:
    https://farm.pcibex.net/r/BcwcIV/

    Many thanks,
    Chloris

    #10636
    Jeremy
    Keymaster

    Hi Chloris,

    The table you include in your message presents the data after it has been transformed, so unless you provide the script or a detailed explanation of how you obtained that table, I can hardly understand how it maps to the raw results of your experiment

    That being said, judging from your table, it looks like you maybe summarized the data in groups as defined by the ID column above, ie. treated the results as if all the lines referencing “fr51” corresponded to the same submission. That, however, is not the case: I find five submissions in the database that reference “fr51”. Four of those five submissions report the same MD5 hash, indicating that they were taken on the same device using the same browser and the same connection — the remaining submission has a different MD5 hash. I don’t know whether it’s something unexpected for your collection method

    Jeremy

    #10644
    Chloris
    Participant

    Hi Jeremy,

    Thanks a lot for your reply! My experiment asked participants to rate French words according to certain criteria. I got the table by selecting relevant columns and then selecting certain words (rows) in R:

    # select and rename relevant columns
    tidied_val <- results_val %>%
      filter(Parameter == "Choice" | Value == "Start") %>%
      select(ID, group, entry, Label, Parameter, PennElementName, Value, EventTime, word_id) %>%
      group_by(group, ID, entry) %>%
      mutate(RT = (mean(EventTime[Parameter=="Choice"] - EventTime[Value=="Start"]))/10)  %>%
      ungroup() %>%
      filter(Parameter == "Choice")  %>%
      select(ID, group, Value, entry, RT, Label, word_id) %>%
      rename(score = Value)
    
    # quality check (valence)
    quality_val <- tidied_val %>%
      filter(entry == "amour"| 
               entry == "espoir"| 
               entry == "haine"| 
               entry == "peur"| 
               entry == "tristesse"| 
               entry == "fer"| 
               entry == "sandwich"| 
               
               entry == "cauchemar"|
               entry =="douleur"| 
               entry =="rire"|
               entry =="passion"|
               entry =="jalousie"|
               entry == "racisme"| 
               entry == "anxiété"| 
               entry == "dent"| 
               
               entry == "plaisir"| 
               entry == "souffrance"| 
               entry == "colère"| 
               entry == "désagréable"| 
               entry == "bonheur"|
               entry == "richesse"| 
               entry == "poire"| 
               entry == "moyen"| 
             
               entry == "horreur"| 
               entry == "mensonge"| 
               entry == "confort"| 
               entry == "victoire"|
               entry == "diable"|
               entry == "chagrin"|
               entry == "vertu" |
               entry == "appareil") %>%
      group_by(ID)
    
    
    readr::write_excel_csv(quality_val, "quality_val.csv") 

    The ID variable was acquired by asking participants to enter it by themselves. I thought about whether it might be two different participants, on of whom accidentally put the wrong ID. But why would the two records have the same response time in that case?

    May I know what you referred when saying “MD5 hash”? I’m wondering how to distinguish which four records belonged to that single participant…

    Best wishes,
    Chloris

    #10646
    Jeremy
    Keymaster

    Hi Chloris,

    The first columns of each line of the results file are described in the IBEX manual: the second column is the MD5 hash, the first one is the reception time of the submission; using the two together reliably identifies the rows that come from the same submission

    The reason why you get the same RT for different scores is that your code calculates RTs across all the scores: group_by(group, ID, entry) groups the data by group, ID and entry, but not by MD5+ReceptionTime, so the groups might contain more than one choice (in case of multiple submissions being associated with the same ID). In those cases where your groups contain several choices, mutate(RT = (mean(EventTime[Parameter=="Choice"] - EventTime[Value=="Start"]))/10) will calculate one RT per group spanning multiple choices. Even though you’re adding a column that contains a single RT value per group, your table at that point still contains multiple choices per group, so when you do filter(Parameter == "Choice")+rename(score = Value) later on, you end up with multiple scores (for those groups that contain multiple ones)

    I don’t know why you got multiple submissions with the same ID: it could be that one participant took the experiment several times, or that they shared their ID with other participants that could have taken the experiment on the same browser+device in some cases. It looks like that didn’t just happen with fr51; you should double-check your results file yourself

    Jeremy

    #10647
    Chloris
    Participant

    The same problem also occurred in my other experiment:
    https://farm.pcibex.net/r/Iwjgkg/

    It happened to fr24, still not very frequent – only once, but perplexing.

    #10648
    Chloris
    Participant

    Thanks for providing the latest tutorial! I referred to this tutorial https://www.linguisticsociety.org/sites/default/files/PCIbex_Tutorial%5B2%5D.pdf when writing R codes…

    Could you suggest a more unequivocal way of calculating the RTs?

    Many thanks,
    Chloris

    #10651
    Jeremy
    Keymaster

    The PDF you link to is indeed outdated. Please follow the latest tutorial instead

    As far as I can tell, however, your issue does not come from PCIbex, or from the way you calculate RTs per se. It comes from the fact that some values (fr51, fr24, …) were entered on several occasions, for multiple submissions. As I mentioned in my previous message, and as suggested by the IBEX manual, you could group by MD5+ReceptionTime instead of, or in addition to, ID (which turned out to not uniquely identify submissions, as we found out): group_by(group, Results.reception.time, MD5.hash.of.participant.s.IP.address, ID, entry). But at the end of the day, if you initially assumed that ID should uniquely identify submissions, you should probably try to figure out why that turns out not to be the case

    Jeremy

Viewing 7 posts - 1 through 7 (of 7 total)
  • You must be logged in to reply to this topic.