Auto-translator issue – PennController for IBEX

Tagged: mturk, translate

This topic has 5 replies, 2 voices, and was last updated 5 years, 7 months ago by Jeremy.

Viewing 6 posts - 1 through 6 (of 6 total)

Author

Posts
September 30, 2019 at 3:35 pm #4239

leaena
Participant

Hi Jeremy,

We (alas! alack!) discovered that one of our participants was using an automatic google translate browser extension on our experiment. Obviously this is a problem for us… so now we are in brainstorm mode to think of ways to get around this. I wanted to ask you as well since you would be the person who could do anything about it on the PennController side.

One option is to take screenshots of each sentence and then display them as images rather than as strings. Not entirely ideal, but doable.

Martin had another idea which was to automatically insert a white character where the space should be, rendering the string untranslatable, or else replace all of the spaces with another character and have PennController recognize that as a space and white it out. I’m not sure how easy that would be but I thought I’d float the idea.

Anything you can tell me about PennController’s ability to detect, stop or otherwise thwart such an extension would be much appreciated.

Thank you!
Leo

September 30, 2019 at 5:22 pm #4241

Jeremy
Keymaster

Hi Leo!

I’m curious, how did you discover that your participant used an automatic translation extension?

I think all the solutions you list are quite ingenious. If you don’t mind showing text as images as your mention of screenshots seems to suggest (note that it would mean constant printing size on the screen, irrespective of the participant’s window size/resolution) then using the HTML canvas could be a good idea (not to be confused with PennController’s Canvas element). You can print text onto it using fillText and extensions should be blind to it.

I guess you could also use the intrusive character method, but how to best implement it would depend on how the extension works.

I’m not sure there would be a unique, ideal solution to implement as a feature of PennController for future releases. But really, I don’t how prevalent this problem is. I don’t recall noticing participants using an automatic translation tool (though I could have been fooled over and over again). Hopefully any such participant eventually gets filtered out by my exclusion criteria when it comes to data analysis. I guess getting a sense of how serious a problem it is would help you see whether it’s worth spending the time and effort coming up with and implementing a preventive solution.

Let me know if you need help with the canvas method and I can try to work something out for you.

Jeremy

October 1, 2019 at 5:30 pm #4243

leaena
Participant

Hi Jeremy,

He was not one of the people who were caught by our attention checking exclusion criteria. We discovered that he was using a translator because the code at the end of our experiment was an English word, and the code got translated, too, so it told him he had an invalid code, and he emailed me with screenshots to prove that he had completed the experiment that showed the code in Spanish. He also told me that the sentences had shown in English, but when I tested it with the same google translate extension I saw in his screenshots, they were in Spanish. I do worry that this has possibly been happening before without us catching it, since his performance on the task was actually above average…kind of a shame that we had to exclude him anyway.

I am interested in trying out the HTML canvas method, since that seems like it could be integrated with the PC template. But I’m not exactly sure how to get started – it seems like the PC HTML element is mostly for forms which are already created. Can I put “raw” HTML into an HTML element in PennController such that I can fill it with item.Sentence?

Thank you very much for your help!

Leo

October 1, 2019 at 6:14 pm #4244
Jeremy
Keymaster
Hi Leo,

I see, it’s actually impressive that participants can perform above average using a translator! Good to know, then I guess it would be worth adding a note asking participants not to use such tools when taking the experiment (some participants might be well-intentioned and not realize that it could be detrimental to data quality).

There is no straightforward implementation of the HTML canvas method, as canvas require you to code-draw their content in javascript. So you would need to add a canvas element onto the page somehow and then fill it using some javascript code. You can pass HTML to basically any string of a PennController element that ends up being printed, for example Text elements. And you can use newFunction to create (and then call) a javascript function. So here is a basic example (assuming you are inside a Template function, using row.Sentence to refer to the Sentence cell):
```
newText("canvasText", '')
  .print()
,
newFunction(()=>{
  let width = $(".PennController-canvasText-container").width();
  let cvs = document.getElementById("myCanvas");
  let ctx = cvs.getContext("2d");
  ctx.font = '36px serif';
  let words = row.Sentence.split(' '), lines = [], line = [];
  while (words.length){
    line.push(words.shift());
    if (ctx.measureText(line.join(' ')).width >= width){
      words = [line.pop(), ...words];
      lines.push(line);
      line = [];
    }
  }
  if (line.length)
    lines.push(line);
  cvs.width = width;
  cvs.height = 40 * lines.length;
  ctx.font = '36px serif';  // Needed for re-calculating px after resizing cvs
  for (let n = 0; n < lines.length; n++)
    ctx.fillText(lines[n].join(' '), 1, 38*(n+1));
}).call()
```
As you can see it's clearly not as elegant and concise as newText("blablabla").print(): because you're printing the text as an image, you have to manually calculate sizes and print each line, something browsers automatically do for you when the text is printed normally onto the page. But this code is still very basic: it will calculate sizes and positions based on what the page looks like at the exact moment when the Function is called, so it won't automatically adjust to any resizing after that. But at least it should be functional and definitely prevent automatic translation from extensions (again, the text is just an image---the extension would need a neural network or something to read the image).

Let me know if you have questions

Jeremy
November 26, 2019 at 1:43 pm #4646

leaena
Participant

Hi Jeremy,

I thought I should update you on this in case you are interested. We eventually came up with a python script that inserts html in between each letter and either changes the color to slightly-off-black-but-undetectably-so, or inserts formatting that doesn’t show up (e.g. <span>). This breaks the auto-translation plugin. It doesn’t prevent copying and pasting or image recognition, but we’ve deemed it good enough for our purposes for the time being.

Best,
Leo

November 26, 2019 at 2:17 pm #4647
Jeremy
Keymaster
Hi Leo,

Thank you, it’s good to know that the method you describe is efficient against auto-translation issues! And since it is, maybe your Python script can be transposed into javascript, e.g. like this:
```
function insertHtmlTags(s) {
  if (typeof(s) != "string" || s.length == 0) return s;
  return s.replace(/(.)(.)/g, '$1$2');
}

PennController(
    newText( insertHtmlTags("Hello world!") )
        .print()
    ,
    newButton("OK")
        .print()
        .wait()
)
```
Does this also break the auto-translation plugin?

Jeremy
Author

Posts

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic.