How to assess interpreting

Use new evidence of learning to replace old

Use new evidence of learning to replace old (Photo credit: dkuropatwa)

This is not my first and surely not my last post on assessment. If you’re looking for the other posts just type “assessment” in the search box to the right. Last Friday (March 15, 2013) I gave a talk on process and expertise research in the Nordic countries at the conference “Le Nord en français” at the University of Mons (one of my alma maters, actually). I also presented the results of my PhD project. All this in 20 minutes, so you can imagine I didn’t have the time to be very thorough.

One of the questions that came up was how I actually went about doing my assessment, and why I choose this particular methodology and not others. As I didn’t really get round to go through my assessments thoroughly, I thought I’d try to do it here. Thanks for the discussion Cédéric, if you stop by and read the post, don’t hesitate to comment or ask more questions.

When I set out to investigate interpreters at different levels of experience I understood quite early that I had to evaluate or assess their product one way or the other. I did not want to assess them based only on my own judgment. I preferred to have “independent/objective” judges, as I was afraid I would be biased both as an interpreter myself and as a colleague to several of my informants. So, fairly early on I decided to use groups of assessors rather than asses myself.

1. Choosing an instrument

Next, I had to choose the instrument for assessment. A popular method for assessing interpreting both in research and otherwise is to use a componential approach. Components typically cover fluency, correctness (terminology, grammar, syntax), sense consistency (with original), logical cohesion, intonation, accent, style and more (or less). Assessors evaluate each component in order to get a complete evaluation of the interpreting. There were several reasons why I did not want to use this componential approach. First, different researchers had pointed out potential problems when using this type of assessment. Heike Lamberger-Felber found in her PhD that it was very difficult to get consistent results from a componential assessment. But, while the rating of the different components varied a lot, the assessors’ rankings of the different interpreters were almost in agreement. Angela Collados-Aís and her ECIS research team have published several reports on assessment, pointing out that although the assessors in their different studies all agree on the level of importance of different components (e.g. fidelity to the original is the most important), other components (e.g. native accent) affect how the most important ones are rated. So a foreign accent would give a lower score for fidelity, although the interpretings word wise were identical. Another important aspect for me was that I wanted to use people without personal experience as an interpreter to be assessors. The reason behind it was that the Swedish interpreting community is so little that it would be almost inevitable for interpreter-assessors to recognize interpreter-informants.

2. Carroll’s scales

So, I started looking at other types of assessment and soon found a type of Lickert-scale used by Linda Anderson already in the late 1970’s. She used two scales created by John Carroll in 1966 to assess machine translation. John Carroll LINK specialized in language testing and he was a big critic of the discreet point theory. The discrete point theory claims that from certain features in a language learner’s production you can predict the learner’s proficiency in that language (rings a bell? if not – reread the paragraph above). When Carroll developed his instrument for translation he said that a translation can be perfectly true to the original but incomprehensible or perfectly comprehensible but completely untrue to the original. Therefore he developed two scales one for intelligibility (comprehensible or not) and the other for informativeness (different from the original or not). The translations were assessed using both scales. Linda Anderson then applied them as they were to her data collected from conference interpreters. She did not dwell much on using the scales, but seemed to fear that they were too blunt.

The scales had not really been used since then, but I found them appealing and wanted to test. One issue was that the scales had served as basis for creating the scales for the US court interpreter accreditation test (FCICE) and this test had been very criticized for its accuracy (or lack thereof). Andrew Clifford has investigated those tests and argues that there may not be any significant difference between the different test constructs. I do not argue against Clifford’s conclusions, on the contrary, but I think the problem lies in how the court accreditation test was developed and is used, rather than a problem with the original scales.

More than one researcher (but far from all) have sniggered at me for using scales that old, which clearly did not create a spin-off in the interpreting research world. If they weren’t used again it must be because they weren’t good, right? But since I’m a stubborn risk-taker I decided to go ahead. What more fun than to dance with the devil? (Yes, I am being ironic in case you wonder…)

3. Tiselius’ adaptation (sounds grand talking about myself in third person right?!)

The scales had to be adapted of course. They were created for translation and I was going to use them for interpreting. Furthermore, there were nine scale steps, some of them difficult to discern from one another. I wanted clear differences between the scale steps, and no middle step, no number five where everything generally OK could be put. Therefore I changed the definitions from written to spoken language and from English to Swedish. I also reduced the steps from nine to six, merging a few that were very similar.

Now only using the scales remained …  When it came to using the scales I had to decide whether to use sound files or transcripts. After all, interpreting is the spoken word, and should it be assessed on the basis of written words? And if I wanted to use non-interpreters as assessors then I would have to justify that. Presumably, interpreters, especially those who have jury training, would be better than non-interpreters at evaluating interpreting.

4. Interpreters or non-interpreters?

I had both interpreters and non-interpreters rate the first batch of interpretings (on transcripts as I did not want the interpreters to recognize their peers). It turned out that in raw figures the interpreters were slightly more severe, but the scores from the two groups correlated and the difference was not significant. These results indicated that I could use either interpreters or non-interpreters.

5. Sound-files or transcripts?

I designed a study where the intelligibility part of the interpretings was assessed by non-interpreters from both sound-files and transcripts. One group assessed transcripts (with normalized orthography and punctuation) and the other sound files. The sound files got slightly worse scores than the transcripts, but again the difference was not significant and all the scores correlated. So from this respect I could use either sound-files or transcripts.

I ended up going for transcripts. This decision mostly came from the insight that Collados Aís provided on how deceitful the voice is when it comes to assessment of product. Pitch, intonation, accent, security and so forth affects the impression of the quality of the product. Clearly, this aspect is important for the assessment of the interpreting, but with the aim in this study to assess only the skill to transfer an entire message in one language into another it seemed wise to exclude it, too many confounding variables.

6. The assessment

The assessment units ended up looking like this:

Intelligibility

First the raters saw only the interpretation and they rated that according to the scale from completely unintelligible to completely intelligible, from 1 (lowest) to 6 (highest). They also had a sheet with the full explanation of each step of the scale next to them when rating. If you’re curious I left a copy of the sheet in English here.

Informativeness

Then the raters unfolded the sheet of paper and the European parliament’s official translation showed up at the bottom. Then they rated the informativeness of the interpreting, i.e. the difference between the original and the interpretation. This time from no difference compared to the original to completely different compared to the original. Now the scale is inverted so 1 is the best score and 6 the worst. You may wonder why the scale is inverted this time; I decided to stick with Carroll’s original proposal where a low score is equal to little difference. The zero on the scale means that the interpreters added information not present in the original. This typically happens when something implicit is explicitated or when an additional information or hedge is given.

7. Did it work?

The results I got in my cross-sectional material were very promising, clear differences where I would expect them, i.e. between non-interpreter subjects and interpreter subjects, and between novice interpreters and experienced interpreters. The inter-rater variability, that is the variability of the scores between the different raters, was also low. So far, I’m not sure about the results for my longitudinal material. I did not see differences where I expected them. This may be due to a failing instrument (i.e. my scales) or less difference of the interpreting products than what I expected. To be continued…

Now, there are a few more things to try out with my scales. Obviously, an interpreter trainer would not start transcribing their students’ interpretings and divide them into assessment files before assessing or grading them. But, presumably, the scales could work in a live evaluation as well. I have not yet had an opportunity to test them, but I’m looking forward to that, and I will of course keep you posted.

References

Anderson, L. 1979. Simultaneous Interpretation: Contextual and Translation Aspects. Unpublished Master’s Thesis. Department of Psychology, ConcordiaUniversity, Montreal, Canada

Carroll, John, B. 1966. “An Experiment in Evaluating the Quality of Translations.” Mechanical Translations and Computational Linguistics 9 (3-4): 55-66.

Collados Aís, Á., Iglesias Fernández, E. P. M. E. M., & Stévaux, E. 2011. Qualitätsparameter beim Simultandolmetschen: Interdisziplinäre Perspektiven. Tübingen: Narr Verlag.

Clifford, Andrew. 2005. “Putting the Exam to the Test: Psychometric Validation and Interpreter Certification.” Interpreting 7 (1): 97-131.

Lamberger-Felber, H. 1997. Zur Subjektivität der Evaluierung von Ausgangstexten beim Simultandolmetschen. In N. Grbic & M. Wolf (Eds.), Text – Kultur – Kommunikation. Translation als Forshungsaufgabe (pp. 231–248). Tübingen: Stauffenburg Verlag.

Tiselius, E. 2009. “Revisiting Carroll’s Scales.” In Testing and Assessment in Translation and Interpreting Studies. C. Angelelli and H. Jacobson (eds.). 95-121. ATA Monograph Series. Amsterdam: Benjamins.

Advertisement

Self assessment

Although I often like to picture my students as readers when I blog, this post is in particular for you, dear students. The idea came after a very pleasant lunch with an aspiring interpreter. We shared ideas and experiences, personally I was probably very close to a perfect personification of the benevolent granny: “I remember when I was…” Anyhow, I realized that my future colleague could use a few hints on self asssessment and out of classroom practice. I have touched upon practice and learning consecutive earlier. But this post is particularly aimed at giving tips on practice and self assessment.

If you are going to improve and grow as an interpreter practice and self-evaluation is essential. You have to listen to yourself critically, identify areas that can be improved and work on them. Here’s my own step-by-step guide to how to do it. This guide assumes you have gotten basic notions of interpreting and what interpreting teachers are looking for. I will give you ideas on how to correct yourself, but you can probably not follow this guide as a DIY interpreting school. I should also say that there are a million ways to practice and assess yourself, these hints are just a few of my personal ideas that have worked well for me and for my students (or so they tell me). They are a mix of tips I got myself and things that I found out worked for me.

One – Equipment
Get yourself a good mp3 memory, small in size but big in memory. It should be small, with a good mike and good recording quality. Always carry your memory (charged or with extra batteries) with you.

Two – When and how much?
Take every opportunity to practice. If you’re lucky enough to get into a dummy booth, just take out your memory and interpret away. But don’t forget to put on your mp3 memory. If you find yourself in a situation where you can take consecutive notes, then do. Maybe you will have the opportunity to interpret just a little later from your notes. And by all means have your friends, girl/boyfriends, and family give you speeches. And practice often! Every day in short units. But don’t overdo it either, your brain needs som rest as well.

Three – Get the original
Ideally you would want to get the original speech to compare to your own interpreting. There are several ways to do this: a) ask a friend to read speeches to you that you take off the Internet. The Internet is such a wealth of speeches, for instance, most governments and organizations post speeches from their front figures on the web for the press to use. But remember that if your friend reads it, s/he has to adapt the speed. Read speeches can literally be impossible. b) Use the internet and listen to uploaded speeches, news, interviews that you can interpret either simultaneously or consecutively – think You Tube. c) News flashes on the radio. 3 minutes every hour or half-hour and unless something big happens they tend to be the same several times in a row. You can interpret one and then listen to the next one and compare your notes and interpreting(again, remember it’s fast).

Four – Assess
The most painful part of this exercise is to listen to yourself. The first thing here is to get used to listen to your own voice, most people are not used to listen to themselves and find it difficult. You just have to get over it, just like ballet dancers have to get over looking at themselves in the mirror. Then you have to get used to listen critically, and now we are getting to the really crucial point about self assessment.
1) Listen to the overall presentation. One of my friends once complimented another colleague by saying, you sound like a skilled story teller reading from a book. This is what you want it to sound like. No “ahms” or “uhms”, no excessive use of “ands and buts”, no extra sounds. If you’re not producing real words you close your mouth – full stop. And speaking of full stops – finish your sentences! You don’t want to leave your listeners wondering what’s coming next. You can break up the speaker’s sentence in several shorter ones, but make sure to finish them. Also listen to how you come across when it comes to intonation, do you sound sure of what you say or unsure? Do you give a trustworthy impression or not? Do you take your listeners by the hand and guide them through the presentation?
2) Now you have to listen to what you actually convey. Do you interpret what the speaker say or something else? You listen for terminology of course, but also for nuances. Do you interpret what the speaker says or are you perhaps changing the message slightly. This is NOT about using all the words and the same words. I guess we already agree that a word for word interpreting is not the ideal here. You want to say exactly what the speaker says, but in your language and your own words.

Five – Keep a log
Keep a log book of your evaluation. Doesn’t have to be very detailed, but you want to keep a record of what type of speeches (e.g. general politics, easy, 10 min, French>English), your goal (e.g. interpret without interruption for 10 minutes/use a political register/avoid using “and” in the beginning of the sentences) and how you succeeded.

Six – Ask for feed back
Ask your fellow students to help you, ask your family to listen to you, and, if you have the possibility, ask a professional interpreter.

Seven – Set goals for your improvement
Based on your assessment you set goals for the next exercise. Tangible goals such as: “I’m going to interpret without interruption for five minutes” or “I’m not going to use any extra-sounds this time” or “I will use the new vocabulary (word X,Y and Z) or the new set phrases I’ve learned”.

And a final word, you start with easy texts and as you feel more confident you add difficulty. If you are aiming for a conference interpreter test you will want to be able to interpret effortlessly in consecutive for more than six minutes and in simultaneous mode for 20 minutes.

And remember the old story about the tourist in New York who was lost and unknowingly asked Arthur Rubinstein “How do you get to Carnegie Hall?”. Rubinstein answered: “Practice, practice, practice.”

Good luck and Go for it!

Bad interpreters or bad system?

The Swedish Tolkprojektet (interpreting project) has been working since 2008 to shed light on the situation of community interpreting in Sweden. They presented their conclusions and rounded off their project at a conference i Stockholm at the end of August. Their conclusions got quite a lot of press in Sweden, especially since they said that too many unqualified interpreters are used in court trials and hospitals. The news even made it into the Facebook and Twitter discussions. You can read articles in Swedish here, here and here. Read the conclusions here (in Swedish)

This is no news, for quite some time qualified and certified interpreters in Sweden have been struggling to get different Swedish authorities to understand that they need to raise their demands on interpreters’ qualifications. The Swedish system for recruiting community interpreters got a severe blow in the early nineties when interpreting was sold out from the municipality agencies in order to be exposed to competition. Instead of municipal interpreting agencies users of interpretation now had to deal with private agencies with a strong desire for gain. The interpreters were still the same people but now procured through different private agencies. Since agencies desired to raise their own income (private companies usually do, nothing wrong in that) and users of interpretation (hospitals, police, courts etc) were unhappy to pay more for the service, agencies started recruiting less qualified interpreters in order to lower the cost of interpreting fees.

The final blow came with the EU directive on public procurement. Interpretation services were administrated by purchasing staff also responsible for procuring paper, chairs, pens and so forth. Needless to say a ruthless race to the bottom began. Quality was nothing, low fees everything. Of course, agencies committed to always send a certified interpreter if available, but since it was more expensive for the agency to send a certified interpreter, it rarely happened. Actually interpreters reported that as they got their certification assignments went down. Another horrible tale about the agencies I heard during this period was that interpreters who were favored by the interpreting agency also were given assignments to top up their month (i.e. being able to almost survive on interpreting), the top up assignments were not necessarily in the interpreter’s working lanugages, it only had to be languages that he most likely mastered.

At that time (after the EU directive) I met with several procurement officers in my role as regional representative for AIIC trying to convince them to stress (and pay for) quality in their procurement, and they all had the same message: If the quality of the service delivered was poor, then the users would complain, the procurer had then broken his contract and would have to adapt, and worst case for the next round of call for tenders the situation would be solved.

Now, the problem with that argument is that:
1) users of interpretation rarely complain, because a) they are immigrants with little power and lack of knowledge on how to complain or b) they are stressed professionals (MDs, lawyers, social officers etc) how just deal with the situation as well as they can.
2) the conclusion that most Swedish users of interpretation draw when interpreting breaks down is often “interpreting doesn’t work” rather than “the interpreter was bad”. This is due to little experience with and exposure to interpretation.

People tend to just live with it and do the best they can. A few years ago some journalists and media started to discover the alarming situation and there were some articles, but the debate never really took off. Mostly, I believe because, again, the big group of individual users of community interpretation is a weak group with no strong public voice.

Now, it should be said that a lot of work has since then been done in Sweden to improve community interpreters’ competence and to certify as many interpreters as possible. There is also ongoing discussions about the agencies and their role in interpreting quality. Buyers of interpreting services have also increased their demand on the service delivered. But we are far from a well working, stable and situation, and for at least 10 of the past 20 years regression rather than development has been the term to describe the interpreting industry in Sweden.

And thanks to Tolkprojektet the spot light is now put on the absolute strict demand that we need to put on both courts, hospitals, police (society in short) and interpreting agencies as well as interpreters to make sure we provide good, secure interpretation for people in need of it. And of course also making sure that professional interpreters have a descent chance to survive on what they do for a living.

Update:
Read this post about outsourcing in the UK. And The liaison interpreter’s post about being “bad”.