I guess that autumn has been a lot about CPD (continuing professional development) for me. I took a course in working into English B in August (held by Zoë Hewetson and Christine Adams, I can really recommend it) and then the SCIC (interpreting services of the European Commission) training for trainers course in September (it’s offered by SCIC to trainers at its partner universities, and it is definitely worth the effort). And since August I am also taking a MOOC (Massive Open Online Course) on “Surviving your PhD” by the famous @ThesisWhisperer Inger Mewburn of Australian National University. Now, I have already survived my PhD, but the course was also pitched at future and present supervisors, so I thought it might be a good idea, that plus the fact that I really like the Thesis Whisperer and I would get the possibility of taking my first MOOC. Continue reading
We were out walking today and talked about what was the best thing about 2013. For my part it was easy to answer – finishing and defending my PhD! Yes on November 26 I finally defended my PhD, proof here. But 2013 has been a particular eventful year for me. Most of which I have touched upon in my previous post so I won’t dwell anymore on that, suffice to say that I really wish I had had some more time for blogging (no new posts and 500+ unread posts in my feedly…) and twittering these past months. When I met with Michelle (@InterpDiaries), in April I think it was, we both agreed that it isn’t ideas that lack when it comes to blogging, if only there were some more hours in a day.
During autumn I’ve been busy teaching two introductory courses on interpreting, one in Bergen (TOLKHF) and one in Stockholm (ToÖ I). At TÖI (the Institute for Interpreting and Translation Studies), I’ve also been busy launching our Facebook page and our Twitter account (@TOI_SU). On the course developing side I’ve been busy revamping the course Interpreting II, and I’m very excited to see how it turns out as it will start on January 20th. Traditionally we have always given Interpreting II only with Swedish and one other language. This time it will be Swedish and four other languages which means a completely new approach to both teaching and learning interpreting. I have integrated Lionel’s (@Lioneltokyo) approach to start teaching consecutive, when you have students interpret from notes before they actually start to learn note taking. Next term for Interpreting III we will also take on simultaneous interpreting for public service interpreters. I’m still thinking about this since the techniques in public service settings are not the same as in conference settings. There is also very little course literature on simultaneous interpreting for PSI so we’ll see where that takes me. If you have any suggestions, please comment!
As my PhD project drew to an end this year I have also thought about what to do next of course and I have one or two threads I would like to pursue. My most recent research interest has been young children and interpreting or child language brokering as it has been coined (professor Harris writes a lot about it on his blog). I applied for money for a series of workshops on children and interpreting in the Nordic Countries, we did not get it this time but I will apply again, and hope to be successful eventually. I will also try to get project money for a project on just mapping the practice and the ethics around it in the Nordic Countries. I would, of course, also like to continue to follow and interview my fellow interpreters. I hope they will let me continue to record them and investigate them. I have such a wonderful material that I would like to build on. And finally, I hope to start up a project with my friend Emilia Iglesias Fernández of Granada. We have been in touch again and I hope it will lead to something. All of this will of course not come to an end (or maybe not even to a start) during next year, but I’m positive some things will.
This autumn I have also been the extremely proud guest blogger at Rainy London’s blog. I get tired just reading about my own week, but that was a very fun culmination of an extremely busy spring. An I was very flattered to be mentioned by Jonathan Downie (@jonathandownie) in Ligua Grecas blog.
When I write this I sit in front of my bedroom window at our house in the country side, it is pitch black outside and in the distance I can see the lights from our neighbours’. My projects now are like the lights out there gleaming in the distance. I hope I will find my way there and catch up with them. Any new years resolutions? No, not really, they only give me bad conscience as I’m really not good at following them, but if any – being better to stay in touch.
I wish you all the best for the New Year and I hope that you have plenty of projects that you will be able to carry through too. Thank you all my friends, (both in and outside Internet) for staying in touch and being so supportive this year. I hope we will continue our discussions in 2014.
This is not my first and surely not my last post on assessment. If you’re looking for the other posts just type “assessment” in the search box to the right. Last Friday (March 15, 2013) I gave a talk on process and expertise research in the Nordic countries at the conference “Le Nord en français” at the University of Mons (one of my alma maters, actually). I also presented the results of my PhD project. All this in 20 minutes, so you can imagine I didn’t have the time to be very thorough.
One of the questions that came up was how I actually went about doing my assessment, and why I choose this particular methodology and not others. As I didn’t really get round to go through my assessments thoroughly, I thought I’d try to do it here. Thanks for the discussion Cédéric, if you stop by and read the post, don’t hesitate to comment or ask more questions.
When I set out to investigate interpreters at different levels of experience I understood quite early that I had to evaluate or assess their product one way or the other. I did not want to assess them based only on my own judgment. I preferred to have “independent/objective” judges, as I was afraid I would be biased both as an interpreter myself and as a colleague to several of my informants. So, fairly early on I decided to use groups of assessors rather than asses myself.
1. Choosing an instrument
Next, I had to choose the instrument for assessment. A popular method for assessing interpreting both in research and otherwise is to use a componential approach. Components typically cover fluency, correctness (terminology, grammar, syntax), sense consistency (with original), logical cohesion, intonation, accent, style and more (or less). Assessors evaluate each component in order to get a complete evaluation of the interpreting. There were several reasons why I did not want to use this componential approach. First, different researchers had pointed out potential problems when using this type of assessment. Heike Lamberger-Felber found in her PhD that it was very difficult to get consistent results from a componential assessment. But, while the rating of the different components varied a lot, the assessors’ rankings of the different interpreters were almost in agreement. Angela Collados-Aís and her ECIS research team have published several reports on assessment, pointing out that although the assessors in their different studies all agree on the level of importance of different components (e.g. fidelity to the original is the most important), other components (e.g. native accent) affect how the most important ones are rated. So a foreign accent would give a lower score for fidelity, although the interpretings word wise were identical. Another important aspect for me was that I wanted to use people without personal experience as an interpreter to be assessors. The reason behind it was that the Swedish interpreting community is so little that it would be almost inevitable for interpreter-assessors to recognize interpreter-informants.
2. Carroll’s scales
So, I started looking at other types of assessment and soon found a type of Lickert-scale used by Linda Anderson already in the late 1970’s. She used two scales created by John Carroll in 1966 to assess machine translation. John Carroll LINK specialized in language testing and he was a big critic of the discreet point theory. The discrete point theory claims that from certain features in a language learner’s production you can predict the learner’s proficiency in that language (rings a bell? if not – reread the paragraph above). When Carroll developed his instrument for translation he said that a translation can be perfectly true to the original but incomprehensible or perfectly comprehensible but completely untrue to the original. Therefore he developed two scales one for intelligibility (comprehensible or not) and the other for informativeness (different from the original or not). The translations were assessed using both scales. Linda Anderson then applied them as they were to her data collected from conference interpreters. She did not dwell much on using the scales, but seemed to fear that they were too blunt.
The scales had not really been used since then, but I found them appealing and wanted to test. One issue was that the scales had served as basis for creating the scales for the US court interpreter accreditation test (FCICE) and this test had been very criticized for its accuracy (or lack thereof). Andrew Clifford has investigated those tests and argues that there may not be any significant difference between the different test constructs. I do not argue against Clifford’s conclusions, on the contrary, but I think the problem lies in how the court accreditation test was developed and is used, rather than a problem with the original scales.
More than one researcher (but far from all) have sniggered at me for using scales that old, which clearly did not create a spin-off in the interpreting research world. If they weren’t used again it must be because they weren’t good, right? But since I’m a stubborn risk-taker I decided to go ahead. What more fun than to dance with the devil? (Yes, I am being ironic in case you wonder…)
3. Tiselius’ adaptation (sounds grand talking about myself in third person right?!)
The scales had to be adapted of course. They were created for translation and I was going to use them for interpreting. Furthermore, there were nine scale steps, some of them difficult to discern from one another. I wanted clear differences between the scale steps, and no middle step, no number five where everything generally OK could be put. Therefore I changed the definitions from written to spoken language and from English to Swedish. I also reduced the steps from nine to six, merging a few that were very similar.
Now only using the scales remained … When it came to using the scales I had to decide whether to use sound files or transcripts. After all, interpreting is the spoken word, and should it be assessed on the basis of written words? And if I wanted to use non-interpreters as assessors then I would have to justify that. Presumably, interpreters, especially those who have jury training, would be better than non-interpreters at evaluating interpreting.
4. Interpreters or non-interpreters?
I had both interpreters and non-interpreters rate the first batch of interpretings (on transcripts as I did not want the interpreters to recognize their peers). It turned out that in raw figures the interpreters were slightly more severe, but the scores from the two groups correlated and the difference was not significant. These results indicated that I could use either interpreters or non-interpreters.
5. Sound-files or transcripts?
I designed a study where the intelligibility part of the interpretings was assessed by non-interpreters from both sound-files and transcripts. One group assessed transcripts (with normalized orthography and punctuation) and the other sound files. The sound files got slightly worse scores than the transcripts, but again the difference was not significant and all the scores correlated. So from this respect I could use either sound-files or transcripts.
I ended up going for transcripts. This decision mostly came from the insight that Collados Aís provided on how deceitful the voice is when it comes to assessment of product. Pitch, intonation, accent, security and so forth affects the impression of the quality of the product. Clearly, this aspect is important for the assessment of the interpreting, but with the aim in this study to assess only the skill to transfer an entire message in one language into another it seemed wise to exclude it, too many confounding variables.
6. The assessment
The assessment units ended up looking like this:
First the raters saw only the interpretation and they rated that according to the scale from completely unintelligible to completely intelligible, from 1 (lowest) to 6 (highest). They also had a sheet with the full explanation of each step of the scale next to them when rating. If you’re curious I left a copy of the sheet in English here.
Then the raters unfolded the sheet of paper and the European parliament’s official translation showed up at the bottom. Then they rated the informativeness of the interpreting, i.e. the difference between the original and the interpretation. This time from no difference compared to the original to completely different compared to the original. Now the scale is inverted so 1 is the best score and 6 the worst. You may wonder why the scale is inverted this time; I decided to stick with Carroll’s original proposal where a low score is equal to little difference. The zero on the scale means that the interpreters added information not present in the original. This typically happens when something implicit is explicitated or when an additional information or hedge is given.
7. Did it work?
The results I got in my cross-sectional material were very promising, clear differences where I would expect them, i.e. between non-interpreter subjects and interpreter subjects, and between novice interpreters and experienced interpreters. The inter-rater variability, that is the variability of the scores between the different raters, was also low. So far, I’m not sure about the results for my longitudinal material. I did not see differences where I expected them. This may be due to a failing instrument (i.e. my scales) or less difference of the interpreting products than what I expected. To be continued…
Now, there are a few more things to try out with my scales. Obviously, an interpreter trainer would not start transcribing their students’ interpretings and divide them into assessment files before assessing or grading them. But, presumably, the scales could work in a live evaluation as well. I have not yet had an opportunity to test them, but I’m looking forward to that, and I will of course keep you posted.
Anderson, L. 1979. Simultaneous Interpretation: Contextual and Translation Aspects. Unpublished Master’s Thesis. Department of Psychology, ConcordiaUniversity, Montreal, Canada
Carroll, John, B. 1966. “An Experiment in Evaluating the Quality of Translations.” Mechanical Translations and Computational Linguistics 9 (3-4): 55-66.
Collados Aís, Á., Iglesias Fernández, E. P. M. E. M., & Stévaux, E. 2011. Qualitätsparameter beim Simultandolmetschen: Interdisziplinäre Perspektiven. Tübingen: Narr Verlag.
Clifford, Andrew. 2005. “Putting the Exam to the Test: Psychometric Validation and Interpreter Certification.” Interpreting 7 (1): 97-131.
Lamberger-Felber, H. 1997. Zur Subjektivität der Evaluierung von Ausgangstexten beim Simultandolmetschen. In N. Grbic & M. Wolf (Eds.), Text – Kultur – Kommunikation. Translation als Forshungsaufgabe (pp. 231–248). Tübingen: Stauffenburg Verlag.
Tiselius, E. 2009. “Revisiting Carroll’s Scales.” In Testing and Assessment in Translation and Interpreting Studies. C. Angelelli and H. Jacobson (eds.). 95-121. ATA Monograph Series. Amsterdam: Benjamins.
Jérôme, one of the 2interpreters, Michelle (Interpreter Diaries) and myself have been involved in a discussion on how to evaluate interpreter exams. A really tricky business as anyone of you who have been on an exam jury will know. Jérôme published a really interesting reflection on final exams and Michelle and I responded, you can read the post here.
We have now arrived at the even trickier subject of quality in interpreting and this is where I felt I needed to write a post, not just continue the comments. Clearly what exam jurors are after is some type of high quality interpreting, this is also supposedly what accreditation jurors or peer-assessors are looking for. But what is it?
Michelle mentions two early studies, one by Hildegund Bühler (questionnaire study with interpreters as respondents) and Ingrid Kurz (questionnaire study with interpreting users as respondents). These two have recently been followed up by Cornelia Zwischenberger with a more recent one with interpreters as respondents. When we are talking about questionnaire studies it should also be mentionned that AIIC commissioned a study made by Peter Moser on user expectations and that SCIC regularly make surveys of their users expectations. Bühler and Kurz more or less concludes that an interpreting is good when it serves its purposes and that different contexts have different requirements (I’m summing up really heavy here).
As both Michelle and Jérôme points out in their comments there is a flood of articles on quality, and there are many studies made in the area, but I’m not sure we have actually come up with something more conclusive than Bühler and Kurz did. However, I would like to draw you attention to something that I have found most interesting in research on quality – Barbara Moser-Mercer was also mentioned in the comments and she published an article in 2009 when she challenges the use of surveys for determining quality. This seems very inspired by the work that has been done in Spain by Angela Collados-Aís and her research team ECIS in Granada. Unfortunately, she only publishes in Spanish and German, so I had to go there to understand what she does, but it was worth every bit of it. Extremely interesting research. I also have to complement them on how I was received as a guest, Emilia Iglesias-Fernandez made me feel like a royalty, and all the other researchers in the unit was extremely welcoming and accommodating. But here’s the interesting thing:
For the past 10 years they have been researching how users of interpretation perceive and understand the categories most commonly used in surveys to assess interpreting. These categories have typically been since Bühler; Native accent, pleasant voice, fluency of delivery, logical cohesion, consistency, completeness, correct grammar, correct terminology, appropriate style. If I remember correctly, for instance, Peter Moser’s study showed that experienced users of interpretation reported that they cared more about correct terminology and fluency than pleasant voice or native accent.
In their experiments they have been tweaking interpreted speeches so the exact same speech would be done with or without native accent, with or without intonation, high speed or low speed and so forth. Different user groups first rated how important the different categories were and then they were asked to rate different speeches, tweaked for certain features. When you do that it turns out that the exact same speech with native accent gets higher score for quality (i.e. using more correct terminology or correct grammar) than the speech with non-native accent. And the same goes for intonation, speed and so forth.
So it seems like (very strongly argued) features that are not rated important (such as accent) affect how the user perceive important features (correct terminology).
In interpreting research there is also a lot of error analysis going on of course, and many studies base their evaluation of the interpretings used on error analysis. One problem with that is exactly the one that Jérôme points out – maybe the interpretation actually got better because of something that the researcher/assessor perceived as an error. Omissions is a typical category where it’s difficult to judge that. I have also just gotten results with my holistic scales where the interpreter that I perceived as “much better” (only guts feeling) got much worse scores. One reason for this when I started analyzing my results could very well be the fact that that interpreter omitted more, and thereby, in comparison with the source text, there are more “holes” or “faults” or whatever you would like to call it.
When it comes to exams, Jérôme claims that not much has been done in terms of research on exam-assessment and exams. I have not checked that, but my impression is that Jérôme is right. I cannot remember reading about quality assessment of examinees. I know that entrance exams are studies and aptitude tests, but final exams… Please enlighten me.
Another thing that Jérôme also points out, and which is really a pet subject to me, but where there seem to be very little consensus, at least in the environments where I have been, is the training of the exam juror or the peer-reviewer. Now, I don’t mean to say that there are no courses in how to be an interpreting exam juror, of course there are. But what I mean (and Jérôme too I think), is that people evaluating interpreting do not get together and discuss what they believe is good interpreting or not. You could for instance organize a training event before an exam where jurors get together and discuss criteria and how they understand them, and also listen to examples and discuss them. I’m sure this happens somewhere, but I have not come across if so far.
What’s your take on this? Have I left out any important studies or perspectives? Do you have any other suggestions?
Bühler, H. 1986. “Linguistic (semantic) and extra-linguistic (pragmatic) criteria for the evaluation of conference interpretation and interpreters”. Multilingua 5-4. 231-235.
Collados Aís, Angela. 1998. La evaluación de la calidad en interpretación simultánea: La importancia de la comunicación no verbal. Granada: Editorial Comares.
Kurz, Ingrid. 1993. “Conference interpretation: Expectations of different user groups”. The Interpreters’ Newsletter 5: 13–21. (http://www.openstarts.units.it/dspace/handle/10077/4908)
Moser, Peter. 1995. “Survey on expectations of users of conference interpretation”. (http://aiic.net/community/attachments/ViewAttachment.cfm/a525p736-918.pdf?&filename=a525p736-918.pdf&page_id=736)
Moser-Mercer, Barbara. 2009. “Construct-ing Quality”. In Gyde Hansen, Andrew Chesterman, Heidrun Gerzymisch-Arbogast p. 143-156. Efforts and models in interpreting and translation research: a tribute to Daniel Gile Amsterdam & Philadelphia: John Benjamins.
Zwischenberger, Cornelia. 2011. Qualität und Rollenbilder beim simultanen Konferenzdolmetschen. PhD thesis, University of Vienna.
The 16th conference DG Interpretation – Universités was held on March 15 and 16. Unfortunately, I could not follow the proceedings, but there has been a lot going on via Twitter, thanks to @GlendonTranslate both days have been archived here and here. And Matt Haldimann wrote two blog posts on it over at 2interpreters. In one of the posts he discussed Brian Fox’s presentation where one of the issues was that stress is an important factor behind candidates not passing the EU accreditation test.
I’d like to follow up on Matt’s post and my own experience of the comfort zone in interpreting training. But, first of all, the European institution’s problem that students graduating from interpreting school do not pass the accreditation tests is not a new one. I’m not sure that you actually CAN pass an EU accreditation test immediately after interpreting school. I’m not saying that to discourage anyone, but just compare any graduate from any training. You don’t graduate from a Political Science program and start as a senior ministry official, ministries usually have internships, training programs and so forth. You don’t graduate from law school and become a lawyer immediately. Medical doctors are required to be interns before they practice. The institutions have started running training programs for prospective interpreters which is great, but of course schools should prepare interpreting students as well.
Traditionally, interpreting training is very tough. I don’t remember much of comfort zone from my own interpreting training, and ask any interpreter and they will tell you horrible stories about austere teachers literally decomposing students. Students sometimes feel that they are thrown into the water and those who swim survive. Much of these feelings stem from the fact that you are trying to learn a very complex skill that is also closely linked to both your personality, your voice and your language so clearly it is hard.
As a teacher I would not describe the way I practice as throwing students into the water and see who comes up. In fact, I work very hard to be a coaching, positive teacher. Yet, I know that my students also seem to be struggling like I did.
Matt suggests to build on trust, and to work with other skills such as public speaking, he mentions his own experiences of improvisational theater, and last but not least – mock conferences. I think these are great ideas and it also points to something that we may need to refine even more – modular learning. I know that several schools work with modules. The most obvious module being of course that first you work with memory exercises, then with note taking, then with consecutive and so forth. But modules can also be broken down into for instance: interpreting figures, conveying sadness, interpreting names, conveying anger and so forth. And it can of course also be used to train: interpreting under stress, interpreting with text, interpreting at an exam and so forth. And everything does not have to be dealt with in interpreting class; managing stress, voice coaching, public speaking can, together with contemporary social and environmental studies, terminology, study technique and so forth, be done in separate lectures. The social side of interpreting is also often a sadly forgotten business – we should teach students how to deal with clients, how to behave in the booth, how to establish yourself on the market and so forth.
But – and here comes the big but – many interpreting schools have classes specializing on interpreting two or four hours per week. And classes can be huge. If you are the only interpreting teacher for 30 students 4 hours per week, it is very likely 90 % of your students will never make it to interpreters. Maybe 80 % of them just took the course because they heard it was not much reading required. So you teach them how to teach themselves how to master the skill and those who wants to and take it seriously hopefully benefits from that and use the time appropriately. So in order to be able to give our students all this support we need: more teaching hours, smaller groups (if groups are big), access to other teachers who can work with us for the interpreting students, and maybe even access to specialists who could work with the students on an individual basis (voice coaching, stress).
I have two good bets; teacher training and more money. How does that sound?
This is a post that I have translated from Anne-Birgitta’s tolkeblogg and publish with her permission. My apologies in advance to Anne-Birgitta and other Norwegian speakers if I have misunderstood or mistranslated something (in that case please let me know, I need this caveat since neither English nor Norwegian are my mother tongue). I wanted to share it on my blog because I think it’s a very good illustration of what can and do happen in interpreter mediated events. This is an illustration of why we need to train interpreters and work on interpreting ethics and standards.
The term, ‘interpreter mediated illusory communication'(tolkemediert skinnkommunikasjon) is defined here as two parallel dialogues with different contents, and where the interpreter is the only one who understands what is actually being said, as in the example below from an interview with an angry Palestinian who considers himself a victim of racism:
1. Police: So the police is lying about this?
2. Interpreter: Are you saying that the police is lying?
3. Suspect: He is a liar, yes, his mother is a liar, his father is a liar (raises voice)
4. Interpreter: Yes
5. Suspect: Tell him his father is a liar, his mother is a liar, the racist pig
6. Interpreter: (laughing out loud)
7. Suspect: His mother and his father are liars
8. Police: What’s he saying now?
9. Interpreter: Yes, the police is lying and mother and father also lying (laughs so much that the phrase is almost inaudible)
10. Suspect: Tell him that racism is like AIDS, the disease AIDS, racism is in his blood
11. Police: What does he say about AIDS?
12. Interpreter: (laughs)
13. Suspect: Tell him that he has the racist disease, like AIDS
14. Interpreter: They all have it, the police is sick (laughs)
In the example we see that the interpreter does not render what the suspect says, and that the discussion sounds quite different in Arabic and Norwegian. This example is taken from a tape recording of a police interrogation and is described in: Andenæs, Kristian et. al. Of 2000. Kommunikasjon og rettssikkerhet. Utlendingers og språklige minoriteters møte med politi og domstoler. Oslo: Unipub publishers.
Ranking of academic journals may seem totally irrelevant and even a bit ridiculous, after all who is to decide whether one peer-reviewed journal with a solid editorial board, renowned professors as editors and regularly publishing work of important scholars is better than an other. Nevertheless, this is an important part of academia and also of many academic debates. Important bodies such as the European Science Foundation or the Norsk Samfunnsvitenskaplig Datatjeneste rank journals, book series and editors. Now, this may of course be good for you, if the journal you publish in or edits get a high ranking then good for you and your CV.
Furthermore in the academic world (at least in Sweden, but I suppose it’s not an uncommon practice), you get points for your publications, and ranking is used to evaluate your research. If an article is published in a high ranking journal you get more points than if it’s published in a low ranking journal. These points are then the basis for promotions, funding and so forth.
Only one MAJOR problem – the system is heavily biased. Naturally, not every publication can be ranked on the top level, if you rank too many publications on the highest level, then your index is not worth anything. There can only be one winner, otherwise the gold medal is quickly devaluated. So the ranking indices have a limited numer of journals that can receive the highest ranking. This means that fields that have representatives in the ranking bodies get journals on the list, other fields regardless of size, impact, quality in the publications and so forth are kindly requested to wait outside.
In 2008 the translation schoolar of the ERIH board (the ranking for ESF) resigned, and in the subsequent rating excersise translation journals mysteriously disappeared. The Publication channel of the Norwegian NSD and very important for the Nordic countries degraded the only two translation journals who had the highest ranking following a merger of bodies who were allowed to rank journals. Translation Studies disappeard from these bodies and mysteriously enough so did the transltion journals. One can of course ask if the journals that were degraded had not lost in quality, and therefore deserved degradation. Truth is nothing, apart from translation scholar presence changed.
For me as a translation scholar it means that if I want to earn points and get good evaluation on my research in order to be competitive for my academic career, I have to publish in other journals or with other editors than the ones in my own field. In order to get published in top ranking journals in other fields I would of course send my best articles. Now, what does that do to the publishers and journals in my own field.
In October the NSD is performing its bi-annual ranking excersise again. Needless to say we are many translation scholars who have approached the ranking bodies with demands to at least put the two degraded journals back on the level they had.
I continue to believe that good solid work will pay off regardless of where it’s published and regardless of ranking. But it annoys me when ranking is totally random and yet have so much influence. So does the ranking work for you? If so, do you have any tips?