I guess that autumn has been a lot about CPD (continuing professional development) for me. I took a course in working into English B in August (held by Zoë Hewetson and Christine Adams, I can really recommend it) and then the SCIC (interpreting services of the European Commission) training for trainers course in September (it’s offered by SCIC to trainers at its partner universities, and it is definitely worth the effort). And since August I am also taking a MOOC (Massive Open Online Course) on “Surviving your PhD” by the famous @ThesisWhisperer Inger Mewburn of Australian National University. Now, I have already survived my PhD, but the course was also pitched at future and present supervisors, so I thought it might be a good idea, that plus the fact that I really like the Thesis Whisperer and I would get the possibility of taking my first MOOC. Continue reading
This is not my first and surely not my last post on assessment. If you’re looking for the other posts just type “assessment” in the search box to the right. Last Friday (March 15, 2013) I gave a talk on process and expertise research in the Nordic countries at the conference “Le Nord en français” at the University of Mons (one of my alma maters, actually). I also presented the results of my PhD project. All this in 20 minutes, so you can imagine I didn’t have the time to be very thorough.
One of the questions that came up was how I actually went about doing my assessment, and why I choose this particular methodology and not others. As I didn’t really get round to go through my assessments thoroughly, I thought I’d try to do it here. Thanks for the discussion Cédéric, if you stop by and read the post, don’t hesitate to comment or ask more questions.
When I set out to investigate interpreters at different levels of experience I understood quite early that I had to evaluate or assess their product one way or the other. I did not want to assess them based only on my own judgment. I preferred to have “independent/objective” judges, as I was afraid I would be biased both as an interpreter myself and as a colleague to several of my informants. So, fairly early on I decided to use groups of assessors rather than asses myself.
1. Choosing an instrument
Next, I had to choose the instrument for assessment. A popular method for assessing interpreting both in research and otherwise is to use a componential approach. Components typically cover fluency, correctness (terminology, grammar, syntax), sense consistency (with original), logical cohesion, intonation, accent, style and more (or less). Assessors evaluate each component in order to get a complete evaluation of the interpreting. There were several reasons why I did not want to use this componential approach. First, different researchers had pointed out potential problems when using this type of assessment. Heike Lamberger-Felber found in her PhD that it was very difficult to get consistent results from a componential assessment. But, while the rating of the different components varied a lot, the assessors’ rankings of the different interpreters were almost in agreement. Angela Collados-Aís and her ECIS research team have published several reports on assessment, pointing out that although the assessors in their different studies all agree on the level of importance of different components (e.g. fidelity to the original is the most important), other components (e.g. native accent) affect how the most important ones are rated. So a foreign accent would give a lower score for fidelity, although the interpretings word wise were identical. Another important aspect for me was that I wanted to use people without personal experience as an interpreter to be assessors. The reason behind it was that the Swedish interpreting community is so little that it would be almost inevitable for interpreter-assessors to recognize interpreter-informants.
2. Carroll’s scales
So, I started looking at other types of assessment and soon found a type of Lickert-scale used by Linda Anderson already in the late 1970’s. She used two scales created by John Carroll in 1966 to assess machine translation. John Carroll LINK specialized in language testing and he was a big critic of the discreet point theory. The discrete point theory claims that from certain features in a language learner’s production you can predict the learner’s proficiency in that language (rings a bell? if not – reread the paragraph above). When Carroll developed his instrument for translation he said that a translation can be perfectly true to the original but incomprehensible or perfectly comprehensible but completely untrue to the original. Therefore he developed two scales one for intelligibility (comprehensible or not) and the other for informativeness (different from the original or not). The translations were assessed using both scales. Linda Anderson then applied them as they were to her data collected from conference interpreters. She did not dwell much on using the scales, but seemed to fear that they were too blunt.
The scales had not really been used since then, but I found them appealing and wanted to test. One issue was that the scales had served as basis for creating the scales for the US court interpreter accreditation test (FCICE) and this test had been very criticized for its accuracy (or lack thereof). Andrew Clifford has investigated those tests and argues that there may not be any significant difference between the different test constructs. I do not argue against Clifford’s conclusions, on the contrary, but I think the problem lies in how the court accreditation test was developed and is used, rather than a problem with the original scales.
More than one researcher (but far from all) have sniggered at me for using scales that old, which clearly did not create a spin-off in the interpreting research world. If they weren’t used again it must be because they weren’t good, right? But since I’m a stubborn risk-taker I decided to go ahead. What more fun than to dance with the devil? (Yes, I am being ironic in case you wonder…)
3. Tiselius’ adaptation (sounds grand talking about myself in third person right?!)
The scales had to be adapted of course. They were created for translation and I was going to use them for interpreting. Furthermore, there were nine scale steps, some of them difficult to discern from one another. I wanted clear differences between the scale steps, and no middle step, no number five where everything generally OK could be put. Therefore I changed the definitions from written to spoken language and from English to Swedish. I also reduced the steps from nine to six, merging a few that were very similar.
Now only using the scales remained … When it came to using the scales I had to decide whether to use sound files or transcripts. After all, interpreting is the spoken word, and should it be assessed on the basis of written words? And if I wanted to use non-interpreters as assessors then I would have to justify that. Presumably, interpreters, especially those who have jury training, would be better than non-interpreters at evaluating interpreting.
4. Interpreters or non-interpreters?
I had both interpreters and non-interpreters rate the first batch of interpretings (on transcripts as I did not want the interpreters to recognize their peers). It turned out that in raw figures the interpreters were slightly more severe, but the scores from the two groups correlated and the difference was not significant. These results indicated that I could use either interpreters or non-interpreters.
5. Sound-files or transcripts?
I designed a study where the intelligibility part of the interpretings was assessed by non-interpreters from both sound-files and transcripts. One group assessed transcripts (with normalized orthography and punctuation) and the other sound files. The sound files got slightly worse scores than the transcripts, but again the difference was not significant and all the scores correlated. So from this respect I could use either sound-files or transcripts.
I ended up going for transcripts. This decision mostly came from the insight that Collados Aís provided on how deceitful the voice is when it comes to assessment of product. Pitch, intonation, accent, security and so forth affects the impression of the quality of the product. Clearly, this aspect is important for the assessment of the interpreting, but with the aim in this study to assess only the skill to transfer an entire message in one language into another it seemed wise to exclude it, too many confounding variables.
6. The assessment
The assessment units ended up looking like this:
First the raters saw only the interpretation and they rated that according to the scale from completely unintelligible to completely intelligible, from 1 (lowest) to 6 (highest). They also had a sheet with the full explanation of each step of the scale next to them when rating. If you’re curious I left a copy of the sheet in English here.
Then the raters unfolded the sheet of paper and the European parliament’s official translation showed up at the bottom. Then they rated the informativeness of the interpreting, i.e. the difference between the original and the interpretation. This time from no difference compared to the original to completely different compared to the original. Now the scale is inverted so 1 is the best score and 6 the worst. You may wonder why the scale is inverted this time; I decided to stick with Carroll’s original proposal where a low score is equal to little difference. The zero on the scale means that the interpreters added information not present in the original. This typically happens when something implicit is explicitated or when an additional information or hedge is given.
7. Did it work?
The results I got in my cross-sectional material were very promising, clear differences where I would expect them, i.e. between non-interpreter subjects and interpreter subjects, and between novice interpreters and experienced interpreters. The inter-rater variability, that is the variability of the scores between the different raters, was also low. So far, I’m not sure about the results for my longitudinal material. I did not see differences where I expected them. This may be due to a failing instrument (i.e. my scales) or less difference of the interpreting products than what I expected. To be continued…
Now, there are a few more things to try out with my scales. Obviously, an interpreter trainer would not start transcribing their students’ interpretings and divide them into assessment files before assessing or grading them. But, presumably, the scales could work in a live evaluation as well. I have not yet had an opportunity to test them, but I’m looking forward to that, and I will of course keep you posted.
Anderson, L. 1979. Simultaneous Interpretation: Contextual and Translation Aspects. Unpublished Master’s Thesis. Department of Psychology, ConcordiaUniversity, Montreal, Canada
Carroll, John, B. 1966. “An Experiment in Evaluating the Quality of Translations.” Mechanical Translations and Computational Linguistics 9 (3-4): 55-66.
Collados Aís, Á., Iglesias Fernández, E. P. M. E. M., & Stévaux, E. 2011. Qualitätsparameter beim Simultandolmetschen: Interdisziplinäre Perspektiven. Tübingen: Narr Verlag.
Clifford, Andrew. 2005. “Putting the Exam to the Test: Psychometric Validation and Interpreter Certification.” Interpreting 7 (1): 97-131.
Lamberger-Felber, H. 1997. Zur Subjektivität der Evaluierung von Ausgangstexten beim Simultandolmetschen. In N. Grbic & M. Wolf (Eds.), Text – Kultur – Kommunikation. Translation als Forshungsaufgabe (pp. 231–248). Tübingen: Stauffenburg Verlag.
Tiselius, E. 2009. “Revisiting Carroll’s Scales.” In Testing and Assessment in Translation and Interpreting Studies. C. Angelelli and H. Jacobson (eds.). 95-121. ATA Monograph Series. Amsterdam: Benjamins.
Erving Goffman was an antropologist and sociologist who studied social interaction. Among other things, he proposed a model to analyse the distribution of responsibility between interlocutors. Cecilia Wadensjö (1998) uses this model to analyse the role of the interpreter in an interpreter mediated event. An interlocutor has a given role in a communicative context. The roles can be symetric or assymetric depending on the situation. Participants can either be assigned different roles depending on the context or they can take up different roles. The participation framework (Goffman, 1981) gives different participants different status. Anyone who hears an utterance can take on a participant status, but depending on the siutation you can have different production formats. The formats can be those of the animator (the person who conveys either his or her own words or of somebody elses) or the author (somebody who compiles fact or information and makes an utterance but without necessarily being the one who guarantees the correctness of the information in that utterance) and finally the principal (the actor who is fully responsible for an utterance [the fact, the information behind and so forth], you can be the principal both of an utterance regarding your own feelings or something very formal such as the application of a particular law). In order to fully understand the interpreter’s role in the communication Wadensjö adds three reception formats: the reporter (who just reports verbatim what has been said), the recapitulator (who reacapitulates what has been said but in an active listening and understanding act, not just verbatim repeating) and the respondent (who listens in order to respond, to take the communication furter). The interpreter’s role in the communicative context vary, but has to be seen in the light of the reception formats. The interpreter is an animator and sometimes a principal, but the interpreter is first and foremost a recapitulator (hopefully, since we all agree by know that a word-for-word translation is rarely successful) who sometimes step into the role of responder. The interpreter responds and becomes the principal in utterances such as ”Could you please repeat that” or ”The interpreter would like to ask a question”, i.e. situations when the interpreter goes out of his/her role of conveying somebody else’s message and goes into the role of transmitting a message of his or her own.
There an interesting article in New York Times right know. It asks the question if lanugage affects the way you think. The author, Guy Deuthscher, takes his starting point in theSapir Whorf hypothesis (or rather the Whorf, since this is one of the early articles by Whorf he’s referring to). Deutscher claims that time and common sense has proved the concept wrong. He quotes Roman Jakobson who pointed out a crucial fact about differences between languages. Jakobson claimed that:
Languages differ essentially in what they must convey and not in what they may convey.
And on this quote Deutscher reflects further on whether you languages shapes you brain or not. And to me he seems convinced that this is not the case. You CAN describe anything in another langugage, even if the other language lacks that terminology, Deutscher says. And in most cases this is true of course.
But I’m curious that he not once, discusses the findings of Dan Everett, the linguist who mapped Pirahã, a language in the Amazonian jungle. Everett says that since he got to know Pirahã he has started to doubt that lanugages do not shape the way you think. Since the Pirahã language is so fundamentally different from other languages and certain concepts are very difficult to explain to a Pirahã. One feature is that Pirahã do not tag past as other languages do and things in the past is therefore very hard to grasp. If you don’t know for instance a historical person or know somebody who knows that person, then there is no proof that that person actually existed for a Pirahã and therefore no reason to believe such a person ever existed.
So maybe Sapir and Whorf weren’t entirely wrong after all, or…
I’m trying to conquer Communication Theory for the second time round. Teaching a topic is always better than just studying it if you want to really conquer it. I find communication theory very relevant for interpreting which might be the reason for why it’s often taught to first year interpreting students. The only problem, just as for rhetoric is that when you are a first year interpreting student you don’t necessarily understand how useful it is. But here we are anyway.
The lecture is quite heavy, a lot of information to take in in a fairly short time, and mostly theory. And of course people get tired listening attentively for 45 minutes, more than you actually CAN do if I remember my teacher training correctly. I don’t have much to remedy this, but this year I tried to lighten it up at little bit by putting in photos of all the theoreticians I refer to. I don’t know if it really changed anything, but at least I had a great time looking them up. Haven’t you always wanted to know what Ferdinand de Saussure looked like? I teach from a book by Jan Svennevig Språklig Samhandling and the post is only my interpretation of his book in particular and of communication theory in general.
I would say that an interpreter has a very well developed communiative competence. The definition of communicative competence being that we create meaning in utterances in many different communication situations. Dell Hymes was the first one to coin communicative competence in 1966 with the meaning that a person knows how words and structures works in different communicative situations. Communicative competence was a sort of counter balance to Chomsky’s theory of generative grammar. Chomsky is indeed interested in the competence of the language user, but only as a formal competence not how it is used in a communicative situation. The (in)famous theory of generative grammar defined as something all human beings are born with and without which we would not be able to produce language. The generative grammar would also be the reason for language universals. Features that are common to all or most of the world’s languages. The most interesting dispute of generative grammar, I believe, comes from Dan Everett in his book Don’t sleep there are snakes. In his book he gives an account of his life as a linguist and missionary (at first at least) in the Amazonian jungle. He lived with the Pirahã people and gradually learns their language and culture. He claims that there are no language universals to be found in the Pirahã language. Their language is quite simply totally different.
An old received idea on interpreters is that they are invisible transmitters of meaning or message. This is the kind of interpreter I was brought up to be. You leave your feelings outside the booth or the meeting room, you just transmit the message. As I learned the profession I have also come to question this statement more and more. Is it possible for any human being to be perfectly neutral in any situation?
By this I don’t not mean that I, in my role as an interpreter should go in and give personal comments on the message, but what I mean is that I do believe that just the way I am transmitting something affects my neutrality, my choice of tone, voice, tense. Where I chose to start or stop interpret, in a smaller setting, where I cut in to deliver my interpretation.
There is a Swedish researcher, Cecilia Wadensjö, who wrote a book called “Interpreting as Interaction”. She calls interpreting a pas de deux for three. Claudia Angelelli is an American researcher who wrote “Revisiting the Interpreter’s Role”. In the end of that book she has a letter from one of the respondents in the survey she made who says that interpreting is everything but to “just translate what he says”.
You could also ask yourself if your client wants a perfectly neutral interpreter. In some settings my struggle to be neutral can be seen by my client as a strong hint that I am part of the establishment too, rather than the go-between.
I don’t have any answers to this of course but I find the issue more and more fascinating.