In the last few weeks I have been thinking long and hard about the assessment of the productive skills (speaking and writing), dissatisfied as I am with the proficiency measurement schemes currently in use in many UK school which are either stuck in the former system (National Curriculum Levels) or strongly influenced by it (i.e. mainly tense driven)
However, the challenge of finding a more effective, valid and cost-effective alternative for use in secondary British schools like is no easy task. The biggest obstacles I have found in the process refer to the following questions that have been buzzing in my head for the last few weeks, the answers to which are crucial to the development of any effective approach to the assessment of proficiency in the productive skills
- What is meant by ‘fluency’ and how do we measure it?
- How do we measure accuracy?
- What do we mean by ‘complexity’ of language? How can complexity be measured?
- How do we assess vocabulary richness and/or range? How wide should L2 learner vocabulary range at different proficiency stages?
- What does it mean to ‘acquire’ a specific grammar structure or lexical item?
- When can one say that a specific vocabulary and grammar item has been fully acquired?
- What linguistic competences should teacher prioritize in the assessment of learner proficiency? Or should they all be weighted in the same way?
- What task-types should be used to assess learners’ speaking and writing proficiency?
- How often should we assess speaking and writing?
- Should we assess autonomous learning strategy use? If so, how?
All of the above questions refer to constructs commonly used in the multi-traits scales usually adopted by researchers, language education providers and examination boards to assess L2 performance and proficiency. In this post, for reason of space, I will only concern myself with the first three questions reserving to deal with the rest of them in future posts. The issues they refer to are usually acronymized by scholars as CAF (Complexity, Accuracy, Fluency) but I find the acronym FAC (Fluency, Accuracy, Complexity) much more memorable… Thus I will deviate from mainstream Applied Linguistics on this account.
- The issues
2.1 What do we mean by ‘fluency’ in speaking and writing? And how do we measure it?
Fluency has been defined as ‘the production of language in real time without undue pausing or hesitation’ (Ellis and Barkhuizen 2005: 139) or, in the words of Lennon (1990), as ‘an impression on the listeners that the psycholinguistic process of speech planning and speech production are functioning easily and automatically’. Although many, including teachers, use the term ‘fluency’ as synonymous of competence in oral proficiency, researchers see it more as temporal phenomenon (e.g. how effortlessly and ‘fast’ language is produced). In L2 research Fluency is considered as a different construct to comprehensibility, although from a teacher’s point of view it is obviously desirable that fluent speech be intelligible.
The complexity of the concept of ‘fluency’ stems mainly from its being a multidimensional construct. Fluency is in fact conceptualized as:
- Break-down fluency – which relates to how often speakers pause;
- Repair fluency – which relates to how often speakers repeat words and self-correct;
- Speed fluency – which refers to the rate of speaker delivery.
Researchers have come up with various measures of fluency. The most commonly adopted are:
- Speech rate: total number of syllables divided by total time taken to execute the oral task in hand;
- Mean length of run: average length of syllables produced in utterances between short pauses;
- Phonation/time ratio: time spent speaking divided by the total time taken to execute the oral task;
- Articulation rate (rate of sound production) : total number of syllables divided by the time to produce them;
- Average length of pauses.
A seminal study by Towell et al (1996) investigated university students of French. The subjects were tested at three points in time: (time one) the beginning of their first year; (time 2) in their second year and (3) after returning from their year abroad (in France). The researchers found that improvements in fluency occurred mainly in terms of speaking rate and mean length of run – the latter being the best indicator of development in fluency. Improvements in fluency were also evidenced by an increase in the rate of sound production (articulation rate), but not in a major way. In their investigation, Towell et al. found that assessing fluency based on pauses is not always a valid procedure because a learner might pause for any of the following reasons:
- The demands posed by a specific task;
- Difficulty in knowing what to say;
- An individual’ personal characteristic;
- Difficulty in putting into words an idea already in the brain;
- Getting the right balance between length of utterance and the linguistic structure of the utterance.
Hence, the practice of rating students’ fluency based on pauses may not be as valid as many teachers often assume. As Lambert puts it: “although speed and pausing measures might provide an indication of automaticity and efficiency in the speech production process with respect to specific forms, their fluctuation is subject to too many variables to reflect development directly.”
When it comes to writing, fluency is much more difficult to define. As, Bruton and Kirby (1987) observe,
Written fluency is not easily explained, apparently, even when researchers rely on simple, traditional measures such as composing rate. Yet, when any of these researchers referred to the term fluency, they did so as though the term were already widely understood and not in need of any further explication.
In reviewing the existing literature I was amazed by how much disagreement there is amongst researchers on how to assess writing fluency, which begs the question: if it is such a subjective construct on whose definition nobody agrees, how can the raters appointed by examination boards be relied on to do an objective job?
There are several approaches to assessing writing fluency. The most commonly used in research is composition rate, which is how many words are written per minute. So for instance, in order to assess the development of fluency a teacher may give his/her class a prompt, then stop after a few minutes and ask the students, after giving guidelines on how to carry out the word count, to count the words in their output. This can be done a different moments in time, within a given unit of work or throughout the academic year, in order to map out the development of writing fluency.
Oral fluency is a hugely important dimension of proficiency as it assesses the extent to which speaking skills have been automatized. A highly fluent learner is one who can speak spontaneously and effortlessly, with hardly any hesitation, backtracking and self-correcting.
Assessing, as I have just discussed, is very problematic as there is no international consensus on what constitutes best practice. The Common European Framework of Reference for Languages, which is adopted by many academic and professional institutions around the world provides some useful – but not flawless – guidelines.(http://www.coe.int/t/dg4/education/elp/elp-reg/Source/Key_reference/Overview_CEFRscales_EN.pdf ). MFL department could adapt them to suit their learning context mindful of the main points put across in the previous paragraphs.
The most important implications for teachers are:
- Although we do not have to be as rigorous and pedantic as researchers, we may want to be mindful in assessing our students’ fluency of the finding (confirmed by several studies) that more fluent speakers produce longer utterances between short pauses (mean length of run);
- However, we should also be mindful of Towell and al.’s (1996) finding that there may be individuals who pause because of other issues not related to fluency but rather to anxiety, working memory issues or other personal traits. It is important in this respect to get to know our students and make sure that we have repeated oral interactions with them so as to get better acquainted with their modus operandi during oral tasks;
- In the absence of international consensus on how fluency should be measured, MFL departments may want to decide whether and to what extent frequency of self-repair, pauses and speed should be used in the assessment of their learners’ fluency;
- If the GCSE or A level examination adopted by their school does include degrees of fluency as an evaluative criterion– as Edexcel for instance does – then it is imperative for teachers to ask which operationalization of fluency is applied in the evaluation of candidates’ output so as to train students accordingly in preparation for the oral and written exams;
- Although comprehensibility is a separate construct to fluency in research, teachers will want their students to speak and write at a speed as close as possible to native speakers’ but also to produce intelligible language. Hence, assessment criteria should combine both constructs.
- Regular mini-assessments of writing fluency of the kind outlined above (teacher giving a prompt and students having to write under time conditions) should be conducted regularly, two or three times a term, to map out students’ progress whilst training them to produce language in real operating conditions. If this kind of assessment starts at KS3 or even KS2 (with able groups and ‘easier’ topics), by GCSE and A-levels, it may have a positive washback effect on learner examination performance.
Accuracy would seem intuitively as the easiest way to assess language proficiency, but it is not necessarily so. Two common approaches to measuring accuracy involve: (1) calculating the ratio of errors in a text/discourse to number of units of production (e.g. words, clauses, sentences, T units) or (2) working out the proportion of error-free units of production. This is not without problems because it does not tell us much about the type of errors made; this may be crucial in determining the proficiency development of a learner. Imagine Learner 1 who has made ten errors with very advanced structures and Learner 2 who has made ten errors with very basic structures without attempting any of the advanced structures Learner 1 has made mistakes with. To evaluate these two learners’ levels of accuracy as equivalent would be unfair.
Moreover, this system may penalize learners who take a lot of risks in their output with highly challenging structures. So, for instance, an advanced student who tries out a lot of difficult structures (e.g. if –clauses, subjunctives or complex verbal subordination) may score less than someone of equivalent proficiency who ‘plays it safe’ and avoids taking risks. Would that be a fair way of assessing task performance/proficiency? Also, pedagogically speaking, this approach would be counter-productive in encouraging avoidance behavior rather than risk-taking, possibly the most powerful learning strategy ever.
Some scholars propose that errors should be graded in terms of gravity. So, errors that impede comprehension should be considered as more serious than errors which do not. But in terms of accuracy, errors are errors, regardless of their nature. We are dealing with two different constructs here, comprehensibility of output and accuracy of output.
Another problem with using accuracy as a measure of proficiency development is that learner output is compared with native like norms. However, this does not tell us much about the learner’s Interlanguage development; only with what degree of accuracy she/he handles specific language items.
Lambert (2014) reports another important issue pointed out by Bard et al.(1996):
In making grammaticality judgments, raters do not only respond to the grammaticality of sentences, but to other factors which include the estimated frequency with which the structure has been heard, the degree to which an utterance conforms to a prescriptive norm, and the degree to which the structure makes sense to the rater semantically or pragmatically. Such acceptability factors are difficult to separate from grammaticality even for experienced raters.
I am not ashamed to say that I have experienced this myself on several occasions as a rater of GCSE Italian oral exams. And to this day, I find it difficult not to let these three sources of bias skew my judgment.
3.1 Initial implications for teachers and assessment
Grammatical, lexical, phonological and orthographic accuracy are important aspects of proficiency included in all the examination assessment scales. MFL departments ought to collegially decide whether it should play an equally important or more or less important role in assessment than fluency/intelligibility and communication.
Also, once decided what constitute more complex and easier structures amongst the structures the curriculum purports to teach for productive use, teachers may want to choose to focus in assessment mostly or solely on the accuracy of those structures – as this may have a positive washback effect on learning.
MFL teams may also want to discuss to what extent one should assess accuracy in terms of number or types of mistakes or both. And whether mistakes with normally late acquired, more complex structures should be penalized considering that such assessment approach might encourage avoidance behavior.
Complexity is the most difficult construct to define and use to assess proficiency because it can refer to different aspects of performance and communication (e.g. lexical, interactional, grammatical, syntactic). For instance, are lexical and syntactic complexity two different aspects of the same performance or two different areas altogether? Some researchers (e.g. Skehan) think so and I tend to agree. So, how should a students’ oral or written performance exhibiting a complex use of vocabulary but a not so complex use of grammar structures or syntax be rated? Should evaluative scales then include two complexity traits, one for vocabulary and one for grammar/syntax? I think so.
Another problem pertains to what we take ‘complex’ to actually mean. Does complex mean…
- the number of criteria to be applied in order to arrive at the correct form‘ as Hulstijn and De Graaff (1994) posit? –In other words, how many steps the application of the underlying rule involves? (e.g. perfect sense in French or Italian with verbs requiring the auxiliary ‘to be’)
- variety? Meaning, that, in the presence of various alternatives, choosing the appropriate one flexibly and accurately across different contexts would be an index of high proficiency? (this is especially the case with lexis)
- cognitively demanding, challenging? Or
- acquired late in the acquisition process? (which is not always easy to determine)
All of the above dimensions of complexity pose serious challenges in their conceptualization and objective application to proficiency measurement.
Standard ways of operationalizing language complexity in L2 research have also focused on syntactic complexity, and especially on verbal subordination. In other words, researchers have analyzed L2 learner output by dividing the total number of finite and non-finite clauses by sentential units of analysis such as terminal units, communication units, speech, etc. One of the problems with this is that the number thus obtained is just a figure that tells us that one learner has used more verbal subordination than another but does not differentiate between types of subordination – so, if a learner uses less but more complex subordination than another, s/he will still be rated as using less complex language.
4.1 Implications for teachers
Complexity of learner output is a very desirable quality of learner output and a marker of progress in proficiency, especially when it goes hand in hand with high levels of fluency. However, in the absence of consensus as to what is complex and what is not, MFL departments may want to decide collegially on the criteria amongst the ones suggested above (e.g. variety, cognitive challenge, number of steps required to arrive at the correct form and lateness of acquisition) which they find most suitable for their learning contexts and curricular goals and constraints.
Also, they may want to consider splitting this construct into two strands, vocabulary complexity and grammatical complexity.
Finally, verbal subordination should be considered as a marker of complexity and emphasized with our learners. However, especially with more advanced learners (e.g. AS and A2) it may be useful to agree on what constitute more advanced and less advanced subordination.
In addition, since complexity of language does appear as an evaluative criterion in A-level examination assessment scales, teachers may want to query with the examination boards what complexity stands for and demand a detailed list of which grammar structures are considered as more or less complex.
Fluency, Accuracy and Complexity are very important constructs central to all approaches to the assessment of the two productive macro-skills, speaking and writing. In the absence of international consensus on how to define and measure them, MFL department must come together and discuss assessment philosophies, procedures and strategies to ensure that learner proficiency evaluation is as fair and valid as possible and matches the learning context they operate in. In taking such decisions, the washback effect on learning has to be considered.
Having only dealt with three of the ten issues outlined at the beginning of this post, the picture is far from being complete. What is clear is that there are no clear norms as yet, unless one decides to adopt in toto an existing assessment framework such as the CEFR’s (http://www.coe.int/t/dg4/education/elp/elp-reg/Source/Key_reference/Overview_CEFRscales_EN.pdf ). This means that MFL departments have the opportunity to make their own norms based on an informed understanding – to which I hope this post has contributed – of the FAC constructs and of the other crucial dimensions of L2 performance and proficiency assessment that I will deal with in future posts.