Analysis of Language Testing Essay Example | Topics and Well Written Essays

Running Head: LANGUAGE TESTING Language Testing s Language Testing Introduction The needs of the modern world have led to an increasing emphasis on oral skills in the teaching of foreign languages. However, though oral skills now figure largely in language teaching programmes at all levels, they are still generally regarded as much more difficult to test than written skills. There are obviously a large number of possible methods of testing oral skills, some of them testing specific areas, e.g. pronunciation, and some of them involving the use of recording equipment. The most common way of testing general oral proficiency, however, is by means of a direct oral test in the form of an interview, in which the examinee has to interact in some way with the examiner or examiners. This form of direct oral testing is known variously as "oral interview", "oral test", or "oral examination". I will refer to it here as "oral examination", as I am dealing mainly with examinations in a university context. The literature on language testing has identified a number of unsolved problems with oral examinations. Much of the discussion has centred on the issues of validity and reliability, but problems in the practical administration of oral examinations have also received comment. Reliability, Validity, And Practical Problems Associated With Oral Examinations The fundamental problems with oral examinations are those of reliability (i.e. the consistency with which different examiners mark the same test, or with which the same examiner marks a test on different occasions) and validity (i.e. whether or not an oral test assesses what it sets out to assess). The reliability of oral examinations has been seen as a serious problem right from the start of research on this topic. Spelberg et al. (2002) report very low correlations, averaging only .41, between the marks of different examiners, although Taguchi (2005) points out that "the nine examiners who marked sixteen candiates [ . . . ] in this study did not have marking schemes, were given no training, were unstandardized and were given no criteria for judging candidates ability", so the discrepancies in their judgements are perhaps not such a surprise. Spelberg (2000) describes the usual ways of testing oral ability as "impressions from memory or haphazard interviews" and writes that "the vast majority of cases [ . . . ] are not reliably separated into levels of speaking ability by this approach, because of the complexity of the language and non-language factors involved". Michael (2001) states that for tests based on free conversation "the problems of sampling, and reliable scoring are almost insoluble, unless a great deal of time and many standardized expert testers are available". Liying (2004) claims that oral tests "are impressions of the tester about the students speaking ability rather than accurate objective measures of speaking proficieny". Taguchi (2005) finds that "the fact that 2 examiners are required to rate the OIT [Oral Interview Test] indicates lack of confidence in the rating by one". Bachman (2005) reports that an investigation into oral examination results showed that "different candidates were failed by different examiners while being awarded a Pass or even a Distinction by the majority of the other examiners" and finds that the sex of the candidate was a major factor in the results. Richard (2004) report a "remarkable between-rater variance" in their study of oral examinations in the German Abitur (Matriculation Examination). However, recent detailed studies of the reliability of oral language tests have produced rather more positive results. Much of the research has been on the American Foreign Service Institute Oral Interview Test. Reporting on this, Liying (2005) states that "the problems inherent in the system do not include reliability among rathers of the same performance", although it is not possible to ascertain to what extent different sets of interviewers succeed in eliciting similar performances from candidates. In a remark in the discussion of this paper, Jones reports that a cross-agency study in three different languages showed inter-rater reliability to be very good (Liying 2005). Brown (2005) studied agreement among raters of the FSI Oral Interview and found a high level of agreement, in all cases correlations between ratings exceeded .82 with an average correlation of .91. Taguchi (2005) reports that "more direct measures of oral language proficiency may be as reliable as less direct but more structured standardized tests". Spelberg (2002) sums up the research on the OI, stating that "research with the FSI oral testing procedure [ . . . ] has indicated that trained judges can render highly reliable evaluations of samples of speech acquired in oral interview settings. Research on other oral tests has produced similar positive results. In a study of a GCE O level oral test in Italian, Taguchi (2005) reports a very high mark/re-mark coefficient of .91. Dana (2004) found that while the factors speech style and topic significantly affected students scores in oral tests, the occasion and the interviewer did not. Dana (2004) found that interrater reliability varies according to the type of test: for the oral interview it was .91, for a reporting test .81, for role play .76 and for group discussion .73. Shohamy sums up her views very clearly "research has repeatedly shown that rater reliability is relatively high". It thus seems fair to say that while there is no reason for complacency about problems of reliability, the earlier suspicions were probably exaggerated and that with careful preparation and administration quite high reliability ratings can be achieved in the direct testing of oral skills. Validity (i.e. the degree to which a test assesses what it claims to assess) is usually considered less of a problem, presumably because the face validity of an oral examination is very high. As McNamara (2005) points out, it is difficult to imagine any other way of measuring oral proficiency with a higher level of face validity. Taguchi (2005) also sees face validity as high, but mentions problems with what he calls "content-of-sample validity", by which he means that the sample of language elicited in the test may be quite limited, for instance examinees may not have the opportunity to ask questions, make requests, etc. Dana (2004) stresses the desirability of further studies to determine the content and construct validity of oral examinations. Taguchi (2005) suggests validating oral examinations by means of a comparison with extended interviews, in which an adequate sampling of situations and language is guaranteed, but detailed studies of this nature have to my knowledge not yet been undertaken. It is interesting to note that surrogate tests of oral proficiency (see below) are sometimes validated by comparison with the results of oral examinations (e.g. Dana 2004, Brown 2004), but until oral examinations themselves have been adequately validated, for instance along the lines suggested by Taguchi, there is a great deal of uncertainty about such a procedure. Oral examinations have also been criticised for practical reasons. They are said to be laborious, time-consuming, costly and difficult to administer, e.g. Taguchi (2005) and Liying (2004), McNamara (2005), Brown (2002), Taguchi (2005), although evidence for these claims in the form of detailed studies has not been presented. Surrogate Tests Of Oral Skills Dissatisfaction with oral examinations has led to the search for alternative ways of testing oral skills by means of "objective" tests, which are simple to administer and very quick to mark. Such tests work well for receptive skills, and the testing of listening comprehension by means of a recording and a number of objective (e.g. multiple choice) questions, is now well-established and highly reliable. More recently, ways of examining productive skills by means of pencil-and-paper tests, especially cloze tests, have been studied, e.g. by Liying (2004), Richard (2004), and Brown (2005). These authors all found relatively high correlations between cloze tests and more direct measures of speaking proficiency. Compared to objective tests like multiple choice or cloze, oral examinations are of course time-consuming to administer, but not all oral, or indeed written, skills can be tested by straightforward objective tests. In writing, essay-type questions are very time-consuming to mark, and also pose problems of rater reliability very similar to those of oral examinations.[And yet essay-type questions are considered indispensible in judging the written skills of advanced learners. Whether a short surrogate test or a full, relatively costly direct oral examination is administered will depend crucially on the purpose of the test. Tests can be used for the purposes of prediction, assessment and diagnosis, i.e. for predicting students future performance by assigning them to groups of varying levels (placement tests), for assessing proficiency or achievement when determining marks at end-of-year or end-of-course examinations, and for diagnosing students individual weakness so that steps may be taken to overcome them. Quick, inexpensive surrogate tests would seem best suited to predictive purposes, as the results of a placement test can be checked by the performance of the students in the groups to which they are allocated, and it is a relatively simple matter for students to be reallocated to more suitable groups if this proves necessary. For assessing proficiency or achievement, however, surrogate tests appear much less suitable. The process of determining marks, especially degree classes at the end of a long course of study, is far too important to be determined on the basis of cost or convenience. What is needed here is the most accurate feasible test. For the purposes of diagnosis, finally, it is hard to see how surrogate written tests could ever be developed into a useful tool. At present and for the foreseeable future the only way to diagnose weaknesses in the overall oral use of language is to test directly. The effect that tests and examinations have on students should not be forgotten in this discussion. One important effect is to motivate the student to become as proficient as possible in the skills that are to be tested. Richard (2004) quite rightly points out that if students oral skills are tested by some indirect means such as a cloze test, the effect may well be to motivate the students and teachers alike to give more attention to practising cloze tests and to neglect the practice and teaching of speaking. A Reassessment Of The Practical Problems Of Oral Examinations It thus seems that there are certain purposes for which we have no adequate alternative to a direct oral examination if we wish to assess oral proficiency, and that there are also pedagogical (motivational) reasons for favouring direct testing. With these points in mind, it seems appropriate to re-examine some of the practical aspects of oral examinations. It is only in comparison with "objective" tests that oral examinations appear particularly time-consuming. When compared with essay-type questions, the time involved in administering and marking oral examinations does not appear so excessive. Most oral examinations are short, 15-20 minutes, and as the time needed for preparation is hardly longer than for setting a written examination, and marking time is generally much shorter, it is hard to see why oral examinations should be considered especially time-consuming. It is generally thought desirable to have two assessors present at oral examinations, but this does not necessarily imply a lack of confidence in the rating by one, as Taguchi (2005) claims, but it is important so that the two distinct roles of interviewer and assessor do not overtax a single examiner. As Liying (2005) points out, "the examiner testing alone is likely to lose both his skills as an interviewer and his perceptiveness as an observer to a degree that cannot be justified on the grounds of economy". Even though this does add considerably to the cost of administering the test in terms of examiners time, it means that much of the marking can be carried out while the examination is in progress without loss of reliabilty, especially when using an analytic marking sheet with clearly defined categories, which in turn means that almost no additional marking time is needed. When comparing the time involved in administering oral and written examinations, it should not be forgotten that some written examinations, especially essays, are also commonly double-marked because of the number of factors involved and doubts about the reliability of marks. These reasons are, of course, very similar to those given in favour of having more than one assessor present at oral examinations. These types of examination are similar in that they involve communicative as well as more strictly linguistic elements, and marks are based on a combination of these elements. It is this factor, as much as any special features of the oral medium, which makes it impossible to replace certain types of examination by objective tests and which also makes it difficult to reduce the time needed to administer such examinations. Implicit in much of the discussion about the time involved in testing oral skills directly (and indeed about the reliability of oral examinations) is the assumption that the sample produced by the examinee in a typical short oral examination is extremely limited and provides an unsatisfactory basis for assessment. This point is made explicitly e.g. by Michael (2001) and by Richard (2004): "Examiners are discontented with the limited information a short oral provides". The only way to enlarge the sample is to have a longer examination, which would make problems with examiners time more acute. However, one factor which must be taken into account here is the great speed of oral production in comparison to writing. Conclusion In terms of quantity, it is thus clearly not justified to single out oral examinations for doubts on whether the sample of language produced by the candidate provides an adequate basis for assessment. The question naturally arises as to how the quality of the oral sample compares to that of the written sample. It is not possible to give precise figures in a comparison of quality, but it is clear that the written sample will show greater structural complexity and variety than the oral sample. This, of course, has nothing to do with the testing situation, but is rather a general difference between spoken and written language. In writing there is a greater need to express connections between sentences clearly, e.g. by use of subordinating conjunctions, whereas in speech the connections are frequently obvious from the situation. There is also more time for the writer to formulate carefully and pay attention to questions of style (e.g. by avoiding repetitions and varying both vocabulary and syntactic structures), things which are much more difficult in spoken language because of the speed of production. It would therefore be unrealistic to expect the same level of style and syntactic complexity in the oral as in the written sample. What we are testing in an oral examination is the ability of candidates to handle spoken language, and we should not demand that they demonstrate the same structural range in speech as in writing. Other aspects of language, e.g. the vocabulary of a special topic, may actually be easier to test in oral than in written examinations because of the interactive situation, i.e. examiners have the opportunity to ask specific questions and home in on areas in which it seems as though the candidate might have special strengths or weaknesses. However, even on its own, a short oral examination provides a basis of assessment which is smaller, but not dramatically smaller than that provided by much longer written examinations. As we have also seen that the time involved in administering a short oral examination is by no means prohibitive in comparison with some types of examination, e.g. essay-type questions, this study indicates that there is no real need to avoid the direct testing of oral skills when the accuracy of the test is the most important consideration (as it is for the purposes of assessment and diagnosis). The use of direct oral examinations for these purposes is supported by motivational considerations. On the other hand, for the purposes of placement tests, where there is a chance to revise the decision on the basis of a students later performance, approximate measures such as are provided by cloze tests will probably suffice in most cases. Once it is realised, however, that the basis for assessment of oral skills is rather similar to that of written skills, one of the main reasons for undervaluing oral skills in examinations will have disappeared. Confidence in the basis of assessment is one of the factors (along with developments in oral examination techniques and further studies in the validity of oral examinations) which will help to overcome the reluctance to award marks for oral skills in proportion to their importance in the language learning process. References Bachman, L.F. (2005) Building and supporting a case for test use. Language Assessment Quarterly 2: 1–34 Brown, A. (2005) Interviewer variability in language proficiency interviews. Frankfurt: Peter Lang. Brown, J.D. and Hudson, T. (2002): Criterion-referenced language testing. New York, NY: Cambridge University Press. Dana R. Ferris, John S. Hedgcock. (2004) Teaching ESL Composition: Purpose, Process, and Practice. Lawrence Erlbaum Associates Liying Cheng, Yoshinori Watanabe, Andy Curtis. (2004) Washback in Language Testing: Research Contexts and Methods. Lawrence Erlbaum Associates. McNamara, T.F. (2005) 21st century shibboleth: language tests, identity and intergroup conflict. Language Policy 4.4: 1–20. Michael Byram. (2001) Routledge Encyclopedia of Language Teaching and Learning. Publisher: Routledge. Richard P. Phelps. (2004) Defending Standardized Testing. Lawrence Erlbaum Associates Spelberg, H, de Boer, P. and van den Bos, K. (2000): Item type comparisons of language comprehension tests. Language Testing 17, 311–22. Taguchi, N. (2005) Comprehending implied meaning in English as a foreign language. Modern Language Journal 89.4: 543–62. Read More

Analysis of Language Testing - Essay Example

Extract of sample "Analysis of Language Testing"

CHECK THESE SAMPLES OF Analysis of Language Testing

English Language Assessment and Testing

Python Programming Language

The History and Place of English in the US Education System

Relationship between Instant Texting and Language Decline

Testing Web Services

Language Learning and Testing

Learning English as a Second Language

Grammar Teaching and Learning- An Input versus Output Approach