VALIDITY, RELIABILITY AND PRACTICALITY OF THE FIRST CERTIFICATION IN ENGLISH (FCE) AND THE BUSINESS LANGUAGE TESTING SERVICE (BULATS)
DOI:
https://doi.org/10.24256/ideas.v6i2.520Keywords:
evaluation, validity, reliability, practicality, first certification in English, business language testing serviceAbstract
This paper begins with the test specifications of the two tests – the First Certification in English (FCE) and the Business Language Testing Service (BULATS). It will then go on to the evaluation of the test usefulness: reliability, (construct) validity, backwash, and practicality (Bachman & Palmer, 1996; see Kunnan, 2004 for a slightly different perspective). This paper explores the test specifications at the outset in that a test would be evaluated (as estimated) based on the test purpose and construct that it is trying to measure (Luoma, 2004). To begin the evaluation, the test (score) reliability would be evaluated first, for a test would not be considered valid if it is not reliable (Brown, 1996; but see Moss, 1994 when a test could be valid without reliability). Throughout this paper, the term “test(ing)†will be used more or less synonymously with “assess(ment)†and “measure(ment)â€, in that Bachman and Palmer point out that in the field of language testing these terms have been very broadly defined “as the process of collecting information†to make decisions (2010, p. 20). (See Bachman, 1990; Cohen & Swedlik, 2010; Douglas, 2010 for the distinctions, e.g., a test is a tool for assessment.)
References
Alderson, J. C., & Wall, D. (1993). Does washback exist? Applied Linguistics, 14(1), 115–129.
Alderson, J.C., Clapham, C & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press.
American Educational Research Association, American Psychological Association and National Council on Measurement in Education. (1999). Standards for Educational and psychological testing. Washington, DC: American Educational Research Association
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford: Oxford University Press.
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice. Oxford: Oxford. University Press.
Bonk, W. J. & G. J. Ockey. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing 20(1), 89–110.
Brown, J. D. (1990). Short-cut estimators of criterion-referenced test consistency. Language Testing, 7(1), 77-97.
Brown, J. D. (1995). The elements of language curriculum: A systematic approach to program design. Boston: Heinle & Heinle
Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: Prentice Hall Regents
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language assessment. New York, NY: McGraw Hill College.
Brown, J. D., & Hudson, T. (2002). Criterion-referenced language testing. Cambridge: Cambridge University Press.
Carr, N. T. (2011). Designing and analyzing language tests. Oxford: Oxford University Press.
Chambers, L and Ingham, K. (2011). The BULATS online speaking test. Research Notes 43(1), 21–25.
Chapelle, C. (1998). Construct definition and validity inquiry in SLA research. In Bachman, L. and Cohen, A., (eds), Interfaces between second language acquisition and language testing research. Cambridge: Cambridge University Press, 32–70.
Cohen, R. J., & Swerdlik, M. E. (2010). Psychological testing and assessment: An introduction to tests and measurement (7th ed.). New York: McGraw-Hill.
Cronbach, L. J., & Meehl, P. E. (2010). Construct validity in psychological tests. Psychological Bulletin, 52(1), 281-302
Davies, A. (1990). Principles of language testing. Oxford: Blackwell.
Davies, A. (2003). Three heresies of language testing research. Language Testing 20(4), 355-368.
Davies, A., Brown, A., Elder, C. and Hill, K. (1999). Dictionary of Language Testing. Cambridge: Cambridge University Press.
Douglas, D. (2010). Understanding language testing. New York: Routledge
Ennis, R. H. (1999). Test reliability: a practical exemplification of ordinary language philosophy. In R. Curren (ed.), Philosophy of education. Urbana, IL: The Philosophy of Education Society, 242–48.
Fulcher, G. (2003). Testing second language speaking. London, UK: Pearson-Longman
Galaczi, E. D. (2008). Peer–peer interaction in a speaking test: The case of the First Certificate in English examination. Language Assessment Quarterly, 5(2), 89-119.
Green, A. (2014). Exploring language assessment and testing: language in action. New York: Routledge.
Hackett, E. (2002). Revising the BULATS standard test. Research Notes 8(1), 7-10
Hughes, A. (2003). Testing for language teachers. Cambridge: Cambridge University Press.
Jones, N (2000). BULATS: a case study comparing computer-based and paper-and-pencil tests. Research Notes 3, 10–13.
Kunnan, A. J. (2004). Test fairness. In M. Milanovic and C. Weir (Eds.), European language testing in a global context. Cambridge: Cambridge University Press.
Lado, R. (1961). Language testing. London: Longman.
Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University Press.
McNamara, T. (1996). Measuring Second Language Performance. London: Longman.
McNamara, T. F. & Roever, C. (2006). Language testing: The social dimension. Oxford: Blackwell.
Messick, S. (1989). Validity. In R. Linn (ed.), Educational Measurement. New York: Macmillan, pp. 13–103
Moss, P. A. (1994). Can there be validity without reliability?. Educational researcher, 23(2), 5-12.
Newman, I., Newman, C., Brown, R., & McNeely, S. (2006). Conceptual statistics for beginners (3rd ed.). Lanham, MD: University Press of America.
Orr, M. (2002). The FCE speaking test: Using rater reports to help interpret test scores. System, 30(2), 143-154.
O'Sullivan, B., Weir, C. J. and Saville, N. (2002). Using observation checklists to validate speaking-test tasks. Language Testing 19(1): 33–56.
Shepard, L. A. (1993). Evaluating test validity. Review of Research in Education, 19(1), 405–. 450.
Thorndike, R. L., & Hagen, E. P. (1977). Measurement and evaluation in psychology and education. New York: Wiley.
UCLES (University of Cambridge Local Examinations Syndicate). (2015). Cambridge English First: Handbook for Teachers. Cambridge: University of Cambridge Local Examinations Syndicate.
UCLES (University of Cambridge Local Examinations Syndicate). (2011). BULATS Business Language Testing Service: Information for Candidates. Cambridge: University of Cambridge Local Examinations Syndicate.
UCLES (University of Cambridge Local Examinations Syndicate). 2013. Principles of good practice quality management and validation in language assessment: Validity, reliability, impact, practicality. Cambridge: University of Cambridge Local Examinations Syndicate.
Wall, D. (1997). Impact and washback in language testing, in C. Clapham and D. Corson (eds.), Encyclopedia of Language and Education, 291–302
Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press.
Weir, C. J. (2005). Language testing and validation: An evidence based approach. Houndgrave, Hampshire: Palgrave MacMillan.
Wiliam, D. (1993). Validity, dependability and reliability in national curriculum assessment. The Curriculum Journal, 4(3), 335-350.
Downloads
Published
Issue
Section
Citation Check
License
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under an Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See the Effect of Open Access)