A shortcut in language testing: Predicting the score for paper-based TOEFL based on one sub-score

Samsul Anwar(1), Faisal Mustafa(2*),

(1) Universitas Syiah Kuala
(2) Universitas Syiah Kuala
(*) Corresponding Author

DOI: https://doi.org/10.26858/ijole.v5i3.16200


Using standardized tests such as paper-based TOEFL with three subtests for classroom assessment is restricted by the length of the test, which is usually longer than the class duration. Therefore, it is significant to be able to predict other subtests by conducting only one subtest. Therefore, the current study aimed to calculate prediction coefficients, enabling teachers to predict scores in paper-based TOEFL by conducting only one subtest. The data to create the prediction models were obtained from 2,030 scores of Institutional TOEFL, i.e. paper-based TOEFL without writing subtest. The prediction coefficient was calculated by using linear regression analysis. The result shows that the listening comprehension sub-score predicts the TOEFL score more accurately (MSE of 520) than other sub-scores (MSE of 553 and 587). The intercept for listening comprehension sub-score was 373.07, 357.14 for structure & written expression, and 364.19 for reading comprehension. In addition, the slope for each sub-score was 4.07, 5.96, and 4.63, respectively. Therefore, a listening test should be used in predicting the overall TOEFL scores for an accurate prediction.


paper-based TOEFL; score prediction; linear regression model; language testing; correlational study

Full Text:



Alavi, S. M., & Akbarian, I. (2012). The role of vocabulary size in predicting performance on TOEFL reading item types. System, 40(3), 376–385. https://doi.org/10.1016/j.system.2012.07.002

Alderson, J. C. (2005). Diagnosing foreign language proficiency: The interface between learning and assessment. Continuum.

Ananda, R. (2016). Problems with section two ITP TOEFL test. Studies in English Language and Education, 3(1), 37–51. https://doi.org/10.17969/siele.v3i1.3387

Mustafa, F. & Anwar, S. (2018). Distinguishing TOEFL score: What is the lowest score considered a TOEFL score? Pertanika Journal of Social Sciences and Humanities, 26(3),1995–2008.

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests. Oxford University Press.

Bailey, A. L. (2017). Assessing the language of young learners. In E. Shohamy, L. G. Or, & S. May (Eds.), Language Testing and Assessment (pp. 323–342). Springer International Publishing. https://doi.org/10.1007/978-3-319-02261-1_22

Best, J. W., & Kahn, J. V. (2006). Research in education (10th ed.). Pearson Education Inc.

Blasco, M. E. (2015). A cognitive linguistic analysis of the cooking domain and its implementation in the EFL classroom as a way of enhancing effective vocabulary teaching. Procedia - Social and Behavioral Sciences, 178, 70–77. https://doi.org/10.1016/j.sbspro.2015.03.149

Bowerman, B. L., O’Connell, R. T., & Koehler, A. B. (2005). Forecasting, time series, and regression: An applied approach. Thomson Brooks/Cole. https://books.google.co.id/books?id=2Yc_AQAAIAAJ

Brown, H. D. (2004). Language assessment: Principles and classroom practices. Longman.

Brown, J. D. (1996). Testing in language programs. Prentice Hall Regents.

Chatfield, C. (2000). Time-series forecasting. Chapman & Hall/CRC.

Cohen, A. D., & Upton, T. A. (2007). `I want to go back to the text’: Response strategies on the reading subtest of the new TOEFL(R). Language Testing, 24(2), 209–250.

Coniam, D. (2009). Investigating the quality of teacher-produced tests for EFL students and the effects of training in test development principles and practices on improving test quality. System, 37(2), 226–242. https://doi.org/10.1016/j.system.2008.11.008

Dancey, C., & Reidy, J. (2011). Statistics without maths for psychology. In Book. http://books.google.com/books?hl=en&lr=&id=QjfQ0_DqyNQC&oi=fnd&pg=PR16&dq=Statistics+Without+Maths+for+Psychology&ots=5PBfHf-mB-&sig=XUC1_n2l4AVh3o_qgCh7wE8FmuY

DeMauro, G. (1992). Examination of the relationships among TSE, TWE and TOEFL scores. Language Testing, 9(2), 149–161. https://doi.org/10.1177/026553229200900203

Douglas, D. (2010). Understanding language testing (B. Comrie & G. Corbett (eds.)). Routledge.

Droop, M., & Verhoeven, L. (2003). Language proficiency and reading ability in first- and second-language learners. Reading Research Quarterly, 38(1), 78–103. https://doi.org/10.1598/RRQ.38.1.4

Educational Testing Service. (2013). Official guide to TOEFL ITP test. Educational Testing Service.

ETS. (2011). Test and score data summary for TOEFL® Internet-based and paper-based tests: January 2010 - December 2010 test data. https://www.ets.org/Media/Research/pdf/TOEFL-SUM-2010.pdf

Fulcher, G. (2010). Practical language testing. Hodder Education. https://doi.org/10.4324/9780203767399

Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource book. Routledge. https://doi.org/10.1177/026553229301000104

Furwana, D. (2019). Validity and reliability of teacher-made English summative test at second grade of Vocational High School 2 Palopo. Language Circle: Journal of Language and Literature, 13(2). https://doi.org/10.15294/lc.v13i2.18967

Gass, S. M., & Selinker, L. (2008). Second language acquisition: An introductory course (2nd ed.). Routledge Taylor & Francis Group.

Graham, S. (2011). Self-efficacy and academic listening. Journal of English for Academic Purposes, 10(2), 113–117. https://doi.org/10.1016/j.jeap.2011.04.001

Green, A. (2014). Exploring language assessment and testing: Language in action. Routledge Taylor & Francis Group. https://doi.org/10.4324/9781315889627

Hambleton, R. K., Swaminathan, H., & Rogers, D. J. (1991). Fundamentals of item response theory. Sage Publications.

Hatch, E., & Lazaraton, A. (1991). The research manual: Research design and statistics for applied linguistics. In The Modern Language Journal. Heinle & Heinle Publishers. https://doi.org/10.2307/327087

Henning, G. (1987). A Guide to language testing: Development, evaluation and research. Foreign Language Teaching and Research Press.

Ing, L. M., Musah, M. B., Al-Hudawi, S. H. V., Tahir, L. M., & Kamil, N. M. (2015). Validity of teacher-made assessment: A table of specification approach. Asian Social Science, 11(5), 193–200. https://doi.org/10.5539/ass.v11n5p193

Iwashita, N. (2018). Grammar and language proficiency. In J. I. Liontas (Ed.), The TESOL Encyclopedia of English Language Teaching (pp. 1–7). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118784235.eelt0069

Kelley, K., & Bolin, J. H. (2013). Multiple regression. In T. Teo (Ed.), Handbook of quantitative methods for educational research (pp. 71–101). Sense Publishers.

Kothari, C. R. (2004). Research methodology: Methods and techniques (2nd Ed). New Age International (P) Ltd.

Kusumawardani, S. A., & Mardiyani, E. (2018). the Correlation Between English Grammar Competence and Speaking Fluency. PROJECT (Professional Journal of English Education), 1(6), 724–733. https://doi.org/10.22460/project.v1i6.p724-733

Li, W., & Renandya, W. A. (2012). Effective approaches to teaching listening: Chinese EFL teachers’ perspectives. Journal of Asia TEFL, 9(4), 79–111.

Liskinasih, A., & Lutviana, R. (2016). The validity evidence of TOEFL test as placement test. Jurnal Ilmiah Bahasa Dan Sastra, 3(2), 173–180. https://doi.org/10.21067/jibs.v3i2.1513

Lyons, P., & Doueck, H. J. (2010). The dissertation: From beginning to end. Oxford University Press.

Mackey, A., & Gass, S. M. (2005). Second language research: Methodology and design. Lawrence Erlbaum Associates. http://medcontent.metapress.com/index/A65RM03P4874243N.pdf%5Cnhttp://books.google.com/books?hl=en&lr=&id=b3CxLrJ_1pYC&oi=fnd&pg=PP1&dq=Second+Language+Research+Methodologie+and+Design&ots=GB2Lp7MNqy&sig=Hcm9uWbR6Zf27VYO2YlrfH85_0M

Madehang. (2018). The analysis of the English teacher-made tests based on the taxonomy of instructional bbjectives in the cognitive domain at the state senior secondary schools in Palopo. Asian EFL Journal, 50(5), 221–227.

McQuillan, J. (2019). Where do we get our academic vocabulary? Comparing the efficiency of direct instruction and free voluntary reading. Reading Matrix: An International Online Journal, 19(1), 129–138.

Mehrpour, S., & Rahimi, M. (2010). The impact of general and specific vocabulary knowledge on reading and listening comprehension: A case of Iranian EFL learners. System, 38(2), 292–300. https://doi.org/https://doi.org/10.1016/j.system.2010.01.004

Mendenhall, W. I., Beaver, R. J., & Beaver, B. M. (2013). Introduction to Probaility and Statistics. https://doi.org/10.1017/CBO9781107415324.004

Miralpeix, I., & Muñoz, C. (2018). Receptive vocabulary size and its relationship to EFL language skills. IRAL - International Review of Applied Linguistics in Language Teaching, 56(1), 1–24. https://doi.org/10.1515/iral-2017-0016

Nurhalimah, N., Fahriany, F., & Dadan, D. (2019). Determining the quality of English teacher-made test: How excellent is excellent? Indonesia. Indonesiann EFL Journal: Journal of ELT, Linguistics, and Literature, 5(1), 24–38.

Ölmezer-Öztürk, E., & Aydin, B. (2018). Investigating language assessment knowledge of EFL teachers. Hacettepe University Journal of Education, 34(3), 1–19. https://doi.org/10.16986/HUJE.2018043465

Pratiwi, N. P. W., Dewi, N. L. P. E. S., & Paramartha, A. A. G. Y. (2019). The reflection of HOTS in EFL teachers’ summative assessment. Journal of Education Research and Evaluation, 3(3), 127–133. https://doi.org/10.23887/jere.v3i3.21853

Quaigrain, K., & Arhin, A. K. (2017). Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Education, 4(1), 1–11. https://doi.org/10.1080/2331186X.2017.1301013

Ramanarayanan, V., Chen, L., Leong, C. W., Feng, G., & Suendermann-Oeft, D. (2015). An analysis of time-aggregated and time-series features for scoring different aspects of multimodal presentation data. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 1373–1377.

Rohmah, N. (2019). Validity and reliability study on teacher-made assessment for English mid-term examination. Proceedings of the Eleventh Conference on Applied Linguistics (CONAPLIN 2018), 254, 107–110. https://doi.org/10.2991/conaplin-18.2019.236

Rost, M. (2014). Exploring EFL Fluency in Asia. In T. Muller, J. Adamson, P. S. Brown, & S. Herder (Eds.), Exploring EFL Fluency in Asia. Palgrave Macmillan UK. https://doi.org/10.1057/9781137449405

Sen, Y., & Kuleli, M. (2015). The Effect of Vocabulary Size and Vocabulary Depth on Reading in EFL Context. Procedia - Social and Behavioral Sciences, 199, 555–562. https://doi.org/10.1016/j.sbspro.2015.07.546

Shin, Y. K., & Kim, Y. J. (2017). Using lexical bundles to teach articles to L2 English learners of different proficiencies. System, 69, 79–91. https://doi.org/10.1016/j.system.2017.08.002

Sulistyo, T., Eltris, K. P. N., Mafulah, S., Budianto, S., Saiful, S., & Heriyawati, D. F. (2020). Portfolio assessment: Learning outcomes and students’ attitudes. Studies in English Language and Education, 7(1), 141–153. https://doi.org/10.24815/siele.v7i1.15169

Tannenbaum, R. J., & Baron, P. A. (2011). Mapping the TOEFL® ITP tests onto the Common European Framework of Reference.

Taufiq, W., Santoso, D. R., & Fediyanto, N. (2018). Critical analysis on TOEFL ITP as a language assessment. Advances in Social Science, Education and Humanities Research, 125, 226–229. https://doi.org/10.2991/icigr-17.2018.55

Triastuti, A. (2020). Assessing english pre-service teachers’ knowledge base of teaching: Linking knowledge and self-portrayal. TEFLIN Journal, 31(1), 108–138. https://doi.org/10.15639/teflinjournal.v31i1/108-138

VanPatten, B. (2015). Input processing in adult SLA. In B. VanPatten & J. Williams (Eds.), Theories in Second Language Acquisition: An Introduction (2nd ed., pp. 113–134). Routledge Taylor & Francis Group. https://doi.org/10.4324/9780203628942-12

Wang, Y., & Treffers-Daller, J. (2017). Explaining listening comprehension among L2 learners of English: The contribution of general language proficiency, vocabulary knowledge and metacognitive awareness. System, 65, 139–150. https://doi.org/10.1016/j.system.2016.12.013

Way, W. D., & Reese, C. M. (1991). An investigation of the use of simplified IRT models for scaling and equating the TOEFL test. ETS Research Report Series.

Williams, C. H. (2017). Teaching English in East Asia: A teacher’s guide to Chinese, Japanese, and Korean learners. Springer Nature Singapore Pte Ltd.

Zechner, K., Higgins, D., & Xi, X. (2007). SpeechRaterTM: A construct-driven approach to scoring spontaneous non-native speech. Proceedings of the 2007 Workshop of the International Speech Communication Association (ISCA) Special Interest Group on Speech and Language Technology in Education (SLaTE2007), 128–131.

Article Metrics

Abstract view : 555 times | PDF view : 107 times


  • There are currently no refbacks.

License URL: https://creativecommons.org/




Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.