Ask and distract : Data-driven methods for the automatic generation of multiple-choice reading comprehension questions from Swedish texts

Sammanfattning: Multiple choice questions (MCQs) are widely used for summative assessmentin many different subjects. The tasks in this format are particularly appealingbecause they can be graded swiftly and automatically. However, the processof creating MCQs is far from swift or automatic and requires a lot of expertiseboth in the specific subject and also in test construction.This thesis focuses on exploring methods for the automatic MCQ gen-eration for assessing the reading comprehension abilities of second-languagelearners of Swedish. We lay the foundations for the MCQ generation researchfor Swedish by collecting two datasets of reading comprehension MCQs, anddesigning and developing methods for generating the whole MCQs or its parts.An important contribution is the methods (which were designed and appliedin practice) for the automatic and human evaluation of the generated MCQs.The best currently available method (as of June 2023) for generatingMCQs for assessing reading comprehension in Swedish is ChatGPT (althoughstill only around 60% of generated MCQs were judged acceptable). However,ChatGPT is neither open-source, nor free. The best open-source and free-to-use method is the fine-tuned version of SweCTRL-Mini, a foundational modeldeveloped as a part of this thesis. Nevertheless, all explored methods are farfrom being useful in practice but the reported results provide a good startingpoint for future research.