Licence to drive : the importance of reliability for the validity of the Swedish driving licence test

Sammanfattning: Background: The Swedish driving licence test is a criterion-referenced test resulting in a pass or fail. It currently consists of two parts - a theory test with 65 multiple-choice items and a practical driving test where at least 25 minutes are spent driving in traffic. It is a high-stakes test in the sense that the results are used to determine whether the test-taker should be allowed to drive a car without supervision. As the only other requirements for obtaining a licence is a few hours of hazard education (and a short introduction if you intend to drive with a lay instructor) it is important that the test result, in terms of pass or fail, is reliable and valid. If this is not the case it could have detrimental effects on traffic safety. Examining all relevant aspects is beyond the scope of this licentiate thesis so I have focused on reliability.Methods Reliability for both the theoretical and practical test results was examined. As these are very different types of tests the types of reliability examined also differed. In order to examine inter-rater reliability of the driving test 83 examiners were accompanied by one of five selected supervising examiners for a day of tests. All in all 535 tests were conducted with two examiners assessing the same performance. At the end of the day the examiners compared notes and tried to determine the reason for any inconsistencies. Both examiners and students also filled in questionnaires with questions about background and preparation. As for studying decision consistency and decision accuracy of the theory test, three test versions (a total of around 12,000 tests) were examined with the help of methods devised by Subkoviak (Subkoviak, 1976, 1988) and Hanson & Brennan (Brennan, 2004; Hanson & Brennan, 1990).Results The results from two research studies concerning reliability were presented. Study I focused on inter-rater reliability in the driving test and in 93 per cent of cases the examiners made the same assessment. For the tests where their opinions differed there was no correlation to any of the background variables or other variables examined except for three, which had logical explanations and did not constitute a problem. Although there were cases where the differences were due to different stances on matters of interpretation the most common suggested cause was the placement in the car (back seat vs. front seat). Although the supervising examiners gave both praise and criticism as to how the test was carried out the study does not answer the question whether the tests were equal in terms of composition and difficulty.In Study II the focus was on decision consistency and decision accuracy in the theory test. Three versions of the theory tests were examined and, on the whole, found to be fairly similar in terms of item difficulty and score distribution, but the mean was so close to the cut-score (i.e. the score required to pass) that the pass rate differed somewhat between versions. Agreement coefficients were around .80 for all test versions (between .79 and .82 depending on method). Classification accuracy indicated an .87 probability of a correct classification.Conclusion It is important to examine the reliability and validity of the driving licence test since a misclassification can have serious consequences in terms of traffic safety. In the studies included here the rate of agreement between examiners is deemed as satisfactory. It would be preferable if the classification consistency and classification accuracy, as estimated by the methods used, were higher for the theory test, given its importance.While reliability in terms of agreement between raters/examiners or consistency and accuracy of classification are routinely examined in other contexts, such as large-scale educational testing, this is not often done for the driving licence tests. At the same time, the methods used here can be transferred to contexts where such properties are generally not examined. Collecting information about test-takers and examiners, like in Study I, can provide evidence concerning possible bias.Examining to what extent decisions are consistent is one important aspect of collecting evidence that shows that test results can be used to draw conclusions about driver competence. Still, regardless of outcome, validation is a process that never ends. There is always reason to examine various aspects and make further improvements. There are also many other relevant aspects to examine. A prerequisite for the validity of the score interpretation of a criterion-referenced test like this one is that the cut-score is appropriate and the content relevant. This should therefore be the subject of further research as the validation process continues.