Quality And Accountability

Our Unwavering Commitment to Quality

Our staff are our greatest asset in upholding our commitment to excellence. Through rigorous quality systems that undergo independent checks and adhere to international standards, we ensure accountability and inspire confidence in the Manchester Exam.

Quality Management and Validation in Language Assessment

The Manchester Exam encompasses systems and processes that drive our pursuit of excellence and continuous improvement. While these systems integrate complex research and technology, our underlying philosophy is straightforward:

Validity: Are our exams an authentic measure of real-life English proficiency?

Reliability: Do our exams consistently and fairly evaluate all candidates?

Impact: Do our assessments positively influence teaching and learning?

Practicality: Do our assessments meet learners’ needs within available resources?

Quality: How do we plan, deliver, and verify excellence in all these areas?

Reliability as an aspect of exam quality :

Reliability and validity are the two most crucial properties of an examination. It is a general principle that, in any examination context, one must maximize both validity and reliability to yield the most beneficial results for exam users, while adhering to practical constraints.

Manchester Exam holds the view that reliability is an essential component of validity; there can be no validity without reliability. Therefore, any approach to estimating reliability must consider potential sources of evidence for the construct validity of the Exams.

Reliability, typically expressed as a value between 0 and 1, indicates the replicability of exam scores when:

The same exam is administered multiple times to the same group of people.
Two exam constructed in the same manner are given to the same group.
The same performance is evaluated independently by two different examiners.
The expectation is that candidates would receive nearly identical results on all occasions. If the candidates' results are consistent across all instances, the exam is deemed reliable; thus, the degree of score consistency is a measure of the exam's reliability.

Measuring reliability of Manchester Exam :

There are various methods to estimate the reliability of an exam. The Manchester Exam consists of two main types of components: Automated Evaluation Modules and Expert Evaluation Modules.

Automated Evaluation Modules

Automated Evaluation Modules do not require human judgment for scoring. These include sections of reading comprehension, listening comprehension, and use of English. The scores for these sub-components are calculated by simply adding the total number of correct responses for each section. Reliability estimates for these sections are calculated using a statistic called Cronbach’s Alpha. The closer the Alpha value is to 1, the more reliable the exam.

Expert Evaluation Modules

Expert Evaluation Modules include the writing and speaking sections, which are typically marked by one human grader. However, a selection of responses is also marked by a second or third grader. This sample of responses marked by multiple examiners is used to estimate reliability for writing using a statistic called Gwet’s AC2, which measures inter-rater reliability.

For speaking, the Feldt Reliability Test is applied. This exam assesses reliability when the exam score is the sum of scores given by two graders or judges. It is used for the Manchester Exam speaking sections, which employ a paired format where two Oral Examiners assess the candidates' performance.

All these methods use a scale ranging between 0 and 1, similar to the Alpha used for Automated Evaluation Modules.

Overall Reliability of Manchester Exam :

Scores from the subsections of the Manchester Exam are reported on the final official exam score report. These sub-section scores are used to calculate a candidate’s overall exam score, which determines the candidate’s grade and, where relevant, their CEFR level. While measuring the reliability of each subsection is important, the reliability of the overall score is of paramount importance to candidates and exam users.

To calculate the reliability of the overall score, we use the Measurement Error Standard Deviation (MESD) from the sub-sections, along with the standard deviation of the overall scores. This approach ensures a comprehensive and accurate assessment of the exam’s reliability, providing valuable insights for all stakeholders.

Measurement Error Standard Deviation (MESD)

Measurement Error Standard Deviation (MESD) is not a separate approach to estimating reliability, but rather an alternative way of reporting it. Language testing is influenced by numerous factors unrelated to the ability being measured, which contribute to what is known as 'measurement error'. The MESD transforms reliability into the context of exam scores.

While reliability pertains to a group of exam takers, the MESD demonstrates the impact of reliability on an individual's likely score. It indicates the proximity of an exam taker’s score to their 'true score' within a specified probability. For example, if a candidate receives an overall score of 320 with an MESD of 2.5, there is a high probability that their true score lies between 316 and 324. This information is invaluable for exam users in their decision-making processes.