The Oxford International Standardized Assessments report test-takers’ achievement in three dimensions: subject (e.g. Science), domain (e.g. Physics) and subdomain (e.g. Forces, Motion and Pressure).

At subject and domain level, your results are reported in terms of standardized scores. At subdomain level, test-takers’ scores are assigned to a performance band.

Together, these results let you:

  • Compare the performance of each student, class, year group, and school to the typical performance achieved across all test-takers who completed the assessments.
  • Understand the level of proficiency each student demonstrated in each of the subdomains assessed.

Standardized Scores

A standardized score is a way to compare a student’s performance to the average scores attained by other students that completed the same assessments internationally.

In standardized scoring, a score of 100 represents the average (mean) score attained by all students in the Oxford International Standardized Assessments (OISA) benchmarking cohort. Scores above 100 represent better-than-average performance, and vice versa.

Based on a cohort average of 100.

  • Typical range: 70—130
    • Below 84: Below average
    • 85–94: Low average
    • 95–104: Average
    • 105–115: High average
    • 116+: Well above average

Standardized scoring ensures the OISA results allow a comparison of performance across subjects, even where tests in different subjects contain different numbers of marks, or where tests in one subject are harder than in another. The standardized score always represents performance relative to the whole benchmarking cohort who took the assessments.

To calculate results in OISA, test-takers’ marks are totaled, scaled (to ensure a normal distribution of results around the mean), and standardized (mapped to a distribution where the mean score is equated to a score of 100).

The Oxford International Standardized Assessments use a standard deviation of 15, such that 95% of test-takers in a subject or domain will attain a standardized score between 70 and 130:

Standardization is carried out separately for each subject and for the domains that comprise each subject. The subject mean is not simply an average of the means in each domain within the subject. 

Your results reports contain standardized scores for individual test-takers, and average standardized scores in each class/group, year group, and school. 

Performance bands 

At subdomain level, your results report a performance band indicating the degree of proficiency that test-takers have demonstrated for the given subdomain. The available bands are as follows: 

These bands reflect the proportion of available marks each test-taker scored within the subdomain. Because different subdomains have different totals of marks available, raw scores are not reported here. Standardized scores are likewise not reported at subdomain level, to ensure these results provide additional information beyond what is reported at subject and domain level: because performance bands reflect the proportion of available marks scored, they demonstrate the test-taker’s attainment relative to the demand of the test, rather than relative to the performance of other test-takers. For example, on a very easy test a test-taker could achieve a “High” result in many subdomains even while attaining a lower-than-100 standardized score (if most of the cohort received an ‘Advanced’ result across many subdomains). 

Benchmarking data 

The Oxford International Standardized Assessments are standardized through psychometric analysis. The standardization process draws on results from 9,645 OISA tests taken during our 2024-25 benchmarking phase. 

The results data set was screened to ensure only valid test attempts were included in standardization. The following exclusion criteria were applied: 

  • Test attempts from test-takers outside the appropriate age range  
  • Repeated test attempts  
  • Test attempts from test-takers who did not submit one or more mandatory tests within a subject  
  • Test attempts in English from test-takers who also sat tests in ESL, and vice versa  
  • Test attempts containing evidence of potential malpractice 

In some subjects, applying the above exclusion criteria resulted in a smaller standardization cohort, and we advise that caution continues to be used in interpreting results from this pilot phase.  

Confidence intervals 

The following standard error of measurement and test reliability (Cronbach’s alpha) values have been derived following the pilot phase: 

The standard error of measurement can be used to derive a confidence interval around a test-taker’s score, to quantify the statistical uncertainty in the results. An approximate 68% confidence interval is obtained by adding and subtracting the standard error of measurement from the student’s score. 

For example, if a test-taker achieved a standardized score of 125 in the Year 9 English assessment, an approximate confidence interval is 125 +/- 3.6 standardized score points, so we can say that there is a more than 2-to-1 chance the test-taker’s true score lies approximately between 121.4 and 128.6. 

Standardization will be repeated with expanded datasets as more tests are taken in future years. 

Posted in Results