This is the method used to systematically review the psychometric properties of the instrument - either the COSMIN criteria, Terwee's criteria, EMPRO, or the custom methodology of the Oxford PROMS group.
This the setting in which the psychometrics were evaluated. This field also contains a link to the original study where the systematic ratings (showing in the diagram to the right) of the psychometric data was performed. The data given below is for indicative purposes only, and users should refer to the original source when seriously reviewing measures.
Internal consistency evaluates how individual items of the outcome measure correlate with each other. The quality criteria to assess internal consistency is Cronbach’s alpha, which reports the average of correlations between all possible halves of the scale. A high internal consistency (>0.8) suggests that many items of the measure are capturing similar aspects. Internal consistency is important if an outcome measure is used to monitor a single underlying concept with multiple items. However, if the underlying clinical phenomenon is complex, internal consistency is not so relevant or may be reported as sub-scales of the instrument.
The reliability of an outcome measure refers to whether the measure produces the same or similar results when administered in unchanged conditions. Reliability is important as it can reduce measurement error or errors that are related to the process of measurement. Providing clear definitions for the scores from an outcome measure helps to make it more reliable. Fewer points on the scale also improves reliability. It can be assessed via inter-rater reliability (whether similar results are reached when different observers are used to rate the same situation or patient) or rest-retest reliability (whether similar results are reached over two distinct periods of time in unchanged conditions).
The difference between a measured value of quantity and its true value.
Validity is one of the most important aspects of an outcome measure. It refers to what a tool is measuring and whether it is measuring what it should be measuring.
Construct structural validity
Construct validity is the extent to which a measurement corresponds to the theoretical concepts or constructs that it was designed to measure. It can be assessed via statistical evaluations of the structure of the measurement, such as factor analysis. Correlations that fit the expected pattern contribute evidence of construct validity.
Construct hypothesis testing
If no other measure or gold standard exists for comparison, the measure could be linked to a theory or hypothesis in order to show construct validity.
Construct cross cultural validity
To be able to use outcome measures with different groups to compare results between countries, outcome measures need to be translated into other languages by following a formal process and the same rigorous validation process also applies as for the original measure. Even though this is lengthy and costly, it is an important procedure to ensure accurate scores when outcome measures are used and compared.
Criterion validity refers to whether the measure correlates with another instrument that measures similar aspects. Preferably, the other instrument is the ‘gold standard’, meaning it has been validated, and is widely used and accepted in the field.
Responsiveness to change refers to whether the measure can detect clinically important changes over time that are related to the course of the disease or to an intervention, such as symptom management.
The interpretability of an outcome measure refers to whether the results (which are often a number) can be translated into something more meaningful to the patient, the family or clinician. An interpretable tool should enable a response to these questions: What is severe? What is the cut-off point when the outcome measure is used for diagnosis? How many points correlate with a symptom change?
Floor ceiling effects
Floor and ceiling effects occur when scores from an outcome measure are not discriminated below or above a certain level (meaning that they will not detect change). This can be a particular problem in conditions with a very wide range of symptoms.