Novice–Expert+

Measures for quantitative usability testing

Usability testing involves testing designs, prototypes, or finished products with users. Methods for usability testing divide into qualitative methods, like interviews, focus groups, and experience diaries, and quantitative methods.

Quantitative methods divide into three main groups, listed below with examples of corresponding measures:

Efficiency
- Deviation from the optimal path
- Task completion time
- Time until event
Effectiveness
- Binary task completion
- Completeness
- Error rate
- Outcome quality
- Recall
- Spatial accuracy
Satisfaction
- Scales
  - Intensity
  - Likert
  - Semantic differential
- Indices: an unweighted or weighted sum or average of scales
- Coding
  - Labeling statements during an interview as favorable, neutral, or unfavorable and then counting the number of statements under each label.
  - Counting facial expressions labeled as affectively positive, neutral, or negative.

Novice–expert ratio

As a usability measure of efficiency, the novice–expert ratio method (NEM) belongs to the first group. It is relatively granular because it measures the time a novice takes to complete each step in a task and divides it by the time it takes an expert to complete the same step.

$LaTeX: NE~ratio = \frac{novice~time }{expert~time}$

Typically, the time values are an average for a group of novices to represent novice users in general rather than a particular user. Averaging time-on-step performance across experts is less critical because expert performance is approaching human performance limits and, thus, has less variance.

The figure below shows the results of applying NEM to a car navigation system. The novice group was comprised of 30 novices, and the expert group was comprised of 4 experts. The orange line indicates the average time in seconds required by the novices to complete a step, and the blue line the time required by the experts.

Both novices and experts took the most time to choose the destination shown (step 6). However, the NE ratio was below the average for this step, indicating that this might not be the step most in need of improvement. The figure shows four other steps with above-average NE ratios (steps 7, 8, 12, and 13). Usability specialists identified all five steps as having usability issues.

Substituting simulated experts, including effectiveness and satisfaction

Employing real experts in a usability study can be expensive. For a new interface, experts may not exist. Thus, novices must be recruited and trained until they become experts.

An alternative is to simulate expert performance using GOMS. MacDorman and colleagues (2011) used 337 novices and a simulated expert to evaluate a variant on the iTap interface for text entry on a cellphone. The study found a standard measure of effect size, Hedges's ĝ, gave a marginal improvement in accuracy compared to NE ratio.

More importantly, the study found number of actions required was more strongly correlated with self-reported usability than the NE ratio. If users took more steps to complete a task, they were making more errors, which likely caused frustration. A weighted average of number of actions required and NE ratio provided a better predictor of usability than either measure alone.

Conclusion

NE ratio is a measure of efficiency—the time a novice takes to perform each step in a task divided by the time an expert takes. Number of actions taken, which captures error rate, is a measure of effectiveness. Self-reported usability is a measure of satisfaction. All three factors—efficiency, effectiveness, and satisfaction—should be included in user testing to measure usability.

The main benefit of NEM and similar methods is pinpointing areas of concern—steps performed much slower by novices than experts. Furthermore, they can be used by nonspecialists to identify specific usability problems. Quantitative evaluations like NEM do not depend on the opinion of a usability specialist, which can simplify communication with engineers, developers, and management.

References

Kurosu, M., Urokohara, H. & Sato, D. (2002). Novice expert ratio method. Usability Professionals Association: Humanizing Design. July 8–12, 2002, Orlando, Florida, USA. https://www.ueyesdesign.co.jp/file/paper/upa2002.pdf

MacDorman, K. F., Whalen, T. J., Ho, C.-C., & Patel, H. (2011). An improved scale for measuring usability from novice and expert performance. International Journal of Human-Computer Interaction, 27(3), 1–23. doi: 10.1080/10447318.2011.540472

Urokohara, H., Tanaka, K., Furuta, K., Honda, M., & Kurosu, M. (2000). NEM: "Novice expert ratio method" A usability evaluation method to generate a new performance measure. In Proceedings of the ACM/SIGCHI Conference on Human Factors in Computing Systems (pp. 185–186). New York: ACM Press.