GOMS

Introducing user modeling with KLM, CMN, NGOMSL, and CPM GOMS

By Karl F. MacDorman on June 1, 2022

Introduction

GOMS stands for goals, operators, methods, and selection rules. It is a family of techniques for analyzing user interfaces by modeling the user's behavior in performing familiar tasks (Hochstein, 2002). Thus, GOMS simulates expert performance.

Goals

Goals are the outcomes the user wants to achieve. High-level goals like writing a research paper break down into subgoals and sub-subgoals, down to low-level goals like deleting a word. Thus, goals have a hierarchy.

Operators

Operators are the actions the user performs. They can be perceptual actions like verifying a setting, cognitive actions like recalling a menu item, or motor actions like making a hand gesture. In GOMS, operators are atomic elements that cannot be broken down further.

The time required for the user to complete an operator is its execution time. It is estimated using a constant, a function of some parameters, or a probability distribution. For example, a pointer movement could be approximated using a constant like 1.1 s or a function of the distance and target size like Fitts' law.

Methods

Methods are the procedures the user follows to achieve a goal. They specify the sequence of operators the user performs to achieve a goal or subgoal. For example, a word could be deleted by double clicking on it and pressing delete or by using the backspace key.

Selection rules

If a goal can be achieved by multiple methods, a selection rule determines which method to use. For example, if the cursor is at the end of the word to be deleted, use the backspace; otherwise, double click on it and press delete.

Advantages

No expert data required

GOMS models make a priori predictions. Thus, they can be applied by user experience designers without the need to collect data from actual experts.

Comparing designs by execution times

GOMS models predict an expert user's task completion time, assuming error-free performance. Thus, in comparing designs, GOMS can be used to determine which design is more efficient.

Improving a design

GOMS also enables the designer to profile an interface to identify bottlenecks and determine whether a bottleneck can be eliminated by changing the design.

Functionality coverage

Given a list of user goals, GOMS can be used to verify whether methods exist to complete them.

Help systems

GOMS models support the design of help systems and tutorials because they explicitly represent expert user activity.

Descriptive, predictive, and prescriptive

A GOMS model describes how a user performs tasks, it predicts the time the user will require, and it can guide the development of interfaces, training programs, and help systems. It can be used to teach a novice how to achieve goals.

Four versions, two forms

GOMS has four main versions, expressed in either program form or sequence form.

Program Form Sequence Form
Card, Moran, and Newell (CMN) Keystroke-level model (KLM)
Natural GOMS Language (NGOMSL) Cognitive-perceptual-motor (CPM)

A GOMS model in program form is similar to a computer program. It describes how to complete a class of tasks with a class instance defined by parameter settings. Its advantage is that all procedural knowledge is visible to the user; however, the model is time-consuming to develop, and it must be executed to determine the sequence of operators.

For a GOMS model in sequence form, methods list the operator sequence for completing an instance of a class of tasks. However, sequence form can only represent those aspects of methods that can be represented by operator sequences.

KLM

KLM is the simplest version of GOMS (Card et al., 1980). The execution time of a method's operator sequence is the sum of each operator's execution time. Perceptual and cognitive times are combined in a single mental preparation operator M at the start of sub-methods. Selection rules are not used.

KLM GOMS, as originally defined, had six kinds of operators with corresponding execution times in seconds:
K – press a key or button, 0.2
P – point at a target on a display by using the mouse, 1.1
H – home hands on the keyboard or mouse, 0.4
D – draw a line segment on a grid, 0.9 number of lines + 0.16 total length in cm
M – mentally prepare to perform an action or series of related actions, 1.35
R – wait for system response, varies

For keyboard typing, the value 0.2 for K assumes an average skilled typist working at 55 words per minute (wpm). To calculate K from typing speed, use K = 11/wpm.

An alternative method of calculating the execution time for P is Fitts' law:

execution time LaTeX: = a \times b \log_2(\frac{D}{W}+1),

where D is the distance to the target, W is the target's width in the direction of movement, and a and b are empirically derived constants.

To distinguish mouse or trackpad operations from keystrokes, B may be used:
B – either press or release mouse or trackpad, 0.1
BB – click (press and release) mouse or trackpad, 0.2

It may be convenient to add motor operators for newer gesture-based interfaces:
S – two-finger scroll on a Mac, 1.1

Consider the task of playing Gabriel Fauré's Requiem in D-minor (Op. 48) on YouTube. The device is a MacBook Air, and the browser is already open to the website.

Description Operation Time (s)
Touch the trackpad. H[trackpad] 0.4
Move the pointer to the text box labeled Search. M, P[text box] 2.45 (1.35 + 1.1)
Click the text box. BB[text box] 0.2
Type faure requiem. M, 13K[textbox] 3.95 (1.35 + 13 × 0.2)
Move the pointer to the magnifying glass. P[button] 1.1
Click the magnifying glass. BB[button] 0.2
Wait for search results. R[website] 1.0
Move the pointer to the first result. M, P[image] 2.45 (1.35 + 1.1)
Click the video preview. BB[image] 0.2
Wait for the video to appear. R[website] 1.0
The video starts to play automatically. Total 13.15

According to this KLM model, it takes 13.15 s to accomplish this task.

Consider the task of replacing all instances of a four-letter word with another four-letter word in a Microsoft Word for Mac document. The device is an iMac.

Description Operation Time (s)
Reach for the mouse. H[mouse] 0.4
Move the pointer to the Edit menu. M, P[menu] 2.45 (1.35 + 1.1)
Move the pointer to Find. P[menu item] 1.1
Move the pointer to Replace. P[menu item] 1.1
Click Replace. BB[mouse] 0.2 (0.1 + 0.1)
Return hands to the keyboard. H[keyboard] 0.4
Type the word to be replaced (4 letters). M, 4K[word] 2.15 (1.35 + 4 × 0.2)
Reach for the mouse. H[mouse] 0.4
Point to the next field. P[field] 1.1
Click on the field. BB[mouse] 0.2 (0.1 + 0.1)
Return hand to keyboard. H[keyboard] 0.4
Type a new word (4 letters). M, 4K[word] 2.15 (1.35 + 4 × 0.2)
Reach for the mouse. H[mouse] 0.4
Move the pointer to Replace All button. P[replace-all] 1.1
Click button. BB[mouse] 0.2
Total 13.75

It is necessary to prepare mentally for the replace all command sequence. Therefore, M is added before moving the pointer to replace. However, M is not added before moving the pointer to replace all because this is just an argument in the same command sequence.

It takes 13.75 s to accomplish this task.

KLM GOMS is useful for rapidly comparing different user interface designs.

However, KLM's simplicity has disadvantages. It assumes that tasks have fixed execution times that do not depend on past events. It also does not allow for slips and mistakes. KLM is not suitable for analyzing abstract tasks with conditionals and decomposing goals into subgoals.

KLM has been applied to CAD/CAM software, mouse-driven text editors, space operations database systems, and workstations for directory-assistance telephone operators (John & Kieras, 1996a).

CMN

CMN is the original version of GOMS developed by Card, Moran, and Newell (1983). Its hierarchical goal and method structure are represented in a program form that can include subgoals, sub-methods, and conditional statements. Mental time is placed in verify operators at the end of sub-methods.

This example models the task of moving text while editing a manuscript. Note the use of subgoals and selection rules, both of which KLM lacks.

GOAL: EDIT-MANUSCRIPT
  GOAL: EDIT-UNIT-TASK, repeat until no more unit tasks
    GOAL: ACQUIRE UNIT-TASK
      GOAL: GET-NEXT-PAGE, if at end of manuscript page
      GOAL: GET-FROM-MANUSCRIPT
    GOAL: EXECUTE-UNIT-TASK, if a unit task was found
      GOAL: MODIFY-TEXT
        [select: GOAL: MOVE-TEXT*, if text is to be moved
          GOAL: DELETE-PHRASE, if a phrase is to be deleted
          GOAL: INSERT-WORD], if a word is to be inserted
        VERIFY-EDIT

*Expansion of the goal MOVE-TEXT.

GOAL: MOVE-TEXT
  GOAL: CUT-TEXT
    GOAL: HIGHLIGHT-TEXT
      [select**: GOAL: HIGHLIGHT-WORD
        MOVE-CURSOR-TO-WORD
        DOUBLE-CLICK-MOUSE-BUTTON
        VERIFY-HIGHLIGHT
      GOAL: HIGHLIGHT-ARBITRARY-TEXT  
        MOVE-CURSOR-TO-BEGINNING              1.10
        CLICK-MOUSE-BUTTON                    0.20
        MOVE-CURSOR-TO-END                    1.10
        SHIFT-CLICK-MOUSE-BUTTON              0.48
        VERIFY-HIGHLIGHT]
    GOAL: ISSUE-CUT-COMMAND                   1.35
      MOVE-CURSOR-TO-EDIT-MENU                1.10
      PRESS-MOUSE-BUTTON                      0.10
      MOVE-CURSOR-TO-CUT-ITEM                 1.10
      VERIFY-HIGHLIGHT                        1.35
      RELEASE-MOUSE-BUTTON
  GOAL: PASTE-TEXT
    GOAL: POSITION-CURSOR-AT-INSERTION-POINT  0.10
      MOVE-CURSOR-TO-INSERTION-POIONT         1.10
      CLICK-MOUSE-BUTTON                      0.20
      VERIFY-POSITION
    GOAL: ISSUE-PASTE-COMMAND                 1.35
      MOVE-CURSOR-TO-EDIT-MENU                1.10
      PRESS-MOUSE-BUTTON                      0.10
      MOVE-MOUSE-TO-PASTE-ITEM                1.10
      VERIFY-HIGHLIGHT                        1.35
      RELEASE-MOUSE-BUTTON                    0.10
TOTAL TIME PREDICTED (S)

CMN GOMS estimates it will take 14.38 s to move text.

CMN adds subgoals and selection rules to KLM GOMS. Since a CMN model is in program form, its execution predicts the sequence of operators in addition to their total execution time. This makes a CMN model amenable to analysis.

CMN has been applied to a CAD system for ergonomic design, Sun Microsystem's web page, and word processors (John & Kieras, 1996a).

NGOMSL

NGOMSL represents GOMS models using a natural-language notation (John & Kieras, 1996b). Because NGOMSL is an extension of CMN, its models explicitly represent the goal hierarchy.

The procedure for constructing an NGOMSL model is top-down and breadth-first. It begins by expanding the user's top-level goals into methods with subgoals. The subgoals are then expanded until the sub-methods only contain sequences of keystroke-level operators.

More importantly, methods are represented using cognitive complexity theory (CCT). CCT is based on a serial-stage human information processing (HIP) architecture. Working memory activates production rules that are carried out at a fixed rate. These production rules execute internal or external operators. Internal operators are used to update information in working memory, store and access information in long-term memory (LTM), and prepare subgoals. External operators resemble the keystroke-level operators of KLM or CMN.

Because NGOMSL explicitly models internal operators acting on working memory and long-term memory as well as external operators, it can estimate task learning times. Moreover, the demands an interface places on working memory and long-term memory can be assessed.

Below is an example of NGOMSL from John and Kieras (1996b).

Method for goal: Move text
  Step 1. Accomplish goal: Cut text.
  Step 2. Accomplish goal: Paste text.
  Step 3. Return with goal accomplished.

Method for goal: Cut text
  Step 1. Accomplish goal: Highlight text.
  Step 2. Return that the command is CUT, and
             accomplish goal: Issue a command.
  Step 3. Return with goal accomplished.

Method for goal: Paste text
  Step 1. Accomplish goal: Position cursor at insertion point.
  Step 2. Retain that the command is PASTE,
          and accomplish goal: Issue a command.

Selection rule set for goal: Highlight text
    If text-is word, then accomplish goal: Highlight word.
    If text-is arbitrary, then accomplish goal: Highlight arbitrary text.
    Return with goal accomplished.
...
Method for goal: Highlight arbitrary text
  Step 1. Determine position of beginning of text.    1.20
  Step 2. Move cursor to beginning of text.           1.10
  Step 3. Click the mouse button.                     0.20
  Step 4. Determine position of end of text.          0.00 (already known)
  Step 5. Move cursor to end of text.                 1.10
  Step 6. Shift-click mouse button.                   0.48
  Step 7. Verify that correct text is highlighted.    1.20
  Step 8. Return with goal accomplished.

This NGOMSL model predicts that the sub-method Highlight arbitrary text will take 5.28 s.

All menu commands use the same sub-method Issue a command in this example. Because this sub-method is used by both the Cut text and Paste text sub-methods, their similarity is made explicit by NGOMSL notation. The model assumes the user has already learned which command appears on which menu, so working memory only needs to store the command's name and its menu's name.

Method for Goal: Issue a command
  Step 1. Recall the command name, recall from LTM 
          its menu name, and retain the menu name.
  Step 2. Recall the menu name and move the cursor     1.10
          to it on the menu bar.
  Step 3. Press the mouse button down.                 0.10
  Step 4. Recall the command name and move the         1.10
          cursor to it.
  Step 5. Recall the command name and verify that      1.20
          it is selected.
  Step 6. Release the mouse button.                    0.10
  Step 7. Forget menu name, forget command name, 
          and return with the goal accomplished.

execution time = 1.10 + 0.10 + 1.10 + 1.20 + 0.10 = 3.6 s

The NGOMSL model predicts that the sub-method Issue a command will take 3.6 s.

The method learning time is linearly proportional to the number of statements that must be learned. It can be estimated by counting them and multiplying the number by 17 s, an empirically determined constant.

method learning time = 17 s × number of statements to be learned

17 s × 7 statements comprising Issue a command = 119 s

The LTM item learning time is the time required to learn the chunks of information required by the sub-method, such as the name of the menu Edit under which the commands Cut and Paste appear. These chunks are called LTM items because they are stored in long-term memory.

LTM item learning time = 6 s × number of LTM chunks to be learned

6 s × 2 chunks = 12 s

The pure learning time is the time required to learn how to perform the sub-method and to learn all the LTM items it uses.

pure learning time = method learning time + LTM item learning time

119 s + 12 s = 131 s

The total time required to learn the sub-method Issue a command is its total execution time plus the additional pure learning time required to learn how to perform it.

total learning time = pure learning time + execution time

131 s + 3.6 s = 134.6 s

Learning transfers from task A to task B if the NGOMSL statements used in task B have already been learned for task A. Transfer effects can be calculated by deducting the number of statements in task B's methods that are the same as previously learned statements in task A's methods.

CPM

CPM stands for cognitive–perceptual–motor because it can model these three kinds of activities (John & Kieras, 1996b). However, its main strength is that it can model them in parallel. This separates CPM from KLM, CMN, and NGOMSL, which only perform operators serially. Parallel modeling enables CPM to estimate expert task execution times more accurately than other GOMS techniques. However, it also requires operators that are more granular. A single KLM, CMN, or NGOMSL operator will typically break down into several CPM operators.

CPM uses an HIP architecture called model human processor (MHP, Card et al., 1983). MHP has an eye movement cycle of 30 ms, a cognitive cycle of 50 ms, and a perceptual cycle of a duration determined by the complexity of the stimulus (e.g., 100 ms for a binary visual signal, 290 ms for a six-letter word).

CPM could also be called the critical path method because it uses operator execution times along a critical path to estimate task execution time. CPM employs a schedule (or PERT) chart to represent operators and their dependencies. A box represents each operator, and lines represent dependencies between operators. The number directly above the box is the operator's execution time in milliseconds.

Operators are executed from left to right; thus, an operator is executed once its dependencies on the left have completed. Operators on the same row in the schedule chart use the same HIP resource and must be serialized. These resources include visual perception, cognition, the motor control of eye movements, and the motor control of the right hand.

The critical path through a schedule chart is the sequence of operators with the longest duration. It can be determined by subtotaling prior execution times.

Figure 1 shows a schedule chart that implements the goal read screen when an eye movement is required (John & Kieras, 1996b). Information cannot be perceived until the eye movement has been completed, so a line is drawn from the box eye movement to the box perceive information.

CPM-GOMS-1d.svg

Figure 1: A CPM model implements the goal read screen if the eye is not focused on the word. Directly above each operator is its execution time. The top number is the subtotal of execution times from prior dependencies. The thick lines indicate the critical path.

The cognitive operators attend to information, initiate eye movement, and verify information each take 50 ms to complete, which sums to 150 ms. However, this is not the critical path. A longer path includes the 30-ms eye movement and 290-ms perceive information—the time required to perceive a six-letter word.

The critical path is determined by subtotaling execution times prior to an operator, always using the larger subtotal. If we consider verify information, for example, the subtotal from initiate eye movement is 100 ms (50 + 50) but the subtotal from perceive information is 420 ms (130 + 290), so 420 is used. The critical path is 420 plus the 50-ms verify information operator for a total of 470 ms. We can confirm these results by summing operator execution times along the critical path:

critical path = 50 + 50 + 30 + 290 + 50 = 470 ms

In the next example, the goal move cursor to beginning of text is achieved by perceptual, cognitive, and motor operators, several of which are performed in parallel. Note that for other GOMS techniques, these nine operators would be represented by a single operator.

CPM-GOMS-2b.svg

Figure 2. A CPM model for the goal move cursor to beginning of text. The critical path, indicated by the thick lines, is a sequence of three cognitive operators, a motor operator, a perceptual operator, and another cognitive operator.

Eye movements are faster than hand movements, so although the eyes start to perceive the target cursor location before the hand has started to move the cursor, they must wait for the arrival of the cursor to finish perceiving the cursor location. Thus, the operators initiate cursor movement and move cursor are on the critical path but operators eye movement and perceive information are not.

As with the earlier example, the critical path can be determined by subtotaling the execution times from prior dependencies and, if there is more than one value, selecting the larger one. This leads to a task execution time of 780 ms, which may be confirmed by summing operator execution times along the critical path and comparing them with those summed along other paths.

critical path = 50 + 50 + 50 + 480 + 100 + 50 = 780 ms

In a telephone operator application, CPM was the only GOMS technique able to determine why a new interface was slower than the original. The reason was because it had changed the order of an operation so that it fell on the critical path.

Techniques compared

Purpose

KLM can only predict task execution times. CMN can generate a sequence of operators for any task instance and predict its task execution time. NGOMSL can also generate any possible operator sequence and predict its execution time. In addition, NGOMSL can predict learning times. CPM can predict task execution times for overlapping cognitive, perceptual, and motor operators.

Unobserved operations

Unobserved operations include perceiving information, verifying a position, or making an eye movement. KLM combines all these operations into a single 1.35-s mental preparation operator M that occurs at the beginning of an operator sequence. CMN instead uses a 1.35-s verify operator at the end of an operator sequence. NGOMSL uses determine position and verify, both of which are 1.20 s. It has no mental preparation operator. CPM represents unobserved operators in terms of a 30-ms eye movement cycle, a 50-ms cognitive cycle, and a variable perception cycle.

User expertise

KLM, CMN, and NGOMSL model the expert user performing operations in series. CPM models an expert user performing operations in parallel, executing tasks as rapidly as the MHP allows. Given that CPM models great expertise on the part of the user, task execution times will be less than those estimated by the other three models.

Expertise of the analyst

KLM is the simplest to implement, CMN and NGOMSL are somewhat comparable, and CPM is the most difficult.

Limitations

Effectiveness and satisfaction are not modeled

The usability of an interface may be evaluated on three broad dimensions: efficiency, effectiveness, and satisfaction. Efficiency includes measures like task execution time and movement accuracy. Effectiveness includes measures like completion rate or error rate. Satisfaction includes measures of the user's feelings and attitudes about the interface. GOMS focuses on the first dimension, although NGOMSL and CPM, which include a HIP architecture, could be extended to model aspects of effectiveness and satisfaction (e.g., mental workload).

Novice performance and individual and group differences are not modeled

GOMS models an expert user's error-free performance of routine tasks. Thus, GOMS cannot model the performance of novice users who commonly make errors, though NGOMSL can estimate their task learning times. Intermediate and expert users will still sometimes make errors, but GOMS does not model these varying levels of proficiency.

GOMS does not generally account for individual and group differences, though the K operator in KLM can be adjusted for typing speed. For example, age, visual acuity, and disability can all affect execution times. This is a deficiency given that universal usability is one of the aims of HCI research.

Non-goal-directed and collaborative activity are not modeled

GOMS characterizes human–computer interaction as solitary and goal directed. Its omission of group work was an impetus for the develop of a fifth GOMS variation, sociotechnical GOMS (SGOMS). Goal-directedness may not apply to certain kinds of tasks and problem-solving approaches.

References

Hochstein, Lorin. (2002). GOMS. Theories in computer–human interaction. University of Maryland, College Park.

Card, Stuart, Morn, Thomas P., & Newell, Allen. (1980). The keystroke-level model for user performance with interactive systems. Communications of the ACM, 23, 396–210.

Card, Stuart, Moran, Thomas P., & Newell, Allen. (1983). The psychology of human–computer interaction, Hillsdale, NJ: Lawrence Erlbaum Associates,

John, Bonnie, & Kieras, David E. (1996a). Using GOMS for user interface design and evaluation: Which technique? ACM Transactions on Computer–Human Interaction, 3(4), 287–319.

John, Bonnie, & Kieras, David E. (1996b). The GOMS family of user interface analysis techniques: Comparison and contrast. ACM Transactions on Computer–Human Interaction, 3(4), 320–351.