Testing

We’ve been working on making wireframes for a banking app since the beginning of Q2. To test our wireframes last quarter, we went out into the community to do the Think Aloud Protocol. This method of user testing has us present people we’ve just met with a prototype of our wireframes, and talk out loud while they complete a task. They simply tell us what they’re doing as they’re doing it, and we use what we learn to gain insight into the problems with the prototype.

Over the last few weeks, we’ve been using new methods: Cognitive Walkthrough and Heuristic Evaluation. Both methods rely on designers or experts looking at a design–in this case the banking app wireframes–to find instances where the design does not match an expectation of usability.

Cognitive Walkthrough focuses on learnability: how will a first time user learn the app? To determine this, I looked at a flow (a series of screens that represent a task from beginning to end) with a few classmates. The method is simple, you ask a series of questions and note any issues:

  1. Will the user try to achieve the right effect?
  2. Will the user notice that the correct action is available?
  3. Will the user associate the correct action with the effect that user is trying to achieve?
  4. If the correct action is performed, will the user see that progress is being made towards their goal?

This was a lot of words to remember. I focused on:

  1. Intent (Is it clear what the intent of this flow or screen is?)
  2. Visibility (Do I see the option I need?)
  3. Association (Do I associate the thing I need to do with the options available?)
  4. Feedback (Can I tell what’s happening? Where am I?)

Heuristic Evaluation was the second method we learned. This time, my classmates and I looked at individual screens to see how they compared to a series of heuristics (accepted techniques):

  1. Visibility of system status
  2. Match between system and the real world
  3. User control and freedom
  4. Consistency and standards
  5. Error prevention
  6. Recognition rather than recall
  7. Flexibility and efficiency of use
  8. Aesthetic and minimalist design
  9. Help users recognize, diagnose and recover from errors
  10. Help and documentation

Distilling these down, I recognized that these were techniques to make sure experienced users could understand and predict how the app worked. Do the users have control? Are the screens and actions consistent? Is the system efficient and allowing for the user to work efficiently?

Through the Heuristic Evaluation, I came to the realization that each individual heuristic is not equally important on each screen. For my budgeting flow, I was trying to fit to standards, visibility, and minimal language at the expense of recognition, and documentation. What does this thing do?

budget-08 budget-07

 

My last iteration relied on a curious user who wanted to click around and see what this feature did. That could be a pretty risky thing to do in a banking app. I for one, would like to understand the ramifications of my button-clicking before I click. If I set this saving goal, am I locked out of this money? Where does it go?

I’m early in the process of reworking my budgeting feature to give users more actionable information. That means this feature will have more instruction available than other features on the app, but most other features on the app rely on one-time user actions. Depositing a check, for example, only happens when you a get to deposit. It’s easier for a user to work through or start over if they’re working on something short. There’s nothing to monitor or keep up with once the check has cleared. Budgeting is a long-term interaction that requires users to actively keep up with their budgeting goals.

You may notice how different this Accounts screen looks from the one above. For one thing, it says “Accounts,” for another, it abandoned the Hamburger or side menu in favor of a tab bar at the bottom. I never had a problem during Think Aloud testing with anyone using the menu. What I realized when going through these new methods was that part of reason I was not having an issue is that I was using the navigation bar at the top in a way that is entirely inconsistent with the way it actually works in apps. The hamburger menu (those three lines in the top left corner) is not always visible. It gets replaced by a back button when a user moves into a flow. So, it was easier to use because it was always visible. What’s easier to use than an always visible hamburger? Always visible actions. The new tab bar is easier because it shows users the all of the actions they can take within the app. I became very aware of the inconsistencies in the app, and of how that was effecting the association the user would have with an action. In the screens below, it’s hard to see what the correct action is. On the right, we’ve got two different types of buttons that essentially do the same thing. If I want to create an alert about my balance, my eye might be drawn to “Balance on Checking…” rather than clicking on that small dark box that will actually take you to the correct screen.
settings-01

New draft, based on new findings from the evaluation:
settings-10

In reflecting on my experience using all three evaluation methods, I keep coming back to a mantra of writer and humorist John Hodgman: “Specificity is the soul of narrative.” I found that watching potential users test the app was great at letting me know that there was something weird about… something on a page. User testing let me outline what wasn’t working, but it was shadowy, imprecise, and in need of fair amounts of interpretation on my end. I had to come back to the wireframes and decide what to change and why. Cognitive Walkthrough and Heuristic Evaluation both garnered far more actionable feedback for me. They gave me specific problems, and told me why they were problems. I was able to see why people were getting frustrated with any particular screen, and have more of an internal map available for making and backing up design decisions.

One part of both Cognitive Walkthrough and Heuristic Evaluation felt weak to me: rating the severity and frequency of problems. I felt like I was guessing each time, and noticed my guesses getting less severe the more screens I looked at. In the future, I would like to substantiate the severity and frequency of problems with user testing. I think a large part of the shadowy feedback I was finding from users could be reinterpreted through the lens of “importance.” How irritating or limiting is this issue, and how often does it happen?

I’m excited to use both together. Learning these new methods, I feel like I have a ruler where before I had a stick. These new methods can give me a baseline of predictability and usability that can strengthen my work and my understanding of it, and Think Aloud testing can push my ideas into something more natural.

Evaluation PDF

System with problem areas highlighted