Meta | callieriek

Nova, an AI-powered smart assistant for data center engineers.

My role: As a product designer, I developed design principles for the system, designed the final documentation review flow and interface, and led user testing sessions.

Team: 2 designers, 2 researchers, 1 product manager

Timeline: 8 months

Client: Meta

Project Overview

Problem

Humans are a crucial part of the data center ecosystem. Their knowledge is getting lost.

This is an issue because of:

Cost: Lost knowledge means it take engineers longer to solve issues, making failures more expensive.

Scalability: Lost knowledge makes it more difficult to scale data centers to meet customer demand.

Opportunity

How can we facilitate knowledge sharing among engineers in a way that allows them to learn from each other and perform at a higher level?

Process at a glance

Solution

Meet Nova, an LLM-powered, multi-modal smart assistant.

Using Nova is like sitting next to a helpful coworker. The system can provide tutorials, answer questions, and document an engineer's work all within Meta's existing ticketing system.

Nova is...

Research

Methods

We conducted mixed methods research to uncover the challenges, constraints, and goals of data center engineers.

Some of these methods included:

Activity-based design workshops with stakeholders (leaders in the data center space) to better understand the knowledge sharing behaviors of engineers.

Interviews and directed storytelling with data center engineers to learn about pain points, sentiments, and mental models related to knowledge sharing.

Product walkthroughs of various documentation tools used at Meta to better understand the current state and identify gaps.

Site visits to other data centers to familiarize ourselves with the environment and understand situational factors such as noise, physical space, and other human factors.

Conducting a directed storytelling session with engineers.

Visting H5 data center in Pittsburgh.

Insights

Why is knowledge sharing so difficult?

1. Most valuable knowledge in data centers resides in informal systems like messages and in-person conversations. Informal systems are convenient to add to, but difficult search for and reference.

Formal systems, such as tutorials, are easier to access and search, but require a lot more work to add to. Therefore, a much smaller portion of knowledge resides in formal channels.

2. Even when captured, information is difficult to retrieve because it is spread out across many platforms, both formal and informal. This means even when knowledge is captured, nobody is using it.

3. Engineers are actually incentivized to skip documentation due to time pressure and the high amount of friction in the formal documentation process.

Engineers told us that they will always prioritize completing tasks over documenting them.

Design opportunity

What if there was an easy-to-use, centralized platform for engineers to capture and retrieve information?

Design Rationale

Exploration

We rapidly prototyped many different ideas to determine what form the solution should take.

These included dramatically different solutions like policy interventions, conversational user interfaces, and an AI chatbot.

This exploration also forced me to reevaluate my assumptions.

For example, an AR headset seemed like an initially promising solution, but testing revealed that users found it more distracting than helpful, and discussions with stakeholders led my team and I to conclude that the financial investment would not be worth the benefits.

Sketch of what an engineer might look like using AR glasses in a data center

A participant tries out our rudimentary AR glasses prototype

Developing Design Principles

Findings from our prototypes helped us develop a set of design principles to guide our ultimate solution.

Design Principle #1: The system must be flexible

Conducting research in analogous domains like healthcare showed us that multi-modal input methods greatly reduce the amount of friction in documentation.

Our research with users also told us there may be times when data center engineers need to use both hands for their work, or times when they would prefer to see instructions on a screen. A seamless transition between voice interface and screen was key.

Design Principle #2: The information provided must be digestible & relevant

Through user testing we learned that participants completed tasks on average 16% faster when presented with a shorter set of instructions because it was easier for them to understand. In other words, shorter content is more digestible.

Additionally, we found that presenting users with relevant, actionable information at the right time enabled them to act more quickly. It also built trust between the user and the system.

Design Principle #3: The system must have as little friction as possible

In order to reduce friction, we needed to find a way for the system to take on most of the work related to documenting.

After speaking with AI experts at Carnegie Mellon, we identified LLMs as a potential solution because of their ability to capture large amounts of information and then summarize it into distinct steps. Automatic capture and summary of information removes much of the friction from the documentation process.

We verified this assumption in user testing, where we found it was nearly 5x faster for participants to document with the help of ChatGPT than from scratch.

Prototypes

We wanted to create a system that felt like sitting next to helpful co-worker who provides tips and takes notes as you complete a task. We decided a multi-modal smart assistant integrated into Meta's ticketing system was the best way to achieve this.

The system is powered by a LLM which synthesizes information from many different platforms into concise, actionable steps.

Because it is constantly taking in new information, the smart assistant can evolve in real time with the existing knowledge pool.

A simplified sketch of how the systems sources and summarizes information

We imagined a flexible system that can accomodate engineers when they work at their desk or are performing hands-on tasks in the data hall.

Iteration 1: The basics of a smart assistant built into the ticketing system

Iteration 2: Adding structure to documentation

After initial testing, I realized there were some gaps in the documentation.

In order to provide better error handling and create a more thorough documentation, we decided to introduce a final review and closing questions. This is an opportunity for engineers to refine documentation. They must complete the review and answer a few final questions before closing the ticket.

Documentation review:

Closing questions:

Final design

Meet Nova

When engineers open a new ticket, Nova automatically surfaces relevant documentation and offers troubleshooting methods.

How Nova Supports Design Principles

Flexibility: In the final version, a dynamic user input panel indicates to users that they can select voice or text interaction based on their environment/ workflow.

Before:

After:

Digestibility & Relevancy: In addition to surfacing relevant documentation and troubleshooting methods upon opening, we also revised the current step display to be clearer and more actionable.

Before:

After:

Lack of friction: I redesigned the final documentation review flow to highlight system-recommended changes, further reducing the cognitive load required to create documentation.

In order to build trust with users, I redesigned the closing questions to offer more transparency into the length of this section of documentation.

Before:

After: Questions are presented all at once to users can see length; progress bar offers additional transparency into the review process.

Impact & Final Thoughts

Impact

In user studies we determined that Nova can reduce task resolution time by 16%.

Less time spent searching through documentation and consulting others leads to significant time and cost savings.

Nova also brings value to users on the individual level by alleviating cognitive load, reducing fatigue, and supporting self-directed learning and personal growth.

Final Thoughts

This project truly forced me to let go of my assumptions and practice user-centered design. There were many moments when I thought I knew what the solution should be, only to hear users say something different.

One thing I feel that we were not able to address was user motivation. If I were to continue working on this, I the first thing I would do is conduct more user testing to understand how to better incentivize engineers to document their knowledge through Nova, and how the system could provide greater value to them at the individual level.

my team and I at Meta's headquarters in California, where we presented our work to clients, stakeholders, and users.

Figure

Odyssey