Case Study Image

LLM UX Evaluation Rubric

At a large tech company leveraging numerous LLMs for business solutions, our UX organization faced the challenge of evaluating these applications from a user-centered perspective. I designed a scalable evaluation rubric, collaborating across multiple teams and product areas, and effectively socializing the framework. Additionally, it involved educating teams on how to implement the rubric in their respective domains.

1. So many LLMs, so little time

How do you respond to such a quick, sudden proliferation of a specific technology? And how do you do so responsibily? This journey started with multiple teams launching LLM-enabled apps without working with UX.

2. Aligning all the people

First, I began connecting across UX disciplines, including UXR, UXD, conversation design, accessibility, and product management.


This involved explaining the context, requesting their expertise and input, and getting feedback from over 5 teams.

3. Getting feedback

It was critical that we involved the key stakeholders and teams. To identify these teams, I socialized the project documents via our internal chat system, email groups, and in select standups. Individuals were assigned based on their specialty and were given 2 weeks to complete their sections.

4. The impact we made

Finally, after getting input and feedback, we were able to deliver a scalable UX rubric to be used by internal teams. Some of the criteria includes the unique LLM value-add, voice/tone, accuracy, and more.

5. If I could do it again...

The conceptual model of the rubric was eventually integrated into a company-wide evaluation/UAT tool. At the beginning, this was not the plan - we were going to share the "raw" spreadsheet with teams. However, during preliminary testing of the rubric, we found that the number of criteria was overwhelming.


Integrating the rubric content into the UAT tool improved its usability while maintaining the depth of our team's work. If I did it again, I would assume even the best spreadsheet is not the easiest to understand and use.