MyMandi

Human - GenAI System

HCI RESEARCH

DURATION

6 months

MY ROLE

HCI Researcher

PROJECT CONTEXT

Research product in collaboration with 4 other HCI Researchers and under the guidance of Prof. Anirudha Joshi.

ACHIEVEMENT

Received Best Paper Honorable Mention in ACM IUI 2023. Link to research paper.

SUMMARY

Research to understand writer’s interaction and decision making processes while engaging with such systems is still emerging. We decided to shed light on writer’s cognitive processes while interacting with suggestions. We performed qualitative research to generate actionable insights. As a result, we came up with design opportunities and model to build better Human-AI suggestive system.

METHODS

Mixed-Design Qualitative Experiment, User Interviews, Concurrent & Retrospective Think - Aloud Protocols, Thematic Analysis, Brainstorming, High-Fidelity Design

Research Questions

We intensely did the literature review to understand what research has already been done in the the field of studying writer-suggestion interaction. After doing it, we came up with the following research questions to contribute towards this growing field:

Question-1

How do writers interact with inline next-phrase suggestions, and what governs these interactions?

Question-2

How do suggestions and the subsequent writer-suggestion interactions affect the writing process with suggestions as compared to the writing process without suggestions?

Question-3

How does the degree of misalignment between writers opinion and model’s bias affect writer/suggestion system interactions?

Process

Our research aims not to test or validate particular hypotheses or approaches for giving text suggestions but to collect systematic observations and construct knowledge inductively on the writer’s interactions with text suggestions using a grounded approach. Following are the three key parts of the process:

Apparatus Design

We developed two instances of suggestion systems — one fine-tuned on positive movie reviews and another on negative movie reviews.

Research Activity

We asked participants to watch 2 movies & write 2 movie reviews — one review without suggestions & other with suggestions.

Analysis

We qualitatively analysed these with the screen recordings of the writers’ writing process.

Apparatus Design

Our interface consists of a text editor capable of providing phrase and word completion suggestions both at the end of and in between text. We built three versions of the text editor, one with suggestions powered by a language model trained on an IMDB review corpus with reviews with an average rating of 2.5, the second with an average rating of 8.5, and the third without suggestions. The suggestion system takes in the last 50 words the user has written to compute the suggestion using the language model running in the back-end.

Initial Design

CONTEXT

The suggestion pops up as highlighted text near the top of the cursor when between text and in-line at the end

INITIAL PILOT

We deployed a suggestion interface where the writer had to press tab to select the whole suggestion.

FEEDBACK

We received feedback that writers often wanted to select only the first few words in the sentence and had to delete the last few words after they ‘tabbed’ the suggestion.

Revised Design

IMPROVEMENTS

1

To resolve this issue, we designed the interface to select only a single word from the phrase when a writer pressed tab.

2

To visually convey this interface behaviour, we highlighted the first word and added interpuncts (·) between words to represent this stepped approach to accepting a suggestion.

3

Through our pilot, we also tested several wait times from when the writer stops writing to when the suggestion appears. We decided on 300ms as the optimum wait time before the suggestion appeared.

Research Activity

We asked participants to watch two movies and write two movie reviews by using the above apparatus — one review without suggestions and another with suggestions. Following is the step by step process of the research activity that we followed to get extract the qualitative data.

Findings

Following were some of the findings along with what users were suggested and what did the finally wrote to back the findings:

Suggestions prompt to generate ideas

Writers often abstracted a topic/theme present in a suggestion and used it as an inspiration for new sentences. When writers did not have a proposal, instead of coming up with one from scratch, they would abstract a topic from the suggestion and use that as a prompt to generate a proposal for their upcoming sentence.

U9 was suggested “...but the writing is also pretty good." by the system. They rejected the specific opinion in the suggestion but picked up the theme: ’writing of the film’ and expressed their own opinion about it, “the writing of the film was weird”. U9 later remarked: “I saw the term writing and [thought might as well write about it]”.

Augmented the vocabulary of the writer

Writers extracted vocabulary and phrases from the suggestions and used them to express their ideas (i.e., translate their proposals.)

This was evident when we compared writers’ with-suggestion reviews with their without-suggestion reviews. We observed increased use of “classic movie-review language.” (as described by one of the writers - U5; such as "a must-see for all ..." (U5), "... is top-notch." (U7), "... I wouldn’t really recommend it" (U9) etc.

Misalignment leads to rejection

As expected, misalignments between the writer’s intention and suggestions increased the chances of writers rejecting suggestions. Misalignment became an important criterion for evaluation.

For example, U3 loved the movie (rating it 9) but was assigned a system with opposite sentiments (degree of misalignment = 6.5). U3 remarked, “Suggestions were continuously telling me it was a very bad movie... I was like no, it was not a really bad movie.”

Writer’s movie schemas controlled the acceptance of suggestions

Writers had mental schemas about what content was appropriate to put in a movie review, whether they were writing with- or without-suggestions. Suggestions going against this schema were often rejected.

A common writing schema for our participants was to not give out spoilers in a movie review.

When U3 got a suggestion saying ‘For example’, they rejected it, with the following explanation: “I was like I’ll maybe give an example but I am like no I wouldn’t want that in a movie review which I would read. I don’t want spoilers to that degree so I didn’t write.”

Writers had a schema or a rough plan for the structure of their review. They utilized it to evaluate the position of the suggested text, independent of the text preceding it.

U5 was suggested ‘this is one of the few movies that I have . . . ’ at the beginning of the review, but they had a different plan. They rejected this suggestion saying “[I want to write] a two-line plot summary of the movie. [...] that’s a good way to hook [the] audience in with a little bit of a story.”

However, as discussed above, even after rejecting a suggestion for their position, writers would ‘store the suggestion for later use’ — either in their memory or on the text editor.

Text written so far affects their decision

Writers also evaluated the suggestions for consistency and flow with respect to the text they had written so far.

When U5 described the movie’s plot in their first paragraph, the system suggested - ‘This is one of my favourite movies of all’. Although U5 had liked the movie, they declined the suggestion stating, “That [suggestion] seemed too abrupt a change [compared to the sentence I just completed].

Distractions due to suggestions

In the with-suggestions condition, constantly reading and evaluating suggestions led to higher cognitive and hindering their process of idea generation.

U8: “I feel like if I’m thinking of something as I’m trying to form that thought, and I see something, which is completely different, I lose that thought which I was going with.”)

U4: “I was trying to form the sentence. That’s when I looked up. And I saw the suggestion, and I got distracted. To be honest with you, I was kind of judging this suggestion.”

Model

Based on our findings, we propose a model that builds upon the categories and concepts proposed by Hayes and articulates the findings of our study. Following is the image of the model that we proposed:

Design

We believe future systems can leverage the above research findings and following design implications that could be helpful in building effective Human-AI suggestive systems in future:

Strategic Sampling

Strategically controlling sampling in the language model to suggest phrases that will aid writers with more ideas (what to say) and language (how to say it).

Personalizing Models

Personalizing the suggestion models to avoid generic suggestions and reduce evaluation time.

More customization

Giving users freedom to select for which cognitive process (proposer, translator & transcriber) they need suggestions, and when.

Explainable AI

Making the suggestion AI model more explainable so that users can collaborate better to get more relevant suggestions.

Seamless selection

Making the process of extracting some phrases from the suggestive text seamless.

Reflections

Learnings

  • One of the essential things I learned was how to develop various hypotheses and validate them by collecting more data.
  • I learned how to make sense of data collected from what they say in interviews and how they write in the presence of suggestions.
  • I also learned how to write a research paper and present findings.
  • Next Steps

    Applying human-centered approach to build more effective AI suggestive text.