The UK Government is Experimenting with GenAI Chatbots

Following a mixed bag of results, what’s next for GOV.UK and GenAI?

3
the palace of westminster during sunset, london
Contact CentreSpeech AnalyticsLatest News

Published: July 16, 2024

Rhys Fisher

The UK Government Digital Service (GDS) has been experimenting with generative AI (GenAI) for its GOV.UK Chat solution … with mixed results.

As part of the overall strategy of exploring how AI can be adopted to “improve the user experience of GOV.UK,” the GDS confirmed that the implementation of a GenAI-powered chatbot was the first in “a series of phased experiments.”

The organization opted to equip its bot with OpenAI’s Large Language Models (LLMs) to see if it could understand natural language queries and provide factually accurate responses.

Following the testing period, the organization sent follow-up surveys, the results of which highlighted a generally positive response to the GenAI-powered bot.

Indeed, almost 70 percent of respondents found the bot replies useful, while 65 percent were satisfied with the overall experience.

In discussing the early findings of the experiment, Christine BellamyDirector of GOV.UK – was optimistic about the potential of the tech:

We believe that there is potential for this technology to have a major, and positive, impact on how people use GOV.UK… [and] that the government has a duty to make sure it’s used responsibly, and this duty is one that we do not take lightly.

However, as well as surveying users, the organization also employed human experts to evaluate the accuracy and thoroughness of a sample of the tool’s answers.

These experts found that some queries went unanswered due to the relevant GOV.UK webpage that the bot needed to access being too long.

The GDS also admitted to experiencing some problems with what it describes as the “known issues associated with the nascent nature of this technology,” which resulted in bot responses not reaching the high level of accuracy needed for GOV.UK customers.

In addition, the organization experienced instances of hallucination, where the system provided incorrect information as fact.

Moreover, some users underestimated or ignored the risks of inaccuracy with GOV.UK Chat, due to the “credibility and duty of care” associated with the GOV.UK brand.

In response to these shortcomings, the GDS confirmed that it aiming to improve accuracy by enhancing the tool’s ability to search for relevant GOV.UK information, guiding users to ask clearer questions, and exploring ways to tailor answers better to users’ circumstances.

It is clear that this initial experiment is being used as a starting point for future GenAI incorporation, as the GDS explains:

Based on the positive outcomes and insights from this work, we’re rapidly iterating this experiment to address the issues of accuracy and reliability.

“In parallel we’re exploring other ways in which AI can help the millions of people who use GOV.UK every day.”

A Cautionary Tale

Interestingly, the release of the UK Government’s findings comes only a few months after the Mayor of New York City was forced to defend the city’s “MyCity” chatbot, following a series of significant errors.

Most notably, it advised some users to break the law by answering questions such as: “Do I have to accept tenants on rental assistance?”, and: “Are buildings required to accept section 8 vouchers?”, with a definitive “no”.

In doing so, the bot inferred that landlords don’t have to accept these tenants. However, in New York City, it is illegal for landlords to discriminate based on the source of income, except for small buildings where the landlord or their family resides.

While there will naturally be differences between the two solutions, like the NYC “MyCity” solution, GOV.UK Chat will also be tasked with providing factually accurate legal information to business owners.

This is evidenced in the below video taken from The GDS YouTube channel, where one of the examples that the GDS uses is a business owner asking the bot for information on employing staff:

While the GDS made a point of stating that it is not “moving fast and breaking things,” but is instead taking a “balanced, measured and data driven approach to this technology,” the NYC bot serves as a cautionary tale for any organization that cannot take risks when it comes to accuracy.

Indeed, the issue of GenAI and accuracy was mentioned in a LinkedIn post by Iqbal JavaidHead of CX Solution Engineering EMEA at Zoom and a member of the recently announced CX All-Stars – when discussing the GDS’ findings:

Is it too soon to go down the path of Gen AI with LLM’s for a Government where accuracy is paramount? Or should they start with a predictable chatbot journey using NLP (Natural Language Processing)?

Artificial IntelligenceCCaaSChatbotsCommunity NewsGenerative AI
Featured

Share This Post