Alexa, Siri, and Friends: The Voice UI and Machine Learning Landscape

Christiann MacAuley | Senior UX Strategist

July 11, 2017

If you tend to start your day with “Hey, Siri!” or “Ok, Google” then you are one of the 55% of frequent voice assistant users.

Voice-based user interfaces are rapidly evolving and changing how people use software. If you aren’t entirely clear on what Voice UI is, it is simply any technology we use to talk to a computer, a smartphone, or another device, like an Amazon Echo.

In this post I’ll share how machine learning is driving the rapid growth of Voice UI, and some opportunities that are emerging in the voice UI and natural language processing space.

What is going on with Voice UI?

A lot! People are using voice UI more and more, mostly because it’s become really fast and easy to use. Voice input is now 3 times faster than typing on a phone. A year ago, Google announced that 20% of mobile searches were made by voice, and that number is rising quickly.

And while voice UI seems convenient while you're driving a car or walking outside, 43% of voice UI users said they primarily use it at home. It seems the Amazon Echo is behind some of that trend.

amazon echo on top of marble kitchen counter with cereal bowl and blueberries

More than 11 million Echos have been sold since late 2014. In that time, Echo has grown from being a smart voice-controlled speaker to performing more complex tasks like ordering takeout or sending messages to friends. And it now has a marketplace of more than 10,000 apps. At this point, Echo has grown past being an audio-only device. Now it’s available in smartphones, tablets, TVs, and the Echo Show, which is a freestanding touchscreen version of the Echo.

Google’s competitor to Echo, the “Google Home”, has only been out for six months. But its underlying software, Google Assistant, is now on all up-to-date Android phones and is available for iOS. Although Amazon got the playing field first and has a more mature line of hardware, Google has a great advantage via its 2 billion active Android users.

Google Assistant extends the power of a conversational voice UI on your phone in various ways. It can read what’s currently on your screen or use your camera to help you understand what you’re looking at. It also steps back from being solely a voice UI by also allowing you to type instead of speak, like when when you’re somewhere public or loud.

Apple and Microsoft are also in the game. They both have voice assistants on their platforms -- Siri and Cortana, respectively -- and Apple has just announced the HomePod, a direct competitor to Echo and Google Home. They’re still less powerful than what Amazon and Google have to offer. But Apple and Microsoft both have a huge presence in the market, and their voice UIs will continue to improve.

Facebook hasn’t launched much yet, but voice UI now appears in some Facebook chatbots. These chatbots are generally run by brands like Domino's and Fandango, and they enable you to do things like order pizza or buy movie tickets via Facebook Messenger. Chat features are expected to be majorly disrupted by voice UI in the next year. And a major voice UI launch is expected from Facebook at some point this year.

Machine Learning

I mentioned that voice input has become faster than typing -- that improvement is developing very fast. The error rate of Google’s speech recognition dropped from 8.5% to below 5% in less than a year. That means Google gets 19 out of 20 words right when it’s listening to you talk.

So what’s driving this rapid improvement?

Source: Google I/O Keynote, March 17, 2017

It’s the transformative power of machine learning. Machine learning can seem like magic from an outsider’s perspective, but it’s a type of Artificial Intelligence (AI) that revolves around teaching computers to solve problems -- instead of explicitly programming them like we normally do.

Machine learning isn’t new, but we didn’t have computers powerful enough to solve big problems -- like understanding natural human speech -- until recently. In the same way that voice UI has become a more typical way for people to interact with computers, the magic of machine learning is becoming more commonplace in our everyday lives.

Some of these examples are more Orwellian than others. There’s the friendly machine learning we’ve been talking about, where computers have learned to speak and understand our voices. But more pervasive -- and unsettling -- in our everyday lives are uncanny recommendations on the web, like the ads we see everywhere recommending the things we were already thinking about buying, or the videos we were already thinking about watching. These recommendations are the result of our browsing behavior being analyzed and profiled to the ‘nth degree by artificial intelligence.

In another alarming example, the vast majority of stock trading is now controlled by machine learning, which is better able to analyze markets and determine trends than humans alone.

A more benevolent example on the web is Google News. Google doesn’t personally find news stories all over the internet and group them by topic -- this is categorization is powered by machine learning.

Machine learning is also being used for systems optimization by a number of engineers. Learning algorithms can analyze a website’s load time and performance and automatically optimize how content is served. They can also find patterns in system logs to predict issues and outages before they occur. Machine learning is even used to improve machine learning. Engineers use algorithms to test and optimize the intelligent models they produce.

What’s next?

Voice UI and machine learning are already game changers for the way people experience digital life. Not only will people increasingly expect seamless transactions through “conversational controls”, But they will expect software to anticipate their needs. If you look far enough ahead, a seamless, unified experience will eventually make individual websites a thing of the past.

That may not be happening tomorrow, but there are good reasons to think more like data scientists today and begin to understand which patterns fit with known solutions. The same way we know that if we need to put a nail in a wall we can use a hammer … or, as app designers, if we need a Yes Or No question answered by a user we can use a checkbox.

So, in machine learning, there are some easy problems that can be addressed with learning algorithms. A few examples are:

Analyzing tagged content to make predictions about future content, like looking at spam to figure out what else is spam (classification).
Clustering similar data together, like grouping users based on what content they viewed (clustering).
Discovering trends, like “people who buy toothpaste tend to buy dental floss” (association).

Starting to think about what kinds of problems we can solve with machine learning will help us design more innovative solutions.

Of course, we’re living in an awkward moment for these emerging technologies. They are not fully developed, but with their rapid evolution voice UI and machine learning will soon become ubiquitous. Designing next-level software means incorporating this tech. It will become baseline for software design in just a few years. So now is a good time to embrace what’s next and let machine learning empower us rather than make us obsolete. Read more about voice technology in "Why Invest in Voice."

Recommended Next

Voice Technology

Understanding Voice Technology and How it Can Impact Healthcare

Voice Technology

Is Your SEO Strategy Optimized for Voice Technology?

Voice Technology

The Future Is Now & It’s (Mostly) Accessible