Speech Rec…Or Speech Wreck?
WE HAVE BEEN IMPLEMENTING AND TWEAKING SPEECH RECOGNITION SYSTEMS ON A PRETTY REGULAR BASIS FOR THE PAST 10 YEARS. Many contact centers (especially the mid and larger sized ones) employ speech rec in IVR systems, and it is mainstream enough that today’s customers expect it when routing themselves through our menus. Seems like a good time to take stock of where we are and where we may be going with this technology.
“Some contact centers, with proper design and monitoring, can make natural language worthwhile for customers.”
I know what you’re thinking. You saw that little “speech wreck” homophone in the title and you are making an educated guess that I am not a fan of this technology. I will plead guilty to that, but only to a point. There are elements and applications of the technology that have been quite useful in our efforts to improve customer satisfaction. Let’s start there before the “wreck” takes over.
The Essence of Speech “Rec”
The term speech rec is defined as the translation of spoken words into text, and that basic definition covers a lot of ground. In a contact center, everything from “press or say 1” to the well-marketed persona that greets you by asking you to “tell me, in your own words, why you are calling” (a true natural language approach) meets the definition. In such a large space, there are certainly some bright spots:
- Press or say: Yes, it is the most basic way to use speech rec and is far from sexy. It is also, without question, the most efficient way to explain to callers that they have an option, and is exceptionally accurate. Today’s speech rec systems rarely misinterpret numbers and yes/no commands.
- Supporting mobile: The percent of callers contacting us via mobile phones increases daily. Being able to speak commands rather than look down at a small virtual keypad is a great advantage when we are in the midst of multitasking.
- Skipping layers: When a natural language “tell me why you are calling” greeting elicits a response that bypasses three or more layers of a traditional IVR menu, everyone comes out a winner. Speed matters when routing calls, and quicker movements increase both self-service utilization and customer satisfaction scores.
That last point about skipping layers is a big one. This is how natural language speech recognition systems can distinguish themselves from the traditional touchtone menu. When it works properly and a customer can skip over numerous menus, the promise of the technology is fulfilled.
The Essence of Speech “Wreck”
Yet, for all its promise, the average speech
recognition enabled IVR system today is underperforming, due to a variety of reasons:
- Privacy: When in a crowded room, some customers may be hesitant to speak rather than use touchtone input. This is especially true when authenticating an account with something like a social security number, account number or a birthdate.
- Accuracy: The number of validations (“I think you said ‘claims,’ is that correct?”) and repeat requests (“I’m sorry, I did not understand that”) is simply way too high at most contact centers. Yes, things like background noise and accents play into that, but more often than not, the problem is a lack of monitoring. Speech recognition systems need regular updates to libraries and synonyms, and that means constant monitoring and reporting on translation problems. Without this effort, the high translation rates needed for success are simply impossible to achieve. This matters because one of the most effective ways to decrease customer satisfaction scores is to make customers repeat themselves.
- Directed phrase: Few contact centers implement natural language right off the bat. Typically, they will use directed phrase as a starting point. A command in a directed phrase environment may sound something like, “For account balances, say ‘account balance’ or press 1.” This is easier to implement than natural language, and there are some who think it helps to “train” customers to speak to your IVR. Unfortunately, all directed phrase does is bring in the disadvantage of a touchtone or “press and say” design (which requires a menu structure) with the disadvantage of natural language (with its translation errors). It’s a loselose proposition, and because it is so difficult to migrate to natural language, many contact centers never get past this design.
It is easy to point to sales data of voice recognition systems in contact centers as proof that it is a success. A better barometer, though, would be how customers respond when given the option to use voice or touchtone. In my work with contact centers that offer this option, customers typically choose voice 50% of the time or less. That’s high enough to suggest it is providing value, but is nowhere near a ringing endorsement of the technology.
More Rec and Less Wreck
Speech recognition is not going away, nor should it. The “press or say” design is an improvement over traditional touchtone-only menus in any environment. Some contact centers, with proper design and monitoring, can make natural language worthwhile for customers. Deciding where you fit lies in recognizing that natural language is not right for everyone. To determine if you are a good candidate, answer these four questions:
- Do many of your customers navigate three or more menus before reaching their destination?
- Is your product or service simple enough to understand so that customers can clearly articulate their needs?
- Have you found a great vendor that will steer you in the right direction and continually improve voice rec performance?
- Are you willing to spend the time, effort and dollars needed for a smooth implementation and continual fine-tuning?
The answer you are looking for is “yes,” and you need four yeses to give natural language a go. Otherwise, take the “press or say” approach and focus your efforts on clean, efficient menu design.
Some centers, with proper design and monitoring, can make natural language worthwhile for customers. To determine if you are a good candidate, answer the four questions in the decision tree. The answer you are looking for is “yes,” and you need four yeses to give natural language a go.
Safely Reaching Your Customer Satisfaction Destination
Following the pack can work sometimes, but it is a dangerous strategy when applied to speech recognition. There is too much at stake from a customer satisfaction perspective to leave your design to someone else. This is a project you will need to immerse yourself in… or else you may find yourself in the middle of a wreck.
Jay Minnucci is Founder and President of the independent consulting firm Service Agility.
– Reprinted with permission from Contact Center Pipeline, http://www.contactcenterpipeline.com
You must be logged in to post a comment.