Tempted to Use AI for Your Enterprise’s Transcription? Think Again

How many words out of 100 does Automatic Speech Recognition (ASR) software typically get wrong when confronted with non-native speakers of English?

The answer may surprise you …

It can be anywhere from 16.6 to a whopping 58.87, according to a recently published study in the academic journal Artificial Intelligence Applications and Innovations.

For the study, researchers compared the word error rate (WER) of IBM’s Speech to Text service, Google’s closed-source speech recognition technology, and the open source framework Wit. To test the services, they had three non-native English speakers – two women (speakers A and B) and one man (speaker C) – record 20 sentences in English. Of the ASR tools, Google performed the best, having an error rate of 16.6% rate for speaker A, for instance. By comparison, Wit got nearly 26% of the words wrong for that same speaker, and IBM had an error rate of more than 30%. The worst result was Wit’s translation of speaker C into text, which was nearly 59% incorrect.

If your enterprise is considering using voice-recognition software to document client meetings or advisors’ dictated notes, this study should give you pause. Although this study was done on non-native English speakers, anyone who has struggled with tools such as Siri and Alexa knows that artificial intelligence (AI) doesn’t always correctly understand what you are saying, even if you are a native-English speaker. That’s especially true if you have a regional accent, as I do, or if you are using less-common words, phrases, or abbreviations. After all, sometimes financial terminology can seem like a whole new language!

I’ve explored in earlier blogs and Copytalk’s May 21 special webinar with Ben Marzouk, counsel at Eversheds Sutherland, how good documentation can help enterprises contend with the Securities and Exchange Commission’s Regulation Best Interest (Reg BI). For instance, if Reg BI compliance issues arise, transcriptions of meetings with a client and the advisor’s own dictated notes can be crucial to proving that the advisor was indeed acting in the client’s best interest.

But what if the client is a non-native speaker or a native speaker with a strong accent, and the ASR didn’t understand much of what he or she said, resulting in a garbled translation to text? This garbled translation could lessen the value of having the documentation at all.

Since the ASR may not understand some nuance, accents, or slang language, some enterprises may use a hybrid approach, having their own employees check the translations for accuracy. Yet this only creates more work for professionals at the firm. Furthermore, there is also the chance that some mistakes that the ASR software makes will slip under the radar when checked for accuracy, so the documentation itself would preserve those errors.

That’s why at Copytalk, we use humans – live, U.S.-facilities-based transcriptionists fluent in financial terminology – to transcribe every word. Everything from meetings, conferences, impromptu conversations, internal meetings, and more can be easily captured. Then, within a few hours, the resulting transcriptions will be delivered via email, secure download, or automatic integration into many popular customer relationship management systems (CRMs).

AI has the capacity to change the financial services industry in marvelous ways, but there are some areas where you need a human touch. Transcription is one of them.

For special enterprise pricing, email enterpriseconcierge@copytalk.com!
Contact our Sales Team at 1-866-267-9825 Option 2.