Alexa device.
Technology at the service of teaching. Photo: Finn — Pixabay.

Learning English with Alexa

A new learning experience that will favor the Spanish-speaking population in the United States.


50 Years of Community Advoca

November 8th, 2023

Helping Those in Need

September 29th, 2023

Closer to Homeownership

September 28th, 2023

Hispanic Leaders Meeting

September 28th, 2023


September 27th, 2023

Leading U.S. Economy

September 27th, 2023

Lifting Diverse Businesses

September 26th, 2023

SBA Announcement

September 20th, 2023


Earlier this year, Alexa launched a new experience in Spain to help Spanish speakers learn English from scratch.

Developed in collaboration with Vaughn, the leading English learning provider in Spain, the immersive English program focuses on optimizing users' pronunciation.

Daniel Zhang, an applied scientist with Alexa AI, announced via his Amazon Science blog:

We are now expanding this offering to Mexico and the Spanish-speaking population of the United States, and we plan to add more languages in the future. This learning experience includes structured lessons in vocabulary, grammar, expression and pronunciation, with practical exercises and tests.

Program Strengths

Zhang points out that the most important thing about this new experience is its pronunciation function, which provides accurate information whenever a customer mispronounces a word or phrase.

“At this year's International Conference on Acoustics, Speech, and Signal Processing (ICASSP), we presented a paper describing our innovative method of mispronunciation detection,” said Zhang.

The method implemented by Alexa uses a novel recurrent neural network (RNN-T) phonetic model that predicts phonemes, the smallest units of speech, from the learner's pronunciation.

In this way, the model can provide a detailed assessment of pronunciation at the word, syllable or phoneme level, meaning that if a learner mispronounces, for example, the word "rabbit" as “rabid,” the model will detect the part that is mispronounced and will compare it with the correct sequence.

To try it out, set your device's language to Spanish and tell Alexa “I want to learn English.”

The multilingual model can be used to assess pronunciation in many languages. Photo: Heiko — Pixabay.
The multilingual model can be used to assess pronunciation in many languages. Photo: Heiko — Pixabay.

Knowledge Gaps

Zhang highlights what he calls two knowledge gaps that have not been addressed in previous pronunciation models.

By designing a multilingual pronunciation lexicon and creating a vast mixed phonetic dataset for the learning program, the program has the ability to distinguish similar phonemes in different languages, for example, the "r" rolled in Spanish vs. the "r" in English.

“The other knowledge gap is the ability to learn unique mispronunciation patterns from language learners, taking advantage of the autoregressiveness of the RNN-T model, that is, the dependence of its results on previous inputs and outputs. This context awareness means that the model can capture frequent patterns of mispronunciation from the training data,” explains Zhang.

The Science Behind the Experience

Noting that it is usual to use a sequence-to-sequence model in grammatical error correction tasks, in this case the direction of the task was reversed by training the model to mispronounce words instead of correcting pronunciation errors.

In addition, to further enrich and diversify the generated L2 phoneme sequences, a preference-aware diversified decoding component was proposed that combines a diversified bundle search with preference loss that leans towards mispronunciations similar to those humans.

In short, the phoneme paraphraser can generate realistic L2 phonemes for speakers of a specific location, for example, phonemes that represent a native Spanish speaker speaking English.

“We are currently studying various methods to improve our pronunciation evaluation feature. One of them is the creation of a multilingual model that can be used to assess pronunciation in many languages. We are also expanding the model to diagnose more mispronunciation features, such as tone and lexical stress,” added Zhang.


  • Join the discussion! Leave a comment.

  • or
  • to comment.

  • Join the discussion! Leave a comment.

  • or
  • to comment.
00:00 / 00:00
Ads destiny link