Ghif's Research Notes

AI Weekly Reading Summary (25 Mei 2019)


1. Direct speech-to-speech translation with a sequence-to-sequence model

Source: https://arxiv.org/pdf/1904.06037.pdf

Topics: speech-to-speech translation, sequence-to-sequence model, deep learning

Models that translate between spoken languages typically take an intermediate step, i.e., source speech –> source text –> translated text –> translated speech. This paper proposes Translatotron, an end-to-end speech translation model that skips the step of converting the speech into text. This implies an advantage in terms of lower inference latency since only 1 decoding step is needed, instead of 3. Translatotron is not only translating but also reproducing the speaker’s voice on the translated speech.

Model architecture of Translatotron

Model architecture of Translatotron

Translatotron has 3 main components:

  1. Attention-based sequence-to-sequence neural net: taking spectograms as input and generates spectograms in the translated language
  2. Neural vocoder: converting the output spectograms into audio wave.
  3. Pre-trained speaker encoder: preserving the characteristic of the speaker’s voice

The experiments show the effectiveness of Translatotron, trained on a large corpus of matched spoken phrases in Spanish and English, as well as phoneme transcripts. The detailed results can be found at https://google-research.github.io/lingvo-lab/translatotron/.

In general, Translatotron slightly underperforms a baseline model with the intermediate text encoding-decoding steps in translating Spanish to English languages. But it produces more realistic audio than pervious systems and breaks new ground for the end-to-end approach.

result example


2. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Source: https://arxiv.org/pdf/1703.03400.pdf

Topics: meta learning, learning to learn

One of the hallmarks of human intelligence is to learn quickly only from a few examples and adapt seamlessly as more data become available. An attempt to manifest that characteristic into machine intelligence is through meta-learning, i.e., learning from metadata to change certain aspects of a learning algorithm or the learning method itself. The aim of meta-learing is to automate human design decisions and revisit/optimize those decisions dynamically according to experience obtained during learning.

This paper proposes Model-Agnostic Meta-Learning (MAML) algorithm that initializes the gradient-based model’s parameters through meta-learning in such a way that the model can quickly adapt to a new task using only a few datapoints and training iterations. “Model-agnostic” means that the algorithm can be directly applied to any learning algorithm trained with gradient descent – the paper shows that MAML is applicable to fully connected, convolutional, or recurrent neural nets. Furthermore, it can serve to solve a variety of tasks such as supervised classifications, regressions, and reinforcement learning.

The overall MAML algorithm is quite simple as described by the pseudo-code below.

maml algorithm maml algorithm

MAML algorithm

The intuition behind MAML is that some internal representations are general-purpose, i.e., more transferrable to other tasks than others. The MAML algorithm encourages the emergence of such representations by taking into account several tasks simultaneously in updating the model’s parameters.


3. Lessons learned turning machine learning models into real products and services

Source: https://www.oreilly.com/ideas/lessons-learned-turning-machine-learning-models-into-real-products-and-services

Topics: practical AI, business

This article highlights that there is a major difference between building machine learning models and actually deploying them into production as real, reliable, scalable, and scure products and services. The first is something that is already a common practice which has been thought in schools or data science bootcamps – any junior data scientist or developer can do it with a good set of training data and the right tools. The latter is fairly new, emerging domain, i.e., has not been well-taught in earlier phases of education, but important to guarantee successful AI applications in an organization. If it is not executed well, the end result is a bottleneck where AI models fail to be turned into revenue-generating products and services.

The author mentioned a few aspects that need careful considerations before implementing a machine learning solution: