arxiv:2202.12163

Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

Published on Feb 24, 2022

Authors:

Abstract

A novel language identification system using conformer layers and attentive temporal pooling achieves high accuracy with ___domain adaptation techniques and outperforms LSTM and transformer models.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, we investigate two ___domain adaptation approaches to allow adapting an existing language identification model without retraining the model parameters for a new ___domain. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-based models significantly outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and ___domain adaptation improve model accuracy.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2202.12163

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2202.12163 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.