Bits & Pieces

My collection of research puzzles

Demo of OWSM-V4 Medium model and OWSM-V4 CTC model

This application transcribes and translates speech into text in 151 languages. Users can upload an audio file or use their microphone to provide input. They can choose the source language, specify ...

1 min read · August 28, 2025 · OWSM v4 Demo

2025
How to Perform Long-Form ASR with CTC?

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech) - NeMo/tutorials/asr/Streaming_ASR.ipynb at main · NVIDIA-NeMo/NeMo

1 min read · February 01, 2025 · NVIDIA NeMo

2025
How to fine-tune pre-trained OWSM?

End-to-End Speech Processing Toolkit. Contribute to espnet/espnet development by creating an account on GitHub.

5 min read · September 03, 2024 · ESPnet OWSM

2024