-
Demo of OWSM-V4 Medium model and OWSM-V4 CTC model
This application transcribes and translates speech into text in 151 languages. Users can upload an audio file or use their microphone to provide input. They can choose the source language, specify ...
-
How to Perform Long-Form ASR with CTC?
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech) - NeMo/tutorials/asr/Streaming_ASR.ipynb at main · NVIDIA-NeMo/NeMo
-
How to fine-tune pre-trained OWSM?
End-to-End Speech Processing Toolkit. Contribute to espnet/espnet development by creating an account on GitHub.