-
Demo of OWSM-V4 Medium model and OWSM-V4 CTC model
This application converts spoken language into text. Users can upload audio or record speech, select the language, and choose the task (like speech recognition or translation). The app will output ...
-
How to Perform Long-Form ASR with CTC?
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech) - NeMo/tutorials/asr/Streaming_ASR.ipynb at main · NVIDIA-NeMo/NeMo
-
How to fine-tune pre-trained OWSM?
End-to-End Speech Processing Toolkit. Contribute to espnet/espnet development by creating an account on GitHub.