Muhammad Shakeel, Ph.D.

Scientist

Honda Research Institute Japan Co., Ltd.

As a Scientist at Honda Research Institute Japan Co., Ltd., my research is centered on developing the next generation of automatic speech recognition (ASR) technologies. While my published work has focused on foundational models and contextual ASR, my current passion and research efforts are increasingly directed toward the complex challenges of multi-speaker ASR and speaker diarization, aiming to create systems that can robustly process real-world conversational audio.

This has involved contributing to large-scale, open speech foundation models, most notably through a collaboration with Language Technologies Institute at Carnegie Mellon University on the Open Whisper-style Speech Model (OWSM) project. Within this initiative to create transparent alternatives to proprietary models, the focus has been on architectural innovation. This included enhancing the model with E-Branchformer for better performance and developing non-autoregressive systems like OWSM-CTC to achieve significant gains in speed and robustness against model hallucination.

This work on foundational models naturally highlights the critical need for both practical applicability and computational efficiency. To address this, my research has explored several interconnected areas. To improve real-world utility, investigations into contextual ASR have yielded novel methods for recognizing rare and user-specific terminology through techniques like dynamic vocabularies, intermediate biasing losses, and a bias phrase boosted (BPB) beam search. Simultaneously, to enhance architectural robustness and flexibility, contributions were made to unified systems like the 4D ASR model, which integrates multiple decoder paradigms (CTC, Attention, RNN-T, and Mask-CTC) into a single, jointly trained framework. This theme of unification also extends to addressing deployment constraints, as seen in work on jointly optimizing streaming and non-streaming ASR. Recognizing that the utility of large models is ultimately gated by their deployability, another facet of this research has been model efficiency, demonstrated through contributions to compression techniques like joint distillation and pruning in the DPHuBERT work.

The pursuit of scientific advancement is a cumulative effort, built upon the foundational work of those who came before. As Sir Isaac Newton famously wrote, “If I have seen further, it is by standing on the shoulders of Giants.” This idea has been a guiding principle throughout my research career, which has been profoundly shaped by the mentorship and collaboration of distinguished researchers. In my current role at Honda Research Institute Japan, I am honored to be collaborating with Prof. Shinji Watanabe, whose pioneering work continues to shape the field of end-to-end speech recognition. This opportunity builds upon the excellent guidance I received during my academic journey: my doctoral studies were supervised by Prof. Kazuhiro Nakadai at the Tokyo Institute of Technology (now Institute of Science Tokyo); my master’s thesis was a collaborative effort guided by Prof. Satoshi Tadokoro of Tohoku University and Prof. Daniele Nardi of Sapienza University of Rome; and my foundational research experience, contributing to the ALICE experiment at CERN, was conducted under the supervision of Prof. Arshad Saleem Bhatti.

Honors and Awards

Aug 22, 2025	ISCA Best Student Paper Award (2025)
Dec 04, 2024	IEEE SLT Best Paper Award (2024)
Mar 22, 2019	Fully funded Ph.D. Research Fellowship (MEXT)
Sep 01, 2013	Tohoku University Exchange Programme
Jun 10, 2011	Erasmus Mundus Scholarship

News

Aug 22, 2025	A co-authored paper received the ISCA Best Student Paper Award at INTERSPEECH 2025
Aug 06, 2025	A first-authored paper, UME, has been accepted at IEEE ASRU 2025
May 19, 2025	Three co-authored papers have been accepted at INTERSPEECH 2025
Dec 04, 2024	A co-authored paper received the Best Paper Award at IEEE SLT 2024
Aug 30, 2024	A co-authored paper has been accepted at IEEE SLT 2024
Jun 04, 2024	Two papers (one first-authored) have been accepted at INTERSPEECH 2024
May 16, 2024	A co-authored paper, OWSM-CTC, has been accepted at ACL 2024 (main conference)
Feb 03, 2024	A first-authored paper has been accepted at ICASSP 2024 Satellite Workshop
Dec 13, 2023	A co-authored paper has been accepted at ICASSP 2024
Sep 22, 2023	A co-authored paper has been accepted at IEEE ASRU 2023
May 17, 2023	Three co-authored papers have been accepted at INTERSPEECH 2023
Nov 01, 2022	Joining Honda Research Institute Japan Co., Ltd. as a Scientist
Sep 20, 2022	I have successfully earned my Ph.D. (Doctor of Philosophy) in Systems and Control Engineering from the Tokyo Institute of Technology. This milestone marks the culmination of years of dedicated research and academic pursuit. I’m grateful for the support of my advisors, collaborators, and peers throughout this journey.

Selected publications

CONFERENCE ASRU Unified Model

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder

Muhammad Shakeel, Yui Sudo, Yifan Peng, Chyi-Jiunn Lin, and Shinji Watanabe

In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2025

arXiv PDF
CONFERENCE INTERSPEECH Foundation Model

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Yifan Peng, Muhammad Shakeel, Yui Sudo, William Chen, Jinchuan Tian, and 2 more authors

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (ISCA Best Student Paper Award) , Aug 2025

Awarded arXiv PDF Website

ISCA Best Student Paper Award
CONFERENCE SLT Contextualized ASR

Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe

In Proceedings of the IEEE Spoken Language Technology Workshop (SLT) (Best Paper Award) , Dec 2024

Awarded PDF

Best Paper Award
CONFERENCE INTERSPEECH Contextualized ASR

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024

PDF
CONFERENCE ACL Foundation Model

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Aug 2024

Abs arXiv PDF Code Poster Website

There has been an increasing interest in large speech models that can perform multiple tasks in a single model. Such models usually adopt an encoder-decoder or decoder-only architecture due to their popularity and good performance in many domains. However, autoregressive models can be slower during inference compared to non-autoregressive models and also have potential risks of hallucination. Though prior studies observed promising results of non-autoregressive models for certain tasks at small scales, it remains unclear if they can be scaled to speech-to-text generation in diverse languages and tasks. Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC). It is trained on 180k hours of public audio data for multilingual automatic speech recognition (ASR), speech translation (ST), and language identification (LID). Compared to encoder-decoder OWSM, our OWSM-CTC achieves competitive results on ASR and up to 24% relative improvement on ST, while it is more robust and 3 to 4 times faster for inference. OWSM-CTC also improves the long-form ASR result with 20x speed-up.We will publicly release our code, pre-trained model, and training logs to promote open science in speech foundation models.
WORKSHOP ICASSPW Unified Model

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe

In IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Apr 2024

PDF

Centered ClustrMaps