Muhammad Shakeel, Ph.D.

Honda Research Institute Japan Co., Ltd.

profile_pic.jpeg

Scientist

Honda Research Institute Japan Co., Ltd.

As a Scientist at Honda Research Institute Japan Co., Ltd., my research is centered on developing the next generation of automatic speech recognition (ASR) technologies. While my published work has focused on foundational models and contextual ASR, my current passion and research efforts are increasingly directed toward the complex challenges of multi-speaker ASR and speaker diarization, aiming to create systems that can robustly process real-world conversational audio.

This has involved contributing to large-scale, open speech foundation models, most notably through a collaboration with Language Technologies Institute at Carnegie Mellon University on the Open Whisper-style Speech Model (OWSM) project. Within this initiative to create transparent alternatives to proprietary models, the focus has been on architectural innovation. This included enhancing the model with E-Branchformer for better performance and developing non-autoregressive systems like OWSM-CTC to achieve significant gains in speed and robustness against model hallucination.

This work on foundational models naturally highlights the critical need for both practical applicability and computational efficiency. To address this, my research has explored several interconnected areas. To improve real-world utility, investigations into contextual ASR have yielded novel methods for recognizing rare and user-specific terminology through techniques like dynamic vocabularies, intermediate biasing losses, and a bias phrase boosted (BPB) beam search. Simultaneously, to enhance architectural robustness and flexibility, contributions were made to unified systems like the 4D ASR model, which integrates multiple decoder paradigms (CTC, Attention, RNN-T, and Mask-CTC) into a single, jointly trained framework. This theme of unification also extends to addressing deployment constraints, as seen in work on jointly optimizing streaming and non-streaming ASR. Recognizing that the utility of large models is ultimately gated by their deployability, another facet of this research has been model efficiency, demonstrated through contributions to compression techniques like joint distillation and pruning in the DPHuBERT work.

The pursuit of scientific advancement is a cumulative effort, built upon the foundational work of those who came before. As Sir Isaac Newton famously wrote, “If I have seen further, it is by standing on the shoulders of Giants.” This idea has been a guiding principle throughout my research career, which has been profoundly shaped by the mentorship and collaboration of distinguished researchers. In my current role at Honda Research Institute Japan, I am honored to be collaborating with Prof. Shinji Watanabe, whose pioneering work continues to shape the field of end-to-end speech recognition. This opportunity builds upon the excellent guidance I received during my academic journey: my doctoral studies were supervised by Prof. Kazuhiro Nakadai at the Tokyo Institute of Technology (now Institute of Science Tokyo); my master’s thesis was a collaborative effort guided by Prof. Satoshi Tadokoro of Tohoku University and Prof. Daniele Nardi of Sapienza University of Rome; and my foundational research experience, contributing to the ALICE experiment at CERN, was conducted under the supervision of Prof. Arshad Saleem Bhatti.

Honors and Awards

News

Aug 22, 2025 :trophy: A co-authored paper received the ISCA Best Student Paper Award at INTERSPEECH 2025
Aug 06, 2025 :scroll: A first-authored paper, UME, has been accepted at IEEE ASRU 2025
May 19, 2025 :scroll: Three co-authored papers have been accepted at INTERSPEECH 2025
Dec 04, 2024 :trophy: A co-authored paper received the Best Paper Award at IEEE SLT 2024
Aug 30, 2024 :scroll: A co-authored paper has been accepted at IEEE SLT 2024
Jun 04, 2024 :scroll: Two papers (one first-authored) have been accepted at INTERSPEECH 2024
May 16, 2024 :scroll: A co-authored paper, OWSM-CTC, has been accepted at ACL 2024 (main conference)
Feb 03, 2024 :scroll: A first-authored paper has been accepted at ICASSP 2024 Satellite Workshop
Dec 13, 2023 :scroll: A co-authored paper has been accepted at ICASSP 2024
Sep 22, 2023 :scroll: A co-authored paper has been accepted at IEEE ASRU 2023
May 17, 2023 :scroll: Three co-authored papers have been accepted at INTERSPEECH 2023
Nov 01, 2022 :man_office_worker: Joining Honda Research Institute Japan Co., Ltd. as a Scientist
Sep 20, 2022 :trophy: I have successfully earned my Ph.D. (Doctor of Philosophy) in Systems and Control Engineering from the Tokyo Institute of Technology. This milestone marks the culmination of years of dedicated research and academic pursuit. I’m grateful for the support of my advisors, collaborators, and peers throughout this journey.

Selected publications

  1. CONFERENCE ASRU Unified Model
    UME.png
    Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
    Muhammad Shakeel, Yui Sudo, Yifan Peng, Chyi-Jiunn Lin, and Shinji Watanabe
    In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2025
  2. CONFERENCE INTERSPEECH Foundation Model
    OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
    Yifan Peng, Muhammad Shakeel, Yui Sudo, William Chen, Jinchuan Tian, and 2 more authors
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (ISCA Best Student Paper Award) , Aug 2025
  3. CONFERENCE SLT Contextualized ASR
    dv_slt2024.png
    Contextualized Automatic Speech Recognition with Dynamic Vocabulary
    Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe
    In Proceedings of the IEEE Spoken Language Technology Workshop (SLT) (Best Paper Award) , Dec 2024
  4. CONFERENCE INTERSPEECH Contextualized ASR
    ib_interspeech2024.jpg
    Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
    Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe
    In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024
  5. CONFERENCE ACL Foundation Model
    owsm_ctc.jpg
    OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
    Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Aug 2024
  6. WORKSHOP ICASSPW Unified Model
    joint_icassp2024.jpg
    Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation
    Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe
    In IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Apr 2024
Centered ClustrMaps