Publications | Muhammad Shakeel, Ph.D.

Please also check my research profiles for more up-to-date information:

2026

CONFERENCE ICASSP Unified Model

CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR

Muhammad Shakeel, Yosuke Fukumoto, Chikara Maeda, Chyi-Jiunn Lin, and Shinji Watanabe

In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jan 2026

arXiv PDF

2025

CONFERENCE ASRU Unified Model

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder

Muhammad Shakeel, Yui Sudo, Yifan Peng, Chyi-Jiunn Lin, and Shinji Watanabe

In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2025

arXiv PDF
CONFERENCE INTERSPEECH Foundation Model

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning

Yifan Peng, Muhammad Shakeel, Yui Sudo, William Chen, Jinchuan Tian, and 2 more authors

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) (ISCA Best Student Paper Award) , Aug 2025

Awarded arXiv PDF Website

ISCA Best Student Paper Award
CONFERENCE INTERSPEECH Unified Model

Joint Target-Speaker ASR and Activity Detection

Chikara Maeda, Muhammad Shakeel, and Yui Sudo

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2025

PDF
CONFERENCE INTERSPEECH Contextualized ASR

DYNAC: Dynamic Vocabulary based Non-Autoregressive Contextualization for Speech Recognition

Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, Chyi-Jiunn Lin, and 1 more author

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2025

arXiv PDF
JOURNAL TASLP Unified Model

Joint Beam Search Integrating CTC, Attention, and Transducer Decoders

Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, and 2 more authors

IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), Jan 2025

PDF

2024

CONFERENCE AIチャレンジ研究会 Contextualized ASR

動的な語彙拡張を用いたEnd-to-end音声認識の文脈適応

唯周藤, Muhammad Shakeel, Peng Yifan, and 唯周藤

人工知能学会第二種研究会資料, Jan 2024

DOI PDF
CONFERENCE AIチャレンジ研究会 Speech Separation

Speech Separation with Auxiliary Signal-to-Artifact Ratio Loss for Improving Multi-Talker ASR

Ngai Matthew, Maeda Chikara, Muhammad Shakeel, and Sudo Yui

人工知能学会第二種研究会資料, Jan 2024

DOI PDF
CONFERENCE SLT Contextualized ASR

Contextualized Automatic Speech Recognition with Dynamic Vocabulary

Yui Sudo, Yosuke Fukumoto, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe

In Proceedings of the IEEE Spoken Language Technology Workshop (SLT) (Best Paper Award) , Dec 2024

Awarded PDF

Best Paper Award
CONFERENCE INTERSPEECH Contextualized ASR

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024

PDF
CONFERENCE ICASSP Contextualized ASR

Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search

Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Yifan Peng, and Shinji Watanabe

In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024

PDF
CONFERENCE ACL Foundation Model

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification

Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe

In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), Aug 2024

Abs arXiv PDF Code Poster Website

There has been an increasing interest in large speech models that can perform multiple tasks in a single model. Such models usually adopt an encoder-decoder or decoder-only architecture due to their popularity and good performance in many domains. However, autoregressive models can be slower during inference compared to non-autoregressive models and also have potential risks of hallucination. Though prior studies observed promising results of non-autoregressive models for certain tasks at small scales, it remains unclear if they can be scaled to speech-to-text generation in diverse languages and tasks. Inspired by the Open Whisper-style Speech Model (OWSM) project, we propose OWSM-CTC, a novel encoder-only speech foundation model based on Connectionist Temporal Classification (CTC). It is trained on 180k hours of public audio data for multilingual automatic speech recognition (ASR), speech translation (ST), and language identification (LID). Compared to encoder-decoder OWSM, our OWSM-CTC achieves competitive results on ASR and up to 24% relative improvement on ST, while it is more robust and 3 to 4 times faster for inference. OWSM-CTC also improves the long-form ASR result with 20x speed-up.We will publicly release our code, pre-trained model, and training logs to promote open science in speech foundation models.
CONFERENCE INTERSPEECH Foundation Model

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, and 7 more authors

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Sep 2024

arXiv PDF Code Poster Website
WORKSHOP ICASSPW Unified Model

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation

Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe

In IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Apr 2024

PDF

2023

CONFERENCE ASRU Foundation Model

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data

Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, and 11 more authors

In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Dec 2023

arXiv PDF Website
CONFERENCE INTERSPEECH Unified Model

Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training

Yui Sudo, Muhammad Shakeel, Yifan Peng, and Shinji Watanabe

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023

PDF
CONFERENCE INTERSPEECH Unified Model

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

Yui Sudo, Shakeel Muhammad, Brian Yan, Jiatong Shi, and Shinji Watanabe

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023

DOI PDF
CONFERENCE Unified Model

End-to-end integration of online and offline encoders using auxiliary losses for automatic speech recognition

Muhammad Shakeel, Yui Sudo, Yifan Peng, and Shinji Watanabe

In 人工知能学会第二種研究会資料, Nov 2023

DOI PDF
CONFERENCE INTERSPEECH Efficient Model

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models

Yifan Peng, Yui Sudo, Muhammad Shakeel, and Shinji Watanabe

In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2023

arXiv PDF Code
CONFERENCE IEEE/SICE SII Anomaly Detection

Metric-Based Multimodal Meta-Learning for Human Movement Identification Via Footstep Recognition

Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, and Kazuhiro Nakadai

In 2023 IEEE/SICE International Symposium on System Integration (SII), Aug 2023

DOI

2022

JOURNAL Anomaly Detection

3D Convolution Recurrent Neural Networks for Multi-Label Earthquake Magnitude Classification

Muhammad Shakeel, Kenji Nishida, Katsutoshi Itoyama, and Kazuhiro Nakadai

Applied Sciences, Aug 2022

DOI

2021

JOURNAL Anomaly Detection

Detecting earthquakes: a novel deep learning-based approach for effective disaster response

Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, and Kazuhiro Nakadai

Applied Intelligence, Nov 2021

DOI
CONFERENCE IEEE/SICE SII Anomaly Detection

EMC: Earthquake Magnitudes Classification on Seismic Signals via Convolutional Recurrent Networks

Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, and Kazuhiro Nakadai

In 2021 IEEE/SICE International Symposium on System Integration (SII), Nov 2021

DOI
CONFERENCE IEEE/SICE SII Others

Assessment of a Beamforming Implementation Developed for Surface Sound Source Separation

Zhi Zhong, Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, and Kazuhiro Nakadai

In 2021 IEEE/SICE International Symposium on System Integration (SII), Nov 2021

DOI

2015

CONFERENCE IEEE/SSSR Others

Environmental sensing using millimeter wave sensor for extreme conditions

Shakeel Muhammad, Daniele Nardi, Kazunori Ohno, and Satoshi Tadokoro

In 2015 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Nov 2015

DOI