School of Electronics and Information Technology, Sun Yat-sen University, China 2016 - present
School of Mobile Information Engineering, Sun Yat-Sen University, China 2013 - 2015
ECE department of Carnegie Mellon University, Pittsburgh, USA 2015 - present
SYSU-CMU Joint Institute of Engineering, Sun Yat-sen University, China 2013 - 2016
SYSU-CMU Shunde International Joint Research Institute, China 2013 - 2016
Technique committee or Scientific Review Committee member
ICASSP2014, 2015, 2016, INTERSPEECH 2014, 2015, 2016, 2017, ISCSLP 2012, 2014, 2016, Odyssey 2014,2016, etc.
INTERSPEECH 2016, INTERSPEECH 2018
Technical Committee of APSIPA speech, language, audio (SLA) track
Carnegie Mellon University Pittsburgh, USA
Visiting Professor in ECE department Sep 2013 ~ July 2014
University of Southern California Los Angeles, USA
Ph.D. in Electrical Engineering 2008-2013
Advisor: Shrikanth Narayanan
Institute of Acoustics, Chinese Academy of Sciences Beijing, China
Master of Engineering in Signal and Information Processing 2005-2008
Advisor: Yonghong Yan
Nanjing University Nanjing, China
Bachelor of Science in Telecommunications engineering 2001-2005
Graduate with the highest honor
Rank: top1 within 50 students in communication engineering major
模式识别 Spring 2017
语音识别系统的设计与实现 Fall 2014 Spring 2015 Fall 2015 Fall 2016
概率论与数理统计 Fall 2017 （链接: https://pan.baidu.com/s/1kV9nB3h 密码: zqpn）
My current research interests:
My research interests lie in the areas of multimodal signal processing, speech and language processing, machine learning, statistical modeling, affective computing, structure health monitoring. My work is to enhance the robustness and efficiency of the multimodal human state recognition tasks which cover a broad range of applications, notably in security, healthcare and user assistance. Multimodal human state recognition can be considered as a task encompassing term for identifying or verifying various kinds of human centered state labels from multimodal signals, both overtly expressed and covertly present. We can also use signal processing and machine learning techniques for structure health monitoring.
Human state recognition:
1. Speech signal processing: Speech recognition, speaker verification, spoken language identification, speech paralinguistics detection, speaker diarization, speaker age and gender identification, emotion recognition, speech production, array signal processing, speech enhancement
2. Human behavior signal processing: gathering, analyzing and modeling multimodal human behavior signals, both overtly and covertly expressed (speech/language/audio/visual/physiological signal analysis and understanding)
3. Multimodal biometrics: Audio-visual joint biometrics, emerging behavior biometrics (ECG biometrics), finger biometrics, iris biometrics, multimodal fusion
4. Body sensing, processing and modeling methods in metabolic health monitoring: Multimodal physical activity recognition, energy efficient sensing and modeling
Structure health monitoring:
1. Robust waveform number analysis
2. Data-driven matched field processing
3. Acoustic-Ultrasonic Localization for damages in plates and pipes
My previous research interests:
Audio watermarking: Robust frequency domain audio watermarking, content adaptive audio watermarking in wavelet domain
Computational acoustics scene analysis: co-channel speech separation
co-PI of project "speech techniques development for multimedia big data analysis", Science and technology development foundation of Guangdong Province, 2016-2018, 1M RMB (total), 300K RMB (sysu part).
PI of project “A study of text independent paralinguistic speech attribute recognition based on the end-to-end deep learning framework”, Natural Science Foundation of China, 2018-2021, 600K RMB.
PI of project “Robust speaker verification based on multiple phonetic level deep features and discriminative subspace modeling”, Natural Science Foundation of Guangzhou City, 2018-2020, 200K RMB.
PI of project “A study of key speech processing technologies for national security applications”, funded by SYSU, 2017-2020, 500K RMB.
PI of project “Robust human computer interaction based on speech processing and microphone array technology” funded by JRI, 2017.1~2017.12, 1M RMB.
PI of project “speaker verification and language identification and voice conversion technology", funded by Jinlin Tech, 2016.1-2016.8, 2.89M RMB.
PI of project “design and implementation of speech recognition systems course", funded by SYSU excellent graduate course foundation, 2016-2017, 60K RMB.
PI of project “A speech analysis system for high education classroom interaction using speech processing technology ", funded by SYSU network center, 2016-2017, 100K RMB.
PI of NSFC project “Speaker recognition and language recognition using articulatory information in the speech production system and mid-level discriminative tokenization”, 2015.1~2017.12, 280K RMB.
PI of NSFC-Guangdong project “Robust short duration text dependent speaker verification”, 2015.1~2017.12, 100K RMB.
PI of the project, “Speech information retrieval for targeting Chinese dialects and foreign languages” funded by Fundamental Research Funds for the Central Universities, 2015.7~2017.6, 500K RMB.
PI of project “Multimodal human behavior signal analysis for autism kids", funded by IBM, 2016.1-2016.12, 100K RMB.
PI of project “text dependent speaker verification", funded by BAIDU, 2015.6-2015.12, 196K RMB.
PI of project “Multimodal human behavior modeling through computational sensing and analysis for young children with autism spectrum disorders” funded by JRI, 2013.9~2016.9, 1M RMB.
Co-PI of project, “Study and Industrialization of video big data content analysis systems for education cloud based on intelligent speech and audio processing technologies”, funded by Guangdong enterprises-universities-researches integration foundation, 2018-2020, 1M RMB (total), 300K RMB (subcontract, SYSU part).
Co-PI of project, “Whole body PET/MRI imaging system”, funded by National Key Research and Development Program, 2016-2020, 10M RMB (total), 150K RMB (SYSU part).
Co-PI of innovation team project “Structure health monitoring for plates and pipes using ultrasound techniques” funded by JRI, 2014,1~2016.12, 5M RMB (total).
Co-PI of project “Algorithm and Hardware Co-Design for Ultra Low-Power Data Processing of Electrocardiogram (ECG) Biometrics” funded by CMU-SYSU Collaborative Innovation Research Center at Carnegie Mellon University, 2014-2015, 100K USD.
Co-PI of project “Affect Analysis in Human-Human Interactions” funded by CMU-SYSU Collaborative Innovation Research Center at Carnegie Mellon University, 2015-2016, 100K USD.
- Kong-Yik Chee; Zhe Jin; Danwei Cai; Ming Li; Wun-She Yap; Yen-Lung Lai; Bok-Min Goi, “Cancellable Speech Template via Random Binary Orthogonal Matrices Projection Hashing,” submitted to Pattern Recognition, 2017.
- Zhicheng Li, Yinliang Xu, Ming Li (*), “Finite-time Stability and Stabilization of Semi-Markovian Jump Systems with Time Delay”, submitted to International Journal of Robust and Nonlinear Control, 2017.
- Wenbo Zhao, Ming Li (*), Joel B. Harley, Yuanwei Jin, Jos e M.F. Moura, Jimmy Zhu, "Reconstruction of Lamb wave dispersion curves by sparse representation and continuity constraints", Journal of the Acoustical Society of America, 2017.
- Wenbo Liu, Ming Li (*), Li Yi, "Identifying Children with Autism Spectrum Disorder Based on Their Face Processing Abnormality: A Machine Learning framework", Autism research, 2016.
- Yinliang Xu, Zaiyue Yang, Wei Gu, Ming Li, and Zicong Deng, "Robust Real-Time Distributed Optimal Control Based Energy Management in a Smart Grid", IEEE TRANSACTIONS ON SMART GRID, 2015.
- Donna Spruijt-Metz, Cheng K.F. Wen, Gillian O’Reilly, Ming Li, Sangwon Lee, Adar Emken, Urbashi Mitra, Murali Annavaram, Gisele Ragusa, Shrikanth Narayanan. “Innovations in the Use of Interactive Technology to Support Weight Management”, Current Obesity Reports. 2015
- Ming Li(*), Jangwon Kim, Adam Lammert, Prasanta Kumar Ghosh, Vikram Ramanarayanan, Shrikanth Narayanan, “Speaker verification based on the fusion of speech acoustics and inverted articulatory signals”, Computer Speech & Language, 2015.
- Ming Li (*)and Wenbo Liu, "Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification", Journal of Signal Processing Systems, 2015.
- Ming Li(*), and Shrikanth Narayanan. "Simplified Supervised I-vector Modeling and Sparse Representation with application to Robust Language Recognition", Computer Speech & Language, 2014.
- Ming Li(*), Kyu J. Hanb and Shrikanth Narayanan, “Automatic Speaker Age and Gender Recognition using acoustic and prosodic level information fusion”, Computer speech and language, 2013, vol 27.
- Daniel Bone, Ming Li, Matthew P. Black and Shrikanth S. Narayanan. "Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM Supervectors." Computer Speech & Language, 2013.
- Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth S. Narayanan, “Automatic intelligibility classification of sentence-level pathological speech”, to appear in Computer Speech & Language.
- U. Mitra, A. Emken, S. Lee, M. Li, V. Rozgic, G. Thatte, H. Vathsangam, D. Zois, M. Annavaram, S. Narayanan, D. Spruijt-Metz, and G. Sukhatme, "KNOWME: a Case Study in Wireless Body Area Sensor Network Design", IEEE Communications Magazine 2012 50:5(116-125).
- Gautam Thatte, Ming Li, Sangwon Lee, Adar Emken, Shri Narayanan, Urbashi Mitra, Donna Spruijt-Metz and Murali Annavaram, “KNOWME: An Energy-Efficient and Multimodal Body Area Sensing System for Physical Activity Monitoring,” ACM Transactions in Embedded Computing Systems, 2012.
- Adar Emken, Ming Li, Gautam Thatte, Sangwon Lee, Murali Annavaram, Urbashi Mitra, Shrikanth Narayanan, Donna Spruijt-Metz, “Recognition of Physical Activities in Overweight Hispanic Youth using KNOWME Networks”, Journal of Physical Activity and Health, 9:3(432-441) 2012.
- Gautam Thatte, Ming Li, Sangwon Lee, Adar Emken, Murali Annavaram, Shri Narayanan, Donna Spruijt-Metz, Urbashi Mitra, “Optimal Time-Resource Allocation for Energy-Efficient Physical Activity Detection”, IEEE Transaction on Signal Processing, vol 59, issue 4, April, 2011.
- Ming Li(*), Viktor Rozgic, Gautam Thatte, Sangwon Lee, Adar Emken, Murali Annavaram, Urbashi Mitra, Donna Spruijt-Metz and Shrikanth Narayanan, "Multimodal Physical Activity Recognition by Fusing Temporal and Cepstral Information," IEEE Transactions on Neural Systems & Rehabilitation Engineering, vol 18, issue 4, August, 2010.
- Hongbin Suo, Ming Li, Ping Lu, Yonghong Yan, “Automatic language identification with discriminative language characterization based on SVM”, IEICE transaction on Information and Systems, 2008.
- Hongbin Suo, Ming Li, Ping Lu, Yonghong Yan, "Using SVM as back-end classifier for language identification", EURASIP Journal on Audio, Speech, and Music Processing, 2008.
- Jianping Zhang, Ming Li, Hongbin Suo, Lin Yang, Qiang Fu and Yonghong Yan, “Long span prosodic features for speaker recognition”, ACTA ACOUSTICA, 2010. (In chinese)
- Ming Li, Luting Wang, Zhicheng Xu, Danwei Cai, “Mandarin Electrolaryngeal Voice Conversion with Combination of Gaussian Mixture Model and Non-negative Matrix Factorization”, APSIPA ASC 2017.
- Tianyan Zhou, Yixiang Xie, Xiaobing Zou, Ming Li(*)，“An Automated Assessment Framework for Speech Abnormalities related to Autism Spectrum Disorder”, ASMMC 2017.
- Jing Pan, Ming Li(*), Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi and Manman Zhu,"An audio based piano performance evaluation method using deep neural network based acoustic modeling", Interspeech 2017.
- Weicheng Cai, Danwei Cai, Wenbo Liu, Gang Li, Ming Li(*), "Countermeasures for Automatic Speaker Verification Replay Spoofing Attack : On Data Augmentation, Feature Representation, Classification and Fusion", Interspeech 2017.
- Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li(*), "End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum", Interspeech 2017.
- Jinkun Chen, Ming Li(*), "Automatic Emotional Spoken Language Text Corpus Construction from Written Dialogs in Fictions", ACII 2017.
- Wenbo Liu, Xiaobin Zou, Ming Li(*), "RESPONSE TO NAME: A DATASET AND A MULTIMODAL MACHINE LEARNING FRAMEWORK TOWARDS AUTISM STUDY", ACII 2017.
- Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song, "SphereFace: Deep Hypersphere Embedding for Face Recognition", CVPR 2017.
- Tianyan Zhou, Weicheng Cai, Huadi Zheng, Luting Wang, Xiaoyan Chen,Xiaobing Zou, Shilei Zhang, Ming Li(*), "A Pitch, Energy and Phoneme Duration Features based Speaker Diarization System for Autism Kids’ Real-Life Audio Data", ISCSLP 2016.
- Danwei Cai, Weicheng Cai, Ming Li(*), “Locality Sensitive Discriminant Analysis for Speaker Recognition”, APSIPA ASC 2016.
- Gaoyuan He, Jinkun Chen, Xuebo Liu, and Ming Li(*), “The SYSU System for CCPR 2016 Multimodal Emotion Recognition Challenge”, CCPR 2016.
- Huadi Zheng, Weicheng Cai, Tianyan Zhou, Shilei Zhang, Ming Li(*), "Text-Independent Voice Conversion Using Deep Neural Network Based Phonetic Level Features", ICPR 2016.
- Wei Fang, Jianwen Zhang, Dilin Wang, Zheng Chen, Ming Li, Entity Disambiguation by Knowledge and Text Jointly Embedding, CONLL 2016.
- Yandong Wen, Weiyang Liu, Meng Yang, Ming Li, "Efficient Misalignment-robust Face Recognition via Locality-constrained Representation", ICIP 2016.
- Zhun Chen, Wenbo Zhao, Yuanwei Jin, Ming Li(*), Jimmy Zhu, "A Fast Tracking Algorithm for Estimating Ultrasonic Signal Time of Flight in Drilled Shafts Using Active Shape Models", IEEE International Ultrasonics Symposium 2016.
- Zhiding Yu, Weiyang Liu, Wenbo Liu, Yingzhen Yang, Ming Li and Vijayakumar Bhagavatula, "On Order-Constrained Transitive Distance Clustering", AAAI, 2016.
- Shushan Chen, Yiming Zhou and Ming Li(*), "Automatic English Pronunciation Evaluation System", APSIPA ASC 2015.
- Shitao Weng, Shushan Chen, Lei Yu, Xuewei Wu, Weicheng Cai, Zhi Liu, Yiming Zhou and Ming Li(*), "THE SYSU SYSTEM FOR THE INTERSPEECH 2015 AUTOMATIC SPEAKER VERIFICATION SPOOFING AND COUNTERMEASURES CHALLENGE", APSIPA ASC 2015.
- Wenbo Liu, Zhiding Yu, Li Yi, Bhiksha Raj, Ming Li(*), "Efficient Autism Spectrum Disorder Diagnosis with Eye Movement: A Machine Learning Framework", ACII 2015.
- Wenbo Liu, Zhiding Yu, Bhiksha Raj and Ming Li(*), "Locality Constrained Transitive Distance Clustering on Speech Data", INTERSPEECH 2015.
- Qingyang Hong, Lin Li, Ming Li, Ling Huang, Lihong Wan and Jun Zhang,"Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System", INTERSPEECH 2015.
- Weicheng Cai, Ming Li(*), Lin Li and Qingyang Hong,"Duration Dependent Covariance Regularization in PLDA Modeling for Speaker Verification", INTERSPEECH 2015.
- Yingxue Wang, Shenghui Zhao, Wenbo liu, Ming Li, Jingming Kuang, “Speech bandwidth extension based on Deep Neural Networks”, INTERSPEECH 2015.
- Ming Li, “speaker verification with the mixture of Gaussian factor analysis based representation”, ICASSP 2015.
- Wenbo Liu, Zhiding Yu and Ming Li(*), “An Iterative Framework for Unsupervised Learning in the PLDA basedSpeaker Verification”, ISCSLP 2014.
- Ming Li, Wenbo Liu, "Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenization and tandem features", INTERSPEECH 2014.
- Ming Li, "Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens", INTERSPEECH 2014.
- Prashanth Gurunath Shivakumar, Ming Li, Vedant Dhandhania and Shrikanth S.Narayanan, “SIMPLIFIED AND SUPERVISED I-VECTOR MODELING FOR SPEAKER AGE REGRESSION”, ICASSP 2014.
- Ming Li, Xin Li, “verification based ECG biometrics with cardiac irregular conditions using heartbeat level and segment level information fusion”, ICASSP 2014.
- Ming Li, Andreas Tsiartas, Maarten Van Segbroeck and Shrikanth S. Narayanan, "SPEAKER VERIFICATION USING SIMPLIFIED AND SUPERVISED I-VECTOR MODELING", ICASSP 2013.
- Ming Li, Adam Lammert, Jangwon Kim, Prasanta Ghosh and Shrikanth Narayanan, “Automatic Classification of Palatal and PharyngealWall Morphology Patterns from Speech Acoustics and Inverted Articulatory Signals'', Workshop on Speech Production in Automatic Speech Recognition, 2013.
- Ming Li, Jangwon Kim, Prasanta Kumar Ghosh, Vikram Ramanarayanan and Shrikanth Narayanan, ``Speaker verification based on fusion of acoustic and articulatory information'', Interspeech 2013.
- Andreas Tsiartas, Theodora Chaspari, Nassos Katsamanis, Prasanta Ghosh, Ming Li, Maarten Van Segbroeck, Alexandros Potamianos, Shrikanth Narayanan, "Multi-band long-term signal variability features for robust voice activity detection'', Interspeech 2013.
- Kyu Jeong Han, Sriram Ganapathy, Ming Li, Mohamed K. Omar and Shrikanth S. Narayanan, "TRAP Language Identification System for RATS Phase II Evaluation", Interspeech 2013.
- Daniel Bone, Theodora Chaspari, Kartik Audhkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee and Shrikanth S. Narayanan, "Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds", Interspeech 2013.
- Ming Li, Charley Lu, Anne Wang, Shrikanth Narayanan, "Speaker Verification using Lasso based Sparse Total Variability Supervector and Probabilistic Linear Discriminant Analysis”, presented at NIST Speaker Recognition Workshop, Atlanta, 2011.published in Proceedings of APSIPA Annual Summit and Conference, Hollywood, CA, 2012
- Ming Li, Angeliki Metallinou, Daniel Bone, Shrikanth Narayanan, "Speaker states recognition using latent factor analysis based Eigenchannel factor vector modeling", ICASSP 2012.
- Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li and Shrikanth S. Narayanan, "Intelligibility classification of pathological speech using fusion of multiple high level descriptors", Interspeech 2012.
- Kartik Audhkhasi, Angeliki Metallinou, Ming Li and Shrikanth S. Narayanan, "Speaker Personality Classification Using Systems Based on Acoustic-Lexical Cues and an Optimal Tree-Structured Bayesian Network", Interspeech 2012.
- Ming Li, Xiang Zhang, Yonghong Yan and Shrikanth Narayanan, "Speaker Verification using Sparse Representations on Total Variability I-Vectors,”, Interspeech Florence, Italy, 2011.
- Ming Li, Shrikanth Narayanan, “Robust talking face video verification using joint factor analysis and sparse representation on GMM mean shifted supervectors”, ICASSP, Prague, Czech Republic, 2011.
- Samuel Kim, Ming Li, Sangwon Lee, Urbashi Mitra, Adar Emken, Donna Spruijt-Metz, Murali Annavaram, Shrikanth Narayanan, "Modeling high-level descriptions of real-life physical activities using latent topic modeling of multimodal sensor signals", IEEE conference of engineering in medicine and biology society (EMBC), Boston, 2011.
- Daniel Bone, Matthew P. Black, Ming Li, Angeliki Metallinou, Sungbok Lee and Shrikanth Narayanan, "Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors", Interspeech, Florence, Italy, 2011.
- Ming Li, Shrikanth Narayanan, “ECG Biometrics by Fusing Temporal and Cepstral Information”, 20th conference of the International Association for Pattern Recognition, ICPR 2010, Turkey.
- Ming Li, Chi-Sang Jung and Kyu Jeong Han, “Combining Five Acoustic Level methods for Automatic Speaker Age and Gender Recognition”,Interspeech, 2010.
- Gautam Thatte, Viktor Rozgic, Ming Li, Sabyasachi Ghosh, Urbashi Mitra, Shri Narayanan, Murali Annavaram, Donna Spruijt-Metz, "Optimal Time-Resource Allocation for Activity-Detection via Multimodal Sensing," in Proceedings of the Fourth International Conference on Body Area Networks (BodyNets), Los Angeles, CA, 2009.
- Gautam Thatte, Viktor Rozgic, Ming Li, Sabyasachi Ghosh, Urbashi Mitra, Shri Narayanan, Murali Annavaram and Donna Spruijt-Metz, "Optimal Allocation of Time-Resources for Multihypothesis Activity-Level Detection," in Proceedings of the 5th IEEE International Conference on Distributed Computing in Sensor Systems (DCOSS), Marina Del Rey, CA (June 2009). (Best paper award!)
- Sangwon Lee, Murali Annavaram, Gautam Thatte, Vikor Rozgic, Ming Li, Urbashi Mitra, Shri Narayanan and Donna Spruijt-Metz, "Sensing for Obesity: KNOWME Implementation and Lessons for an Architect," in Proceedings of the Workshop on Biomedicine in Computing: Systems, Architectures, and Circuits (BiC2009), Austin, TX (June 2009).
- Gautam Thatte, Ming Li, Adar Emken, Urbashi Mitra, Shri Narayanan, Murali Annavaram and Donna Spruijt-Metz, "Energy-Efficient Multihypothesis Activity-Detection for Health-Monitoring Applications," the 31st Annual International Conference of the IEE Engineering in Medicine and Biology Society (EMBC), Minneapolis, 2009.
- Ming Li, Chuan Cao, Di Wang, Ping Lu, Qiang Fu, and Yonghong Yan, “Cochannel speech separation using multi-pitch estimation and model based voiced sequential grouping”, proceeding of International Conference on Spoken Language Processing, INTERSPEECH 2008.
- Ming Li, Hongbin Suo, Xiao Wu, Ping Lu, Yonghong Yan, “Spoken Language Identification Using Score Vector Modeling and Support Vector Machine”, INTERSPEECH 2007.
- Ming Li, Yun Lei, Xiang Zhang, Jian Liu, Yonghong Yan, “authentication and quality monitoring based audio watermark for analog AM shortwave broadcasting”, proceeding of IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing，IIH-MSP 2007.
- Hongbin Suo, Ming Li, Tantan Liu, Ping Lu, Yonghong Yan,“The Design of Backend Classifiers in PPRLM System for Language Identification”, proceeding of International Conference on Natural Computation, ICNC 2007.
- Hongbin Suo, Ming Li, Ping Lu, Yonghong Yan, “Language identification based on parallel PRLM system”, proceeding of Chinese national conference on network security, 2007 (in Chinese).
- Ming Li, Yun Lei, Jian Liu, Yonghong Yan, "A Novel Audio Watermarking in Wavelet Domain", proceeding of IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing，IIH-MSP 2006.