Multimedia Coding and Communications Laboratory

Thank you for visiting the web home of Multimedia Coding & Communications Laboratory (Mc2L), Department of Electrical & Computer Engineering, Queen's University. Mc2L researchers perform analysis and modeling, and develop algorithms and realization architectures, for processing (understanding, coding, enhancing) speech, digital communications, and health related signals and data, for applications in machine mediated communications, networking, and interactions. For specific activities, please look over our publications.

About

Software

Publications

Paper Awards

People

Joining Mc²L

Mc²L members, past and present, include degree-seeking students and visiting researchers. They hail from many countries, including Brazil, Canada, China, Egypt, France, India, Iran, Korea, Lebanon, Mexico, Saudi Arabia, Spain, Thailand, Turkey, and the United States. If you are interested in pursuing a Queen's graduate degree, please follow the application procedure.

Location

Multimedia Coding and Communications Laboratory (Mc²L)
Department of Electrical and Computer Engineering
Walter Light Hall
19 Union Street
Queen's University
Kingston, ON, K7L 3N6
Canada

Software

wSTMI and STGI code for speech intelligibility prediction

Latest release of the SRMR toolbox

Sparse network coding, see also here.

A. Edraki, W.-Y. Chan, J. Jensen, & D. Fogerty. "Speaker Adaptation for Enhancement of Bone Conducted Speech." Proc. IEEE Intl. Conf. on Acoustics, Speech, & Signal Proc., 5 pages, Apr 2024.

M. Boertjes, A.S. Kashi, J.C. Cartledge, & W.-Y. Chan "Machine Learning Model Training Framework for Nonlinear Signal-to-Noise Ratio Estimation in Heterogeneous Optical Networks," IEEE/OSA Journal of Lightwave Technology, 11 pages, Mar 2024.

R.K.G. Do, L. Elbatarny, N. Gangai, W.-Y. Chan, X. Zhu, A. Simpson. "Natural Language Processing of Oncologic Radiology Reports: Predicting Response and Progression from Free Text Impressions." Abstract, Radiological Society of North America (RSNA) Annual Meeting, Nov 2023.

A. Alghamdi, W.-Y. Chan, D. Fogerty, & J. Jensen. "Correlation Based Glimpse Proportion Index," Proc. IEEE Workshop on Applications of Signal Proc. to Audio and Acoustics, 5 pages, Oct 2023.

C. Lau, X. Zhu, & W.-Y. Chan, "Automatic Depression Severity Assessment with Deep Learning Using Parameter-Efficient Tuning," Frontiers in Psychiatry, vol. 14, article1160291, 14 pages, Jun 2023.

I. Lopez-Espejo, A. Edraki, W.-Y. Chan, Z.-H. Tan, & J. Jensen, "On the deficiency of intelligibility metrics as proxies for subjective intelligibility," Speech Communication, vol.150, pp. 9-22., Apr 2023.

A. Edraki, W.-Y. Chan, D. Fogerty, & J. Jensen, "Modeling the effect of linguistic predictability on speech intelligibility prediction,"" JASA Express Letters, 3(3): 1-8, Mar 2023.

M. Ashofteh Barabadi, X. Zhu, W.-Y. Chan, A.L. Simpson, & R.K.G. Do, "Parameter-Efficient Methods for Metastases Detection from Clinical Notes," Proc. Canadian AI (CANAI), 6 pages, Jun 2023.

A. Edraki, W.-Y. Chan, J. Jensen, & D. Fogerty,"Spectro-temporal modulation glimpsing for speech intelligibility prediction," Hearing Research, vol. 426, 10 pages, Nov 2022.

J. Sanii & W.-Y. Chan., "Explainable Machine Learning Models for Pneumonia Mortality Risk Prediction Using MIMIC-III Data," Proc. International Conf. on Soft Computing & Machine Intelligence, 6 pages, Nov 2022.

C. Lau, W.-Y. Chan, & X. Zhu, "Improving Depression Assessment With Multi-Task Learning From Speech and Text Information," Proc. 55th Asilomar Conference on Signals, Systems and Computers, 5 pages, Oct 2021.

A. Alghamdi, W.-Y. Chan, D. Fogerty, & J. Jensen, “Improved Intelligibility Prediction in the Modulation Domain," Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 5 pages, Oct 2021.

M. Chowdhury, E. Gasca Cervantes, W.-Y. Chan, & D.P. Seitz, "Use of Machine Learning and Artificial Intelligence Methods in Geriatric Mental Health Research Involving Electronic Health Record or Administrative Claims Data: A Systematic Review," Frontiers in Psychiatry, vol. 12, article 738466, 11 pages, Sep 2021.

E. Gasca Cervantes & W.-Y. Chan, "LIME-Enabled Investigation of Convolutional Neural Network Performances in Covid-19 Chest X-Ray Detection," Proc. 2021 Canadian Conference of Electrical and Computer Engineering, 6 pages, Sep 2021.

A. Edraki, W.-Y. Chan, J. Jensen, & D. Fogerty, “A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction," Proc. Interspeech, 5 pages, Aug 2021.

A. S. Kashi, J. C. Cartledge, & W.-Y. Chan, “Neural Network Training Framework for Nonlinear Signal-to-Noise Ratio Estimation in Heterogeneous Optical Networks,” Proc. Optical Fiber Conf., 3 pages, June 2021.

A. Edraki, W.-Y. Chan, J. Jensen, & D. Fogerty, “Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis,” IEEE/ACM Trans. Audio, Speech, & Language Processing, vol. 29, pp. 210-225, 2021.

D. Fogerty, A. Alghamdi, & W.-Y. Chan, “The effect of simulated room acoustic parameters on the intelligibility and perceived reverberation of monosyllabic words and sentences,” J. Acoustical Society of America, 147 (5), pp. EL396-402, May 2020.

A. Alghamdi & W.-Y. Chan, “Modified ESTOI for improving speech intelligibility prediction,” Proc. IEEE Canadian Conf. Electrical & Computer Engineering, pp. 1-5, 2020.

S. Rezazadeh, F. Alajaji, & W.-Y. Chan, “Scalar Quantizer Design for Two-Way Channels,” Proc. 16th Canadian Workshop Information Theory, pp. 1-6, Sep 2019.

P. Wang, J. C. Cartledge and W.-Y. Chan, “Pre-compensation of Nonlinear Distortion of a Silicon Microring Modulator Using Back-calculation,” Proc. Intl. Conf. Numerical Simulation of Optoelectronic Devices, Ottawa, pp. 123-124, June 2019.

A. Edraki, W.-Y. Chan, J. Jensen, & D. Fogerty, “Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation,” Proc. Interspeech, pp. 1378-1382, May 2019.

A. Alghamdi, W.-Y. Chan, & D. Fogerty, “Using Acoustic Parameters for Intelligibility Prediction of Reverberant Speech,” Proc. European Signal Processing Conf., 5 pages, Sep 2018.

Y. Li, W.-Y. Chan, & S.D. Blostein, “On Design and Efficient Decoding of Sparse Random Linear Network Codes,” IEEE Access, vol. 5, pp. 17031-17044, Aug. 2017.

A. Bakhshali, W.-Y. Chan, A. Rezania, & J.C. Cartledge, “Detection of High Baud-Rate Signals With Pattern Dependent Distortion Using Hidden Markov Modeling,” IEEE/OSA Journal of Lightwave Technology, vol. 35, no. 13, pp. 2612-2621, Jul. 2017.

A. Alghamdi & W.-Y. Chan, “Single-ended intelligibility prediction of noisy speech based on auditory features,” Proc. Canadian Conf. on Electrical and Computer Engineering, 4 pages, May 2017.

W.-Y. Chan, T.H. Falk, & Q. Xu, “Single-Sided Speech Quality Measurement,” US Patent 9,786,300 B2, Oct 10, 2017.

A. Rezania, J.C. Cartledge, A. Bakhshali, & W.-Y. Chan, "Compensation Schemes for Transmitter and Receiver Based Pattern-Dependent Distortion," IEEE Photonics Technology Letters, DOI: 10.1109/LPT.2016.2613401.

A. Bakhshali, W.-Y. Chan, A. Rezania, & J.C. Cartledge, "Sequential MAP Detection for High Baud-Rate Systems with Pattern-Dependent Distortions," Proc. 2016 European Conf. Optical Communication (ECOC), 3 pages, Sep. 2016.

A. Abou Saleh, F. Alajaji, & W.-Y. Chan, “Source-Interference Recovery Over Broadcast Channels: Asymptotic Bounds and Analog Codes,” IEEE Trans. on Communications, vol. 64, no. 8, pp. 3406-3418, Aug. 2016.

A. Bakhshali, W.-Y. Chan, J.C. Cartledge, M. O’Sullivan, C. Laperle, A. Borowiec, & K. Roberts, “Frequency-Domain Volterra-Based Equalization Structures for Efficient Mitigation of Intra-Channel Kerr Nonlinearities,” IEEE/OSA Journal of Lightwave Technology, vol. 34, no. 8, pp. 1770-1777, Apr. 2016.

2015

A. Bakhshali, W.-Y. Chan, S.D. Blostein, & Y. Cao, “QoE optimization of video multicast with heterogeneous channels and playback requirements,” EURASIP Journal on Wireless Communications and Networking, article 2015:260, 21 pages, Dec. 2015. [Click to download paper with an erratum.]

Y. Li, S.D. Blostein, & W.-Y. Chan, “Systematic network coding for two-hop lossy transmissions,” EURASIP Journal on Advances in Signal Processing, article 2015:93, 14 pages, Nov. 2015.

A. Abou Saleh, F. Alajaji, & W.-Y. Chan, “Compressed Sensing with Non-Gaussian Noise and Partial Support Information,” IEEE Signal Processing Letters, vol. 22, no. 10, pp. 1703-1707, Oct. 2015.

A. Bakhshali, W.-Y. Chan, J.C. Cartledge, M. O’Sullivan, C. Laperle, A. Borowiec, & K. Roberts, “Volterra-based nonlinearity compensation structures with improved performance-complexity trade-offs,” Proc. 2015 European Conf. Optical Communication (ECOC), 3 pages, Sep. 2015. *** Selected by the ECOC TPC as a "most highly-ranked paper" ***

Y. Cao, S.D. Blostein, & W.-Y. Chan, “Optimization of unequal error protection rateless codes for multimedia multicasting,” Journal of Communications and Networks, vol. 17, no. 3, pp. 221-230, June 2015.

A. Abou Saleh, F. Alajaji, & W.-Y. Chan, “Analog coding for Gaussian source and state interference estimation,” Proc. IEEE 16th Intl. Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 600-604, June 2015.

2014

A. Abou Saleh, W.-Y. Chan, & F. Alajaji “Source-Channel Coding for Fading Channels with Correlated Interference,” IEEE Trans. on Communications, vol. 62, no. 11, pp. 3997-4011, Nov. 2014.

A. Bakhshali, W.-Y. Chan, Gao Y., J.C. Cartledge, M. O’Sullivan, C. Laperle, A. Borowiec, & K. Roberts, “Complexity reduction of frequency-domain Volterra-based nonlinearity post-compensation using symmetric electronic dispersion compensation,” Proc. 2014 European Conf. Optical Communication (ECOC), 3 pages, Sep. 2014.

A. Abou Saleh, W.-Y. Chan, & F. Alajaji, “Power-Constrained Low-Complexity Coding of Compressed Sensing Measurements,” Proc. IEEE 15th Intl. Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 439-443, June 2014.

Y. Li, W.-Y. Chan, & S.D. Blostein, “Systematic network coding for transmission over two-hop lossy links,” Proc. 27th Biennial Symposium on Communications, pp. 213-217, June 2014.

A. Abou Saleh, F. Alajaji, & W.-Y. Chan, “Distortion bounds for broadcasting a Gaussian source in the presence of interference,” Proc. 27th Biennial Symposium Communications., pp. 96-100, June 2014.

A. Abou Saleh, F. Alajaji, & W.-Y. Chan, “Low-latency source-channel coding for fading channels with correlated interference,” IEEE Wireless Communications Letters, vol. 3, no. 2, pp. 137-140, Apr. 2014.

2013

Y. Li, S. Blostein, & W.-Y. Chan “Large File Distribution Using Efficient Generation-based Network Coding,” Proc. 2013 IEEE GlobecomWorkshop on Cloud Computing Systems, Networks, & Applications, pp. 427-432, Dec. 2013.

C. Zheng & W.-Y. Chan, “Late Reverberation Suppression Using MMSE Modulation Spectral Estimation,” Proc. Interspeech 2013, 6 pages, Aug. 2013.

A. Abou Saleh, F. Alajaji, & W.-Y. Chan, “Hybrid Digital-Analog Coding for Interference Broadcast Channels,” Proc. 2013 IEEE Intl. Symp. on Information Theory, pp. 544-548, July 2013.

A. Abou Saleh, W.-Y. Chan, & F. Alajaji, “Low and High-Delay Source-Channel Coding with Bandwidth Expansion and Correlated Interference,” Proc. 13th Canadian Workshop on Information Theory (CWIT), pp. 61-65, June 2013.

2012

A. Bakhshali, W.-Y. Chan, Y. Cao, & S. Blostein, "Multi-Scalable Video Multicast for Heterogeneous Playback Requirements Using a Perceptual Utility Measure," Proc. 2012 IEEE International Workshop on Multimedia Signal Processing, 6 pages, Sep 2012.

W.-Y. Chan and T. Falk, "Machine Assessment of Speech Communication Quality," in J.D. Gibson (ed), The Mobile Communications Handbook, CRC Press, Chapter 30, pp. 587-600, 2012.

T. Falk, W.-Y. Chan, & F. Shein "Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility," Speech Communication, vol. 54, no. 5, pp. 622-631, June 2012. *** EURASIP Speech Communication Best Paper Award ***

Y. Li, W.-Y. Chan, & S. Blostein, "Network Coding with Unequal Size Overlapping Generations," Proc. Netcod 2012, 6 pages, June 2012.

A. Abou Saleh, F. Alajaji, & W.-Y. Chan, "Power-Constrained Bandwidth-Reduction Source-Channel Mappings for Fading Channels," Proc. 26th Biennial Symposium on Communications, 6 pages, May 2012.

A. Bakhshali, W.-Y. Chan, Y. Cao, & S. Blostein, "Outage Probability of Rateless Codes in Memoryless Erasure Channels," Proc. 26th Biennial Symposium on Communications, 4 pages, May 2012.

A. Abou Saleh, W.-Y. Chan, & F. Alajaji, "Compressed Sensing With Nonlinear Analog Mapping in a Noisy Environment," IEEE Signal Processing Letters, vol. 19, no. 1, pp. 39-42, Jan. 2012.

2011

S. Moeller, W.-Y. Chan, N. Cote, T. Falk, A. Raake, & M. Waltermann, "Speech Quality Estimation - Models and Trends," IEEE Sig. Proc. Mag., vol. 28, no. 6, pp. 18-28, Nov. 2011.

Y. Cao, S.D. Blostein, & W.-Y. Chan, "Optimization of rateless-coded asynchronous multimedia multicast," Proc. IEEE 22nd IEEE Personal Indoor Mobile Radio Communications, 6 pages, Sep. 2011.

A. Abou Saleh, W.-Y. Chan, & F. Alajaji, "Compressed Sensing with Shannon-Kotel'nikov Mapping in the Presence of Noise," Proc. European Sig. Proc. Conf. (Eusipco), 5 pages, Sep. 2011.

R. Hummel, W.-Y. Chan, & T. Falk, "Spectral Features for Automatic Blind Intelligibility Estimation of Spastic Dysarthric Speech," Proc. Interspeech, 4 pages, Aug. 2011.

C. Zheng, T. Falk, & W.-Y. Chan, "An Assessment of the Improvement Potential of Time-Frequency Masking for Speech Dereverberation," Proc. Interspeech, 4 pages, Aug. 2011.

S. Wu, T. Falk, & W.-Y. Chan, "Automatic Speech Emotion Recognition Using Modulation Spectral Features," Speech Communication, vol. 53, no. 5, pp. 768-785, May-June 2011.

A. Abou Saleh, F. Alajaji, & W.-Y. Chan, "Hybrid Digital-Analog Source-Channel Coding with One-to-Three Bandwidth Expansion," Proc. Cdn. Wkshp. Information Theory (CWIT), 4 pages, May 2011.

T. Falk, R. Hummel, & W.-Y. Chan, "Quantifying Perturbations in Temporal Dynamics for Automated Assessment of Spastic Dysarthric Speech Intelligibility," Proc. IEEE Intl. Conf. Acoustics, Speech, & Sig. Proc., 4 pages, May 2011.

M. H. Radfar, R. M. Dansereau, W.-Y. Chan, & W. Wong, "MPTRACKER: A New Multi-Pitch Detection and Separation Algorithm for Mixed Speech Signals," Proc. IEEE Intl. Conf. Acoustics, Speech, & Sig. Proc., 4 pages, May 2011.

2010

T. Falk, C. Zheng, & W.-Y. Chan, "A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech," IEEE Trans. on Audio, Speech & Language Proc., vol. 18, no. 7, pp. 1766-1774, Sep. 2010.

Y. Cao, S.D. Blostein, & W.-Y. Chan, "Unequal error protection rateless coding design for multimedia multicasting," Proc. IEEE Intl. Symp. Information Theory, pp. 2438-2442, June 2010.

W. Sheng, W.-Y. Chan, S.D. Blostein, & Y. Cao, "Asynchronous and reliable multimedia multicast with heterogeneous QoS constraints," Proc. IEEE Intl. Conf. Communications, pp. 23-27, May 2010.

W. Sheng, W.-Y. Chan, & S.D. Blostein, "Rateless code based multimedia multicasting with outage probability constraints," Proc. 25th Biennial Symp. Communications, pp. 134-138, May 2010.

T. Falk & W.-Y. Chan, "Temporal Dynamics for Blind Measurement of Room Acoustical Parameters," IEEE Trans. on Instrumentation & Measurement, vol. 59, no. 4, pp. 978-989, Apr. 2010.

M.H. Radfar, W. Wong, R.M. Dansereau, & W.-Y. Chan, "Scaled factorial hidden Markov models: A new technique for compensating gain differences in model-based single channel speech separation," Proc. IEEE Intl. Conf. Acoustics, Speech, & Sig. Proc., pp. 1918-1921, Mar. 2010.

Y. Cao, S.D. Blostein, & W.-Y. Chan, "Optimization of rateless coding for multimedia multicasting," Proc. IEEE Intl. Symp. Broadband Multimedia Systems & Broadcasting, 6 pages, Mar. 2010.

Y. Zhou & W.-Y. Chan, "Multiple Description Quantizer Design for Space-Time Orthogonal Blocked Coded Channels," IEEE Trans. on Communications, vol. 58, no. 1, pp. 136-145, Jan. 2010.

T. Falk & W.-Y. Chan, "Modulation Spectral Features for Robust Far-Field Speaker Identification," IEEE Trans. on Audio, Speech & Language Proc., vol. 18, no. 1, pp. 90-100, Jan. 2010.

M.H. Radfar, R.M. Dansereau, & W.-Y. Chan, "Monaural Speech Separation based on Gain Adapted Minimum Mean Square Error Estimation," Journal of Signal Processing Systems, vol. 61, no. 1, pp. 21-37, 2010.

T. H. Falk, W.-Y. Chan, E. Sejdic, and T. Chau, "Spectro-Temporal Analysis of Auscultatory Sounds", in D. Campolo (editor), New Developments in Biomedical Engineering, In-Tech Publishing, Chapter 5, pp. 93-104, Jan. 2010.

J. Li, J.D. Johnston, and W.-Y. Chan, "Perceptual Scalable Audio Compression." US Patent 7,835,904, Nov 16, 2010.

2009

T.H. Falk and W.-Y. Chan,Performance Study of Objective Speech Quality Measurement for Modern Wireless-VoIP Communications, EURASIP Journal on Audio, Speech, and Music Proc., vol. 2009, Article ID 104382, 11 pages, 2009. doi:10.1155/2009/104382

Abstract:

Wireless-VoIP communications introduce perceptual degradations that are not present with traditional VoIP communications. This paper investigates the effects of such degradations on the performance of three state-of-the-art standard objective quality measurement algorithms - PESQ, P.563, and an "extended" E-model. The comparative study suggests that measurement performance is significantly affected by acoustic background noise type and level, as well as speech codec and packet loss concealment strategy. On our data, PESQ attains superior overall performance and P.563 and E-model attain comparable performance figures.

S. Warrington, W.-Y. Chan, and S. Sudharsanan, Scalable High-Throughput Variable Block Size Motion Estimation Architecture, Microprocessors and Microsystems, Vol. 33, No. 4, pp. 319-325, June 2009.

Abstract:

Variable block size (VBS) motion compensated prediction (MCP) provides substantial rate-distortion performance gain over conventional fixed-block-size MCP and is a key feature of the new H.264/AVC video coding. VBS-MCP requires the encoder to perform VBS motion estimation (VBSME), a computationally complex operation. In this paper, we propose a high motion vector throughput full-search VBSME architecture. High performance is achieved by performing parallel computations for multiple pixels within a macroblock, as well as computing several candidate motion vector (MV) positions in parallel. Two implementations of the architecture are examined, a four pixelparallel implementation, and a higher performance 16 pixelparallel implementation. A high degree of scalability is achieved by allowing for a variable length processing element array, where more processing elements yields a higher degree of candidate MV parallelism. The proposed architecture achieves throuhgputs exceeding current full-search VBSME architectures.

A. Huang, T.H. Falk, W.-Y. Chan, V. Parsa, and P. Doyle,Reference-Free Automatic Quality Assessment of Tracheoesophageal Speech, Proc. Intl. Conf. IEEE Engr. in Medicine & Biology Soc.

Abstract:

Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in conjunction with a recently introduced speech modulation spectrum measure. Tests performed on two TE speech databases demonstrate that the modulation spectral measure and a subset of features in the standard ITU-T P.563 algorithm estimate TE speech quality with better correlation (up to 0.9) than previously proposed features.

M.H. Radfar, W.-Y. Chan, R.M. Dansereau, and W. Wong,Performance Comparison of HMM and VQ Based Single Channel Speech Separation, Proc. Interspeech.

Abstract:

In this paper, single channel speech separation (SCSS) techniques based on hidden Markov models (HMM) and vector quantization (VQ) are described and compared in terms of (a) signal-to-noise ratio (SNR) between separated and original speech signals, (b) preference of listeners, and (c) computational complexity. The SNR results show that the HMM-based technique marginally outperforms the VQ-based technique by 0.85 dB in experiments conducted on mixtures of female-female, male-male, and male-female speakers. Subjective tests show that listeners prefer HMM over VQ for 86.70 % of test speech files. This improvement, however, is at the expense of a drastic increase in computational complexity when compared with the VQ-based technique.

M.H. Radfar, W. Wong, W.-Y. Chan, and R.M. Dansereau,Gain Estimation in Model-Based Single Channel Speech Separation, Proc. IEEE Intl. Workshop on Machine Learning for Signal Proc.

Abstract:

In most current model-based single channel separation techniques, it is assumed that the recording conditions are identical in the training phase and application phase. In this paper, we consider a general case in which training data and application data have different levels of energy and a technique is proposed to estimate the sources' gains which are required for the separation process. We use the periodogram of the speech signal as the selected feature for separation such that the sources' gains are estimated in terms of normalized periodograms of the sources and the mixture. The proposed technique is compared with a state-of-the-art technique which uses AR modeling of the speech signal and maximum likelihood for estimating gain and separating the sources. Experimental results show that our technique not only outperforms this technique in terms of SNR results and gain estimation accuracy but also reduces computational complexity.

S. Wu, T.H. Falk, and W.-Y. Chan,Automatic Recognition of Speech Emotion Using Long-Term Spectro-Temporal Features, Proc. 16th Intl. Conf. on Digital Signal Proc., July 2009.

Abstract:

This paper proposes a novel feature type for the recognition of emotion from speech. The features are derived from a long-term spectro-temporal representation of speech. They are compared to short-term spectral features as well as popular prosodic features. Experimental results with the Berlin emotional speech database show that the proposed features outperform both types of compared features. An average recognition accuracy of 88.6% is achieved by using a combined proposed & prosodic feature set for classifying 7 discrete emotions. Moreover, the proposed features are evaluated on the VAM corpus to recognize continuous emotion primitives. Estimation performance comparable to human evaluations is furnished.

2008

J.-P. Thibault, W.-Y. Chan, and S. Yousefi, A Family of Concatenated Network Codes for Improved Performance with Generations, Journal of Communications and Networks, vol. 10, no. 4, pp. 384-395, Dec. 2008. (Click to download the paper.)

Abstract:

Random network coding can be viewed as a single block code applied to all source packets. To manage the concomitant high coding complexity, source packets can be partitioned into generations; block coding is then performed on each set. To reach a better performance-complexity tradeoff, we propose a novel concatenated network code which mixes generations while retaining the desirable properties of generation-based coding. Focusing on the code's erasure performance, we show that the probability of successfully decoding a generation on erasure channels can increase substantially for any erasure rate. Using both analysis (for small networks) and simulations (for larger networks), we show how the code's parameters can be tuned to extract best performance. As a result, the probability of failing to decode a generation is reduced by nearly one order of magnitude.

T. H. Falk and W.-Y. Chan, Hybrid Signal-and-Link-Parametric Speech Quality Measurement for VoIP Communications, IEEE Trans. on Audio, Speech and Language Proc., Vo. 16, No. 8, pp. 1579-1589, November 2008.

Abstract:

A hybrid signal-and-link-parametric approach to speech quality measurement for voice over Internet Protocol (VoIP) communications is described. Connection parameters are used to determine a base quality representative of the transmission link. Degradation factors, computed from perceptual features extracted from the decoded speech signal, are used to quantify distortions not captured by the connection parameters. The algorithm is tested on speech degraded by acoustic noise, temporal clippings, and noise suppression artifacts, thus simulating degradations present in wireless-VoIP tandem connections. Hybrid measurement is shown to overcome the limitations of pure link parametric and pure signal based measurement methods, resulting in better measurement accuracy for modern VoIP communications. In addition, the proposed algorithm incurs modest computational overhead relative to pure link parametric measurement and attains up to 88% reduction in processing time relative to the ITU-T standard P.563 signal-based algorithm.

T. H. Falk and W.-Y. Chan, Spectro-Temporal Features for Robust Far-Field Speaker Identification, in Proc. Interspeech, Sep. 2008.

Abstract:

Features derived from an auditory spectro-temporal representation of speech are proposed for robust far-field speaker identification. The auditory representation is obtained by first filtering the speech signal with a gammatone filterbank. A modulation filterbank is then applied to the temporal envelope of each gammatone filter output. Compared to commonly used mel-frequency cepstral coefficients (MFCC), the proposed features are shown to be more robust to mismatched conditions between enrollment and test data and are less sensitive to increasing reverberation time (RT). Experiments with simulated and recorded far-field speech show that a Gaussian mixture model based identification system, trained on the proposed features, attains an average improvement in identification accuracy of 15% relative to a system trained on MFCC. Improvements of up to 85% are attained for larger RT.

S. Wu, T. H. Falk and W.-Y. Chan, Long-Term Spectro-Temporal Information for Improved Automatic Speech Emotion Classification, in Proc. Interspeech, Sep. 2008.

Abstract:

This paper investigates the contribution of features which convey long-term spectro-temporal (ST) information for the purpose of automatic emotional speech classification. The ST representation is obtained by means of a modulation filterbank decomposition of long-term temporal envelopes of the outputs of a gammatone filterbank. The two-dimensional discrete cosine transform is used to reduce the dimensionality of the representation; candidate features are then derived from statistics computed from the DCT coefficients. Sequential forward feature selection is used to select the most salient features. Two types of experiments are described which use the Berlin emotional speech database to test the performance of the ST features alone and in combination with prosodic features. In a multi-class experiment, simulation results with a support vector classifier show that a 44% reduction in classification error is attained once prosodic features are combined with the proposed ST features. Additionally, in a one-against-all experiment, an average increase in F-score of 33% is attained when the proposed ST features are included.

T. H. Falk and W.-Y. Chan, A Non-Intrusive Quality Measure of Dereverberated Speech , in Proc. Intl. Workshop for Acoustic Echo & Noise Control, Sep. 2008.

*** Eberhard Haensler Best Student Paper Award ***

Abstract:

A modulation spectral signal representation is investigated for non-intrusive quality measurement of reverberant and dereverberated speech. The representation is obtained by means of an auditory-inspired filterbank analysis of temporal envelopes of the speech signal. Modulation spectral cues are used to develop an adaptive measure which is shown to correlate well with subjective ratings of overall quality, colouration, and reverberation tail effects. The performance of the proposed measure is compared to that of four state-of-art quality measurement algorithms. Experiments show that substantial improvement is attained, in particular for reverberant speech enhanced by a delay-and-sum beamformer.

T. H. Falk and W.-Y. Chan, Modulation Filtering for Heart and Lung Sound Separation from Breath Sound Recordings, in Proc. Intl. Conf. IEEE Engr. in Medicine & Biology Soc., Aug. 2008.

Abstract:

Separation of heart and lung sounds from breath sound recordings is a challenging task due to the temporal and spectral overlap of the two signals. In this paper, the use of a spectro-temporal representation to improve signal separation is investigated. The representation is obtained by means of a frequency decomposition (termed modulation frequency) of temporal trajectories of short-term spectral components. Experiments described herein suggest that improved separability of heart (HS) and lung sounds (LS) is attained in the modulation frequency domain. Bandpass and bandstop modulation filters are designed to separate HS and LS signals from breath sound recordings, respectively. Visual and auditory inspection,quantitative analysis, as well as algorithm execution time are used to assess algorithm performance. Log-spectral distances below 1 dB suggest that the separated lung and heart sound signals do not contain audible artifacts.

J.-P. Thibault, W.-Y. Chan, and S. Yousefi, Efficient Mixed-Generation Concantenated Network Coding, Proc. 24th Biennial Symp. on Communications, June 2008.

Abstract:

Random network coding can be viewed as a single block code applied to all source packets. To manage the concomitant high coding complexity, source packets can be partitioned into generations; block coding is then performed on each set. To reach a better performance-complexity tradeoff, we propose a novel concatenated network code which mixes generations while retaining the desirable properties of generation based coding. On erasure channels, the resulting probability of successfully decoding a generation can increase substantially; this holds for any erasure rate. We show how the code's parameters can be tuned to extract best performance.

2007

J. Shen and W.-Y. Chan, "Method and Apparatus for Encoding a Video Signal." US Patent 7,295,614, Nov 13, 2007.
J.-P. Thibault, W.-Y. Chan, and S. Yousefi, Recursive and Non-Recursive Network Coding: Performance and Complexity , Proc. IEEE Intl. Conf. on Signal Processing & Communication, Nov. 2007.

Abstract:

While network coding promises to increase throughput, network nodes incur increased complexity as they are relied on to perform packet mixing. Previous works have proposed to manage network-level complexity by reducing the number of network coding transport nodes. Here, we study the tradeoff between transport-node complexity and achievable throughput rates. We compare two encoding schemes: recursive and non-recursive. We show that due to the peculiarities of network coding, non-recursive coding achieves considerably higher rates, for comparable computational and storage requirements. We also show that by replacing multiplication with shifting, complexity is further reduced, with negligible impact on performance.

T. H. Falk, Y. Guo, and W.-Y. Chan, Improving Robustness of Image Quality Measurement with Degradation Classification and Machine Learning , Proc. 41st Asilomar Conf. on Signals, Systems & Computers, Nov. 2007.

Abstract:

Image quality metrics can be classified as generic or degradation specific. Degradation specific measures perform poorly under "mismatched" conditions. Generic measures, on the other hand, may compromise quality measurement accuracy while gaining robustness to variation in distortion conditions. To improve the accuracy-robustness tradeoff, we employ support vector degradation classification and machine learning tools to judiciously combine generic and degradation specific measures. To test our algorithm, composite quality metrics are optimized for five different distortion classes. Experiment results show that the proposed algorithm achieves improved performance and robustness relative to two benchmark generic quality metrics.

T. H. Falk, H. Yuan, and W.-Y. Chan, Single-Ended Quality Measurement of Noise Suppressed Speech Based on Kullback-Leibler Distances, Journal of Multimedia, Vol 2, No 5 (2007), 19-26, Sep 2007. doi:10.4304/jmm.2.5.19-26

Abstract:

In this paper, a single-ended quality measurement algorithm for noise suppressed speech is described. The proposed algorithm computes fast approximations of Kullback-Leibler distances between Gaussian mixture (GM) reference models of clean, noise corrupted, and noise suppressed speech and a GM model trained online on the test speech signal. The distances, together with a spectral flatness measure, are mapped to an estimated quality score via a support vector regressor. Experimental results show that substantial improvement in performance and complexity can be attained, relative to the current state-of-art single-ended ITU-T P.563 algorithm. Due to its modular architecture, the proposed algorithm can be easily configured to also perform signal distortion and background intrusiveness measurement, a functionality not available with current standard algorithms.

T. H. Falk, H. Yuan, and W.-Y. Chan, Spectro-Temporal Processing for Blind Estimation of Reverberation Time and Single-Ended Quality Measurement of Reverberant Speech, in Proc. Interspeech, Aug. 2007.

Abstract:

Auditory spectro-temporal representations of reverberant speech are investigated for blind estimation of reverberation time (RT) and for single-ended measurement of speech quality. The auditory representations are obtained from an eight-filter filterbank which is used to extract the modulation spectra from temporal envelopes of the speech signal. Gaussian mixture models (GMM), one for each modulation channel and trained on clean speech signals, serve as reference models of normative speech behavior. Consistency measures, computed between reverberant test signals and each GMM, are mapped to an estimated RT and to an estimated quality score. Experiments show that the proposed measures achieve superior performance relative to current "state-of-art" algorithms.

H. Yuan, T. H. Falk, and W.-Y. Chan, Degradation-Classification Assisted Single-Ended Quality Measurement of Speech, in Proc. Interspeech, Aug. 2007.

Abstract:

We propose an algorithm to classify speech degradations at network endpoints and to estimate the speech quality based on the degradation classification decision. Perceptual features from degraded speech signals are used to form statistical reference models of different degradation classes. Consistency measures, calculated between degraded speech signals and the reference models, are used to train a degradation classifier and mean opinion score (MOS) mappings. The quality of a received speech signal is estimated based on its degradation class and the MOS mapping associated with the class. Experimental results show that the proposed algorithm achieves high classification accuracy, and degradation classification improves the accuracy of the quality estimate.

T. H. Falk, S. Stadler, W. B. Kleijn, and W.-Y. Chan, Noise Suppression Based on Extending a Speech-Dominated Modulation Band, in Proc. Interspeech, Aug. 2007.

Abstract:

Previous work on bandpass modulation filtering for noise suppression has resulted in unwanted perceptual artifacts and decreased speech clarity. Artifacts are introduced mainly due to half-wave rectification, which is employed to correct for negative power spectral values resultant from the filtering process. In this paper, modulation frequency estimation (i.e., bandwidth extension) is used to improve perceptual quality. Experiments demonstrate that speech-component lowpass modulation content can be reliably estimated from bandpass modulation content of speech-plus-noise components. Subjective listening tests corroborate that improved quality is attained when the removed speech lowpass modulation content is compensated for by the estimate.

C. Hsu, N. Wang, W.-Y. Chan and P.K. Jain, Improving A Power Line Communications Standard with LDPC Codes, EURASIP Journal on Advances in Signal Processing, vol. 2007, Article ID 60839, 9 pages, 2007. doi:10.1155/2007/60839

Abstract:

We investigate a power line communications (PLC) scheme that could be used to enhance the HomePlug 1.0 standard, specifically its ROBO mode which provides modest throughput for the worst case PLC channel. The scheme is based on using a low-density parity-check (LDPC) code, in lieu of the concatenated Reed-Solomon and convolutional codes in ROBO mode. The PLC channel is modeled with multipath fading and Middleton's Class A noise. Clipping is introduced to mitigate the effect of impulsive noise. A simple and effective method is devised to estimate the variance of the clipped noise for LDPC decoding. Simulation results show that the proposed scheme outperforms the HomePlug 1.0 ROBO mode and has lower computational complexity. The proposed scheme also dispenses with the repetition of information bits in ROBO mode to gain time diversity, resulting in 4-fold increase in physical layer throughput.

Y. Zhou and W.-Y. Chan, Multiple Description Quantizer Design for Space-time Orthogonal Block Coded Channels, in Proc. Intl. Symp. on Information Theory, June 2007.

Abstract:

We study the design of multiple description quantizers for space-time orthogonal block coded slow Rayleigh fading channels. A time-interleaver is employed at the transmitter to provide independent channel instances for the multiple descriptions, and a maximum a posteriori (MAP) decoder is employed at the receiver to jointly decode the multiple descriptions. We propose a scheme to optimize multiple description vector quantizers using an upper bound of the channel transition probability achieved by the MAP decoder. The scheme furnishes substantial performance gain.

J.-P. Thibault, S. Yousefi, and W.-Y. Chan, Throughput Performance of Generation-Based Network Coding, in Proc. Canadian Workshop on Information Theory, June 2007.

Abstract:

Using generations to implement random linear network coding garners benefits such as reduced decoding complexity. However, these benefits can come at the expense of throughput. In this paper, we seek to understand and maximize throughput for generation-based network coding (GBNC). Motivated by the application of network coding to scalable multicast, we consider schemes which result in high probability of decoding success with minimal feedback. We show that the throughput performance of GBNC is highly dependent on the choice of coding parameters and that GBNC becomes advantageous only when the number of source packet exceeds a network-dependent threshold. Results for various network topologies lead to the formulation of throughput-motivated guidelines for the adoption of GBNC.

S. Warrington, S. Sudharsanan, and W.-Y. Chan, Architecture for Multiple Reference Frame Variable Block Size Motion Estimation, in Proc. Intl. Symp. on Circuits & Systems, May 2007.

Abstract:

This paper proposes a high throughput variable block size motion estimation (VBSME) architecture supporting multiple reference frames (MRF). To enable best rate-distortion performance for different video contents, the architecture allows selection between high spatial resolution motion search over a single reference frame, or MRF search at a lower spatial resolution. Through synthesis of an ASIC implementation, the architecture is shown to be suitable for high definition video resolutions and frame rates. The architecture also provides a higher overall macroblock throughput than other VBSME architectures in the literature.

H. Yuan, T. H. Falk, and W.-Y. Chan, Classification of Speech Degradations at Network Endpoints Using Psychoacoustic Features, in Proc. Canadian Conf. on Electrical and Computer Engineering, April 2007.

Abstract:

We propose a method of classifying speech degradations at network endpoints. Perceptual features are extracted from degraded speech signals and used to form statistical reference models of behaviors of different degradation types. Consistency values between degraded speech signals and the reference models are calculated and used to train a degradation classifier. The consistency values of a received degraded speech signal then serve as predictors in the trained classifier for a degradation type decision. The proposed method is tested on four commonly encountered degradation types with unseen data and the experimental results show that the method achieves high classification accuracy. The proposed method can be used to enhance applications such as speech enhancement, recognition, and quality estimation.

T. H. Falk, H. Yuan, and W.-Y. Chan, A Hybrid Signal-and-Link-Parametric Approach to Single-Ended Quality Measurement of Packetized Speech, in Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing, April 2007.

Abstract:

A hybrid signal-and-link-parametric approach to single-ended quality measurement of packetized speech is proposed. Transmission link parameters are used to determine a base quality for the test signal. The base quality is adjusted by degradation factors calculated from perceptual features extracted from the test signal. The degradation factors are based on Kullback-Leibler distances between a parametric model trained online for the extracted features and reference models of normative speech behavior. The proposed method overcomes the limitations of pure link parametric and pure signal-based methods.

2006

T. H. Falk and W.-Y. Chan, Single-Ended Speech Quality Measurement Using Machine Learning Methods, IEEE Trans. on Audio, Speech and Language Proc., Special Issue on Objective Quality Assessment of Speech and Audio, Vol. 14, No. 6, pp. 1935-1947, Nov. 2006.

Abstract:

We describe a novel single-ended algorithm constructed from models of speech signals, including clean and degraded speech, and speech corrupted by multiplicative noise and temporal discontinuities. Machine learning methods are used to design the models, including Gaussian mixture models, support vector machines, and random forest classifiers. Estimates of the subjective mean opinion score (MOS) generated by the models are combined using hard or soft decisions generated by a classifier which has learned to match the input signal with the models. Test results show the algorithm outperforming ITU-T P.563, the current "state-of-art" standard single-ended algorithm. Employed in a distributed double-ended measurement configuration, the proposed algorithm is found to be more effective than P.563 in assessing the quality of noise reduction systems, and can provide a functionality not available with P.862 PESQ, the current double-ended standard algorithm.

C. Hsu, N. Wang, W.-Y. Chan and P.K. Jain, Improving HomePlug Power Line Communications with LDPC Coded OFDM, in Proc. IEEE Intl. Telecommunications Energy Conf., Sep. 2006.

Abstract:

Power line communications (PLC) has received much attention due to the wide connectivity andavailability of power lines. Effective PLC must overcome the harsh and noisy environments inherent in PLC channels. HomePlug 1.0 is the current PLC standard in North America. The physical layer of HomePlug 1.0 employs orthogonal frequency division multiplexing (OFDM) as well as concatenated Reed-Solomon and convolutional coding. Aiming to obtain higher PLC throughput, we investigate the performance of OFDM with low-density parity-check (LDPC) codes and compare the proposed scheme with HomePlug 1.0 ROBO mode. In our simulations, the PLC channel is modeled by multipath fading, with Middleton's Class A noise (AWCN) model simulating the worst-case impulsive noise. We apply clipping to lessen the impact of impulsive noise. A simple but effective method is devised to estimate the variance of the clipped noise for LDPC decoding. In comparison with ROBO mode, the proposed scheme offers improved performance and lower computational complexity per decoded bit. Our scheme provides increased throughput by dispensing with ROBO mode's repetitive transmission of information to gain time diversity.

Y. Zhou and W.-Y. Chan, E-Model Based Comparison of Multiple Description Coding and Layered Coding in Packet Networks, European Transactions on Telecommunications Vol. 18, No. 7, pp. 661-668, Nov. 2007. (Click to download the paper)

Abstract:

We examine the performance of multiple description coding (MDC) with and without the use of automatic repeat request (ARQ) protocols for packet network communication, in comparison with layered coding (LC). The rate distortion lower bound of MDC and LC are incorporated into an E-model based performance measure, which accounts for the additional costs of excess rates and delay incurred from using ARQ. The results show that the relative merits of the schemes depend on the values of the channel loss rates and round-trip-time (RTT). LC is superior for small RTT and unaided MDC is superior for large RTT. For moderate RTT, LC is preferred for small channel loss rates and MDC aided by ARQ is preferred for large channel loss rates.

Y. Zeng, Z. Wu, T. H. Falk, and W.-Y. Chan, Robust GMM Based Gender Classification Using Pitch and RASTA-PLP Parameters of Speech, in Proc. Intl. Conf. Machine Learning & Cybernetics, Aug. 2006.

Abstract:

A novel gender classification system is proposed based on Gaussian mixture models of speech features. Pitch and tenth order relative spectral perceptual linear prediction coefficients are used to model the characteristics of male and female speech. The proposed gender classification system is evaluated under the conditions of clean speech, noisy speech, and multiple languages. Simulation results show that the proposed gender classifier is robust to noise and independent across the test languages. The classification accuracy is as high as above 98% for clean speech and 95% for noisy speech.

Y. Zhou and W.-Y. Chan, Low-complexity multiple description vector quantization with constrained central codebook, in Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, May 2006.

Abstract:

Conventional multiple description vector quantizers (MDVQ) have high complexity, which limits their practical application. Two central-codebook-constrained MDVQ (CMDVQ) schemes are proposed to reduce the storage and search complexity. Simulation results show that for low channel loss rates, a tradeoff exists between choosing CMDVQ for its low complexity and the conventional MDVQ for its higher signal to-noise ratio (SNR) performance. For medium to high channel loss rates, CMDVQ is preferred for its low complexity and comparable SNR performance to the conventional MDVQ.

S. Warrington, W.-Y. Chan, and S. Sudharsanan, Scalable High-Throughput Architecture for H.264/AVC Variable Block Size Motion Estimation, in Proc. Intl. Symp. on Circuits & Systems, May 2006.

Abstract:

Variable block size motion estimation (VBSME) is a key part of the new H.264/AVC video coding standard. This has increased the demand for high performance VBSME architectures. This paper proposes a VLSI architecture for high throughput VBSME. The VBS calculation is done by combining the results of sub-block calculations to form the results for larger blocks. High motion vector throughput is achieved in two proposed implementations: one performing operations on a 1x4 set of pixels per cycle, and the second performing operations on a 1x16 set of pixels per cycle. Using these approaches, the architecture is able to produce motion vector results at a higher throughput than current VBSME designs, while providing a high level of scalability through adjusting the length of the processing element array.

S. Warrington, H. Shojania, S. Sudharsanan, and W.-Y. Chan, Performance Improvement of the H.264/AVC Deblocking Filter Using SIMD Instructions, in Proc. Intl. Symp. on Circuits & Systems, May 2006.

Abstract:

The H.264/AVC standard defines an in-loop deblocking filter which is used in both the encoder and decoder. This work examines several methods for improving the performance of the H.264/AVC reference software implementation of the deblocking filter. Methods examined include general software optimization, parallelization through standard multimedia SIMD instructions, and augmenting standard SIMD instruction sets with new instructions. Using the above methods, we are are able achieve a large speedup of the deblocking filter computation.

T. H.Falk and W.-Y. Chan, Enhanced Non-Intrusive Speech Quality Measurement Using Degradation Models, in Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, May 2006.

Abstract:

The speech quality estimation scheme is improved with the addition of a reference model of the behavior of speech degraded by different transmission and/or coding schemes. Moreover, via maximization of a mutual information measure, we validate the use of segmental SNR as a measure of the amount of multiplicative noise present in the test signal. These two additions result in an algorithm that is more accurate and more robust to certain distortion conditions. When tested on unseen data, the proposed algorithm outperforms the current "state-of-art" P.563 algorithm while requiring considerably lower computational complexity.

T. H.Falk, H. Shatkay, and W.-Y. Chan, Breast Cancer Prognosis via Gaussian Mixture Modeling, in Proc. Canadian Conf. on Electrical and Computer Engineering, May 2006.

Abstract:

This paper compares the performance of classification and regression trees (CART), multivariate adaptive regression splines (MARS), and a Gaussian mixture regressor (GMR) method in predicting breast cancer recurrence time in patients that have undergone cancer excision It is shown that the GMR-based algorithm demonstrates an improved performance compared to CART and MARS. Moreover, GMR performance is comparable to that of a baseline predictor with the advantage of performing automatic feature selection and model optimization.

Y. Zhou and W.-Y. Chan, Performance of Joint Source Coding and Space-time Coding over MIMO Channels, in Proc. 23rd Biennial Symposium on Communications, June 2006.

Abstract:

We examine the performance of combining single or multiple description coding with space-time coding over multiple input multiple output (MIMO) channels. The rate-diversity tradeoff achieved by a space-time code is optimized by minimizing end-to-end distortion. Multiple description coding is found to be superior to single description coding over MIMO channels with the use of short channel codes.

T. H. Falk and W.-Y. Chan, Non-Intrusive Speech Quality Estimation Using Gaussian Mixture Models, IEEE Signal Processing Letters, Vol. 13, Issue 2, pp. 108 - 111, Feb. 2006.

Abstract:

An algorithm for non-intrusive speech quality estimation based on Gaussian mixture models (GMMs) is presented. GMMs are used to form an artificial reference model of the behavior of features of undegraded speech. Consistency measures between the degraded speech signal and the reference model serve as indicators of speech quality. Consistency values are mapped to an objective speech quality score using a multivariate adaptive regression splines function. When tested on unseen data, the proposed algorithm generally outperforms ITU-T standard P.563, the current "state-of-art" algorithm. The algorithm computes objective quality scores roughly twice as fast as P.563.

2005

W. Zha and W.-Y. Chan, Objective Speech Quality Measurement Using Statistical Data Mining, EURASIP Journal on Applied Signal Processing, Vol. 9, pp. 1410-1424, 2005.

Abstract:

Measuring speech quality by machines overcomes two major drawbacks of subjective listening tests, their low speed and high cost. Real-time, accurate, and economical objective measurement of speech quality opens up a wide range of applications that can not be supported with subjective listening tests. In this paper, we propose a statistical data mining approach to design objective speech quality measurement algorithms. A large pool of perceptual distortion features is extracted from the speech signal. We examine using classification and regression trees (CART) and multivariate adaptive regression splines (MARS), separately and jointly, to select the most salient features from the pool, and to construct good estimators of subjective listening quality based on the selected features. We show designs that outperform the state-of-art objective measurement algorithm. The designed algorithms are computationally simple, making them suitable for real-time implementation. The proposed design method is scalable with the amount of learning data; thus, performance can be improved with more offline or online training.

Y. Zhou and W.-Y. Chan, Performance Comparison of Layered Coding and Multiple Description Coding in Packet Networks, in Proc. IEEE Global Communications Conf., Dec. 2005.

Abstract:

We examine the performance of multiple description coding (MDC) with and without the use of automatic retransmission request (ARQ) protocols for packet network communication. The rate-distortion lower bound of MDC and layered coding (LC) are incorporated into a performance measure that accounts for the additional costs of excess rates and delay incurred from using ARQ. Results show that unaided MDC is the best for large packet loss rates and large delay, and LC is the best for small loss and moderate delay. In between these two extremes, MDC aided by ARQ provides the best performance.

Y. Zhou and W.-Y. Chan, Multiple Description Conjugate Vector Quantizer with Side Distortion Compensation, in Proc. 39th Asilomar Conf. on Signals, Systems and Computers, Nov. 2005.

Abstract:

Conjugate vector quantizer (CVQ), a joint source channel coding scheme robust to channel bit errors, is used in various popular speech coders such as ITU-T G.729 and ISO/IEC MPEG4 audio. We propose two multiple description CVQ (MD-CVQ) schemes for combating channel erasure errors. MD-CVQ offers an advantage of moderate computational complexity and storage over conventional MD vector quantizers (MDVQs). Experiments are performed for both i.i.d. Gaussian source and speech/audio signals. Results show that for low channel loss rates, a tradeoff exists between choosing MD-CVQ for its low complexity and MDVQ for its higher signal-to-noise ratio (SNR) performance. For medium to high channel loss rates, MD-CVQ is preferred for its low complexity and comparable SNR performance to MDVQ.

Y. Zhou and W.-Y. Chan, Rate-Distortion Performance of Layered Coding and Multiple Description Coding in Packet Networks, in Proc. Canadian Workshop on Information Theory, June 2005.

Abstract:

We compare the rate-distortion performance of layered coding (LC) and multiple description coding (MDC) for packet network communication. Re-transmission is not needed for MDC, while the use of automatic retransmission request (ARQ) protocols for the base layer of LC incurs excess rates and delay. We compare the performance of LC and MDC, using rate-distortion lower bounds, and accounting for the additional costs of excess rates and delay. Results show that LC outperforms MDC for low to medium packet loss rates.

T. H. Falk and W.-Y. Chan, An Improved GMM-Based Speech Quality Predictor, in Proc. 9th European Conf. on Speech Communication and Technology, Sept. 2005.

Abstract:

A voice quality prediction method based on Gaussian mixture models (GMMs) is improved by constructing a feature selection algorithm to provide the best GMM-based prediction quality. The proposed sequential selection algorithm performs N-survivor search, allowing for trading between design complexity and performance. Simulation shows that predictors designed using the proposed algorithm outperform two benchmark selection algorithms. Performance improvements over the ITU-T P.862 PESQ standard are also attained.

T. H. Falk and W.-Y. Chan, A Sequential Forward Selection Algorithm for GMM-Based Speech Quality Estimation, in Proc. 13th European Signal Processing Conf., Sept. 2005.

Abstract:

We propose a sequential feature selection algorithm for designing Gaussian mixture model (GMM) based estimators. Feature selection is performed progressively to minimize estimation errors. The algorithm is applied to design estimators of subjective speech quality. Simulation shows that estimators designed using the proposed algorithm outperform two benchmark algorithms by as much as 39% in correlation and 24% in root-mean-squared error. Furthermore, features selected by the proposed algorithm are suitable for diagonal GMM estimators, which incur lower computational complexity.

T. H. Falk, Q. Xu, and W.-Y. Chan, Non-intrusive GMM-based Speech Quality Measurement, in Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, March 2005.

*** Best Student Paper - Speech Processing Category ***

Abstract:

We propose a non-intrusive speech quality measurement algorithm based on using Gaussian-mixture probability models of features of undegraded speech signals as an artificial reference model of "clean" speech behaviour. The consistency between the features of the test speech signal and the reference model serves as an indicator of speech quality. Consistency measures are calculated and mapped to an objective speech quality score using a multivariate adaptive regression splines function. Simulation results show that the proposed method offers accurate and yet low-complexity measurement of speech quality.

R. Der, P. Kabal and W.-Y. Chan, Rate-Distortion Allocation for Time-Frequency Dependent Audio Coding,in Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, March 2005.

Abstract:

A stream coding framework is presented for solving the distortion-constrained time-frequency dependent quantizationp roblem that naturally arises when an overlapped time-frequencyd ecomposition is used. The main contributions of this paper are (1) an efficient rate-distortion allocation algorithm for dependent quantization when the neighborhood of dependency is large; and (2) demonstration that a perceptual Excitation Distortion measure produces better coded audio quality than the conventional Noise-to-Mask Ratio measure.

2004

T. H. Falk, and W.-Y. Chan, Feature Mining for GMM-based Speech Quality Measurement, in Proc. 38th Asilomar Conf. on Signals, Systems and Computers, Nov. 2004.

Abstract:

We propose a novel approach to objective speech quality measurement using feature mining and Gaussian mixture models (GMMs). A large pool of perceptual distortion features is extracted from the speech signal and data mining techniques are used to sift out the most relevant feature variables from the pool. We examine using multivariate adaptive regression splines (MARS), classification and regression trees (CART), a hybrid CART-MARS scheme, and the sequential forward selection (SFS) algorithm for data mining. For our speech databases, the SFS algorithm provides best performance with a five-feature, three-component GMM. A reduction of 21.7\% in root-mean-squared mean opinion score estimation error is obtained in comparison with ITU-T P.862 PESQ.

Y. Zhou, W.-Y. Chan, and T. H. Falk, Multiple-Channel Optimized Quantizers for Rayleigh Fading Channels, in Proc. 38th Asilomar Conf. on Signals, Systems and Computers, Nov. 2004.

Abstract:

We consider multiple description communication over Rayleigh fading channels with BPSK modulators at the transmitter and soft-decision demodulators at the receiver. The multiple-channel optimized quantizer design (MCOQD) method for multiple discrete memoryless channels is extended to multiple Rayleigh fading channels. The decisioon thresholds of the soft-decision demodulators are optimized to achieve minimum end-to-end distortion. Simulation results show that MCOQD provides more robust quantizers that multiple description scalar quantizers over Rayleigh fading channels, when both encoder and decoder are matched to channel statistics, and when only the decoder is matched to channel statistics.

T. H. Falk, W.-Y. Chan, and P. Kabal, Speech Quality Estimation using Gaussian Mixture Models, in Proc. 8th Intl. Conf. on Spoken Language Processing, Oct. 2004.

Abstract:

We propose a novel method to estimate the quality of coded speech signals. The joint probability distribution of the subjective mean opinion score (MOS) and perceptual distortion feature variables is modelled using a Gaussian mixture density. The feature variables are sifted from a large pool of candidate features using statistical data mining techniques. We study what combinations of features and mixture model configuration are most effective. For our speech database, a five-feature, three-component GMM furnishes approximately 18% lower root-mean-squared MOS estimation error than ITU-T P.862 PESQ, the current best standard algorithm.

Y. Zhou and W.-Y. Chan, Multiple Description Quantizer Design Using a Channel Optimized Quantizer Approach, in Proc. 38th Annual Conf. on Information Sciences and Systems, 6 pages, March 2004. (Click to download the paper.)

Abstract:

This paper extends the channel optimized quantization scheme of Farvardin and Vaishampayan to two parallel channels. The extended multiple-channel optimized quantizer design (MCOQD) framework is applied to discrete memoryless channels with erasures. The resultant MCOQD subsumes the multiple description scalar quantizer (MDSQ) design of Vaishampayan. While MDSQ is suited to only ``on-off'' channels, MCOQD accounts for both erasure and symbol errors. Performance results based on simulation show that MCOQD provides more robust quantizers than MDSQ.

T. H. Falk and W.-Y. Chan, Objective Speech Quality Assessment Using Gaussian Mixture Models, in Proc. 22nd Biennial Symposium on Communications, June 2004.

Abstract:

Objective speech quality assessment algorithms provide low-cost and online monitoring of voice calls, replacing costly and time-consuming subjective listening tests. We propose a novel approach to objective speech quality measurements using Gaussian mixture models (GMMs). A large pool of perceptual distortion features is extracted from speech files and multivariate adaptive regression splines (MARS) is used to sift out the most relevant variables from the pool. The five most salient variables are used to construct good GMM estimators of subjective listening quality. Simulation results show that this novel approach outperforms the state-of-the-art objective measurement algorithm, PESQ.

Y. Zhou and W.-Y. Chan, Multiple-Channel Optimized Quantizers for Multiple-Description Communication, in Proc. 22nd Biennial Symposium on Communications, June 2004.

Abstract:

This paper extends the channel optimized quantization scheme of Farvardin and Vaishampayan to two parallel channels. The extended multiple-channel optimized quantizer design (MCOQD) framework is applied to both discrete memoryless channels (DMC) and Rayleigh fading channels. MCOQD is first investigated over DMCs. We show that the MCOQD subsumes the multiple description scalar quantizer (MDSQ) design of Vaishampayan. While MDSQ is suited to only "on-off"' channels, MCOQD accounts for {\it both} erasure and symbol errors. A complete MCOQD encoder/decoder is then applied over two Rayleigh fading channels with BPSK modulators at transmitter and soft-decision detectors at receiver. An optimal soft-decision detector is found to minimize end-to-end distortion. Performance results based on simulation show that MCOQD provides more robust quantizers than MDSQ in both DMC and Rayleigh fading channels.

W. Zha and W.-Y. Chan, A Data Mining Approach to Objective Speech Quality Measurement, in Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, May 2004.

Abstract:

Existing objective speech quality measurement algorithms still fall short of the measurement accuracy that can be obtained from subjective listening tests. We propose an approach that uses statistical data mining techniques to improve the accuracy of auditory-model based quality measurement algorithms. We present the design of a novel measurement algorithm using the multivariate adaptive regression splines (MARS) method. A large set of speech distortion features is first created. MARS is used to find a small set of features that provide the best estimate ("model") of speech quality. One appeal of the approach is that the model size can scale with the amount of speech data available for learning. In our simulations, the new algorithm furnishes significant performance improvement over PESQ.

R. Der, P. Kabal and W.-Y. Chan, Bit Allocation Algorithms for Frequency and Time Spread Perceptual Coding, in Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, May 2004.

Abstract:

We examine the problem of bit allocation when time spread and frequency spread perceptual distortion criteria are used. For such measures, standard incremental techniques can fail. Two algorithms are introduced for bit allocation; the first a multi-band version of the greedy algorithm, and the second an inverse greedy algorithm initialized by the bit allocation of a forward algorithm driven by a non-spread metric. Experimental results show the second algorithm outperforms the first.

V. Fong and W.-Y. Chan, Rate-Distortion Optimization of Spatial Filters for Motion Compensated Video Coding, in Proc. IEEE Canadian Conf. on Electrical and Computer Engineering, May 2004.

Abstract:

We propose a novel motion-compensated prediction scheme for improving the rate-distortion performance of motion compensated video coders. Our scheme uses a codebook of filters so that the prediction block is encoded by specifying jointly an integer motion vector and the index of a filter in the codebook. The two-dimensional spatial filter furnishes simultaneously the functions of motion compensation, pixel interpolation and noise reduction. With such a filtering framework, motion compensation can be performed at arbitrary precision and the codebook optimized for specific video data. Incorporated into an ITU-T H.264 coder, the proposed motion-compensated prediction scheme improves coder performance by increasing the reconstruction PSNR by 0.2 dB or reducing the bit rate by 5 %.

2003

W. Zha and W.-Y. Chan, Voice Quality Assessment using Classification Trees, in Proc. 37th Asilomar Conf. on Signals, Systems and Computer, Nov. 2003.

Abstract:

Conventional listening-test based voice quality measurement is performed "offline" and costly, and the test results vary from test to test due to a variety of factors. Signal processing based, "objective" voice quality measurement can be performed economically in real-time. Deployed online, automatic voice quality measurement provides an efficient means for monitoring voice quality, and can be integrated with network intelligence to provide end-to-end voice quality assurance. In this paper, we describe using classification trees to estimate mean opinion scores (MOS) from features extracted from the speech signal. Our experimental results have demonstrated, for a suite of MOS-labelled speech databases, consistently superior performance over ITU-T P.862 (PESQ), the state-of-the-art standard for objective voice quality estimation.

R. Der, P. Kabal and W.-Y. Chan, Towards a New Perceptual Coding Paradigm for Audio Signals, in Proc. Intl. Conf. on Acoustics, Speech and Signal Processing, 2003.

Abstract:

A new frequency domain approach to coding audio signals is introduced. The bit assignment strategy is aimed at reducing the perceived loudness difference between the original signal and the coded signal. As such it uses perceptual effects (spread excitation patterns), but does not directly invoke masking results. At low bit rates, examples coded with the new approach sound better than a more traditional bit allocation based on noise-to-mask ratio.

2002

D. Blasiak and W.-Y. Chan, Motion Filter Vector Quantization, in Proc. IEEE Intl. Conf. on Image Processing, Sept. 2002.

Abstract:

Motion-compensated prediction of video is formulated as a novel vector quantization scheme called motion filter vector quantization (MFVQ). In MFVQ, the motion vector and the pixel-intensity interpolation filter are combined into a motion filter and the entire filter is vector quantized. A codebook design algorithm is proposed for designing unit gain and entropy constrained MFVQ codebooks. The algorithm is tested under two application configurations, MFVQ with static codebook and MFVQ with forward-adaptive codebook, and is shown to furnish up to a dB of PSNR gain.

W. Jia and W.-Y. Chan, Joint Pitch and Voicing Estimation for Multiband Excitation and Sinusoidal Speech Coders, in Proc. 36th Asilomar Conf. on Signals, Systems and Computers, Nov. 2002.

Abstract:

In conventional multi-band excitation (MBE) speech encoding, pitch is estimated first from the speech signal. Using the estimated pitch, voicing decisions are made for pitch-spaced spectral bands. As the method invariably includes unvoiced components in the speech signal to estimate the pitch, the accuracy of the estimated pitch and voicing decisions are degraded. A novel pitch and voicing estimation scheme is presented, wherein the spectrum of the speech signal is segmented into voiced and unvoiced regions without knowledge of the pitch. Pitch is then estimated only from the voice regions. Experimental results show that the new scheme improves the accuracy of the estimated pitch and voicing decisions, and offers better speech quality.

D. Blasiak, Y. Zhou and W.-Y. Chan, Motion Compensation with Motion Filters, in Proc. 21st Biennial Symposium on Communications, June 2002.

Selected Earlier Publications

Jiandong Shen and Wai-Yip Chan, "A Novel Code Excited Pel-Recursive Motion Compensation Algorithm," IEEE Signal Processing Letters, pp. 100-102, April 2001.
Jiandong Shen and Wai-Yip Chan, "Code Excited Pel-Recursive Motion Compensated Video Coding," Proc. Intl. Conf. on Image Processing, CD ROM, Sep. 2000.
Rohit Prasad and Wai-Yip Chan, "Predictive Quantization of Spectral Amplitudes for Harmonic Coders," Proc. IEEE Workshop on Speech Coding, pp. 47-49, September 2000.
Wenhui Jia and Wai-Yip Chan, "Analysis-by-synthesis voicing cut-off determination in harmonic coding," Proc. IEEE Workshop on Speech Coding, pp. 65-67, September 2000.
Jiandong Shen and Wai-Yip Chan, "Fast Rate-Distortion Optimisation Algorithm For Motion Compensated Transform Coding of Video," Electronics Letters, 36(4):305-306, Feb. 17, 2000.
Wenhui Jia and Wai-Yip Chan, "An Experimental Assessment of Personal Speech Coding," Speech Communication, 30-1, pp. 1-8, January 2000.
Jiandong Shen and Wai-Yip Chan, "Vector Quantization of Affine Motion Models," Proc. Intl. Conf. on Image Processing, CD ROM, Oct. 1999.
Ukrit Visitkitjakarn, Wai-Yip Chan, and Yongyi Yang, "Recovery of Speech Spectral Parameters using Convex Set Projection," Proc. 1999 IEEE Speech Coding Workshop, pp. 34-36, June 1999.
Dariusz Blasiak, Jiandong Shen, and Wai-Yip Chan, "Generalized scalar quantizer design using dynamic programming," IEEE Signal Processing Letters, 6(5):103-105, May 1999.
Dariusz Blasiak and Wai-Yip Chan, "Efficient Wavelet Coding of Motion Compensated Prediction Residuals," Proc. Intl. Conf. on Image Processing, CD ROM, Oct. 1998.
Jiandong Shen and Wai-Yip Chan, "A Non-Parametric Method for Fast Joint Rate-Distortion Optimization of Motion Estimation & DFD Coding," Proc. Intl. Conf. on Image Processing, CD ROM, Oct. 1998.
Hao Bi, Pattabiraman Subramanian, Jiandong Shen and Wai-Yip Chan, "Motion-compensated transform coding of video using adaptive displacement fields," Journal of Electronic Imaging, Vol. 7, No. 3, pp. 527-538, July 1998.
Wenhui Jia and Wai-Yip Chan, "Personal Speech Coding," IEEE Proc. Intl. Conf. Acoustics, Speech, & Signal Processing, Vol. I, pp. 65-68, May 1998.
Hao Bi and Wai-Yip Chan, "Rate-Distortion Optimization of Hierarchical Displacement Fields," IEEE Trans. Circuits & Systems for Video Tech., Vol. 8, No. 1, pp. 18-24, Feb. 1998.
Pattabiraman Subramanian, Dariusz Blasiak, Jiandong Shen, and Wai-Yip Chan, "Encoder Optimization in an Extended H.263 Framework", Proc. 31st Asilomar Conf. on Signals, Systems and Computers. Nov. 1997.
Pattabiraman Subramanian and Wai-Yip Chan, "Reduced-Complexity Rate-Distortion Optimization of Multiresolution Motion Field and Prediction Residual", Proc. Intl. Conf. on Image Processing, Vol. II, pp. 799-803, Oct. 1997.
Pattabiraman Subramanian, Hao Bi, and Wai-Yip Chan, "Multiresolution Displacement Fields for Motion Compensated Video Coding", Proc. 30th Asilomar Conf. on Signals, Systems and Computers. Nov. 1996.
Hao Bi and Wai-Yip Chan, "Motion Compensated Transform Coding of Video using Hierarchical Displacement Field and Global Rate-Distortion Optimization", Proc. Intl. Conf. on Image Processing, , Vol. III, pp. 267-270, Sep. 1996.
Hao Bi and Wai-Yip Chan, "Rate-Constrained Hierarchical Motion Estimation Using BFOS Tree pruning", Proc. Intl. Conf. Acoustics, Speech, & Signal Processing, Vol. IV, pp. 2315-2318, May 1996.
James Loo, Wai-Yip Chan, and Peter Kabal, "Classified Nonlinear Predictive Vector Quantization of Speech Spectral parameters," in IEEE Proc. Intl. Conf. Acoustics, Speech, & Signal Processing, Vol. II, pp. 761-764, May 1996.

Additional earlier publications on vector quantization, and speech, audio, and video coding and communications......

A. Bakhshali, W.-Y. Chan, J.C. Cartledge, M. O’Sullivan, C. Laperle, A. Borowiec, & K. Roberts, “Volterra-based nonlinearity compensation structures with improved performance-complexity trade-offs,” Proc. 2015 European Conf. Optical Communication (ECOC), 3 pages, Sep. 2015. ^^^ Selected by the TPC as a "most highly-ranked paper" ^^^

T. Falk, W.-Y. Chan, & F. Shein "Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility," Speech Communication, vol. 54, no. 5, pp. 622-631, June 2012. ^^^ EURASIP Speech Communication Journal Best Paper Award ^^^

T. Falk & W.-Y. Chan, "A Non-Intrusive Quality Measure of Dereverberated Speech," Proc. Intl. Workshop for Acoustic Echo & Noise Control, 4 pages, Sep. 2008. ^^^ Eberhard Haensler Best Student Paper Award ^^^

T. Falk, Q. Xu, & W.-Y. Chan, "Non-intrusive GMM-based Speech Quality Measurement," Proc. Intl. Conf. on Acoustics, Speech & Signal Processing, 4 pages, March 2005. ^^^ Best Student Paper - Speech Processing Category ^^^

Current Members

Maede Ashofteh Barabadi: ML modeling of clinical data (co-supervised with Dr. Zhu)
Aymen Bashir: Speech
Matthew Boertjes: ML models for optical fiber communications (co-supervised with Dr. Cartledge)
Amin Edraki: Speech cognition
Jade Mezzi: Optical signal processing & ML (co-supervised with Dr. Cartledge)
Leonard Moen: ML identification of controlled substances
James Sanii: ML modeling of health data
Haolan Wang: Speech intelligibility
Wai-Yip Geoffrey Chan

Past Members

Ahmed Alghamdi
Clinton Lau (co-supervised with Dr. Zhu)
Eddie Gasca Cervantes
Aazar Kashi (co-supervised with Dr. Cartledge)
Abdalla Abdelrahman
Xiao Chu (co-supervised with Dr. Zhu)
Patrick Pan (co-supervised with Dr. Zhu)
Saeed Rezazadeh (co-supervised with Dr. Alajaji)
Peng Wang (co-supervised with Dr. Cartledge)
Patrick Fuentes Ugartemendia (co-supervised with Dr. Cartledge)
Sonya Stuhec-Leonard (co-supervised with Dr. Cartledge)
Qiaochu Yang
Ali Bakhshali
Ahmad Abou Saleh (co-supervised with Dr. Alajaji)
Yi Zang
Sai Ma
Ye Li (co-supervised with Dr. Blostein)
Shen Shen
Chenxi Zheng
Richard Hummel
Wei Sheng
Andy Huang
Hooman Alikhanian
Tiago Falk
Hossein Radfar
Siqing Wu, Qualcomm, Toronto
Qingfeng Xu, BlackBerry, Waterloo
Jean-Pierre Thibault, Elliptic Technologies, Ottawa
Yugang Zhou, Qualcomm, Toronto
Hua Yuan, Blueslice Networks, Montreal
Yingchun Guo, visiting researcher from China
Peter Chng, Nortel, Belleville, ON
Neng Wang, Nortel, Richardson, TX
Yumin Zeng, visiting researcher from China
Steve Warrington, Magnum Semiconductor., Waterloo, ON
Christine Hsu, Industry Canada, Ottawa, ON
Jong Bae Lee, visiting researcher from Korea
Vincent Fong, Nanometrics, Ottawa, ON
Wei Zha, PCTEL, Germantown, MD
Shane Bergsma
Rohit Prasad, Amazon, Cambridge, MA
Wenhui Jia, Dolby Labs, Santa Clara, CA
Darek Blasiak, The Climate Corporation, Chicago, IL
Jiandong Shen, Harmonic Inc., San Jose, CA
Raman Subramanian, Qualcomm, Santa Clara, CA
Ukrit Visitkitjakarn, Motorola, Harvard, IL
Hao Bi, Motorola, Libertyville, IL
Gurdal Oruklu, Ingenient Technologies, Rolling Meadows, IL
Wen Chen, PacketVideo, Palatine, IL
Bertrand Combaluzier, Altran Technologies, Paris, France
Julien Drouet, Motorola, Arlington Heights, IL
James Loo, Nortel Networks, Montreal, QC
David Chemla, French Atomic Agency, France
Sau-Wah Soong, STS Systems Inc., Montreal, QC

Joining Mc2L

Location

Software

Selected Earlier Publications

Current Members

Past Members

Joining Mc²L