Source Separation via Spectral Masking for Speech Recognition Systems
Abstract
Full Text:
PDFReferences
D. Kolossa, R. F. Astudillo, E. Hoffmann and R. Orglmeister, Independent Component Analysis and Time-Frequency Masking for Speech
Recognition in Multitalker Conditions, EURASIP Journal on Audio, Speech, and Music Processing, 2010.
T. T. Kristjansson and B. J. Frey, Accounting for uncertainty in observations: a new paradigm for robust automatic speech recognition,
in Proceedings of the IEEE International Conference on Acustics, Speech, and Signal Processing, 2002.
V. Stouten, H. Van Hamme and P. Wambacq, Application of minimum statistics and minima controlled recursive averaging methods to
estimate a cepstral noise model for robust ASR, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal
Processing, vol. 1, 2006.
M. Van Segbroeck and H. Van Hamme, Robust speech recognition using missing data techniques in the prospect domain and fuzzy
masks, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 4393–4396.
D. Kolossa, A. Klimas and R. Orglmeister, Separation and robust recognition of noisy, convolutive speechmixtures using time-frequency
masking and missing data techniques, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, vol. 13,
pp. 82–85.
M. Kddotuhne, R. Togneri and S. Nordholm, Time-frequency masking: linking blind source separation and robust speech recognition in
Speech Recognition: Technologies and Applications, IN-TECH, Vienna, Austria, 2008, pp. 61–80.
S. Srinivasan and D. Wang, Transforming binary uncertainties for robust speech recognition, IEEE Transactions on Audio, Speech and Language Processing, vol. 15, no. 7, pp. 2130–2140, 2007.
O. Yilmaz and S. Rickard, Blind separation of speech mixtures via time-frequency masking, EURASIP Journal on Audio, Speech, and Music Processing, vol. 52, no. 7, pp. 1830–1847, 2004.
G. J. Brown and D. L. Wang, Separation of speech by computational auditory scene analysis, in Speech Enhancement, J.Benesty, S. Makino, and J. Chen, Ed. Springer, New York, 2005, pp. 371–402.
S. Srinivasan, N. Roman and D. L. Wang, Binary and ratio time-frequency masks for robust speech recognition, Speech
Communication, vol. 48,pp. 1486–1501, 2006.
S. Srinivasan, N. Roman and D. L. Wang, On binary and ratio timefrequency masks for robust speech recognition, in Proc. International Conference on Spoken Language Processing, 2004, pp. 2541–2544.
H. Sawada, S. Araki, R. Mukai and S. Makino, Blind extraction of dominant target sources using ICA and timefrequency masking, IEEE Transactions on Audio, Speech and Language Processing, vol. 14,
no. 6, pp. 2165–2173, 2006.
T. S. V. Souza, G. F. Rodrigues, A. C. S. Souza, J. M. Moreira and H. C. Yehia, Binary Spectral Masking for Speech Recognition
Systems, in Proc. 35th International Conference on Telecommunications and Signal Processing (TSP), 2012, pp. 432–436.
E. Hoffmann, D. Kolossa and R. Orglmeister, A batch algorithm for blind source separation of acoustic signals using ICA and time-frequency masking, in Proceedings of the 7th International Conference on Independent Component Analysis and Signal Separation, 2007, pp. 480– 487.
G. Hu and D. L. Wang, Speech segregation based on pitch tracking and amplitude modulation, in Proc. IEEE Workshop on
Applications of Signal Processing to Audio and Acoustics, 2001, pp. 79– 82.
A. Jourjine, S. Rickard and O. Yilmaz, Blind separation of disjoint orthogonal signals: Demixing N sources from 2 mixtures, in IEEE
Conference on Acoustics, Speech, and Signal Processing (ICASSP2000), Jun. 2000, vol. 5, pp. 2985–2988.
N. Roman, D. L. Wang and G. J. Brown, Speech segregation based on sound localization, J. Acoust. Soc. Am., vol. 114, pp. 2236–2252, 2003.
N. Roman and D. L. Wang, Binaural sound segregation for multisource reverberant environments, in Proc. IEEE ICASSP, 2004, vol. 2, pp. 373–376.
G. F. Rodrigues and H. C. Yehia, Limitations of the Spectrum Masking Technique for Blind Source Separation, Lecture Notes
in Computer Science, vol. 5441, pp. 621–628, 2009.
N. Li and P. C. Loizou, Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction, J. Acoust. Soc. Am., vol. 3, no. 123, pp. 1673–1682, 2008.
B. S. Kirei, M. D. Topa, I. Muresan, I. Homana and N. Toma, Blind Source Separation for Convolutive Mixtures with Neural Networks, Advances in Electrical and Computer Engineering, vol. 11, no. 1, pp. 63–68, 2011. Available:http://dx.doi.org/10.4316/AECE.2011.01010
A. M. Ahmad, S. Ismail and D. F. Samaon, Recurrent neural network with backpropagation through time for speech recognition, in Proceedings of the IEEE international symposium on communications and information technology, 2004, vol. 1, pp. 98–102.
R. P. Lippmann, Neural network classifiers for speech recognition, The Lincoln Laboratory Journal, vol. 1, pp. 107–128, 1988.
S. I. Amari and A. Cichocki, Adaptive Blind Signal Processing - Neural Network Approaches, in Proceedings of IEEE, 1998, vol.86, no. 10.
D. Kobayashi, S. Kajita, K. Takeda and F. Itakura, Extracting speech features from human speech like noise, in Proceedings of IEEE,
Fourth International Conference on Spoken Language (ICSLP 96), 1996, vol. 1, pp. 418–421.
DOI: http://dx.doi.org/10.11601/ijates.v1i2-3.16
Refbacks
- There are currently no refbacks.