Skrybot – A System for Automatic Speech Recognition of Polish Language

Lesław Pawlaczyk i Paweł Bosky
Insitute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
Abstract
In this article we present a system for clustering and indexing of automatically recognised radio and television news spoken in Polish language. The aim of the system is to quickly navigate and search for information which is not available in standard internet search engines. The system comprises of speech recognition, alignment and indexing module. The recognition part is trained using dozens of hours of transcribed audio and millions of words representing modern Polish language. The training audio and text is then converted into acoustic and language model, where we apply techniques such as Hidden Markov Models and statistical language processing. The audio is decoded and later submitted into indexing engine which extracts summary information about the spoken topic. The system presents a significant potential in many areas such as media monitoring, university lectures indexing, automated telephone centres and security enhancements.

full article: http://www.springerlink.com/content/005r89701h218005/

messages archive
Are you interested in speech recognition? Subscribe to our newsletter by adding your e-mail address here:
Confidentiality
E-mail addresses obtained by the help of this form will be used solely for the purpose of providing information regarding the development of skryBot - speech recognition and decoding software.