Home Contact us Site Map

 telisma overcomes the language barrier


Telisma has taken steps to make sure all elements of the different languages and dialects are taken account of. Telisma has already made impressive headway on the great variety of official languages that are spoken in India. Telisma currently covers languages spoken by 89.5% of India’s people.

Although Indian English is a commonly spoken nationwide, it is mainly spoken by highly educated people, so the usage of local languages is essential to social and economic development. Telisma has overcome the wide variety in accents with the speech corpus design.
 
Hindi is the most spoken official language, yet it is less spoken in the southern regions. The Hindi Belt itself shows big diversity in accents. There is little phonetic overlap between Hindi and English (e.g. fewer fricatives, retro flex consonants), telisma has therefore developed a distinct phonetic set specifically for the Hindi recognizer.
 
Devanagari is the official script for Hindi and as such needs to be natively supported by the Hindi teliSpeech Language (Grapheme-to-phoneme converter or G2P). TeliSpeech is the only speech recognition engine to deal with the Indian original scripts.

Beyond English, Indo-Aryan and Dravidian, not to mention Austro-Asiatic and Sino-Tibetan language families, share a lot of common sounds. Telisma has been able to leverage these phonetic commonalities in order to optimize language development tasks. Special attention has been made when taking care of particularities in each language due to each regional language having very unique sounds. Telisma therefore paid particular attention to cover these verbatim during the speech corpus collection and acoustic unit design phases.
 
Beyond Devanagari for Hindi, all Indian language scripts must be natively supported by the teliSpeech Recognizer e.g. G2P for Tamil, Gurmukhi, Kannada. As a result, telisma has made it possible so that all content can be dynamically extracted from a database in the original script and processed with no manual intervention. Similarly, telisma has made sure that ASR grammar designers can still use Roman alphabet, provided phonetic transcriptions for said Roman transliterated entries are made available in a lexicon. Unicode UTF-8 is the only universal character encoding which deals with both Roman and Indian scripts (and beyond).



 available languages
  • Indian English

  • Hindi

  • Gujarati

  • Marathi

  • Punjabi

  • Bengali

  • Kannada

  • Telugu

  • Tamil

  • Malayalam


© TELISMA
about us - events & news - products - services - training - customers - partners - India
legal information