There are 24 languages in European Union, and we took care of each of them. It might not be easy to find suitable well-developed solutions for languages with smaller speaking communities. Still, we developed, trained, and perfected ASR models for Baltic languages, which is now the best speech-to-text solution for Estonian, Latvian and Lithuanian.
Let’s get to know these three unique linguistic cupcakes better. Yes, I am comparing languages to cupcakes because why not. Both cupcakes and languages make people (me) happier.
With about 1.1 million native speakers, Estonian could amaze anyone with its umlauts and uniqueness. It belongs to the Finnic branch of the Uralic language family. Estonian is one of the four official languages of the European Union that are not of Indo-European origin. The Estonian language ASR model has proved its efficiency and feasibility and was launched in 2019.
Another Baltic cupcake is Latvian, with 1.75 million speakers. This Eastern Baltic language is rooted in Sanskrit and an Indo-European past considered one of Europe’s most ancient languages. Latvian shares a common bond only with the Lithuanian language.
Last but not least is Lithuanian, with about 3 million speakers, recognized as the most archaic Indo-European language still spoken. The Lithuanian language is often seen as being unnecessarily complex, as it has kept many of the original features of its linguistic ancestors, Latin and Sanskrit.
Both Latvian and Lithuanian speech-to-text solutions were created in 2021. For the first time, the Ender Turing research team used the method of automatic iterative pseudo labeling of data to train ASR models of Latvian and Lithuanian languages. That helped achieve outstanding results in speed and quality of automatic speech recognition.
There are 21 more cupcakes to tell you about, but we will leave them for dessert. Stay tuned, and don’t forget to subscribe.