In many cases as part of natural language processing, we need to perform text translation to apply some libraries that are common in English but are not available in other languages. This is the case for example of the sentiment analysis of tweets in Spanish using the classic Vader and TextBlob libraries in Python. In both cases, it´s necessary to pass the texts to English before apply the sentimental analysis and obtain the polarity of the feeling.
In most cases, it´s enough to translate into the desired language (in this case English) using the Basic Editing Cloud Translation API in Python and apply it to Vaden sentiment analysis. In the case of TextBlob, an internal translation can be done through the function “.translate(to=’en’)” that allows the translation to be performed at an intermediate point but that in the background uses the Google Cloud Translation API as well.
But when we develop analysis on bigger rows dataset, the Google Cloud Translation API introduces a limit of characters to be translated for free (including spaces) that is around 30,000 characters per day which would limit us to an analysis of for example 107 tweets a day. This means that when we tried to analyze a dataset of 10,000 or 20,000 tweets, it would take more than 100 days to analyze the entire dataset….impossible to address with this strategy!! We need to perform this intermediate translation with other alternatives for Open Source projects that don´t have these limitations. In this context, the Yandex Translation API emerges as an alternative whereas the only limitation is the translation of maximum 10,000 characters or 10kb of the text size on each connection… more than enough to be able to translate a tweet into each of those connections.