There are several methods to determine the language of a given text. In this age of artificial intelligence, we can incorporate AI capabilities into our own applications. In this tutorial, I will demonstrate how to utilize the AI Cognitive Service offered by Microsoft Azure to identify the language of a text.
The prerequisite is that you will need an Azure subscription.
Create Azure Cognitive Services Account
Click "Create a resource", then search "Translator", select "Translator text", which is one of the applications of Azure Cognitive Services, the main purpose is to do the translation, but we can also use it to identify the language of the text.
Specify a name in Name, which can be arbitrary and does not affect development. Choose a Pricing tier, here I choose F0, which is free. Resource group can also be specified arbitrarily, it won't affect development.
After the creation is complete, copy a Key, Key1 or Key2 can both be used.
Use Cognitive Services in .NET Core
Azure Cognitive Services provides a REST interface, so we can construct requests and parse returned JSON strings in. NET Core, just as we would with any rest API.
TextLanguageDetector
Create a new class named TextLanguageDetector. It is used to encapsulate actions that call Azure Cognitive Services. Define properties Host, Route, SubscriptionKey. The SubscriptionKey is the key that was previously copied from the Azure portal. We need to allow the caller to assign this freely according to his or her Azure account, so leave it in the constructor parameters. Host and Route are fixed, so they can be hardcoded in the program.
public class TextLanguageDetector
{
public string Host { get; } = "https://api.cognitive.microsofttranslator.com";
public string Route { get; } = "/detect?api-version=3.0";
public string SubscriptionKey { get; }
public TextLanguageDetector(string subscriptionKey)
{
SubscriptionKey = subscriptionKey;
}
public async Task<DetectResult> DetectAsync(string text)
{
// ...
}
}
The DetectAsync method accepts text that needs to be recognized and the returns DetectResult type which is defined by ourselves. Let's take a look at the implementation of this method:
if (string.IsNullOrWhiteSpace(text))
{
throw new ArgumentNullException(nameof(text));
}
object[] body = { new { Text = text } };
var requestBody = JsonConvert.SerializeObject(body);
using (var client = new HttpClient())
using (var request = new HttpRequestMessage())
{
request.Method = HttpMethod.Post;
request.RequestUri = new Uri(Host + Route);
request.Content = new StringContent(requestBody, Encoding.UTF8, "application/json");
request.Headers.Add("Ocp-Apim-Subscription-Key", SubscriptionKey);
var response = await client.SendAsync(request);
var jsonResponse = await response.Content.ReadAsStringAsync();
return new DetectResult(jsonResponse);
}
Very straightforward. A constructed body is submitted to the endpoint address of the Cognitive Service using the POST action, and the content Text is the input parameter of the method, that is, the text to be recognized. The API is authenticated in a way that uses SubscriptionKey. The final JsonResponse is the result, which is converted to the DetectResult type.
Assuming that Simplified Chinese is recognized and no exception occurs, then the return JSON for Azure Cognitive Services will be like this:
[
{
"language": "zh-Hans",
"score": 1.0,
"isTranslationSupported": true,
"isTransliterationSupported": true,
"alternatives": [
{
"language": "ja",
"score": 1.0,
"isTranslationSupported": true,
"isTransliterationSupported": true
}
]
}
]
language is the language code, Zh-hans is Simplified Chinese. score is AI believes how likely it is to be the language, and 1.0 is very sure. For the recognition of the text "予力地球上每一人、每一组织,成就不凡", two kinds of languages are emerged: Simplified Chinese and Japanese. But Japanese is alternatives, so AI basically concludes that the language is Simplified Chinese. To see the specific language code and language name correspondence, you can try:
var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
Constructing DetectResult
In order for our program to be more user-friendly, we will not return only JSON. I constructed the DetectResult type based on two scenarios that Azure Cognitive Services might return: success and failure:
public class DetectResult
{
public string RawJson { get; set; }
public bool IsSuccess => !RawJson.Contains("\"error\"");
public string ErrorMessage
{
get
{
var obj = JsonConvert.DeserializeObject<dynamic>(RawJson);
return obj.error.message.ToString();
}
}
public DetectResult(string rawJson)
{
RawJson = rawJson;
}
public List<TextCogResult> ToCogResults()
{
return IsSuccess ? JsonConvert.DeserializeObject<List<TextCogResult>>(RawJson) : null;
}
}
RawJson is used to store the JSON itself returned by the Cognitive Service, allowing the caller to do some more advanced custom parsing. IsSuccess indicates whether the call was successful, and if it is unsuccessful, the user can check ErrorMessage to get a specific error message. If successful, you can call the ToCogresults() method to parse the result into the TextCogResult type. This method returns a list because the text you enter does not necessarily have only one language.
public class TextCogResult
{
public string Language { get; set; }
public float Score { get; set; }
//public bool IsTranslationSupported { get; set; }
//public bool IsTransliterationSupported { get; set; }
public Alternative[] Alternatives { get; set; }
}
public class Alternative
{
public string Language { get; set; }
public float Score { get; set; }
//public bool IsTranslationSupported { get; set; }
//public bool IsTransliterationSupported { get; set; }
}
All of the above code can be encapsulated in a. NET Standard class library so that it can be used across the. NET Framework,. NET core, or Xamarin.
Application
Take .NET Core Console Application as an example, call the TextLanguageDetector and output the local name and English name of the language:
var texts = new[]
{
"Empower every person and every organization on the planet to achieve more",
"予力地球上每一人、每一组织,成就不凡"
};
var dt = new TextLanguageDetector("YOUR KEY");
foreach (var text in texts)
{
var result = dt.DetectAsync(text).Result;
if (result.IsSuccess)
{
var r = result.ToCogResults();
var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
var ctr = cultures.FirstOrDefault(c => c.Name == r.First().Language);
if (ctr != null) Console.WriteLine($"{ctr.EnglishName} - {ctr.NativeName}");
}
else
{
Console.WriteLine(result.ErrorMessage);
}
}
Reference: https://docs.microsoft.com/en-us/azure/cognitive-services/translator/quickstart-csharp-detect
Comments