Are Foreign Language Characters in URLs Bad for SEO?
Posted by drew on Dec. 7, 2014, 5 p.m.
Being an SEO can be difficult, especially when competition is getting stiffer and old techniques simply aren’t working anymore. However, I am grateful for being an SEO that works primarily on sites written in English. Using a Latin alphabet makes SEO much easier.
I do, however, have international clients, working in Chinese, Japanese, Spanish, etc. There is a lot that can be said about international SEO, like including the appropriate language tags on your page, but that’s the topic for another blog post. This entry will specifically talk about how to handle URLs for languages that use non-latin characters.
Any SEO will tell you that, when it comes to on-site optimization, there are some things you should definitely check, including, but not limited to:
Including non-Latin characters in the content and meta data of your page is simple enough, and it seems like it would be most beneficial to reach searchers who are using those characters. But what about the URL? Will you be able to actually display non-Latin characters in a URL, or will it completely break your site?
Well, it doesn’t look like it will break your site, but as to whether it actually displays, the answer might lie in your choice of browser. For example, the Japanese Wikipedia page for soccer (サッカー) might display as <http://ja.wikipedia.org/wiki/%E3%82%B5%E3%83%83%E3%82%AB%E3%83%BC> or as <http://ja.wikipedia.org/wiki/サッカー>, depending on the browser you use.
Some SEOs are wary of using these non-Latin characters in URLs because of the way it might display in certain browsers, or in certain text cases when the link is copied and pasted somewhere online. While having spammy-looking links is never a good thing, I personally think the possible benefit in search results is worth the drawback.
Search results in Google show the non-Latin characters, so searchers who use these characters will see their search query bolded in results. This increases the clickthrough rate of that snippet.
Something needs to be said about transliteration, which is the process of exchanging non-Latin characters for similar-looking letters in the Latin alphabet. This is most common in languages like French, Spanish, and German, where there are accented letters.
A good example of this is the website for the city of Düsseldorf, Germany. The domain for the city’s official site is www.duesseldorf.de. A keen eye will notice that the letter U with an umlaut has been replaced by ‘ue’ in the domain name.
Looking at the Google Adwords Keyword Tool, we can see how the different options stack up in terms of search traffic: using an accent, using the transliterated version, or simply leaving the accent out of the word.
The transliterated search trails far behind the others. If you want to match the exact query, you should look into using the umlaut in your URL. But based on when I see in the search results, Google displays transliterated results for both [dusseldorf] and [düsseldorf] searches.
All of this is further complicated by the fact that users in different locations search differently for the same thing. For instance, if you limit the Keyword Tool to only United States traffic, [dusseldorf] becomes the leader in searches over [düsseldorf].
Google has addressed this issue, saying they return results for both accented and non-accented versions of queries, though the slight differences a user sees between these SERPs come from their language preferences.
While some SEOs will be afraid of the potential for your URLs to look spammy with strings of numbers letters and percent signs, I think the benefit of having the non-Latin characters show up bolded in the URL of your search snippet is worth that, especially when most browsers are displaying the original characters anyway.
Lastly, for languages that are closer to using the Latin alphabet, you have the option of simply omitting accents or transliterating the text, but my recommendation would be to include the accents whenever possible, as your results will still show up for non-accented queries.