Improve Search Results With Language Detection
By ensuring the crawler can identify page languages correct, you can enhance the search results and functionality.
Our software determines the primary page language by checking the following things:
- Checks if the webserver responds with content-language HTTP response header:
- PHP pages: Insert this code <?php header("Content-Language: en"); ?>.
- The page is checked for content-language META tag:
<meta http-equiv="content-language" content="en">
- The page is checked for lang inside the HTML tag:
<html lang="en">
- The page is searched for Open Graph Protocol attribute property og:locale inside META tags.
- The page is checked for xml:lang inside the HTML tag:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
- The page is checked for alternate / hreflang inside the link tag:
<link rel="alternate" href="http://example.com/name-of-page.html" hreflang="en">
- The page URL is checked for common language/culture and country codes.
Note: This requires enabling option Scan website | Data collection | Inspect URLs to detect language. For more info see:
- Planned: Compare content against word lists for each language. Select best match.
To minimize the search index size, it helps stripping off all common words, e.g. conjunctions. This requires the language has been properly recognized.
Options you can set that will help the crawler:
- Set Select stop words to match the main language of your website or select auto if it uses multiple languages.