Language Detection and Analysing Pages
You can improve analysis of similar content by ensuring language identification is correct.
Our software determines the primary page language by checking the following things:
- Checks if the webserver responds with content-language HTTP response header:
- PHP pages: Insert this code <?php header("Content-Language: en"); ?>.
- The page is checked for content-language META tag:
<meta http-equiv="content-language" content="en">
- The page is checked for lang inside the HTML tag:
<html lang="en">
- The page is searched for Open Graph Protocol attribute property og:locale inside META tags.
- The page is checked for xml:lang inside the HTML tag:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
- The page is checked for alternate / hreflang inside the link tag:
<link rel="alternate" href="http://example.com/name-of-page.html" hreflang="en">
- The page URL is checked for common language/culture and country codes.
Note: This requires enabling option Scan website | Data collection | Inspect URLs to detect language. For more info see:
- Planned: Compare content against word lists for each language. Select best match.