07 Juli 2008

Language filter with Faceted Search

SharePoint Search extracts the language of a document during the crawl and use this information for the calculation of the relevancy while generating search results.

This behavior cause problems in multilingual enterprises. SharePoint search returns mainly documents in the user language (respectively the browser language).

Based on a post from Patrick Tisseghem about a hidden managed property which contains the language of a document. This managed property is called "DetectedLanguage". As Patrick mentioned, the language is stored as an integer value (9 = English, 7 = German, etc.). It could be used on your SharePoint out of the box (give it a try and search for "somekeyword deletectedlanguage:9").

But the real power of this managed property is developed in conjunction with Faceted Search. This open source SharePoint extensions allows you to use SharePoint meta data (or Managed Properties) as facets to refine your search. Go to the CodePlex project page and have a look at there web casts for more information.

You can extend Faceted Search to use your own managed properies as facets. In this way, you also can use the "detectedlanguage" property. Just paste to following XML snipped in the property "Select columns for Facets" in the "Search Facets" Web Part. With this, you will be able to restrict your search results based on the lanuage of a document.

As a side effect, you will see, that SharePoint (or the underlining IFilter) does lots of errors while detecting the language of a document.

Here is the XML snipped for the Faceted Search Web Part configuration: