MediaWiki and Google Search Appliance (GSA)
The Google Search Appliance advertises via the Accept-Encoding part of the HTTP request header that it can handle gzip content. However, this does not appear to be the case with at least gzip-encoded content coming from MediaWiki.
The HTTP request header looks like this: ` GET HOST: www.xyz.com ACCEPT: text/html,text/plain,application/* FROM: USER-AGENT: gsa-crawler (Enterprise; … ; …) ACCEPT-ENCODING: gzip `
The solution is to remove the gzip option from Accept-Encoding which can be done by:
-
Go to GSA admin interface.
-
Crawl and Index->HTTP Headers
-
Set field Additional HTTP Headers for Crawler to
Accept-Encoding:
The HTTP request header now looks like this: ` GET HOST: www.xyz.com ACCEPT: text/html,text/plain,application/* FROM: USER-AGENT: gsa-crawler (Enterprise; … ; …) ACCEPT-ENCODING: `
Solution source: A posting in the Google Search Appliance/Google Mini group. I found that simply setting the field to “Accept-Encoding:” worked just fine – no need to include “foo”.