I’ve released mediawiki-gsa-interwiki which is based on mediawiki-gsa-engine but adds support for results from multiple local wikis by hooking into the interwiki part of the Mediawiki search classes. This is useful for those that have multiple Mediawiki installations indexed by a Google Search Appliance (GSA) and want the search results for the current wiki to be results from the current wiki but also have a sidebar with results from other local wikis. Not a huge market there but useful all the same for those that need it. There are some other subtle changes documented at the project site.
Archive for the ‘Mediawiki’ Category
MediaWiki and Google Search Appliance (GSA)
The Google Search Appliance advertises via the Accept-Encoding part of the HTTP request header that it can handle gzip content. However, this does not appear to be the case with at least gzip-encoded content coming from MediaWiki.
The HTTP request header looks like this:
GET
HOST: www.xyz.com
ACCEPT: text/html,text/plain,application/*
FROM:
USER-AGENT: gsa-crawler (Enterprise; ... ; ...)
ACCEPT-ENCODING: gzip
The solution is to remove the gzip option from Accept-Encoding which can be done by:
- Go to GSA admin interface.
- Crawl and Index->HTTP Headers
- Set field Additional HTTP Headers for Crawler to
Accept-Encoding:
The HTTP request header now looks like this:
GET
HOST: www.xyz.com
ACCEPT: text/html,text/plain,application/*
FROM:
USER-AGENT: gsa-crawler (Enterprise; ... ; ...)
ACCEPT-ENCODING:
Solution source: A posting in the Google Search Appliance/Google Mini group. I found that simply setting the field to “Accept-Encoding:” worked just fine — no need to include “foo”.