Cymen's Blog

Archive for the ‘Mediawiki’ Category

mediawiki-gsa-interwiki: Use GSA for search results in Mediawiki including Interwiki results

2 comments

I’ve released mediawiki-gsa-interwiki which is based on mediawiki-gsa-engine but adds support for results from multiple local wikis by hooking into the interwiki part of the Mediawiki search classes. This is useful for those that have multiple Mediawiki installations indexed by a Google Search Appliance (GSA) and want the search results for the current wiki to be results from the current wiki but also have a sidebar with results from other local wikis. Not a huge market there but useful all the same for those that need it. There are some other subtle changes documented at the project site.

Written by Cymen

January 8th, 2010 at 2:01 pm

MediaWiki and Google Search Appliance (GSA)

leave a comment

The Google Search Appliance advertises via the Accept-Encoding part of the HTTP request header that it can handle gzip content. However, this does not appear to be the case with at least gzip-encoded content coming from MediaWiki.

The HTTP request header looks like this:

GET
HOST: www.xyz.com
ACCEPT: text/html,text/plain,application/*
FROM:
USER-AGENT: gsa-crawler (Enterprise; ... ; ...)
ACCEPT-ENCODING: gzip

The solution is to remove the gzip option from Accept-Encoding which can be done by:

  1. Go to GSA admin interface.
  2. Crawl and Index->HTTP Headers
  3. Set field Additional HTTP Headers for Crawler to Accept-Encoding:

The HTTP request header now looks like this:

GET
HOST: www.xyz.com
ACCEPT: text/html,text/plain,application/*
FROM:
USER-AGENT: gsa-crawler (Enterprise; ... ; ...)
ACCEPT-ENCODING:

Solution source: A posting in the Google Search Appliance/Google Mini group. I found that simply setting the field to “Accept-Encoding:” worked just fine — no need to include “foo”.

Written by Cymen

June 30th, 2009 at 10:58 am