<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cymen&#039;s Blog &#187; Google Search Appliance</title>
	<atom:link href="http://blog.cymen.org/category/google-search-appliance/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cymen.org</link>
	<description></description>
	<lastBuildDate>Fri, 03 Feb 2012 17:53:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>MediaWiki and Google Search Appliance (GSA)</title>
		<link>http://blog.cymen.org/2009/06/30/mediawiki-and-google-search-appliance-gsa/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=mediawiki-and-google-search-appliance-gsa</link>
		<comments>http://blog.cymen.org/2009/06/30/mediawiki-and-google-search-appliance-gsa/#comments</comments>
		<pubDate>Tue, 30 Jun 2009 16:58:16 +0000</pubDate>
		<dc:creator>Cymen</dc:creator>
				<category><![CDATA[Google Search Appliance]]></category>
		<category><![CDATA[Mediawiki]]></category>

		<guid isPermaLink="false">http://blog.cymen.org/?p=78</guid>
		<description><![CDATA[The Google Search Appliance advertises via the Accept-Encoding part of the HTTP request header that it can handle gzip content. However, this does not appear to be the case with at least gzip-encoded content coming from MediaWiki. The HTTP request header looks like this: GET HOST: www.xyz.com ACCEPT: text/html,text/plain,application/* FROM: USER-AGENT: gsa-crawler (Enterprise; ... ; [...]]]></description>
			<content:encoded><![CDATA[<p>The Google Search Appliance advertises via the Accept-Encoding part of the HTTP request header that it can handle gzip content. However, this does not appear to be the case with at least gzip-encoded content coming from MediaWiki.</p>
<p>The HTTP request header looks like this:<br />
<code><br />
GET<br />
HOST: www.xyz.com<br />
ACCEPT: text/html,text/plain,application/*<br />
FROM:<br />
USER-AGENT: gsa-crawler (Enterprise; ... ; ...)<br />
ACCEPT-ENCODING: gzip<br />
</code></p>
<p>The solution is to remove the gzip option from Accept-Encoding which can be done by:</p>
<ol>
<li>Go to GSA admin interface.</li>
<li>Crawl and Index->HTTP Headers</li>
<li>Set field <strong>Additional HTTP Headers for Crawler</strong> to <code>Accept-Encoding:</code></li>
</ol>
<p>The HTTP request header now looks like this:<br />
<code><br />
GET<br />
HOST: www.xyz.com<br />
ACCEPT: text/html,text/plain,application/*<br />
FROM:<br />
USER-AGENT: gsa-crawler (Enterprise; ... ; ...)<br />
ACCEPT-ENCODING:<br />
</code></p>
<p>Solution source: <a href="http://groups.google.com/group/Google-Search-Appliance-Help/browse_thread/thread/123115bac1e8c5d1/632734e187d9184b?lnk=gst&#038;q=gzip#632734e187d9184b">A posting in the Google Search Appliance/Google Mini group</a>. I found that simply setting the field to &#8220;Accept-Encoding:&#8221; worked just fine &#8212; no need to include &#8220;foo&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.cymen.org/2009/06/30/mediawiki-and-google-search-appliance-gsa/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

