<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cymen&#039;s Blog &#187; MySQL</title>
	<atom:link href="http://blog.cymen.org/category/database/mysql/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.cymen.org</link>
	<description></description>
	<lastBuildDate>Fri, 03 Feb 2012 17:53:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>SQL Server &#8211; Ranking names for search results by position of query within name</title>
		<link>http://blog.cymen.org/2011/10/19/sql-server-ranking-names-for-search-results-by-position-of-search-within-name/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=sql-server-ranking-names-for-search-results-by-position-of-search-within-name</link>
		<comments>http://blog.cymen.org/2011/10/19/sql-server-ranking-names-for-search-results-by-position-of-search-within-name/#comments</comments>
		<pubDate>Wed, 19 Oct 2011 16:23:21 +0000</pubDate>
		<dc:creator>Cymen</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://blog.cymen.org/?p=235</guid>
		<description><![CDATA[SQL Server using PATINDEX() and LEFT() When searching names there are some assumptions we can make (based on first and last name being in separate columns): A match in the last name is more important than a match in the first name The position of the match within the last name is important: an earlier [...]]]></description>
			<content:encoded><![CDATA[<h2>SQL Server using PATINDEX() and LEFT()</h2>
<p>When searching names there are some assumptions we can make (based on first and last name being in separate columns):</p>
<ul>
<li>A match in the last name is more important than a match in the first name</li>
<li>The position of the match within the last name is important: an earlier match is a better match</li>
<li>The first name should still be search</li>
<li>If present, a middle name is least important</li>
</ul>
<p>It is possible to do this with SQL Server using the following proprietary extensions:</p>
<ul>
<li><a href="http://msdn.microsoft.com/en-us/library/ms188395.aspx">PATINDEX(needle, haystack)</a>: returns position of needle within haystack and (unfortunately in our use case) 0 if not present.</li>
<li><a href="http://msdn.microsoft.com/en-us/library/ms177601.aspx">LEFT(string, count)</a>: returns substring of string up to length of count (note: will truncate string if length greater than count!)</li>
</ul>
<pre class="brush: sql">SELECT TOP 10 firstName + ' ' + middleName + ' ' + lastName
FROM Member
WHERE [firstName] + ' ' + [middleName] + ' ' + [lastName] LIKE @query
ORDER BY
  PATINDEX (
    @query,
    LEFT([lastName] + '                                                                                          ', 90)
    + LEFT([firstName] + '                                                                                          ', 90)
    + [middleName]
  ),
  [lastName],
  [firstName],
  [middleName]
  -- Note: the ' .... ' above is a string of spaces of length 90</pre>
<p>We are making a big assumption: none of the name fields will have a length &gt; 90. You may need to adjust this value for your use case. The reason we need to do this is that PATINDEX() will return 0 if the value is not present so we can&#8217;t simply due a nice ORDERBY PATINDEX(@query, lastName), PATINDEX(@query, firstName), PATINDEX(@query, middleName). Instead, we have to concatenate the name fields into one long string but pad them so that variable length of the names will not affect the rank they are put in.</p>
<h2>MySQL using LOCATE() and LEFT()</h2>
<p>The same method should work in MySQL using the <a href="http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate">LOCATE()</a> and <a href="http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_left">LEFT()</a> functions. Both appear to be identical in usage to the SQL Server functions.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.cymen.org/2011/10/19/sql-server-ranking-names-for-search-results-by-position-of-search-within-name/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL and Round Robin Database (RRD)</title>
		<link>http://blog.cymen.org/2009/06/13/mysql-and-round-robin-database-rrd/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=mysql-and-round-robin-database-rrd</link>
		<comments>http://blog.cymen.org/2009/06/13/mysql-and-round-robin-database-rrd/#comments</comments>
		<pubDate>Sun, 14 Jun 2009 00:56:35 +0000</pubDate>
		<dc:creator>Cymen</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://blog.cymen.org/?p=7</guid>
		<description><![CDATA[While looking for a MySQL RRD storage engine, I came across Round-Robin Database Storage Engine (RRD) (pdf) which describes how to setup a MySQL table to act as a RRD. The PDF appears to have been created in February of 2007 but the benchmark result at the end of 600 inserts/second says this was achieved [...]]]></description>
			<content:encoded><![CDATA[<p>While looking for a MySQL RRD storage engine, I came across <a href="http://www.shinguz.ch/MySQL/rrd.pdf">Round-Robin Database Storage Engine (RRD)</a> (pdf) which describes how to setup a MySQL table to act as a RRD. The PDF appears to have been created in February of 2007 but the benchmark result at the end of 600 inserts/second says this was achieved on a 1350 MHz AMD CPU which suggests the article may be older.</p>
<p>I replicated the configuration and tested it on my laptop (5400 RPM disk, 2.4 GHz Intel T7700 CPU). With the MyISAM database, a brieft test of about 50k inserts resulted in ~7000 inserts/second. But the 25m max rows means the trigger functionality (the part that makes the table behave like an RRD) wasn&#8217;t really tested.</p>
<p>To make a more interesting test, I recreated the database with max rows set to 25920 (or one sample every 5 minutes for 90 days):</p>
<pre class="brush: sql">CREATE TABLE statistic_rrd (
rrd_key INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
    , attribute_key INT UNSIGNED NOT NULL DEFAULT '0'
    , start_utime INT UNSIGNED NOT NULL DEFAULT '0'
    , end_utime INT UNSIGNED DEFAULT NULL
    , logging_interval INT UNSIGNED NOT NULL DEFAULT '0'
    , value BIGINT UNSIGNED NOT NULL DEFAULT '0'
    , UNIQUE KEY (attribute_key, start_utime)
    , KEY start_time (start_utime)
    ) ROW_FORMAT = FIXED
    , MAX_ROWS = 25920
;

CREATE TABLE statistic_rrd_key (
    rrd_key BIGINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
)
;
INSERT INTO statistic_rrd_key VALUES (0);

DROP TRIGGER IF EXISTS statistic_rrd_ins;
DELIMITER $$
CREATE TRIGGER statistic_rrd_ins
BEFORE INSERT ON statistic_rrd
FOR EACH ROW
BEGIN
    SET @rrd_key = 0;
    SET @rows = 25920;
    -- PK is NULL
    IF NEW.rrd_key = 0 THEN
        SELECT rrd_key + 1
            FROM statistic_rrd_key
            INTO @rrd_key;
        SET NEW.rrd_key = @rrd_key;
    END IF;
    IF (NEW.rrd_key % @rows) THEN
        SET NEW.rrd_key = NEW.rrd_key % @rows;
    ELSE
        SET NEW.rrd_key = @rows;
    END IF;
UPDATE statistic_rrd_key SET rrd_key = NEW.rrd_key;
END;
$$
DELIMITER ;</pre>
<p>Results:<br />
25920 &#8211; ~ 7k inserts/second (on empty database)<br />
25920 * 5 &#8211; ~6k inserts/second<br />
25920 * 10 &#8211; ~6k inserts/second</p>
<p>The insert rate was CPU limited with one of the two cores at 100% and the hard drive rarely being written. The total number of the rows in the table was 25,243 at the end so this suggests my idea of capping at 25920 wasn&#8217;t ideal (I didn&#8217;t examine the trigger to determine exactly how it works).</p>
<p>After converting both tables to InnoDB, the results where:</p>
<p>25920 &#8211; ~630 inserts/second (empty db)<br />
25920 * 5 &#8211; ~660 inserts/second</p>
<p>With the hard drive thrashing, the insert rate was clearly limited by the transaction log. This time though both cores varied at around 30% utilization.</p>
<p>Enabling some sane InnoDB performance options per <a href="http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/">Innodb Performance Optimization Basics</a> at the MySQL Performance Blog:</p>
<pre>[innodb]
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_log_buffer_size = 4M
innodb_flush_log_at_trx_commit = 2
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_file_per_table</pre>
<p>25920 = ~ 1580 inserts/second<br />
25920 * 5 = ~ 1580 inserts/second</p>
<p>Now the disk wasn&#8217;t thrashing so much but the CPU cores were switching back and forth between 80-90% and 60-70% which suggested contention on the rrd_key. One approach would be to make that table (with just a single row) use the MEMORY engine. I made that change and:</p>
<p>25920 * 10 = ~ 1730 inserts/second</p>
<p>But the insertion code is also naive as it makes a SQL call for each insertion. A more realistic scenario is having approximately 500 inserts per call. But this doesn&#8217;t work with the trigger properly&#8230;</p>
<p>According to hdparm, the write speed of </p>
<p><b>insert.php v1:</b></p>
<pre class="brush: php">&lt;?php

try {
    $db = new PDO('mysql:dbname=rrd;host=localhost', 'username', 'password');
    echo "PDO connection object created\n";
} catch (PDOException $e) {
    echo $e-&gt;getMessage() ."\n";
}

$sql = &lt;&lt;&lt;EOD
REPLACE INTO statistic_rrd
    (attribute_key, start_utime, end_utime, logging_interval, value)
VALUES
    (ROUND(RAND()*100000), UNIX_TIMESTAMP(NOW()), NULL, 100, 123456789)
;
EOD;

try {
    $rows_max = 25920*10;
    $start_time = microtime(true);
    $count = $rows_max;
    while ($count) {
        --$count;
        $db-&gt;exec($sql);
    }
    $end_time = microtime(true);
    $total_time = $end_time - $start_time;
    echo ($rows_max / $total_time) ." inserts/sec\n";
    echo "rows_max: $rows_max\n";
    echo "total_time: $total_time\n";
} catch (PDOException $e) {
    echo $e-&gt;getMessage() ."\n";
}

if ($db) $db = null;
?&gt;</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.cymen.org/2009/06/13/mysql-and-round-robin-database-rrd/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

