<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Random tests</title>
	<atom:link href="http://leuksman.com/log/2007/11/22/random-tests/feed/" rel="self" type="application/rss+xml" />
	<link>http://leuksman.com/log/2007/11/22/random-tests/</link>
	<description>reticula, electronica, &#38; oddities</description>
	<pubDate>Thu, 20 Nov 2008 19:49:04 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.3</generator>
		<item>
		<title>By: brion</title>
		<link>http://leuksman.com/log/2007/11/22/random-tests/#comment-5169</link>
		<dc:creator>brion</dc:creator>
		<pubDate>Mon, 26 Nov 2007 15:43:08 +0000</pubDate>
		<guid isPermaLink="false">http://leuksman.com/log/2007/11/22/random-tests/#comment-5169</guid>
		<description>Very long, and no, not very practical. Ideally we want a system that doesn't require a long, very expensive rebuild to be run periodically. :)</description>
		<content:encoded><![CDATA[<p>Very long, and no, not very practical. Ideally we want a system that doesn&#8217;t require a long, very expensive rebuild to be run periodically. <img src='http://leuksman.com/log/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LA2</title>
		<link>http://leuksman.com/log/2007/11/22/random-tests/#comment-5138</link>
		<dc:creator>LA2</dc:creator>
		<pubDate>Sun, 25 Nov 2007 19:53:12 +0000</pubDate>
		<guid isPermaLink="false">http://leuksman.com/log/2007/11/22/random-tests/#comment-5138</guid>
		<description>The rebuild of the table column could be: value[row i] = i/N, where N is the total count of rows. How long would that update take? Is it at all practical for the larger wikis?</description>
		<content:encoded><![CDATA[<p>The rebuild of the table column could be: value[row i] = i/N, where N is the total count of rows. How long would that update take? Is it at all practical for the larger wikis?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brion</title>
		<link>http://leuksman.com/log/2007/11/22/random-tests/#comment-5121</link>
		<dc:creator>brion</dc:creator>
		<pubDate>Sun, 25 Nov 2007 07:40:05 +0000</pubDate>
		<guid isPermaLink="false">http://leuksman.com/log/2007/11/22/random-tests/#comment-5121</guid>
		<description>The values of page_random are assigned, well, randomly. ;) The problem isn't that the *values* are bad, but that the technique will inherently give this kind of biased result.

You can visualize the problem by considering the pages not as randomly sorted points, but as randomly sized *spaces*. It's really the gaps in front of the points, not the points, which are being selected.

The gaps between the points vary from large to small, depending on how close other page points are. Pages with larger spaces in front are more likely to be selected than those with smaller spaces in front of them... this leads to a biased selection, when what we really wanted was for each page to have an equal chance of selection.

An improved algo might grab a small list of pages over a random range, big enough to reduce the variance in those gap sizes, then pick evenly from that number.</description>
		<content:encoded><![CDATA[<p>The values of page_random are assigned, well, randomly. <img src='http://leuksman.com/log/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> The problem isn&#8217;t that the *values* are bad, but that the technique will inherently give this kind of biased result.</p>
<p>You can visualize the problem by considering the pages not as randomly sorted points, but as randomly sized *spaces*. It&#8217;s really the gaps in front of the points, not the points, which are being selected.</p>
<p>The gaps between the points vary from large to small, depending on how close other page points are. Pages with larger spaces in front are more likely to be selected than those with smaller spaces in front of them&#8230; this leads to a biased selection, when what we really wanted was for each page to have an equal chance of selection.</p>
<p>An improved algo might grab a small list of pages over a random range, big enough to reduce the variance in those gap sizes, then pick evenly from that number.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LA2</title>
		<link>http://leuksman.com/log/2007/11/22/random-tests/#comment-5090</link>
		<dc:creator>LA2</dc:creator>
		<pubDate>Fri, 23 Nov 2007 18:28:00 +0000</pubDate>
		<guid isPermaLink="false">http://leuksman.com/log/2007/11/22/random-tests/#comment-5090</guid>
		<description>So how are the values in the random table assigned? Wouldn't it be easy, just to assign new values (rebuild random table), and then all would be fine?</description>
		<content:encoded><![CDATA[<p>So how are the values in the random table assigned? Wouldn&#8217;t it be easy, just to assign new values (rebuild random table), and then all would be fine?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
