<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>dekstop weblog : SearchFox Not Suited for Aggregated, High-traffic Feeds? And Some Comments on Community Attention.</title>
    <link>http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/</link>
    <description>Just read in a comment by Esteban Kozak that SearchFox RSS uses both &quot;attention and community data&quot; when determining the value of an article, which means that some of the weird effects documented earlier might be a result of other people&apos;s behavior, as opposed to my own. To recapitulate: I&apos;m ...</description>
    <dc:language>en-us</dc:language>
    <dc:rights>Copyright 2005 Martin Dittus</dc:rights>
    <lastBuildDate>Fri, 04 Nov 2005 11:44:06 GMT</lastBuildDate>
    <generator>MicroLinks 5.6 (dekstop.de)</generator>
    <managingEditor>public&#64;dekstop&#46;de</managingEditor>
    <webMaster>public&#64;dekstop&#46;de</webMaster>

    <item>
      <title>Comment on "SearchFox Not Suited for Aggregated, High-traffic Feeds? And Some Comments on Community Attention."</title>
      <link>http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/#87</link>
      <description><![CDATA[<p>Esteban,</p>

<p>thanks for the clarifications.</p>

<p>And don't worry, I'm patient ;)</p>]]> &lt;p&gt;- <![CDATA[<a href="http://dekstop.de/" rel="nofollow">martin</a>]]>&lt;/p&gt;</description>
      <dc:creator>martin</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/#87</guid>
      <pubDate>Fri, 04 Nov 2005 19:44:35 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "SearchFox Not Suited for Aggregated, High-traffic Feeds? And Some Comments on Community Attention."</title>
      <link>http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/#86</link>
      <description><![CDATA[<p>Well, there is nothing weird about SearchFox's behavior. Your diagnosis of the problem is right on the money. We are working on several ways to improve the algorithm. First, we'll add an aging policy for topics. Second, we'll adjust the score calculation to include heavier weight on the total number of clicks on a topic as an extra measure of interestingness.</p>

<p>As for the attention data APIs, you'll have to be patient. But I promise we'll get there.<br />
</p>]]> &lt;p&gt;- <![CDATA[<a href="http://rss.searchfox.com" rel="nofollow">Esteban Kozak</a>]]>&lt;/p&gt;</description>
      <dc:creator>Esteban Kozak</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/#86</guid>
      <pubDate>Fri, 04 Nov 2005 19:36:19 GMT</pubDate>
    </item>


    <item>
      <title>SearchFox Not Suited for Aggregated, High-traffic Feeds? And Some Comments on Community Attention.</title>
      <link>http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/</link> 
      <description><![CDATA[<p>Just read in a comment by <a href="http://rss.searchfox.com/blog.php">Esteban Kozak</a> that <a href="http://rss.searchfox.com/">SearchFox RSS</a> uses both "attention and community data" when determining the <a href="http://dekstop.de/weblog/2005/10/searchfox_topics_i_like/">value of an article</a>, which means that some of the <a href="http://dekstop.de/weblog/2005/11/update_on_searchfox_topics_i_like/">weird effects</a> documented earlier might be a result of other people's behavior, as opposed to my own.</p>

<p>To recapitulate: I'm trying to understand the algorithms behind SearchFox RSS's "Topics I Like" listing, and found that some terms are conspicuously high on the list where they don't really deserve to be (currently: "quake",  "ning" -- see image below), and others that I care about more are nowhere to be found (currently: "ruby", "rails").</p>

<table class="imagetable" border="0" width="400">
<tr>
	<td valign="top" align="center">
		<img alt="topics_i_like_2005-11-04.png" src="http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/topics_i_like_2005-11-04.png" width="175" height="221" border="0" /></td>
</tr>
<tr>
	<td align="center"><p>My current list of "topics I like" in the <a href="http://rss.searchfox.com/">SearchFox RSS reader</a>.</p></td>
</tr>
</table>

<p>But I also should note that I'm watching the rather high-traffic delicio.us Rails feed, and of course there are a lot more posts in it than I care to read. So another explanation for the system's apparent ignorance towards my interests in Rails might be a result of the disparity between the high number of occurrences of the term in my feeds and the comparably low number of articles I actually click on.</p>

<p>Translation: I might read more articles containing the term "Rails" than the terms "oktober" or "macorama", but there are even more articles containing the term "Rails" that I <i>don't</i> read. And I definitely read every single <a href="http://www.fscklog.com/">fscklog</a> article (where the other two terms come from).</p>

<p>Which could mean that SearchFox's algorithms are not really suited for aggregated feeds such as <a href="http://del.icio.us/rss/tag/rails">http://del.icio.us/rss/tag/rails</a> where the article-to-click ratio is noticeably lower than with a normal blog feed.</p>

<h3>My recommendations to SearchFox's developers</h3>

<p>First, please talk a bit about the algorithms involved, so that we understand how to use the system to its fullest potential, and that we can anticipate what actions might destroy the validity of its attention arithmetics. For example I would like to know if "quake" and "ning" are "Terms I like" because a lot of <i>other people</i> like them, or because of something I did.</p>

<p>Second, let us help you improve your algorithms. Talk to your users so that we can help you get a better understanding of how we are actually using the system. I have the feeling that your algorithms are irritated by high-traffic feeds, because I <i>do</i> click on a lot of Rails-related articles.</p>

<p>Third, let's see some of that attention data ;). Now that I know that the internal logic is also community-driven I am really curious to know what other people read. Make a "Topics Our Users Like" list with <i>at least</i> the most popular 100 words (let's see some of that long tail!), and link each word to a search in my own feeds.</p>

<p>Ok, enough for now. Just one more thing: I don't get why people are so content with using <a href="http://www.rojo.com/">Rojo</a>, I've used it for a week or so with SearchFox RSS in parallel, and SearchFox clearly took the lead. I guess that as soon as SearchFox gets some more attention, people will realize it's the better system, and Rojo will be toast ;)</p>

<h3>Related Links</h3>
<ul class="links">
	<li><a href="http://dekstop.de/weblog/2005/10/searchfox_web_services/">SearchFox Rocks. But Where Are the Web Services?</a></li>
	<li><a href="http://dekstop.de/weblog/2005/10/searchfox_topics_i_like/">SearchFox RSS's "Topics I Like"</a></li>
	<li><a href="http://dekstop.de/weblog/2005/11/update_on_searchfox_topics_i_like/">Update on SearchFox's "Topics I Like"</a></li>
</ul>]]></description>
      <dc:creator>Martin Dittus</dc:creator>
      <category>commentary</category>
      <category>data mining</category>
      <category>recommendation engines</category>
      <category>tools</category>
      
      <guid isPermaLink="true">http://dekstop.de/weblog/2005/11/searchfox_attention_arithmetics/</guid>
      <pubDate>Fri, 04 Nov 2005 11:44:06 GMT</pubDate>
    </item>
  </channel>
</rss>

