<?xml version="1.0" encoding="iso-8859-1"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>dekstop weblog : ETags Support in Aggregators</title>
    <link>http://dekstop.de/weblog/2006/06/etags_in_aggregators/</link>
    <description>Did you notice Sam Ruby&apos;s new preoccupation with ETags? When he&apos;s talking with founders about their new web services, &quot;the first thing I ask is; &apos;do you support ETags?&apos;&quot; I&apos;m so glad that he&apos;s doing that, and talking about it publicly. I&apos;ve been a web developer for a number of ...</description>
    <dc:language>en-us</dc:language>
    <dc:rights>Copyright 2006 Martin Dittus</dc:rights>
    <lastBuildDate>Thu, 22 Jun 2006 12:46:46 GMT</lastBuildDate>
    <generator>MicroLinks 5.6 (dekstop.de)</generator>
    <managingEditor>public&#64;dekstop&#46;de</managingEditor>
    <webMaster>public&#64;dekstop&#46;de</webMaster>

    <item>
      <title>Comment on "ETags Support in Aggregators"</title>
      <link>http://dekstop.de/weblog/2006/06/etags_in_aggregators/#238</link>
      <description><![CDATA[<p>What there really needs to be is an HTTP library that supports everything like this out-of-the-box. It should cache into its own set of temporary files which can be spread across all apps.</p>]]> &lt;p&gt;- <![CDATA[<a href="http://porg.es/blog" rel="nofollow">Porges</a>]]>&lt;/p&gt;</description>
      <dc:creator>Porges</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2006/06/etags_in_aggregators/#238</guid>
      <pubDate>Fri, 23 Jun 2006 00:17:37 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "ETags Support in Aggregators"</title>
      <link>http://dekstop.de/weblog/2006/06/etags_in_aggregators/#237</link>
      <description><![CDATA[<p>Yeah I too found that I spoke too soon -- I'm still seing the same requests.</p>

<p>My initial reaction after sending the mail yesterday was to make it really obvious to them and block their IP -- but after hearing of your experience I wouldn't be surprised if they don't pay attention to HTTP replies either.</p>]]> &lt;p&gt;- <![CDATA[<a href="http://dekstop.de/" rel="nofollow">Martin Dittus</a>]]>&lt;/p&gt;</description>
      <dc:creator>Martin Dittus</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2006/06/etags_in_aggregators/#237</guid>
      <pubDate>Thu, 22 Jun 2006 20:03:16 GMT</pubDate>
    </item>
    <item>
      <title>Comment on "ETags Support in Aggregators"</title>
      <link>http://dekstop.de/weblog/2006/06/etags_in_aggregators/#236</link>
      <description><![CDATA[<p>Martin,</p>

<p>Unfortunately, polite informative emails to developers don't always work. Here's the email I sent--on February 3 of this year--to the CTO of the very same company whose product has been causing you grief:</p>

<p>-- </p>

<p>Jay,</p>

<p>Here's some background info on one issue I'd like to discuss, which is support for E-Tags to reduce bandwidth consumption:</p>

<p>I noticed that NewsClip doesn't seem to support ETag/If-None-Match caching, which is super-easy to implement and is a HUGE bandwidth saver, so (as someone pumping out lots of the same data to NewsClip users) I'd like to request that feature.</p>

<p>There's a good discussion of it here: </p>

<p>http://www.kbcafe.com/rss/rssfeedstate.html#entitytags</p>

<p> but here's the super-simple version:</p>

<p>The only thing NewsClip needs to do is look for an HTTP header in the response from the server that looks like this:</p>

<p>ETag: "574671cf42ca4f3beee74d05c0ddff75a"</p>

<p>and store the quoted value, then include a header on subsequent requests that looks like this:</p>

<p>If-None-Match: "574671cf42ca4f3beee74d05c0ddff75a"</p>

<p>(where the long hex number is whatever was returned in the ETag.)</p>

<p>In many cases, this is a 99.9% reduction in bandwidth utilization for the feed, which is obviously good. :-)</p>

<p>Regards,<br />
Charile</p>

<p>-- </p>

<p>I still get lots of requests from Virtual Reach Newsclip, non of which support ETag/If-None-Match.</p>

<p>-Charlie<br />
</p>]]> &lt;p&gt;- <![CDATA[<a href="http://globelogger.com" rel="nofollow">Charlie Wood</a>]]>&lt;/p&gt;</description>
      <dc:creator>Charlie Wood</dc:creator>
      <guid isPermaLink="true">http://dekstop.de/weblog/2006/06/etags_in_aggregators/#236</guid>
      <pubDate>Thu, 22 Jun 2006 19:55:57 GMT</pubDate>
    </item>


    <item>
      <title>ETags Support in Aggregators</title>
      <link>http://dekstop.de/weblog/2006/06/etags_in_aggregators/</link> 
      <description><![CDATA[<p>Did you notice Sam Ruby's new <a href="http://www.intertwingly.net/blog/2006/06/05/Elevator-Pitch">preoccupation with ETags</a>? When he's talking with founders about their new web services, "the first thing I ask is; 'do you support <a href="http://www.pocketsoap.com/weblog/stories/2002/05/0015.html">ETags</a>?'"</p>

<p>I'm so glad that he's doing that, and talking about it publicly. I've been a web developer for a number of years now, and from the beginning I knew about some basic caching issues and about the <tt>HTTP 304 (Not Modified)</tt> response -- but it took me a while to figure out that in my scripts it's my responsibility to send this header.</p>

<p>Request caching on this level is simply something most people don't think about when they develop web applications, and I'm glad that, thanks to Sam, this may change.</p>

<p>Yesterday at 19:00 local time I sent out this email:</p>

<blockquote>
<p>From: Martin Dittus<br/>
To: info@(domain)<br/>
Subject: Your crawler is _very_ impolite</p>

<p>I'm the owner and webmaster of the domain dekstop.de.</p>

<p>Since yesterday morning I've been getting thousands of hits by your "Virtual Reach Newsclip Collector" aggregator from sp.virtualreach.com. These were requests to virtually all comment feeds from blog articles I offer on my domain. Apparently someone imported the OPML file I offer that links to all those feeds.</p>

<p>Which is all fine and dandy.</p>

<p>But when I looked at my logfiles today I thought you guys must be kidding... You can't just request thousands of feeds per day, and then not support ETags/If-modified-since, which means every request results in a full download of the respective file! And you don't even seem to request robots.txt to allow webmasters control over such requests.</p>

<p>The result: today alone (it's 7pm local time) there were ca. 30MB traffic from my domain to sp.virtualreach.com, which means you will use up nearly 1 GB of _my_ traffic per month. For reading feeds that virtually never change.</p>

<p>Fix this ASAP, it's just not polite to waste other people's bandwidth that way.</p>

<p>And by fixing it I don't mean 'remove my site from your aggregator', but I mean that you:<br/>
1. Implement ETags/request caching<br/>
2. start to respect robots.txt files</p>

<p>Regards,<br/>
Martin Dittus</p>
</blockquote>

<p>This morning, just before 8:00, their requests to my domain stopped. There was no reply to my email yet.</p>

<p>I wonder what their developers are doing right now.</p>

<h3>Related Articles</h3>
<ul class="links">
  <li><a href="http://dekstop.de/weblog/2006/03/added_article_feeds_with_comments/">Added Article Feeds with Comments</a></li>
  <li><a href="http://dekstop.de/weblog/2006/03/feed_readers_a_commodity/">Feed Readers Are a Commodity -- If Not Now, then Soon.</a></li>
  <li><a href="http://dekstop.de/weblog/2006/02/recursive_ruby_opml_parser/">Parsing an OPML Document Recursively With Ruby While Preserving Its Structure</a></li>
  <li><a href="http://dekstop.de/weblog/2006/01/revisiting_aggregators_pt_one/">Revisiting Aggregators Part I: User-Designed Interfaces</a></li>
  <li><a href="http://dekstop.de/weblog/2005/12/feedtools_cache_in_ruby_scripts/">Using the FeedTools Cache in Plain Ruby Scripts</a></li>
</ul>]]></description>
      <dc:creator>Martin Dittus</dc:creator>
      <category>commentary</category>
      <category>drop culture</category>
      <category>web services</category>
      
      <guid isPermaLink="true">http://dekstop.de/weblog/2006/06/etags_in_aggregators/</guid>
      <pubDate>Thu, 22 Jun 2006 12:46:46 GMT</pubDate>
    </item>
  </channel>
</rss>

