code » del.icio.us caching/embed script

Introduction

I wanted to be able to collect links with del.icio.us and have them display in the sidebar on my own site. Looking at the "about" page I saw

You can fetch your links in a simple HTML feed; documentation is at /doc/html.

Great! So all I have to put in is include "http://del.icio.us/html/fridgemagnet";, right?

Please do not include it with an <IFRAME> as this will cause one hit to del.icio.us per one hit on your site. Likewise, please don't fetch it every time via PHP. This sort of behavior will be considered abusive.

Oh. Well, that's fair enough I suppose.

Seeing as how I was learning PHP at the time I saw this as an opportunity to learn a bit more about file operations. I looked at the PHP docs and googled first, of course, and looked up a few methods that other people had used. The most popular method seemed to be to cache using PHP to call wget (e.g. Bill, but she's a girl). I tried this but found that, er, I seemed to be unable to use wget on my server. Terrific. A workaround was necessary. Richard Eriksson uses a library called Magpie to parse the RSS feed, but I wanted to do it all myself.

So I wrote the following.

Code

<?php
// Script to grab and cache HTML at regular intervals
// Designed for use with del.icio.us but could be used for anything
// fridgemagnet, 6 October 2004

// Monday 11 October 2004 - added error document, put filenames
// in variables to clean up code.
// 2004/10/13, 9:26 PM - added $grabbed flag so that "more" doesn't
// appear if grab fails. What would be the point? If there's no grab,
// there's no page to go to.
// Override register_globals for security
ini_set("register_globals", "0");

// Cache file
$cachefile = $_SERVER["DOCUMENT_ROOT"]."/tools/delicious.html";

// Location of error document to use if can't cache
$errordoc = $_SERVER["DOCUMENT_ROOT"]."/error/delicious_fail.html";

// Username - build URLs from this
$username = "fridgemagnet";
$pageurl = "http://del.icio.us/".$username;
$feedurl = "http://del.icio.us/html/" . $username .
              "?count=6&rssbutton=no&tags=no&bullet=";

// Minimum time between caching, in seconds
$cachetime = 30*60;

// When was it last cached?
$age = (time() - filemtime($cachefile));

// If it needs to be updated locally, do so.
if ($age > $cachetime) {
    // Set user agent to something descriptive
    ini_set('user_agent', 'fridgemagnet.org.uk include cache robot');
    // If can't grab feed, grab error doc instead
    // Set $grabbed to be the result of this, to add "more" link only
    // if successful.
    // Suppress errors from fopen with @ prefix
    if (!($grabbed = ($in = @fopen($feedurl, "r")))) {
        $in = fopen($errordoc, "r");
    }
    $out = fopen($cachefile, "w");
    // Read in contents of whatever file was picked, and output to
    // cache file.
    while (!feof($in)) {
        $line = fgets($in, 1024);
        fwrite($out, $line);
    }
    fclose($in);
    // Put a "more" link at the bottom if the grab was successful
    if ($grabbed) {
        fwrite($out, "<div class=\"delPost\">");
        fwrite($out,
        "<a href=\"$pageurl\" title=\"del.icio.us page\">more...</a></div>");
    }
    fclose($out);
}

// Include the contents of whatever is in cache right now
include $cachefile;
?>

How it works

This is a simple script. You call the above using a standard PHP include directive in the page you want to embed it in. The first thing it does is check the current age of the cache file, using filemtime(). The minimum time between caches is hardcoded as $cachetime.

If the file was last updated longer ago than this, it updates it. It opens the source URL with fopen(), grabs the entire feed, and writes it to a local file.

After that, whatever it just did, it returns the current contents of the HTML cache file, which is then included in your page.

Notes

other code | blog entry about this | originally 6 October 2004