Archive for the ‘Perl’ Category

h1

Improving DaveyP’s HIP link tracker

2007-04-26

DaveyP came up with a nice way of logging outgoing links from 856 tags.

I’ve improved it, at least as far as our needs are concerned:

  • If the user is behind a proxy that sends the HTTP X-Forwarded-For header, we record the user’s real IP address.
  • Since we have more than one HIP server, we record the name of the HIP server.
  • Since our profiles represent individual autonomous libraries who are much more interested in their own users, we record the profile code.
  • Timestamps are now in ISO 8601 format.

You can get what I’m audaciously calling version 1.10 here: http://www.tblc.org/~ostrowb/hiplink-1.10.pl.txt

Creative Commons LicenseUnder the terms of DaveyP’s license, hiplink 1.10 is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5 License.

Advertisements
h1

A new plugin for Nagios

2006-09-07

There’s nothing particularly library-oriented about my latest project. On the other hand, libraries send email, and it’s a good idea to make sure nobody thinks of it as spam. That’s why I created check_dnsbl, a Nagios plugin that checks whether an address is listed as a source of spam, or under various other categories.

Nagios is a great tool for sysadmins. It’s extensible (supra) and highly configurable, but it’s also a pain in the neck to set up properly. On the other hand, when you’re running a cutting-edge Z39.50 network, it helps to know when one of your servers needs attention. And when you’ve got dozens of servers to think about (or even a dozen PCs), you’ll save time in the long run by investing in Nagios setup.

And yes, of course, the code is available under the GPL.

h1

Our local xISBN cache is working

2006-06-09

Using the format of OCLC's xISBN service, I've built a local xISBN responder. You can feed it any ISBN you like. As with OCLC's version, if it's a valid ISBN that's in our catalog, you'll get an XML-formatted list of all related items in our catalog.

For example:

http://helpdesk.tblc.org/xisbn/0439136350

returns this:

<?xml version="1.0" encoding="UTF-8" ?>
  <idlist>
    <isbn>0439136350</isbn>
    <isbn>0439136369</isbn>
    <isbn>043965548X</isbn>
    <isbn>0786222743</isbn>
    <isbn>0807282316</isbn>
    <isbn>0807282316</isbn>
    <isbn>0807282324</isbn>
    <isbn>0807283150</isbn>
    <isbn>0807286028</isbn>
    <isbn>8478885196</isbn>
    <isbn>8478886559</isbn>
  </idlist>

This makes me ridiculously happy. Now I can learn a bit of Ajax that will query my xISBN responder and construct links to related items.

Caveats:

  1. I didn't bother writing XHTML responses, so adding .html to the URL will not do anything interesting.
  2. If you ask for an ISBN that's not in our catalog, you won't get very useful results.

Still… not bad for a day's work. What do you think, sirs?

h1

Creating a locally relevant xISBN cache

2006-05-25

 

has a neat service () that will tell you what ISBNs are associated with other editions of a given work. For example, if you ask it about the ISBN representing the hardcover edition, it will tell you the ISBNs for the paperback, the large print edition, the Braille edition, and so on.I have a database that contains a slice of our database. It includes every in our catalog.

Yesterday I wrote a script that goes through a list of all the ISBNs in our catalog and asks the xISBN server what other ISBNs are associated with each one. Then it compares each resulting list of ISBNs to the ones already on our system. It assigns a group ID to each cluster so that I can later write a script that does a sort of localized xISBN query on the cached results. This way we can get only the ISBNs relevant to our catalog so that we can generate an Amazon-like list of references to other versions in our catalog. In fact, just for the sake of standards (and bloody-mindedness) I think I'll write an xISBN interface for our little cached slice of the world.

Our , 's , is reasonably customizable, but I think this goes beyond what they planned for. So, following 's advice, I'll learn some AJAX to inject the links into the catalog's web page.

To paraphrase Zippy the Pinhead: Are we yet?

I started the script yesterday at 6pm. It's still running (186,000 records have been associated with groups so far), and OCLC hasn't threatened me with any specific harm yet, so I'm cautiously optimistic.

Oh, and since Technorati doesn't seem to like my , I've decided to move all library-related blogging to this site. Here's hoping this finally shows up on the code4lib blogroll.

h1

Stylin’

2006-05-01

With #code4lib's help, I've now got an XSLT stylesheet that lets browsers turn the existing RSS feed into something resembling HTML. This is a big step for all of the work I'm planning to do, since it means I can apply that knowledge to all the rest of the XSLT stuff.

Huzzah! I really need to cacheize the script so I can share the URL with everyone. I've gotten a lot done today and I'm proud of it.

From an email to the stakeholders (hey guys, put those stakes down, I haven't even got my game face on):

Here's what I did today:

http://%5Bserver name deleted]/cgi-bin/recent.pl?location=fyh&output=rss

Yes, it looks very much like the old version.

What's noteworthy about this version:

  • It's actually an RSS feed, meaning that you can paste the URL into bloglines or any other RSS aggregator. (But please don't; I need to make the script cache its results or else we'll have loads of SQL queries bogging down Sunline.)
  • It also has an XSLT stylesheet attached, which lets us skin it so that it looks just like any other TBLC page or SunCat page. (Theoretically. But then, this was all theoretical to me last week!)
  • It takes input directly from the web ("location=fyh"), strips it of all potentially dangerous code, and uses that input to affect the presentation ("at Remington College") as well as the results of the SQL query.

I can apply this knowledge to make this much more useful, and allow users to get exactly what they want from SunCat. My to-do list for this script follows. (Some of it is notes to myself.)

  • Cache the result so we don't query Sybase every time an aggregator requests the feed.
  • Cache images and don't display images that don't exist.
  • Find a way to pass the image URL without using the RSS 'description' attribute.
  • Fix the SQL query to remove duplicates.
  • Fix the SQL query to retrieve title, author, etc.
  • Fix the SQL query to limit to the most recent N items.
  • Add the option to turn off images. (Don't do this in recent.pl, but pass images=yes or images=no to the XSLT stylesheet.)
  • Add the option to see only checked-in items.
  • Add the option to see new items at all libraries (or at some of them? multiple location params?)
  • Add the option to restrict the search to a given author authority or subject authority.

If none of the above makes sense, you can loosely translate it as "Ben has been getting his hack on and has learned a lot that will help us do a lot of impressive stuff."

Ben

Technorati tags: , , , , ,

h1

Success!

2006-05-01

My Perl script returns valid RSS for items added in the last 7 days at any given SunCat library. I need to make it cache results and use the cache, or else I can’t share the URL without getting our database slammed by broken RSS aggregators querying once per minute. But I’ve got valid RSS, which is a good start. Next parts of the project:

  1. Get a scratch PAC working so we can monkey around with it.
  2. Cache-ize the RSS script.
  3. Add more useful information; currently it’s only displaying an ISBN and an image.

Technorati tags: , , , ,