Hate my tribe. Hate them for even asking why nobody uses library standards in the larger world, when “brain-dead inflexibility in practice” is one obvious and compelling reason, and “incomprehensibility” is another.
... $DEITY have mercy, OpenURL is a stupid spec. Great idea, and useful in spite of itself. But astoundingly stupid. Ranganathan preserve us from librarians writing specs! - Caveat Lector
OK, we're on a roll. After adding Journal of Arachnology and Pysche to my OpenURL resolver, I've no added the American Museum of Natural History's Bulletins and Novitates.
In an act of great generosity, the AMNH has placed its publications on a freely accessible DSpace server. This is a wonderful resource provided by one of the world's premier natural history museums (and one others should follow), and is especially valuable given that volumes of the Bulletins and Novitates post 1999 are also hosted by BioOne (and hence have DOIs), but these versions of the publications are not free.
As blogged earlier on SemAnt, getting metadata from DSpace in a actually usable form is a real pain. I ended up writing a script to pull everything off via the OAI interface, extract metadata from the resulting XML, do a DOI look-up for post 1999 material, then dump this into the MySQL server so my OpenURL service can find it.
Apart from the tedium of having to find the OAI interface (why oh why do people make this harder than it needs to be?), the metadata served up by the AMNH is, um, a little ropey. They use Dublin Core, which is great, but the AMNH makes a hash of using it. Dublin Core provides quite a rich set of terms for describing a reference, and guidelines on how to use it. The AMNH uses the same tag for different things. Take date, for example:
Now, one of these dates is the date of publication, the others are dates the metadata was uploaded (or so I suspect). So, why not use the appropriate terms? Like, for instance, <dcterms:created>. Why do I have to parse three fields, and intuit that the third one is the date of publication. Likewise, why have up to three <dc:title> fields, and why include an abbreviated citation in the title? And why for the love of God, format that citation differently for different articles!? Why have multiple <dc:description> fields, one of which is the abstract (and for which <dcterms:abstract> is available?). It's just a mess, and it's very annoying (as you can probably tell). I can see some hate library standards.
Anyway, after much use of Perl regular expressions, and some last minute finessing with Excel, I think we now have the AMNH journals available through OpenURL.
For a demo, go to David Shorthouse's list of references for spiders, say the letter P and click on the bioGUID symbol by a paper by Norm Platnick in the American Museum novitates.