Wednesday, February 22, 2012

Why LSIDs suck

I'll keep this short: LSIDs suck because they are so hard to set up that many LSIDs don't actually work. Because of this there seems to be no shame in publishing "fake" LSIDs (LSIDs that look like LSIDs but which don't resolve using the LSID protocol). Hey, it's hard work, so let's just stick them on a web page but not actually make them resolvable. Hence we have an identifier that people don't recognise (most people have no idea what an LSID is) and which we have no expectations that it will actually work. This devalues the identifier to the point where it becomes effectively worthless.

Now consider URLs. If you publish a URL I expect it to work (i.e., I paste it into a web browser and I get something). If it doesn't work then I can conclude that the URL is wrong, or that you are a numpty and can't run a web site (or don't care enough about your content to keep the URL working). At no point am I going to say "gee, it's OK that this URL doesn't resolve because these things are hard work."

Now you might argue that whether your LSID resolves is an even better way for me to assess your technical ability (because it's hard work to do it right). Fair enough, but the fact that even major resources (such as Catalogue of Life) can't get them to work reliably reduces the value of this test (it's a poor predictor of the quality of the resource). Or, perhaps the LSID is a signal that you get this "globally unique identifier thing" and maybe one day will make the LSIDs work. No, it's a signal you don't care enough about identifiers to make them actually work today.

As soon as people decided it's OK to publish LSIDs that don't work, LSIDs were doomed. The most immediate way for me to determine whether you are providing useful information (resolving the identifier) is gone. And with that goes any sense that I can trust LSIDs.