Now that the Jangle spec has been released, I've been looking for small, simple projects to show off how easy Jangle makes it to incorporate into your applications. The general approach to the projects I have chosen for the proofs-of-concept for Jangle have been to target applications that get the point across but don't require a ton of overhead on my part to either get running or keep running. This is why OpenBiblio was chosen for : it presents at least a representation of the sorts of data or interfaces that allow people to wrap their head around how something actually works.
All along, my original milestone for Jangle 1.0 was to make sure it was useful for 3rd party discovery interfaces; especially VuFind and Blacklight since anyone can download these and immediately see the benefits. However, neither of these projects score well on the effort/impact ratio: I'm not so worried modifying either project to work with Jangle -- they're both probably pretty straightforward and I can butcher away at the PHP and Ruby (respectively) to at least prove the point. The difficulty lies in the application both projects use to load the MARC data into Solr: SolrMARC, a Java app that uses MARC4J to load the MARC into Solr. The keywords here are "Java app" and while I checked out the source and poked at it, messing around with trying to figure out how to parse Atom feeds and format identifiers to work with both VuFind/Blacklight and Jangle broke the threshold for how much effort I was willing to invest in it.
There was another OSS Solr-based OPAC replacement project that was more approachable, though: fac-back-opac (aka Helios, aka Kobold Chieftain, aka KoChief, my preferred title). Originally introduced by Casey Durfee at Code4lib 2007 as "Open-Source Endeca in 250 Lines or Less", it was written as a way to still have a usable catalog when the primary OPAC interface had to be taken down for maintenance. It uses Django, the popular Python web framework used in Google App Engine, among other places, with Solr and, despite having grown quite a bit larger than 250 lines (the search template alone is bigger than that), is still a quite simple, small and approachable OPAC replacement that demonstrates the functionality of VuFind/Blacklight or Primo/Encore in a more self-contained, hackable package.
Getting Jangle integrated into KoChief was pretty simple. It took about three or four days, total -- much of which was spent trying reacquaint myself with Python and figure out the architecture and conventions of Django. The first customization was to the indexing script. Out of the box, KoChief takes a filename or a URL to a MARC file which it parses using PyMARC and converts it into a bunch of tab-delimited values to load into Solr. Custom libraries can be specified from the command line, so I took the existing MARC loader and modified it to use an Atom feed loaded with MARC records.
The differences in behavior that I needed to add were:
The first two were pretty easy. Parse the Atom with ElementTree and loop through the entries. Since the focus for our purposes is on simply getting working code, we're bypassing any optimization; every record is passed individually to PyMARC. ElementTree also isolates the <link rel="next"> element, passed the href attribute to urllib and then we do it all over again for the next page until all the records are parsed.
Indexing performance isn't horrible at around 60 records per second from the Jangle demo server, but there is definitely room for improvement. Realistically, majority of indexing after the initial record load would be incremental, so this shouldn't be too much of a show stopper.
Initially, I had planned on using MARCXML from Jangle (since it's the default) but I was getting a lot of errors from PyMARC because of diacritics. It remains to be seen whether this is a shortcoming of PyMARC's XML parsing or something wrong with the way I am creating MARCXML in Jangle. Further experimentation is needed here. However, since it's just as easy to get MARC21 binary from Jangle, I just fell back to it. No more complaints from PyMARC.
For the identifiers, there needed to be a way to preserve the Jangle URIs without completely ruining the URL structure in KoChief. Obviously something like http://kochief.example.org/record/http%3A%2F%2Fjangle.example.org%2Fcata... is a non-starter. Nobody wants to see that in their location bar. Since I wanted to make sure KoChief could serve more than one Janglified ILS (for a consortial union catalog, for example) I settled for a convention like this: djo:openbiblio:resources:4540 where "djo" is an identifier for "demo.jangle.org". Really, this could be contracted even more (why stop at the host name?) since it's all mapped in settings.py, but this was fine for now. KoChief was using the value of the MARC 035 field for the Solr document identifier (Gabriel Farrell is the principal developer of KoChief now and Drexel, his employer, is a III library), so I modified the 'get_record' method to use the new URI based identifier rather than pull it from the MARC record.
Once the data was in Solr, it was time to modify Django to pull availability status from Jangle. I added a method to helios/discovery/views.py to grab the Items associated with the given Resource from Jangle and add the results to the records dict of the Solr search results. All told, between this method and the code to call it and deal with the results came to be around 75 lines total.
Once I figured out that templating language in Django pretty much forbids any sort of data processing, incorporating the availability status was pretty simple. One block (8 lines) in the search template and 13 lines in the "full record" template to show all of the holdings information.
Really, about a quarter of the lines added to integrate Jangle into KoChief are in the css stylesheet and could definitely be made smaller by somebody that knows Python, Django and KoChief a little better. Not a bad demonstration to show how simple Jangle makes getting at your data, really.