Getting Started with Writing Jangle Connectors

So you have a service that you like to set up a Jangle API for. You look at the spec. You skim the mailing list. Perhaps you check out some of the frameworks from the Google Code site. Unfortunately, despite the Jangle community's best intentions you probably only see this:

  1. Find a library-domain application.
  2. ???
  3. Jangle!!!

Here is an attempt to fill in #2. I will attempt to lay out the steps to build a connector in very general terms. Let's leave aside the Jangle Core for the time being. In an ideal Jangle world there would be many, many connector projects for every core project. The core being generally reusable, connectors being specifically reusable.

The important thing to keep in mind for this tutorial is to not think about Atom. When it comes to connectors, the only consideration is the Connector JSON API.

Start Philosophically

Obviously, the first thing you will need to is identify a service to expose via Jangle. This is most likely a subset of functionality in a given application. For example, an integrated library management system wouldn't itself have a Jangle connector. Instead, there may be a connector for an OPAC service and a different connector for an acquisitions service. Perhaps the ILMS has a course reserves module; this would require yet another connector.

The general rule of thumb is that the four Jangle entities:

  • Actors
  • Collections
  • Items
  • Resources

should be consistent in meaning for the service. Just as important is that these entities are semantically consistent for other connectors that that would be used for similar tasks.

For the sake of this HOWTO, we will use an example of an ILMS OPAC service.

Setting up your environment

The only real prerequisites to writing a Jangle connector are:

  1. A web server
  2. Some capacity to parse RESTful URIs and HTTP headers
  3. A programming language that can serialize output to JSON.

There are a few ways to achieve #2. Most web frameworks have some ability to handle this sort of routing: the Zend Framework for PHP; Rails for Ruby; Django for Python; etc. Of course, a web framework isn't necessary, this sort of routing: it could all be achieved simply using a .htaccess file and URL rewriting. This is the way the current OpenBiblio PHP Connector works. Using this sort of method, a Jangle connector could be any CGI script.

Basically, expect the following patterns:

/{entity}/
/{entity}/{id}
/{entity}/{id}/{relationship}
/{entity}/-/{category}
/{entity}/{id}/{relationship}/-/{category}

Where {entity} and {relationship} might be: actors, collections, items, or resources.

{id} could be a single identifier, a comma delimited list or a hyphen delimited range or a combination:

1234
1234,1235,1236
1234-4321
1234,1243,1250-1260

Also expect query parameters defining result paging and record format (the current frameworks use the arguments "offset" and "format", respectively). This article won't go into search, since that's complicated enough for its own HOWTO, but keep in mind this may introduce more query parameters.

The last thing your connector 'environment' needs to be able to do is accept the incoming X_CONNECTOR_BASE header from the Jangle core. This header specifies the full URL path to the connector resources that will be seen by Jangle clients. Use this to create your resource identifiers.

What this means:

Your Jangle connector lives at:

http://service.example.org/path/to/connector/

The Jangle core is proxying your connector at:

http://jangle.example.org/your_connector/

So as far as clients are concerned, your resource URIs look like:
http://jangle.example.org/your_connector/resources/1234/items

Your connector has no way of knowing this, however. In fact, your connector might be served by multiple cores (which means different resource URIs). This is why the core advertises the base URI that it serves the connector from.

One more thing. Use normal HTTP status codes. If a requested identifier doesn't exist, send a 404. If something used to exist and is now gone, send a 410. Be RESTful. Theoretically, your content-type header should always be set to application/json (although you may want to do some content negotiation so you can view the response in your web browser).

The Services Object

This is probably the simplest to get working first, since majority of the time this will be hard coded or built from a config file.

The connector must send back a services object at /base/path/to/connector/services. The entities defined in the services object must actually exist and the paths resolve.

Categories are a little fast and loose in Jangle. They can have strong semantic meaning (i.e. their scheme conforms to a Jangle vocabulary term), but it's not required.

Note, Jangle connector entity paths do not have to strictly conform to /actors, /collections, /items, /resources but it's recommended that you follow that convention since it reduces the risk of the Jangle core doing the wrong thing.

See more about the services response (such as the actual structure of the object) here.

Building your Feed Objects

Here is where a few layers of abstraction come into play, which is the rationale for the existence of the connector frameworks. Feed objects, regardless of the entity they are transporting, are all structurally identical, so this class or method or <however you need to get to the JSON serialization> can be the same for all of your responses, regardless of entity, regardless of whether the request is for all resources or one identifier.

Next, it's probably easiest visualize the next layer as Jangle entity objects: Actor, Collection, Item and Resource. As I mentioned in the beginning of this post, the behavior of these objects would be very much dependent on the service desired and, ideally, would be dictated (at least to some extent) by an established community profile.

So, for an OPAC service (which a community profile at this point is very much a "work in progress") your object definitions would probably look something like this:

Actor: In ILMS terms this would be a 'patron' or 'borrower'. A 'default' record format still needs to be agreed upon, but the current connectors use vCards as a simple, standard way to pass personal information.

Collection: This will probably never be strictly defined. Realistically, its most common use will be collections of Resources: journals, juvenile fiction, music, etc. The default record format for this currently is Dublin Core.

Item: This could be a single copy or serials holdings. The default currently for this is the DLF ILS-DI Expanded Records format (for either holdings or simpleavailability). Theoretically, this might eventually default to a standard like ISO 20775. An alternate format (especially when taken in Actor/Item relationship) could be based on fines.

Resource: A bibliographic record. A logical default for this would be MARC21 XML.

These objects would be related to each other in various ways. An Actor may have several Items.

An Item would most likely have only one Actor and one Resource.

A Resource may have several Items (or one or none) and zero, one or more Collections.

A Collection may have many Resources.

The next layer down would be business logic objects specific to your system. The entity objects are specific to your system as well, of course (since they would need harness these business logic objects) and, depending on your style or need, these two layers could be merged. I'm going to keep them separated in this example for the sake of clarity (I hope).

So let's say library account holders are defined in the ILMS by 5 tables:

BORROWER (which contains their name, barcode, borrower type, status, etc.)
BORROWER_ADDRESS
ADDRESS
BORROWER_PHONE
PHONE

Your Actor object would be built according the requested record format and some combination of the fields in the above tables.

Your base entity relationships need to be established at this point as well. So, for example, there is a request for /actors/1234

Your response would look something like:
{
type: 'feed',
request: '\/actors\/1234',
offset: 0,
totalResults: 1,
formats: 'http://jangle.org/vocab/formats#text/x-vcard',
data: [
{id: '\/actors\/1234',
format: 'http://jangle.org/vocab/formats#text/x-vcard',
content_type: 'text/x-vcard',
content: <some vCard data here>,
relationships: {"http://jangle.org/vocab/Entity#Item":"\/actors\/1234\/items"}}
]
}

(See the spec for the actual format of the feed response)

Note that the relationships do not point to specific resources. These relationships should only appear if there is definitely a set of entity resources at the relationship URI.

A way to picture this is as the boolean response something like:

Actor.get(1234).has_items?

If this is true, the set of should be greater than 0. If it is false, the client should get a 404 error if it tried to access /actors/1234/items.

Relationship Feeds

When you're building your Feed objects for requests like /actors/1234/items, be sure that the ids you return in your data array have URIs like /items/1, /items/2 rather than /actors/1234/items/1. Entity URIs should always be at the top level.

In order to maintain the associations between entities in a relationship feed for multiple identifiers (i.e. /actors/1234,1236,1241,1290/items), a practical way to keep up with these associations is to use the "link" element:

data:[
{id: '\/items\/1',
link: {"via":[
{"type": "application/atom+xml",
"href": "\/actors\/1234"}
...
},
...
]
Note, this will be addressed directly in the Connector Response API for Jangle 1.1.

Tinker and experiment

Nothing is set in cement in Jangle, yet, (including this HOWTO) so let's start making some cow paths. This document will change according to people's comments and needs and as more connectors get built, patterns should start to emerge to begin making community profiles.

If it's still not clear how to get to:

  1. Jangle!!!

Leave a comment where this document needs more explanation or where more documents need to be written (such as 'search').

Comments

Operations of a connector

Don't you first also have to define the concrete operations your ILS must support to wrap them to Jangle? We have a SOAP API that supports all CRUD operations (create, retrieve, update, delete) for a PICA database:

http://www.gbv.de/wikis/cls/CWS#Webcatws - German description
http://cws.gbv.de/ws/webcatws.wsdl - WSDL
http://search.cpan.org/perldoc?PICA::Store - Perl client library

This should be very easy to map to Jangle but I am still somehow confused where to start. Each record in the catalog is identified by an id but there are no actors, collections and such. And is http://search.cpan.org/perldoc?Atompub the right library to build a Jangle server in Perl on?

Greetings,
Jakob