Xml Web Service Harvester

The Xml Web Service Harvester may be used with any web service that returns strict XML, such as any web service created using RISE, most public government web services, as well as plain XHTML web pages.

Configuring the harvester

Image 1, Xml Web Service Harvester properties.

Harvester
Name	The name of the harvester in use by the selected Query-node. To change harvester, select the name property and click on the ellipsis button. Select the appropriate harvester in the form that is displayed.
Settings
Accept	The accepted content type of the response. The default value for the Xml Web service harvester is application/xml.
Accept-Charset	The accepted charset of the response. The default is utf-8.
Content-Type	The content type of the request. This is only applicable if Method=POST. The default is application/x-www-form-urlencoded.
Custom Headers	You can add any number of custom http headers to your request, e.g. Accept-Language.
Data Element	An XPath expresson to select nodes from the Xml-response. If the XML document uses a default namespace, you access elements using the prefix "p" in you XPath expression, see image 1 above. For all other namespaces use their respective prefix.
Method	Request method. Possible values are POST and GET. The default value is POST.
Query string	Method = GET: The query string is appended to the Url like Url?Query string. The arguments added to query string should be url-encoded. Method = POST: The content of the query string is sent in the request body. If content type is application/json, the query string should contain JSON-style arguments, e.g. "arg1":"val1", "arg2":"val2". Otherwise the query string should be url-encoded.
Url	The url to the service from which you want to request information.
User authentication
Password	Supply a password if the service requires authentication. Leave empty for anonymous authentication.
User ID	Supply a user id if the service requires authentication. Leave empty for anonymous authentication.

Adding relations

In the model displayed in image 1 we want to list all releases for all artists, and all tracks for each release and finally retrieve the lyrics for each track. In order to accomplish this we need to describe how the information in our model is related. For HTTP-harvesters, there are two ways to do this. Which method to use depends mainly of the web service being called. You can use any of the methods, or a combination of them.

1. Parent relation

Every query node (except for the root node), independent of which harvester it uses, has a parent relation property in the XML section.

Image 2, the parent relation property

To add or edit a parent relation, select the property and click on the ellipsis button.

Image 3, managing the parent relation

All leaves of the selected node, having Column Name specified in the Source section, are listed in the Column combo box, and all parent leaves, having Column Name specified in the Source section, are listed in the Parent column combo box. In the first column you can either select any of the listed options, or you can type your own value. In the second column you must select one of the avaliable parent column options. You can add any number of relations.

If request method = GET, the relations will be added to the query string as ColumnName=Parent column value&...

If request method = POST, and content type = application/x-www-form-urlencoded, the relation is sent in the request content according to ColumnName=Parent column value&...

If request method = POST, and content type = application/json, the relation is sent in the request content as Json, e.g {ColumnName=Parent column value, ... }

Where ColumnName is the name of the column you selected, or entered into, the first column, and Parent column value is the value returned by the parent harvester for the selected column. Please see the JSON Web Service Harvester article for an example.

2. Tagging the url

Using any of the http-harvesters you may tag the web service url, as well as the query string, using the following syntax; <parent.ColumnName>, where ColumnName is the value of the Column Name property for the parent column/leaf. At runtime, the tag will be substituted with the harvested value of the specified parent column.

The root node does not have a parent, hence you cannot substitute any tags with parent column values for havesters applied to the root node. However, when executing the model, you can supply a filter, that is passed to the harvester of the root node. Http harvesters may make use of this filter by adding the tag <param.filter> to the Url property or the Query string property.

When you add a http-harvester to the root node, you may notice that the default value for the Query string property is <param.filter>, i.e. by default the harvester uses the supplied filter as query string. Make sure that the filter has the appropriate formatting and encoding depending on the request method and content type used, see the description of the Query string property in the table above.

The example below illustrates how tag substitution works.

Sample Xml Response

We execute the model, supplying the filter query=Madonna. This results in the following request, see the Url and Query string properties in image 1, http://musicbrainz.org/ws/1/artist/?query=Madonna. Notice how the <param.filter> tag has been substituted with the supplied filter, i.e. query=Madonna.

Image 4, Sample Xml Response.

In the Data Element property, image 1, we have entered an XPath expression to select all the artists from the xml, p:metadata/p:artist-list/p:artist. We use the prefix p since the response XML uses a default namespace.

In order to store the id, name and type for each artist, we have created the leaves Id, Name and Type for our Query-node. For each leaf, in the Column Name property we have entered an XPath expression to select the leaves respective data. The leaf XPath expression is executed for each node returned by the expression entered into the Data Element property. To select the id attribute we use the XPath expression @id, to select the type attribute we use the expression @type and to select the name element we use the XPath expression p:name.

Image 5, Leaf source properties

We now want to list all releases made by the artists returned. This means that we need to tie data from result of the Artist response to the Release request.

As an example, in order to list all releases for Madonna we would call the service using the follownig url:

http://musicbrainz.org/ws/1/artist/79239441-bfd5-4981-a70c-55c3f15c1287?type=xml&inc=sa-Official+release-events where "79239441-bfd5-4981-a70c-55c3f15c1287" is the id of the artist Madonna, see image 4.

Image 6, Release node harvester properties

In the Url property for the harvester, image 6, we have entered the url http://musicbrainz.org/ws/1/artist/<parent.@id>?type=xml&inc=sa-Official+release-events. Notice the tag <parent.@id>. The tag will be replaced with the content of the parent leaf with Source/Column Name = @id. You can add any number of tags to be substituted by the value of a parent leaf.