LD4L Ontology Process and Description
At the outset the ontology team recognized the existence of a great deal of prior art in the form of published ontologies and significant ongoing ontology initiatives addressing the representation of bibliographic information in RDF. Elements of the Bibliographic Ontology and FaBIO had already been incorporated into the VIVO-ISF Ontology (GitHub) and were familiar to team members from previous work – and Paolo Ciccarese from Harvard, a member of the LD4L ontology team, was a principal FaBIO contributor. The BIBFRAME initiative at the Library of Congress addresses the representation of MARC metadata in RDF, while OCLC has worked to extend the Schema.org ontology as a bridge between the library community and the Web. The Collections Ontology and ORE address collections of digital objects; the Open Annotation Data Model annotations, and PROV-O and PAV provenance.
Several principles have guided our discussions and influenced the projects selected by ontology team members. The group early on confirmed the intention stated in the proposal to reuse appropriate parts of currently available ontologies rather than building a new, self-contained ontology for LD4L. While there are advantages to working from a blank slate, we believe it makes eminent sense for a project focused on linked data to draw as much as possible on existing ontologies that have already achieved significant adoption or show promise for doing so.
Though still very much in flux, the BIBFRAME ontology was of particular interest to the project because (if successful) it would likely be a model pervasively implemented by the library community. The LD4L ontology team is heavily engaged in the evolution of BIBFRAME to better reflect best practices in the Linked Data domain and thus to avoid leaving the RDF data that libraries contribute to the semantic web marginalized in an ontology not interoperating well outside the library domain. In April 2015 Rob Sanderson submitted Analysis of the BIBFRAME Ontology for Linked Data Best Practices to the Library of Congress. Following on Rob Sanderson's and LD4L's recommended changes to BIBFRAME, the team wrote a bibliographic ontology with the hope that derivations from BIBFRAME found in this ontology, as well as newly proposed terms that are applicable to BIBFRAME’s targeted knowledge domain, will be folded into the official BIBFRAME namespace. The proposed changes include, but are not limited to, incorporation of the following principles and conventions:
Reuse stable pre-existing classes and properties from external ontologies rather than declaring new ones within the local namespace.
Use URIs rather than strings to identify resources.
Replace the BIBFRAME Authority classes with Real World Entity classes for people, places, things, etc.
Define only one pattern to model any single feature of the knowledge domain.
Clarify the directionality of properties via naming patterns, definitions, and, where applicable, domain and range constraints, and add inverse properties where appropriate.
Name terms consistently, and make the distinction between classes, object properties, and datatype properties clear by adopting standard naming conventions.
Efforts have been made to consider each class and property, but this ontology remains largely untested in production practice at scale. LD4L may provide revised/expanded versions in the future as we identify new use cases and begin to test the ontology with instance data, and as BIBFRAME 2.0 revisions solidify. The RDF generated as an output of the project will be based on the LD4L ontology and will be made available for testing. Future considerations will include aligning the LD4L ontology with BIBFRAME 2.0, Schema.org, and other RDF models within the bibliographic and cultural heritage domains.
A high-level schematic representation of the proposed LD4L ontology is shown below, with detailed discussion following.
Works, Instances, and Items
The LD4L Bibliographic Ontology uses three core classes to describe items in library collections: Work, Instance, and Item. The ld4l:Work is an entity referring to the conceptual content of a resource, similar to the BIBFRAME definition of a Work. The ld4l:Instance, like Barbara Tillet’s definition of a FRBR Manifestation, is understood to be “an abstract entity…[which] describes and represents physical entities” [Tillet]. The ld4l:Item is then an actual physical instantiation of a ld4l:Instance. The following illustrates the representation of a map in the LD4L Bibliographic Ontology.
Work, Instance, and Item Example (Hamlet):
<Hamlet> a ld4l:Work ;
ld4l:hasInstance <SomePrintInstanceOfHamlet> .
<SomePrintInstanceOfHamlet> a ld4l:Print ;
ld4l:hasHolding <SingleCopyOfSomePrintInstanceOfHamlet> ;
ld4l:identifiedBy <Some ISBN10> .
<Some ISBN10> a ld4l:Isbn10 ;
rdf:value "0486272788" .
ld4l:Print rdfs:subClassOf ld4l:Instance .
<SingleCopyOfSomePrintInstanceOfHamlet> a ld4l:Item ;
ld4l:heldBy <Some Library> ;
ld4l:identifiedBy <Some Barcode> .
<Some Barcode> a ld4l:Barcode ;
rdf:value "31117013206375" .
One of the motivations for aligning LD4L with the BIBFRAME model is the collapse of the FRBR Work and Expression entities [Tillet] into the single BIBFRAME Work class. The distinctions between Works and Expressions are notoriously difficult to find agreement on in practice; both BIBFRAME and the LD4L Bibliographic Ontology consider that a work can be an expression of another work through the relationships between them, without needing to define a separate Expression class.
Contributions and Provisions
The LD4L Bibliographic Ontology includes Activity/Event based patterns to describe relationships between Agents and Works/Instances that parallel the Contribution class proposed in BIBFRAME 2.0. Though more complex than direct “shortcut” properties between Agents and Works, such as dc:creator, the ld4l:Contribution and ld4l:Provision (and their related property paths) provide greater context and more explicit semantics. For instance, if a book has two authors, distinct Contribution nodes allow them to be ordered correctly and consistently via a property like vivo:rank, and to specify time and place for the Contributions; shortcut properties do not allow for the expression of such data. Similar remarks apply to the ld4l:Provision class used for publication, distribution, and similar roles. Following principle 4 above, LD4L chose not to include shortcut properties such as dc:creator to link a Work or Instance directly to the relevant Agent.
A Novel with Contributions and Provisions:
<Some Novel> a ld4l:Text ;
ld4l:hasContribution <An Author Contribution> , <Another Author Contribution> ;
ld4l:hasInstance <Print Version of Book> .
<An Author Contribution> a ld4l:AuthorContribution ;
prov:agent <Some Person> ;
prov:atLocation <Some Place> ;
dc:date "2015-10-05" ;
vivo:rank “1” .
<Another Author Contribution> a ld4l:AuthorContribution ;
prov:agent <Another Person> ;
prov:atLocation <Another Place> ;
dc:date "2015-10-10" ;
vivo:rank “2” .
ld4l:AuthorContribution rdfs:subClassOf ld4l:Contribution .
<Print Version of Text> a bibo:Book, ld4l:Print;
ld4l:hasProvision <A Publisher Provision> , <A Distributor Provision> ;
ld4l:hasHolding <A Single Copy of Print Version of Book> .
ld4l:Print rdfs:subClassOf ld4l:Instance .
<A Publication Provision> a ld4l:PublisherProvision ;
prov:agent <Some Publisher> ;
prov:atLocation <Some Other Place> ;
dc:date "2015-10-15" .
<A Distributor Provision> a ex:DistributorProvision ;
prov:agent <Some Distributor> ;
prov:atLocation <Yet Another Place> ;
dc:date "2015-10-20" .
ld4l:PublisherProvision rdfs:subClassOf ld4l:Provision .
ld4l:DistributorProvision rdfs:subClassOf ld4l:Provision .
The LD4L Ontology Team explored the PROV Ontology Activity class for linking Agents to Works and Instances, but found that the PROV constraints were too strict for most library use cases. These constraints require that each prov:Activity contributing to a Work or Instance produce distinct intermediate Work or Instance entities, even if no other information about draft states is available, and that the prov:Activity entities be chronologically ordered with respect to one another. When libraries describe a work, they often only have information about the final state of the Work/Instance and some notion of the Agents’ involvement in the creation of the work; thus LD4L did not want to require separate intermediate entities for each prov:Activity.
On the other hand, ld4l:Contribution and ld4l:Provision easily align to prov:Activity, so if datetime data is available, the related PROV temporal properties can be used to record the events in the correct sequence and describe various draft/version “states” as distinct prov:Entities.
Specificity Through Subclassing
The LD4L Bibliographic Ontology is meant to provide basic patterns for describing library collections and relating them to entities defined in external ontologies including Agents, Concepts, and Places. These top-level classes are often conceptually abstract, requiring subclassing to provide the level of specificity needed for library use cases. Few users would find meaning in a resource typed “Instance”, or “Contribution”, or “Provision”, but libraries may begin subclassing instances of these classes and/or use classes from external ontologies with additional specificity that makes sense in the real world. In the above Contribution/Provision example, subclassing allows an easy method for specifying the role of the agent, and the print Instance is also typed using bibo:Book. The LD4L ontology itself provides a number of subclasses of the broad top-level classes, and libraries may decide to introduce new subclasses to accommodate their use cases.
LD4L Workshop Agenda with linked presentations, February, 2015