Knowledge Representation and Discovery Systems

read

“We need new goals.” - Karen Coyle.

Karen Coyle ended her SWIB 2015 talk, “Mistakes have been made” with a call to arms: “There’s so much more we can do for the user, but how can we develop a technology on top unless we have figured out what our goals are? And the goals have to be based on what users want to do not what we will let them do in the catalogue”. And for her in order to support new goals “we don’t need a new technology, we need data and technology that work together.” In the library world, the shape of this data and this technology are subsumed under the heading of linked data.

Now, as Coyle points out, simply adopting linked data will do nothing for our users unless we adopt a different set of rules, but I believe that the work currently being done in linked data is necessary for us to begin to understand what those goals might be. This blog post is an attempt to get down some ideas around linked data and library technology that I’ve been thinking about for a while.

Knowledge Representation: The Context of Linked Data.

Many, if not most, library workers have come to linked data through the work we were already doing. Cataloguers and metadata people may see linked data as an alternative or as a next stage in metadata work, especially with respect to object modeling in the currently very dynamic field of repositories. Systems workers may see linked data as a possible data model for underpinning library systems. Both cataloguers and metadata people may see linked data as a way to increase findability of library resources on the web. But the ideas, technologies, and practices that we include in the term “linked data” are part of a larger context known as “knowledge representation and reasoning” Knowledge Representation

knowledge representation system is, essentially, a system which contains facts and rules about those facts, such that it can solve problems. In the Wikipedia article, knowledge representation and reasoning “is the field of artificial intelligence dedicated to representing information about the world in a form that a computer system can utilize to solve complex tasks”. The system contains a picture of the world (a representation of knowledge about the world) which can be queried, which can create new facts from old facts (inference), and which can reason about the state of the world contained within it.

Davis, Shrobe, and Szolovits describe five characteristics of a knowledge representation:

It is a surrogate, it stands in for the real world in order to allow a system to reason about the world.
It is a set of ontological commitments, that is, terms in which a system thinks about the world.
It is in itself a theory of intelligent reasoning, that is, it models a particular understanding of what “intelligent reasoning” is.
It is a medium of computation, in other words, it is an effective system.
It is a medium of human expression, a language in which we, as humans, say things about the world. (Modified from R. Davis, H. Shrobe, and P. Szolovits. What is a Knowledge Representation? AI Magazine, 14(1):17-33, 1993.)

Linked Data and Libraries

Now I’m sure a lot of cataloguers and systems people are wondering what any of this has to do with linked data, metadata, or library systems, and certainly moving towards linked data should be done with caution. But it seems to be that understanding the broader context of linked data might help us exercise that caution but still come up with the new goals that are necessary for us to satisfy the needs of our users as we move towards a library infrastructure of “data and technology that works together”.

What we in the library world describe as “linked data” is in fact a knowledge representation with the potential of expressing all facts of the real world. The semantic web (a term often used interchangeably with linked data) aims to transform the world wide web into a single, vast knowledge representation system, whose knowledge base is the graph of linked open data we are currently contributing to. There are two ways in which libraries can participate in this project: the creation of ontologies and the publishing of linked data to the semantic web, and the creation of systems which can use the data of the semantic web to allow users to “do what they want to do”. The publishing of linked data and ontologies, the work being done on object modeling, and the creation of repository systems like Hydra and Islandora and platforms like the Digital Public Library of America, are all participating in the knowledge representation system that is the semantic web.

Discovery Systems

So where does library discovery fit in? The problems identified by Karen Coyle in “Mistakes have been made” - an outdated data model, technologies that don’t look “outside our own circle” for ideas and best-practices, and that are overlaid on top of a data model that no longer satisfies our needs or the needs of our users - all these problems are still to be addressed from a linked data perspective. Some work has been done on using linked data to make library resources discoverable on the web, like Dan Scott’s work with Schema.org, and indeed this is the single use-case covered by Zepheira’s “Libhub Initiative”). But we have not yet begun to think about what discovery systems might look like that take advantage of the possibilities of linked data.

Now, when I talk about library discovery systems, I don’t mean the current crop of “next-generation” OPAC replacements. It is clear that in many cases users are finding their own way to our resources, either through the open web, or through other linked systems, and it is equally clear that our discovery systems are increasingly inappropriate for use in a world where “the collection” is no longer a single identifiable thing. These systems are based on the outdated data model and technologies that we have been using since before the days of library automation, and they rely on search techniques that have not scaled well and are likely to be made completely obsolete by the advent of the semantic web. They are not equal to the challenges of what Lorcan Dempsey calls “the facilitated collection”. And worst of all they also fundamentally assume a model of user behaviour and requirements that are constrained by the data and technology at our disposal. We force users to “do what we allow them to do” rather than making it possible to for them to do what they want.

This isn’t to say that there might not be a place for library-specific discovery systems, but I doubt they will be the norm, and they will be created for very specific, well-defined purposes. Library discovery will increasingly happen within the semantic web, made possible by linked data technologies and allowing our users to achieve their own goals by harnessing linked data and the technologies that work with that data. What I see happening is, as the linked data infrastructure of the semantic web becomes more mature, users will use querying techniques tailored for linked data to explore knowledge, to explore facts about the world. Eventually, this exploration may lead them to a resource that requires library support (a license, a subscription, proxying, etc), and it is at that point that library systems have to integrate with a user’s semantic workflow in order to connect them to the resource they want. This means that in addition to creating linked data and repository software, in addition to making our resources discoverable on the open web, we need mechanisms to make the connection between a user and an institution, technologies to handle authentication and proxying within the semantic web, and ways for users to continue their work within the web with support from fully integrated library data and technologies. Given this view of the future, it is unlikely that libraries will run out of work to be done.

s I say, I think that we are currently working very hard within the metadata/cataloguing/repository side of our field, but on the discovery side, we seem to be focusing primarily on search engine optimization (either with Schema.org or by converting MARC records to BIBFRAME for the Libhub initiative). But we haven’t seen much done in the area of thinking about how our users might begin to explore this new data infrastructure, and we haven’t begun to think about how, for example, proxying might be done in a linked data context. Our attitude towards the library web site and the discovery system haven’t fundamentally changed in forty years, and this attitude is, I think, holding us back from really participating in the linked data world. Right now two of our goals should be continuing the work we have already started in linked data and arguing against the continuation of out-of-date data and technologies that are doing nothing but preventing us from participating in the world of linked data.

[A note about vendors: it should go without saying that our vendors are some of the worst offenders in maintaining an outdated view of library systems and data. I think the Hydra and Islandora projects show that if we want to move forward with linked data technologies, open source is pretty much the only viable solution.]

[Note: I’ve learned pretty much everything I’ve talked about here through conversations with and talks given by Karen Coyle and Tom Johnson. All the good stuff is theirs, all the inaccuracies are my own].

Sam Popowich

Discovery and Web Services Librarian, University of Alberta

Knowledge Representation and Discovery Systems

Sam Popowich

Written by

Sam Popowich

Supported by