Day 1 · 2014-12-01 Preconference

09:00 - 12:00

Collocated Events

Meeting of the DINI AG KIM, Germany

Stefanie Rühle

13:00 - 19:00

Workshops and Tutorials

Introduction to Linked Open Data

Felix Ostrowski / Adrian Pohl
graphthinking, Germany / hbz, Germany


This introductory workshop aims to introduce the fundamentals of Linked Data technologies on the one hand, and the basic legal issues of Open Data on the other. The RDF data model will be discussed, along with the concepts of dereferencable URIs and common vocabularies. The participants will continuously create and refine RDF documents to strengthen their knowledge of the topic. Linked Data tenets such as publishing RDF descriptions in a web environment and utilizing Content-Negotiation will be demonstrated and applied by the participants. Aggregating data from several sources and querying this data will showcase the advantages of publishing Linked Data, and RDF Schema will be introduced as an effective way of data integration. On a side track, Open Data principles will be introduced, discussed and applied to the content that is being created during the workshop.

Introducing RDFa with for web applications

Dan Scott
Laurentian University, Canada


Part 1 - Introducing RDFa: While the W3C issued the first RDFa recommendation specification in 2008, it remained a niche method for expressing linked data. However, the RDFa 1.1 specification in 2013 and the creation of RDFa Lite brought renewed interest into this means of uniting the semantic and the document web. This section of the workshop introduces the core concepts of RDFa Lite through a series of hands-on exercises that progressively enrich a sample document with RDFa structured data. Part 2 - Adding structured data via RDFa to library applications: The vocabulary was created by the major search engines to provide a common means of publishing metadata about things that normal humans search for, such as events, people, and products. While one advantage of adopting for your library applications is that your library may benefit from more precise and enriched search results for the resources it holds, you can also enhance access to your organizational information such as locations, operating hours, and employee listings. This section of the workshop introduces the vocabulary through a set of hands-on exercises addressing use cases that are common to libraries, such as marking up different resource types and publishing organizational information. Part 3 - Configuring a structured data search engine: Google's Custom Search Engine offers a quick to deploy, but also highly configurable search service that can take advantage of structured data. This section of the workshop will lead participants through creating a simple structured data search engine, then introduce some of the refinements that are available for more advanced use cases. Part 4 - Crawling and extracting structured data: For reasons such as privacy or control, it is often preferred to create one's own search engine. This section of the workshop introduces a simple web crawler written in Python that uses RDFLib to extract structured data and store it so that it can be indexed and retrieved. Prerequisites: Attendees are expected to have a working knowledge of HTML. Some familiarity with Python would be helpful for Part 4 but is not essential.

RDF Application Profiles in Cultural Heritage

Valentine Charles / Evelyn Dröge / Stephanie Rühle / Karen Coyle / Kai Eckert
Europeana / HU Berlin / SUB Göttingen / / Mannheim University


The DCMI RDF Application Profiles Task Group (RDF-AP) deals with the development of recommendations regarding the proper creation of data models, in particular the proper reuse of existing data vocabularies. The Digitised Manuscripts to Europeana project ( DM2E) was a driving factor in establishing the task group and provides one of the main case studies. Like many other projects in the domain of cultural heritage, DM2E created an own data model based on the Europeana Data Model (EDM), which itself reuses parts of other vocabularies like OAI-ORE and Dublin Core and allows the creation of rich mappings. This mix-and-match of vocabularies and the documentation of project-specific characteristics or even constraints in using the data model is called an application profile. In this workshop, we will first introduce the DM2E model as a typical application profile, as well as the infrastructure developed in the project that determines how the application profile is actually used. The DM2E model not only describes digitised manuscripts, it also is used to model workflows for the transformation of the metadata, as well as representing the provenance of the metadata. Data represented in the DM2E model can be accessed via a SPARQL endpoint and search interfaces. As the DM2E model is backwards compatible with the EDM, the data is also provided in plain EDM for the ingestion to Europeana. The current status of the DCMI RDF Application Profiles Task Group will be presented, which will particularly include a brief overview on the other case studies and the identified requirements for a generally applicable solution. We then want to discuss these requirements and work towards practicable solutions together with the participants, who can reflect all approaches based on their own requirements in various institutions and projects. Concrete topics will be: - Definition of application profiles as “local” versions of existing data models. - How to create an application profile that remains interoperable. - Formulate constraints regarding your data within an application profile. - Linking your data to an application profile. - Validate your data with respect to an application profile.

Catmandu - Importing, transforming, storing and indexing data should be easy

Johann Rolschewski / Jakob Voß
Staatsbibliothek zu Berlin, Germany / Verbundzentrale des GBV (VZG), Germany


Catmandu provides a suite of software modules to ease the import, storage, retrieval, export and transformation of metadata records. Combine Catmandu modules with web application frameworks such as PSGI/Plack, document stores such as MongoDB and full text indexes such as Elasticsearch to create a rapid development environment for digital library services. After a short introduction to Catmandu and its features, we will present the domain specific language (DSL) and command line interface (CLI). Participants will be guided to transform (their) data records to a common metadata model, to store/index it in Elasticsearch or MongoDB and to export it as Linked Data. Prior Experience: We will be using a simplified DSL language. Participants should be familiar with command line interfaces (CLI). Any programming experience is welcome but not required. Requirements: Laptop with VirtualBox installed. Organisers will provide a virtualbox image (Linux guest system) beforehand. Participants should bring their own data (CSV, JSON, MAB2, MARC, PICA+, RDF or YAML).

Analysis and Transformation of Library Metadata with Metafacture

Christoph Böhme / Pascal Christoph
dnb, Germany / hbz, Germany


Metafacture is a versatile Java-based open source toolkit for all metadata related tasks. It was developed in the Culturegraph project. Since then it has become an important part of the software infrastructure at the German National Library. Core applications of Metafacture are metadata transformation, search indexing and statistical analyses of metadata. Despite originating from the library domain, Metafacture is format-agnostic and has successfully been employed in metadata related tasks in other domains. In this workshop participants will learn how to use Metafacture in order to analyse large datasets. After a short introduction to Metafacture three types of analyses, which are often encountered in day-to-day work at a library, will be presented: counting distinct data values, quantifying relationships between metadata records and joining metadata records. Participants will have the opportunity to perform these analyses themselves and to discuss their approaches. Prior Experience: No programming experience is required but participants should have advanced computer skills. Participants should have a basic understanding of XML. A library background is not necessary as the analyses presented in the workshop are applicable to other areas as well. Requirements: A laptop with a VirtualBox installation is required (or any other virtualisation environment that can open OVA-files). Alternatively, users can install a custom Eclipse package prior to the workshop.

Day 2 · 2014-12-02 Conference

09:00 - 10:15



Alexander Vogt / Silke Schomburg / Timo Borst
Member of NRW Parliament, SPD (Social Democratic Party) spokesman for media policy / North Rhine-Westphalian Library Service Center (hbz) / ZBW Leibniz Information Centre for Economics

Using linked data to annotate semantically the BBC's content

Tom Grahame
BBC, United Kingdom


Linked Data at the BBC emerged as a set of ideas, techniques and technologies to build websites and has gone on to show how those techniques can improve and simplify production workflows, and provide interesting automated aggregations for our audiences. The success of applying the technology to deliver the online coverage of major sporting events has demonstrated the potential for reusing the semantic infrastructure as a central part of the BBC production workflow. To support this decision, the vision of semantic publishing at the BBC evolved towards connecting content around the things that matter to our audiences - those things can be politicians, athletes or musicians, places or organisations, topics of study or events. The BBC produces a plethora of content every day about these things and the content varies from news articles, to programmes, to educational guides, clips and recipes. Because it is commissioned and used in different audience facing products, this content is mastered in separate and disconnected systems, yet – the things that the content is about are the same. By semantically describing and annotating the content with the things it is about, we enable journalists and content editors to access heterogeneous and previously isolated creative works in a unified manner. In this talk I will describe how the BBC Sport's use of Linked Data has evolved from developing a single website covering the 2010 World Cup to supporting the annotation and dynamic aggregation of daily Sports coverage and every major event including London 2012, Sochi 2014 and the 2014 FIFA World Cup. I will also discuss how the same platform and technology approach is being deployed across the BBC in domains as diverse as Education, News, Radio and Music and how a Linked Data approach could be applied to similar challenges in the Library environment.

10:15 - 10:45

Coffee break

10:45 - 12:15

LOD Applications

Applying a Dublin Core application profile to the digital Pina Bausch archive for ontology population management and data presentation purposes

Kerstin Diwisch / Bernhard Thull
Hochschule Darmstadt, Germany


In recent years, it has become common practice in Linked Data application development to utilize application profiles for describing and constraining metadata and structure of these applications. Description Set Profiles (DSP), as proposed by the Dublin Core Metadata Initiative (DCMI) as a means of machine readable formalizations of application profiles, take this development into account. However, even with the use of application profiles, Linked Data application developers are still left with challenges concerning data input or presentation of data, especially for large datasets. An example of a large dataset is the digital Pina Bausch archive which is realized as a Linked Data archive containing data on diverse materials such as manuscripts, choreography notes, programmes, photographs, posters, drawings and videos related to Pina Bausch’s work, adding up to about 20 million triples. To address this topic, we developed and implemented a Description Set Profile in RDF containing metadata structure and constraints for the Pina Bausch archive ontology. Initially, the DSP was mostly used as a controller for an interactive form-based data editor. In the course of the project, we enhanced the DSP with information about frontend data presentation in a form that by now, it is not only used for controlling the data input flow but also as part of a controller for our archive browser. In this talk, we will provide an insight into the development of the DSP and discuss the approach.

Using Graphity Linked Data Platform for Danish newspaper registry. From printed books to Linked Data

Martynas Jusevicius / Dziugas Tornau
UAB Linked Data (Lithuania) / Semantic Web (Denmark), UAB Linked Data (Lithuania)


Danish Newspapers is a registry of newspapers with historical and factual metadata records. Three printed volumes with metadata about danish newspaper publishing were scanned and text optically recognized, marking it up using TEI XML schema. The XML was converted into XML-based RDF quad format using multiple domain-specific vocabularies (SKOS, BIBO, Time ontology etc.) in a custom XSLT stylesheet. Rich non-structured text was preserved as XHTML literals, with links to images stored as JPG files. The data was stored in Dydra cloud triplestore and presented as a Web application that publishes Linked Data as well as user-friendly and mobile-ready XHTML. It features interactive maps, faceted and text search, autocomplete and complex content creation and editing for authenticated users. URI templates, SPARQL queries, data quality constraints and access control were defined declaratively as RDF data and processed in run-time by Graphity platform, while XSLT stylesheets were used to generate a customized XHTML layout and facets. Graphity processor is open-source and works with any SPARQL 1.1 triplestore, while the commercial platform layer provides multi-tenant features such as access control and faceted search. Linking with external datasources and alignment with standard models were not in scope for this project, as the focus was on data conversion, content presentation and data filtering. The data can be mapped to generic bibliographic vocabularies such as BIBFRAME and EDM using SPARQL Update.

Entity Facts - A light-weight authority data service

Christoph Böhme / Michael Büchner
Deutsche Nationalbibliothek, Germany / Deutsche Digitale Bibliothek, Germany


The German National Library has published a new web service called Entity Facts. The main goal of Entity Facts is to provide aggregated information about entities from various sources in a way that makes it easy to present this data to the user. The information provided by Entity Facts is based on the Integrated Authority File (Gemeinsame Normdatei, GND) - the main authority file used in the German-speaking world - and merged with other sources such as Wikipedia, VIAF or IMDb. The information is provided as machine- and human-readable data in a straightforward and lightweight way via an Application Programming Interface (API). Our intention is to enable reuse of authority data for developers who do not have domain specific knowledge. This is realized through an easy to understand JSON-LD data model, which is providing ready-to-use data. Linking to and merging data from different sources offers new and ever improving possibilities for data enrichments. The infrastructure of the service is designed to extend and update the data sets easily. In our contribution we introduce the main goals, the present status and the features of Entity Facts, we plan to develop in future. The Deutsche Digitale Bibliothek (DDB) acts as a pilot partner and is the first client to use the service in a productive scenario. The data sets provided by Entity Facts (e.g. the data set provided for J. W. v. Goethe) establish the basis for the entity pages about persons in the DDB portal ( J. W. v. Goethe). A short demonstration will illustrate the current functionality. The presentation will close with a road map for the future development of Entity Facts.

12:15 - 13:45


13:45 - 15:30

Multilingual Data

Wikidata: A Free Collaborative Knowledge Base

Markus Krötzsch
TU Dresden, Germany


Wikidata, the free knowledge base of Wikipedia, is one of the largest collections of human-authored structured information that are freely available on the Web. It is curated by a unique community of tens of thousands of editors who contribute in up to 400 different languages. Data is stored in a language-independent way, so that most users can access information in their native language. To support plurality, Wikidata uses a rich content model that gives up on the idea that the world can be described as a set of *true” facts. Instead, statements in Wikidata provide additional context information, such as temporal validity and provenance (in particular, most statements in Wikidata already provide one or more references). One could easily image this to lead to a rather chaotic pile of disjointed facts that are hard to use or even navigate. However, large parts of the data are interlinked with international authority files, catalogues, databases, and, of course, Wikipedia. Moreover, the community strives to reach “global” agreement on how to organise knowledge: over 1,000 properties and 40,000 classes are currently used as an ontology of the system, and many aspects of this knowledge model are discussed extensively in the community. Together, this leads to a multilingual knowledge base of increasing quality that has many practical uses. This talk gives an overview of the project, explains design choices, and discusses emerging developments and opportunities related to Wikidata.

When Semantics support Multilingual Access to Digital Cultural Heritage - the Europeana case

Valentine Charles / Juliane Stiller
Europeana Foundation, Netherlands, The / Humboldt-Universität zu Berlin, Germany


For Europeana, the platform for Europe’s digital cultural heritage from libraries, museums and archives, multilingual access is one priority. Breaking down language barriers is an ongoing challenge with many facets as Europeana provides content coming from 36 different countries serving users across the world. For Europeana, multilingual access does not only mean the translation of the interface, but also comprises retrieving, browsing and understanding documents in languages the users do not speak. This talk will present the solutions implemented at Europeana enabling multilingual retrieval and browsing. Europeana leverages the semantic data layer by linking multilingual and open controlled vocabularies to objects. The Europeana Data Model (EDM) allows for semantic and multilingual metadata descriptions and gives support for contextual resources including concepts from “value vocabularies” either coming from Europeana’s network of providers or third-party data sources. To enable retrieval across languages and enhance data semantically, Europeana performs automatic metadata enrichment with external value vocabularies and datasets such as GEMET, GeoNames and DBpedia. Providers are also encouraged to send links from open vocabularies such as AAT, GND, Iconclass and VIAF or from their domain vocabularies following the EDM recommendations for contextual resources, especially when these vocabularies contain labels in different languages. By re-using these vocabularies, Europeana does not only pursue efforts in demonstrating the potential of Linked Open vocabularies by exploiting the semantic relations and translations but also aims at making Europeana truly multilingual.

Lightning talks


15:30 - 16:00

Coffee break

16:00 - 17:30

Publishing and Aggregation machine-readable cataloguing for the open web

Dan Scott
Laurentian University, Canada


The vocabulary was created by the major search engines (Bing, Google Yahoo!, and Yandex) in 2011 to provide a common means of expressing metadata about popular search topics such as events, people, and products. While was enthusiastically adopted by web sites hoping for enhanced search results and rankings, libraries have more cautiously integrated the vocabulary. This session examines efforts to use to provide access points for library resources in major search engines via bibliographic metadata, holdings data, and data about libraries themselves. We highlight advances made by the integrated systems Evergreen and Koha, discovery systems such as Blacklight and VuFind, and repositories such as Islandora and ScholarSphere in publishing metadata (from unstructured data, to structured data, to linked data). The role of the W3C Bibliographic Extension community group in filling gaps in and documenting best practices for libraries is also discussed. Finally, we show how common workflows (such as creating union catalogues and checking item availability) that currently rely on niche, library-specific protocols can be simplified and built with standard web tools by embracing a truly machine-readable vocabulary.

d:swarm - A Library Data Management Platform Based on a Linked Open Data Approach

Jens Mittelbach / Robert Glaß / Ralf Talkenberger / Felix Lohmeier
SLUB Dresden, Germany / Avantgarde Labs


The rise of the concept of resource discovery, the increasing multiplicity of information channels and the exploding complexity of the technological infrastructure have placed organizational and financial challenges on libraries. Library data has become more heterogeneous, its sources have grown manifold. Bibliographic and authority data, licence and business data, usage data from library catalogues and the global science community (bibliometric data) as well as open data from the WWW constitute the graph that describes the resources managed by libraries. Consequently, there is an increasing need to integrate, normalize, and enrich existing library data sets as well as assure data quality for production and presentation purposes. The Saxon State and University Library Dresden has chosen a new approach of data integration for libraries and other cultural heritage institutions. In the EFRE-funded project, a scalable cloud-based data management platform called d:swarm has been implemented. Featuring an easy-to-use web-based modelling GUI, d:swarm allows for the integration and interlinkage of heterogeneous data sources into an integrated and flexible property graph data storage. As a middleware layer, it runs on top of existing library software infrastructures. Thus, existing library workflows depending on a variety of software solutions can remain untouched while data integration can be flexibly tailored to the needs of the individual institutions. Using d:swarm, feeding a library’s discovery front-end with high-quality normalized data or disseminating Linked Open Data is much easier. The project is published under an open source licence. Project blog

Linked Open Data in Aggregation Scenarios: The Case of The European Library

Nuno Freire
The European Library, Portugal


The paradigm of Linked Data, brings many new challenges to libraries. The generic nature of data representation used in Linked Data, while it seamlessly allows any community to manipulate the data, also brings many possible paths to its implementation. The European Library Open Dataset is derived from the collections aggregated from member libraries. The dataset is published as Linked Open Data (LOD) and made available under the Creative Commons CC0 license, in order to promote and facilitate the reuse of the data by all communities. This presentation describes the experience of The European Library in the creation of this linked dataset, the data model, and the perspectives on the benefits of linking library data in large aggregation contexts. The dataset includes national bibliographies, library catalogues, and research collections. It addresses the linking of subject heading systems widely used in Europe, by exploiting MACS (Multilingual Access to Subjects), since ontologies are key in many LOD applications, particularly in research. The task of creating LOD is demanding in terms of human and computational resources, and expertise in both information and computer science. Library aggregators provide an organizational environment where conducting LOD activities becomes less demanding for libraries. This kind of organization can leverage on existing information and communication technologies, the centralization of data, and their expertise in both library data and the semantic web.

Day 3 · 2014-12-03 Conference

09:00 - 10:15


Moving from MARC: How BIBFRAME moves the Linked Data in Libraries conversation to large-scale action

Eric Miller
Zepheira, United States of America


As the Library of Congress looked to the future of MARC, they looked to Linked Data principles and Semantic Web standards as the foundation of BIBFRAME. Libraries have an extensive history with MARC as a sophisticated and highly customized descriptive vocabulary with billions of records spread across systems and providers. In order to recognize the value of connecting this legacy in new and contemporary ways, BIBFRAME’s design is intentionally extensible with Profile-based vocabularies, flexible transformation utilities, and iterative linking strategies in mind. The migration from MARC (and other related Library standards) to BIBFRAME offers the most widely actionable opportunity for libraries to adopt Linked Data as a foundation of their Web visibly and internal operations.
This session will include a review of practical tools we have used in helping libraries:
• evaluate their current data
• define local data priorities
• perform large-scale transformation
• create profile-based definitions for original content
• identify linking options
• move beyond simply representing legacy data to take full advantage of the Linked Data nature of Web vocabularies like BIBFRAME and
We benefit from looking back at the history of how Libraries have helped shape the Web of Data to the future of how now given these standards, we together can raise the visibility of libraries on the Web.

Weaving repository contents into the Semantic Web

Pascal-Nicolas Becker
Technische Universität Berlin - Universitätsbibliothek, Germany


Repositories are systems to safely store and publish digital objects and their descriptive metadata. Repositories mainly serve their data by using web interfaces which are primarily oriented towards human consumption. They either hide their data behind non-generic interfaces or do not publish them at all in a way a computer can process easily. At the same time the data stored in repositories are particularly suited to be used in the Semantic Web as metadata are already available. They do not have to be generated or entered manually for publication as Linked Data. In my talk I will present a concept of how metadata and digital objects stored in repositories can be woven into the Linked (Open) Data Cloud and which characteristics of repositories have to be considered while doing so. One problem it targets is the use of existing metadata to present Linked Data. The concept can be applied to almost every repository software. At the end of my talk I will present an implementation for DSpace, one of the software solutions for repositories most widely used. With this implementation every institution using DSpace should become able to export their repository content as Linked Data.

10:15 - 10:45

Coffee break

10:45 - 12:15

Knowledge Organization Systems

KOS evolution in Linked Data

Joachim Neubert
ZBW Leibniz Information Centre for Economics, Germany


Over time, Knowledge Organization Systems such as thesauri and classifications undergo lots of changes, as the knowledge domains evolve. Most SKOS publishers therefore put a version tag on their vocabularies. With the vocabularies interwoven in the open web of data, however, different versions may be the base for references in other datasets. So, updates by "third parties" are required, in indexing data as well as in mappings from or to other vocabularies. Yet answers to simple user questions such as "What's new?" or "What has changed?" are not easily obtainable. Best practices and shared standards for communicating changes precisely and making them (machine-) actionable still have to emerge. STW Thesaurus for Economics currently is subject to a series of major revisions. In a case study we review the amount and the types of changes in this process, and demonstrate how versioning in general and difficult types of changes such as the abandonment of descriptors in particular are handled. Furthermore, a method to get a tight grip on the changes, based on SPARQL queries over named graphs, is presented. And finally, the skos-history activity is introduced, which aims at the development of an ontology/application profile and best practices to describe SKOS versions and changes.

Publish your SKOS vocabularies with Skosmos

Henri Ylikotila / Osma Suominen
The National Library of Finland, Finland


Skosmos is an open source web-based SKOS browser being developed at the National Library of Finland. It can be used by e.g. libraries and archives as a publishing platform for controlled vocabularies such as thesauri, lightweight ontologies, classifications and authority files. The Finnish national thesaurus and ontology service Finto is built using Skosmos, which was formerly known as ONKI Light. Finto is used by indexers at the National Library and other libraries, as well as other organizations including the Finnish broadcasting company YLE and many museums. It is also used to support vocabulary development processes. Skosmos provides a multilingual user interface for browsing and searching the data and for visualizing concept hierarchies. The user interface has been developed by analysing the results of repeated usability tests. A developer-friendly REST API is also available providing RDF/XML, Turtle or JSON-LD serializations and Linked Data access for utilizing vocabularies in other applications such as annotation systems. Skosmos relies on a SPARQL endpoint as its back-end and is written mainly in PHP. The main benefits of using a SPARQL endpoint is that the data provided by the service is always up to date. This allows fast update cycles in vocabulary development. Skosmos can be configured to suit different types of RDF data. The source code is available under the MIT license.

Turning three overlapping thesauri into a Global Agricultural Concept Scheme

Thomas Baker / Osma Suominen
Sungkyunkwan University, Korea / National Library of Finland, Finland


AGROVOC Concept Scheme, CAB Thesaurus (CABT), and NAL Thesaurus (NALT) largely overlap in scope (agriculture and agricultural research). This duplication is both inefficient for their maintainers and constitutes a barrier to searching across databases indexed with their terms. Common representation in SKOS makes mapping easier, but it would in principle be more efficient to merge the thesauri into one shared concept scheme to be jointly maintained by the three organizations. A feasibility study has defined a semi-automatic method for mapping among the three thesauri. Confirmed mappings will be used to coin new concepts, with new URIs, for a shared Global Agricultural Concept Scheme (GACS). One key challenge will be to balance the inclusion of diverse concept hierarchies from the source thesauri against a desire to converge on common semantics through editorial intervention. Partners who currently use their thesauri to automatically generate derivative products will need to balance the efficiencies of sharing a concept scheme with the control required for local production processes. GACS will be natively represented as SKOS XL, edited using VocBench software and published using the Skosmos platform (both open source software) under a Creative Commons license. The GACS project aspires to constitute a consortium open to other thesaurus maintainers. The first version of GACS will be available online in time for a presentation of lessons learned at SWIB 2014.

12:15 - 13:45


13:45 - 15:30

Linking Things

All knowledge, annotated

Dan Whaley, United States of America


This August the W3C chartered the Web Annotation Working Group, based around an RDF data model called Open Annotation, which is rapidly being integrated into new open source tools and software libraries and adopted by a diverse cross-section of scholars, scientists, educators and others. The potential is to create a new layer over the web as we know it, enabling a rich set of interactive capabilities that until now have not been possible. This talk will provide an overview of the history behind annotation as an essential idea of the web, demonstrate some of the ways its being used and suggest plans for further development.

Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing

Cristina Sarasua
Institute for Web Science and Technologies (WeST). University of Koblenz-Landau, Germany


Semantic Web technologies enable the integration of distributed data sets curated by different organisations and with different purposes. Descriptions of particular resources (e.g. events, persons or images) are connected through links that explicitly state the relationship between them. Connecting data of similar or disparate domains, libraries can offer a more extensive and detailed information to their visitors, while librarians have better documentation in their cataloguing activities. Despite the advances in data interlinking technology, human intervention is still a core aspect of the process. Humans, in particular librarians, are crucial both as knowledge providers and reviewers of the automatically computed links. One of the problems that arises in this scenario is that libraries might have limited human resources dedicated to authority control; so, running the time-consuming interlinking process over external data sets becomes troublesome. Microtask crowdsourcing provides an economic and scalable way to involve humans systematically in data processing. The goal of this talk is to introduce the process of crowdsourced data interlinking in semantic libraries, which is a paid crowd-powered approach that can support librarians in the interlinking task. Several use cases are described to illustrate how our software, which implements the crowdsourced data interlinking process, could be useful to reduce the amount of information that librarians would need to process when enriching their data with other sources, or to obtain a different perspective from potential users. In addition, challenges that become relevant when adopting this approach are listed.

Entification: The Route to 'Useful' Library Data

Richard Wallis
OCLC, United Kingdom


Linked Data is all about identifying 'things' then describing them and their relationships in a web of other 'things', or entities. Many library linked data initiatives have focused on directly transforming records into RDF with little linking between the shared concepts captured within those records or to external authoritative representations of the same things. The British Library, with a linked data version of the British National Bibliography, was an early pioneer in attempting to model real world entities as a foundation for their data model. Similar research within OCLC, that led to the release of entities as open linked data from, such as Works, has demonstrated the benefits of such an approach. It also demonstrates that there is much more than record-by-record format conversion required to successfully achieve a web of real world entities. Significant data mining processes, the open availability of authoritative data hubs (such as VIAF, FAST, Library of Congress), and the use of flexible and widely accepted vocabularies, all play a necessary part in this success. Richard will explore some of the issues and benefits of creating library data as descriptions of real world entities, and share some insights into the processes required and their results.


Please note that all information may be subject to change.

organized by:


Friedrich-Ebert-Stiftung Bonn
Bonner Haus
Godesberger Allee 149
53175 Bonnn


Adrian Pohl
Tel. +49-(0)221-40075235
E-Mail: swib(at)

Joachim Neubert
Tel. +49-(0)40-42834462
E-Mail: j.neubert(at)

Twitter: #swib14