Hide all abstracts

DAY 1   |   2017-12-04   PRECONFERENCE
Treffen der DINI AG KIM (Meeting of the DINI AG KIM, Germany)
Stefanie Rühle / Jana Hentschke
DINI AG KIM, Germany
Tagesordnung (the meeting is held in German)

Treffen der AG "Kompetenzzentrum Interoperable Metadaten" (KIM) innerhalb der Deutschen Initiative für Netzwerkinformation (DINI).

  Meeting of the German speaking VIVO Community
Christian Hauschke
Technische Informationsbibliothek (TIB), Germany

User meeting of the German-speaking VIVO community to discuss current concerns and deepen cooperation. Anyone interested is welcome. The meeting will be held in English.

  BIBFRAME Use: Vocabulary, Conversion, Reconciliation
Ray Denenberg / Nate Trail / Wayne Schneider / Leif Andresen
Library of Congress, United States of America / Index Data, Boston, United States of America / Royal Danish Library, Denmark

This Workshop will be an opportunity to probe topics related to BIBFRAME like ontology, MARC data conversion, and merge and match of BF data converted from MARC to produce Works, Instances, and related resources. A portion of the time will treat ontology patterns and the BIBFRAME approach. The leaders have real experience in all these aspects of BIBFRAME and are interested in an exchange of ideas related to issues they have encountered. The Pilot 2 that the Library of Congress is currently immersed in has dealt with the whole continuum from vocabulary to conversion specs to conversion to merge and match to edit.
Sign-up: Send an email to Leif Andresen

Slides: Update

Slides: Ontology Patterns

Slides: Merge and Match

Slides: Conversion Reconciliation

Introduction to Linked Open Data
Christina Harlow / Uldis Bojars / Huda Jaliluddin Khan
Stanford University, United States of America / National Library of Latvia / Cornell University, United States of America

This introductory workshop aims to introduce the fundamentals of linked data technologies on the one hand, and the basic issues of open data on the other. The RDF data model will be discussed, along with the concepts of dereferenceable URIs and common vocabularies. The participants will continuously create and refine RDF documents about themselves including links to other participants to strengthen their knowledge of the topic. Based on the data created, the advantages of modeling in RDF and publishing linked data will be shown. On a side track, Open Data principles will be introduced, discussed and applied to the content that is being created during the workshop.

Attendees are not expected to have any technical, RDF or Linked Open Data experience. We do ask that attendees bring a laptop with a modern web browser for participation.

Slides: Harlow

Slides: Wilcox

  (Meta)data Management with knime
Magnus Pfeffer / Kai Eckert
Stuttgart Media University, Germany

KNIME has established itself over the past years as an open platform for all kinds of data processing and analysis. In this workshop, the participants can try their hands at using knime for basic data processing tasks like loading data from files, transforming data into an intermediate format and saving data into files. In the second half, the focus will be on more advanced tasks like using web apis to enrich local data. Target audience: Persons with some knowledge on data format in the bibliographic domain who want to learn about a different approach to data processing. knime is using a graphical ui and is quite intuitive. Knowledge of programming languages is helpful, but for many tasks not required at all.

Prerequisites: Participants should bring their own notebook, which should have at least 4 GB of ram and a decent cpu. A 64-bit system with a 64-bit operating system is recommended. knime is open source and available for all major operating systems. Participants should download and install the latest version that includes all free extensions beforehand. It is a huge download and would overwhelm the internet connection at the conference.


  Managing Assets as Linked Data with Fedora
David John Wilcox
DuraSpace, Canada

Fedora is a flexible, extensible, open source repository platform for managing, preserving, and providing access to digital content. Fedora is used in a wide variety of institutions including libraries, museums, archives, and government organizations. Fedora 4 introduces native linked data capabilities and a modular architecture based on well-documented APIs and ease of integration with existing applications. Recent community initiatives have added more robust functionality for exporting resources from Fedora in standard formats to support complete digital preservation workflows. Both new and existing Fedora users will be interested in learning about and experiencing Fedora features and functionality first-hand.

Attendees will be given pre-configured virtual machines that include Fedora bundled with the Solr search application and a triplestore that they can install on their laptops and continue using after the workshop. These virtual machines will be used to participate in hands-on exercises that will give attendees a chance to experience Fedora by following step-by-step instructions. Participants will learn how to create and manage content in Fedora in accordance with linked data best practices and the Portland Common Data Model. Attendees will also learn how to import resources into Fedora and export resources from Fedora to external systems and services as part of a digital curation workflow. Finally, participants will learn how to search and run SPARQL queries against content in Fedora using the included Solr index and triplestore. (Source code)

Slides: Introduction to Fedora

Slides: Extended Services

  Practical Linked Data Annotations on IIIF Image Resources
Michael Appleby / Tom Crane / Glen Robson / Simeon Warner
Yale University, United States of America  / Digirati, United Kingdom / National Library of Wales, United Kingdom / Cornell University, United States of America

The International Image Interoperability Framework (IIIF) defines a set of common application programming interfaces that support interoperability between image repositories. The IIIF APIs are designed to provide a common method for accessing high resolution images and to enable the development of sophisticated client applications that allow image comparison, manipulation, and annotation. A growing community of cultural heritage institutions is publishing content using these APIs, which are based on linked data models and specify JSON-LD serializations that are easily consumed by clients. The ease of access to content afforded by IIIF has accelerated the development of both automated and interactive workflows for enrichment using the Open Annotation Data Model, which plays a fundamental role in the IIIF Presentation API.

The goal of this workshop is to familiarize participants with annotation in the context of IIIF APIs and to provide examples of compatible server and client software.

The workshop will begin with a brief review of the IIIF Image API, followed by an in-depth overview of the IIIF Presentation and Search APIs, with particular emphasis on their use of Open Annotation. Participants will then step through a practical example of creating an online annotation environment using IIIF-compatible web clients (Mirador and the Universal Viewer) and the Simple Annotation Server. Simple workflows involving optical character recognition and named entity recognition will be demonstrated, with results being visualized in the IIIF clients. The workshop will conclude with discussion of future developments of the IIIF data model, including support for time-based media and use of the W3C Web Annotation Data Model.

Participants should have a basic knowledge of linked data and RDF. Attendees should come equipped with a laptop with a modern web browser installed. Those wishing to follow code examples should additionally have Python and an editor available.

Slides: Mirador User Interface Demonstration

Slides: Open Annotation Data Model

Slides: Introduction to IIIF

Slides: Search API

Slides: SAS - Single Annotation Server

Slides: Visualising Annotations

Slides: Annotations

Slides: Introduction to the IIIF Presentation API

  Authoring, Annotations, and Notifications in a decentralised Web with Dokieli
Sarven Capadisli
University of Bonn, Germany

In this event we will explore dokieli's core principles, architectural and design patterns through demonstrations and discussions. Its sociotechnical design decisions will be justified based on freedom of expression, decentralisation, interoperability, and access.
The demonstrations will look into use cases involving authoring articles, annotations, and social interactions in a decentralised manner. This workshop is for the brave ones ;) that want to get a feel of just how far existing native Web technologies and recommendations – Linked Data, read-write Web, WebID, Web Annotations, Linked Data Notifications – can go and help researchers, librarians, annotators, and developers.

General familiarity with the conference's topics and aims is expected, but no previous expertise is required. Attendees may want to bring laptops.

DAY 2   |   2017-12-05   CONFERENCE

09:00 - 09:30
KlausTochtermann / Silke Schomburg
ZBW - Leibniz Information Centre for Economics, Germany / North Rhine-Westphalian Library Service Center (hbz), Germany
09:30 - 10:15 Keynote: Every Collection is a Snowflake
George Oates
Good, Form & Spectacle, United Kingdom

The promise of pristine linked data is powerful and compelling, but in practice, we’re working with data created by humans, which is full of inconsistencies and gotchas. Instead of trying to force computers to understand these foibles, we’ve been working directly with data creators and using data visualisation to see the grain of individual catalogues including Open Library and the fabulous Wellcome Library collection. This work has gently challenged myths of data completeness and accuracy, and even helped data creators see their own collection data in a new light.



10:15 - 10:45 COFFEE BREAK

10:45 - 11:10
DOREMUS : Doing Reusable Musical Data
Rodolphe Bailly / Jean Delahousse / Raphael Troncy
Cité de la musique - Philharmonie de Paris, France / Ourouk, France / Eurecom, France

DOREMUS is a research project that aims i) to propose an extension of the FRBRoo ontology to specifically describe musical resources and ii) to publish linked open datasets of several large catalogs coming from the French National Library (BnF), the Philharmonie of Paris and the French public service radio broadcaster (Radio France), that are originally described in MARC and ad-hoc XML formats. In the perspective of data interoperability between those datasets, the DOREMUS project has also produced controlled vocabularies describing keys, modes, derivations, musical genres, medium of performances by re-using existing resources (e.g. IAML vocabularies, Rameau), or creating new ones. We will present the results of DOREMUS project, including the ontology, the controlled vocabularies, the tools for data conversion and interlinking and examples of reuse of those results in a music recommendation system.

11:10 - 11:35 De l’Une à l’Autre: Towards Linked Data in Special Collections Cataloging
Regine Heberlein / Joyce Bell / Lidia Santarelli / Jennifer Baxmeyer / Peter Green
Princeton University Library, United States of America

A member of the Mellon-funded Linked Data for Production (LD4P) initiative, LD4P at Princeton is participating in defining linked data ontologies specifically for the description of special collections materials. It uses a hand-selected set of 525 items from the Library of Jacques Derrida – all of which bear inscriptions by persons who gifted the books to Derrida or his household – to investigate tools, workflows, and data models that will make the creation of linked descriptive data for annotated material viable in a production environment. To this end, LD4P at Princeton is exploring modeling these inscriptions on an adaptation of the (W3C Web Annotation Data Model) and linking them to bibliographic data converted to (BIBFRAME 2.0) or related models. Since the VitroLib ontology editor, which was developed at Cornell University as part of the LD4L-Labs initiative, is designed to use (bibliotek-o; a supplement to BIBFRAME developed by LD4L Labs and the LD4P Ontology Group), LD4P at Princeton anticipates using bibliotek-o at least part of the time.

The presentation will give an overview of the workflows and tools the group has developed, demonstrate the current data model, and discuss our current thinking on implementation issues such as front- and back-end interfaces, API's and the curation of external data sources, and lessons learned along the way.



11:35 - 12:00 Integrating LOD into Library’s Digitized Special Collections
Myung-Ja K. Han / Deren Kudeki / Timothy W. Cole / Jacob Jett / Caroline Szylowicz
University of Illinois at Urbana-Champaign. Library, United States of America

Management of digitized special collections requires ‘special’ curation; integrating Linked Open Data (LOD) into these collections also requires customized approaches different from those used for general library collections. With support from the Andrew W. Mellon Foundation, the Linked Open Data for Digitized Special Collections project is examining challenges and opportunities of LOD for three digitized special collections held at the University of Illinois. We began by transforming legacy metadata into RDF, using semantics to identify classes of entities conflated in original metadata accounts, e.g., visual artworks, person entities, theater productions, published works, and added links for linked data sources. These new LOD accounts were then embedded as JSON-LD in HTML pages to make the LOD visible to harvesting agents. To provide users more context, client-side JavaScript (relying on JQuery and Mustache.js) generates a mash-up of local descriptions and properties fetched from LOD providers like and in real-time. Links to further context are provided for person entities (playwrights, composers, actors, directors), venues (theaters), works (plays), and performances.

This presentation will showcase the challenges of transforming special collection metadata into LOD and the opportunities to meet users’ needs by using LOD enrichment.



12:00 - 13:30 LUNCH

13:30 - 13:55
Perspectives on using for publishing and harvesting Metadata at Europeana
Nuno Freire / Richard Wallis / Antoine Isaac / Valentine Charles / Hugo Manguinhas
INESC-ID, Portugal / Data Liberate consultancy, United Kingdom / Europeana Foundation, The Netherlands

Providing access to Europe’s digital cultural heritage is Europeana’s core mission. To do so, Europeana has progressively adopted the principles of Linked Data for representing, aggregating and enriching the metadata it collects and is now looking at the emerging web technologies to refine its services.

Over the past years we have followed very closely the development of and looked into its potential for increasing the visibility of cultural heritage data on the web and their consumption by search engines. We will present a set of recommendations for publishing Europeana metadata using the vocabulary and report on the status of implementation. We address the representation of the embedded metadata as part of the Europeana HTML pages and sitemaps so that the re-use of this data can be optimized. We also produce a representation of Europeana resources described with the Europeana Data Model (EDM), being the richest as possible and tailored to Europeana’s realities and user needs as well the search engines and their users.

In an aggregation context, interoperability with can also mean that Europeana can use data as a source for its own data services. We will discuss how data available in cultural heritage organizations’ websites can be used as a method to provide metadata for ingestion to Europeana and present our first harvesting experimentations.



13:55 - 14:20 BIBFRAME Pilot
Ray Denenberg / Nate Trail / Sally McCallum
Library of Congress, United States of America

The Library of Congress has begun a second Pilot simulating the cataloging environment with Linked Data, a triple store, and the BIBFRAME data model. A great deal of information resulting from the Pilot will soon be available and the session will report on it. The Pilot and the whole BIBFRAME development are carried out in an open environment with specifications, conversion tools, and system components made downloadable via a web site or GitHub as they are developed. The BIBFRAME 2 ontology is also available as an OWL file and many of the controlled vocabularies used for the data, such as subjects and names, are available in RDF. The vocabulary services have been operational for over 5 years and were basic building blocks for this further development. The Pilot has taken a total-environment approach by converting the whole of the Library of Congress MARC catalog to RDF according to the BIBFRAME data model which the 60 catalogers in the Pilot “catalog against” as they create new descriptions of items. Pilot catalogers are specialists (with various language assignments) who deal with monographs, serials, moving image, recorded sound, still image, cartography, and music.

The presentation will discuss the tools used (what they did well and less well), aspects of converting a very large file to a very different data model, RDF and ontology issues, and cataloger efficiencies and problems in the new environment. We will also share thoughts on how this effort may fit into the global Linked Data environment, including how it can benefit from further engagement with other communities and services.



14:20 - 14:45 The bibliotek-o Framework: Principles, Patterns, and a Process for Community Engagement
Steven Folsom / Jason Kovari / Rebecca Younes
Cornell University, United States of America

This presentation provides a detailed description of ontology development efforts undertaken by Linked Data for Libraries Labs and Linked Data for Production partners to extend BIBFRAME 2.0 and enhance with alternative models, which have yielded the bibliotek-o framework (available: bibliotek-o GitHub Repository and bibliotek-o website).
The framework includes the bibliotek-o ontology alongside well-established ontologies, building off BIBFRAME as its core. We are not creating a competitor to BIBFRAME; instead, our intention is to demonstrate select alternative models for consideration by the community and BIBFRAME architects as development continues in future versions of BIBFRAME.

We will discuss motivations and focus on bibliotek-o modeling patterns, notably areas of deviation from BIBFRAME; in doing so, we will demonstrate how we believe that these models provide queryable patterns and align with ontology principles and best practices. Further, we will discuss efforts around development of an application profile and MARC-to-bibliotek-o mapping for use in aligned, in-development tooling for metadata production and conversion of legacy data.

Our goal is to promote open development of bibliotek-o, including community engagement, feedback, collaboration, testing and adoption. To encourage engagement with the SWIB community, we will provide pointers to the various types of bibliotek-o documentation that are available and outline our strategy to engage with the community. With greater participation from the community we can hopefully begin to converge around a shared set of practices with a clear process for iterative improvements.



14:45 - 15:10 Will you be my bf: forever? Analysing Techniques for Conversion to BIBFRAME at the University of Alberta
Ian Bigelow / Sharon Farnel
University of Alberta, Canada

The University of Alberta is actively trying to ramp up for Linked Data through local experimentation, research and partnerships with other institutions. Though BIBFRAME is still in development, several transformation tools have already been created, and with many libraries thinking about planning for moving to Linked Data it would seem timely to compare approaches to moving legacy MARC data to BIBFRAME. Setting aside the question of whether BIBFRAME should be the approach for libraries to move to Linked Data, this investigation aimed at comparing two tools for converting MARC to bf:2.0: A) LC MARC to BIBFRAME XSLT: An XSLT 1.0 application aimed at converting MARC to RDF/XML released in March 2017; and B) Casalini SHARE Virtual Discovery Environment: A project by Casalini Libri and @Cult to develop a Linked Data discovery environment, including a conversion tool for MARC to bf:2.0 RDF.

Through the comparison and analysis of these transformation tools several topics will be explored: Comparison of underlying development models and performance of the tools; Comparison of data element conversion and impact for discovery; Impact of content standard on conversion efficacy (AACR vs. RDA); Implications for conversions for various formats (monographs, serials, et cetera); URI enrichment pre/post conversion; In house and vendor workflow implications.



15:10 - 15:40 COFFEE BREAK
15:40 - 18:00 OPEN SPACE
  Lightning Talks


European BIBFRAME Workshop 2017
Leif Andresen
Royal Danish Library, Denmark


Linked Open Citation Database (LOC-DB)
Kai Eckert
Stuttgart Media University, Germany


Real Time Information Channel
Violeta Ilik
Stony Brook University, New York, USA


  Breakout Sessions

Like last year, we offer a time slot (ca. 16.15 - 18.00 h) for breakout sessions after the lightning talks. This is a possibility for you to get together with other participants over a specific idea, project, problem, to do hands-on work, discuss or write. We hope the breakout sessions will be used for a lot of interesting exchanges and collaboration. Please let us and possible participants know in advance (, and at the conference add your session to the breakout session board.

DAY 3   |   2017-12-06   CONFERENCE

09:00 - 09:45
Keynote: Unlocking Citations from tens of millions of scholarly Papers
Dario Taraborelli
Wikimedia Foundation, United States of America

Citations are the foundation for how we know what we know. Until recently, the idea of creating a freely accessible repository of citation data – representing how scholarly works cite each other – has been hampered by restrictive and inconsistent licenses and by the lack of comprehensive, machine-readable data sources: for decades, references have been locked inside PDFs or proprietary databases. Launched in April 2017, the Initiative for Open Citations (I4OC) has made nearly half of all indexed scholarly references freely available to everyone with no copyright restrictions. The percentage of indexed scholarly works with open reference data was 1% before the launch of the I4OC: as of July 2017, over 16 million scholarly works have open references available as machine-readable public domain data. There’s now momentum and a growing number of organizations, scholarly societies, funders, and publishers in support of the unconstrained availability of scholarly citation data. However, this is just the beginning of a journey to build high-quality scientific commons.

In this talk, I’ll present how the I4OC was created, its current vision and challenges. I'll showcase examples of real-world applications demonstrating how data unlocked by the initiative can be reused to accelerate scientific discovery and the broader impact of scholarship knowledge.



09:45 - 10:10 Finnish National Bibliography Fennica as Linked Data
Osma Suominen
National Library of Finland, Finland

The National Library of Finland is making our national bibliography Fennica available as Linked Open Data. We are converting the data from 1 million MARC bibliographic records first to BIBFRAME 2.0 and then further to the data model. In the process, we are clustering works extracted from the bibliographic records, reconciling entities against internal and external authorities, cleaning up many aspects of the data and linking it to further resources. The Linked Data set is CC0 licensed and served using HDT technology.

The publishing of Linked Data supports other aspects of metadata development at the National Library. For some aspects of the Linked Data, we are relying on the RDA conversion of MARC records that was completed in early 2016. The work clustering methods, and their limitations, inform the discussions about potentially establishing a work authority, which is a prerequisite for real RDA cataloguing.

This presentation will discuss lessons learned during the publishing process, including the selection and design of the data model, the construction of the conversion pipeline using pre-existing tools, the methods used for work clustering, reconciliation and linking as well as the infrastructure for publishing the data and keeping it up to date.



10:15 - 10:45 COFFEE BREAK

10:45 - 11:10
A distributed Network of Heritage Information
Enno Meijers
National Library of the Netherlands

The Dutch Digital Heritage Network (NDE) started in 2015 by the national cultural heritage institutions as a joint effort to improve the visibility, usability and sustainability of the cultural heritage collections maintained in the GLAM institutions. One of the goals is the realization of a distributed network of heritage information that no longer depends on aggregation of the data.

This talk will focus on our approach for developing a new, cross-domain, decentralized discovery infrastructure for the Dutch heritage collections. A core element in our strategy is to encourage institutions to align their information with formal Linked Data resources for people, place, periods, concepts and to publish their data as Linked Open Data. The NDE program works on making all relevant terminology sources available as Linked Data and provide facilities for term alignment and building new thesauri. Another important goal is to provide means for browsing the collections in a cross-domain, user centric fashion. Based on possible relevant URIs identified in the user queries we want to be able to browse the available Linked Data in the cultural heritage network. The bi-directional use of Linked Data without aggregation is still a technological challenge. We decided to build a registry that records the back links for all the URIs used in our network. Next to Linked Data definitions of organizations and datasets we will also record fingerprints of the object descriptions. This information will provide the back links which make it possible to navigate from a term URI to the objects that have a relation with this term. We are currently developing a Proof-of-Concept and will show the first results at the SWIB conference.



11:10 - 11:35 High Quality Linked Data Generation
Anastasia Dimou / Ben de Meester / Pieter Heyvaert / Ruben Verborgh
imec - Ghent University, Belgium

Linked Data allows the description of domain-level knowledge that is understandable by both humans and machines. Nevertheless, as machines are intolerant of unexpected input, the quality of the underlying Linked Data largely determines the success of the envisaged Semantic Web. Despite, though, the significant number of existing tools, generating Linked Data by incorporating heterogeneous data from multiple sources and different formats into the Linked Open Data cloud remained complicated, let alone generating their metadata. Raw data values are expected to be used as extracted, while when data transformations occur, they remain coupled and case-specific in separate not-reusable systems. Moreover, quality assessment is performed after Linked Data is published and adjustments are manually – but rarely – applied, while the violations root is not identified. In this talk, we present a sustainable semantic-driven approach, based on the RML toolchain (RML Mapper, RML Workbench, RML Editor and RML Validator), which we adopt to address the aforementioned shortcomings and enables data owners to generate high quality Linked Data by themselves. This way, we facilitate and automate the generation of high quality Linked Data with accurate, consistent and complete metadata, offering a granular, sustainable and generic solution that shortens the Linked Data generation workflow, and achieves higher integrity within Linked Data.



11:35 - 12:00 Practical Data Provenance in distributed Environment or: implementing Linked Data Broker using Microservices Architecture
Joonas Kesäniemi / Stefan Negru / João da Silva
University of Helsinki, Finland

Maintaining some sort of data provenance, i.e. the data about the actions that have led the target data to its current state, is an integral feature of a system acting as a data broker. After all, brokering can be seen as an activity that transforms external data sources, which one might not have any control over, into new data source. This transformation can involve complex processing steps, which all contribute to the provenance data. Keeping track of who did what, why and when, is therefore necessary in order be able to ascribe responsibility of, e.g. data quality, to the right (human or software) entity.

We have been developing a data broker solution based on semantic web technologies that is flexible and extendable both in terms of incoming and outgoing data, as well as the cloud based infrastructural resources employed to operate the broker instance. Our solution consists of components implementing different types of services such as workflow and graph management, processing, distribution and provenance.

We present the result of the ATTX project, which provides a set of software components that can be used to build scalable data brokers that work on linked data. We will cover issues and implementation related to modeling, acquisition, exposing and using provenance information produced by services that comprise the ATTX data broker instance.



12:00 - 13:30 LUNCH

13:30 - 13:55
Visual Concept Detection and Linked Open Data at the TIB AV-Portal
Felix Saurbier / Matthias Springstein
Technische Informationsbibliothek (TIB), Germany

The German National Library of Science and Technology (TIB) researches and develops methods of automated content analysis and semantic web technologies to improve access to its library holdings and allow for advanced methods of information retrieval (e.g. semantic and cross-lingual search). Regarding scientific videos in the TIB AV-Portal spatio-temporal metadata is extracted by several algorithms analysing (1) superimposed text, (2) speech, and (3) visual content. In addition, the results are mapped against common authority files and knowledge bases via a process of automated Named Entity Linking and published as Linked Open Data to facilitate reuse and interlinking of information.

Against this background the TIB constantly aims to improve its automated content analysis and Linked Open Data quality. Currently, extensive research in the fields of deep learning is conducted to significantly enhance methods of visual concept detection in the AV-Portal – both in terms of detection rates and coverage of subject-specific concepts. Our solution applies a state-of-the-art deep residual learning network based on the popular TensorFlow framework in order to predict and link visual concepts in audio-visual media. The resulting predictions are mapped against authority files and expressed as RDF-Triples.

Therefore, in our presentation we would like to demonstrate how research in the field of machine learning can be combined with semantic web technologies and transferred to library services like the AV-Portal to improve functionality and provide added value for users. In addition we would like to address the question of data quality assessment and present scenarios of metadata reuse.



13:55 - 14:20 Improving Named Entity Recognition in the Biodiversity Heritage Library with Machine Learning
Katie Mika / Alicia Esquivel
Museum of Comparative Zoology, Harvard University, United States of America / Chicago Botanic Garden, United States of America

Scientific names are important access points to biodiversity literature and significant indicators of content coverage. The Biodiversity Heritage Library (BHL) mines its content using the open source Global Names Recognition and Discovery (GNRD) tool from the Global Names Architecture (GNA) suite of machine learning and named entity recognition algorithms, to extract scientific names to index and attach to page records.

The 2017 BHL National Digital Stewardship Residents (NDSR) are working collaboratively on a group of projects designed to deliver a set of best practices recommendations for the next version of the BHL digital library portal. NDSR Residents Katie Mika and Alicia Esquivel will discuss (i.) BHL and the significance of taxon names, (ii.) the current workflow, proposed improvements, and example workflows for linking content across scientific names including semantic linking to biodiversity aggregators such as Encyclopedia of Life and the Global Biodiversity Information Facility, (iii.) how to use scientific names for content analysis, and (iv.) optimizing manuscript transcription of archival content, which introduces problems like outdated and common names, misspellings, and antiquated taxonomies to GNA tools.

Authors invite questions, comments, and discussion from audience members as the Residents prepare to submit their final recommendations at the end of the year.



14:20 - 14:45 Linking the Data: Building effective Authority and Identity Lookup
Huda Jaliluddin Khan / E. Lynette Rayle / David Eichmann / Simeon Warner / Dean Krafft
Cornell University, United States of America / The University of Iowa, United States of America

The Mellon Foundation-funded Linked Data for Libraries Labs (LD4L-Labs) and Linked Data for Libraries Production (LD4P) projects are exploring the library community’s transition to Linked Open Data. Authority and identity lookups are integral to cataloging workflows and provide excellent opportunities for exploring how to leverage the power of Linked Data for reconciliation and more effective lookups. Central questions in this work include the implementation, performance and reliability of lookup services; what multiple authority lookups mean with respect to reconciliation; and user interface design.

We are currently experimenting with caching mechanisms that address the issues of access, reliability, and speed, including an approach that uses a Linked Data Fragments server for caching and serving up authority results. We are also exploring the implementation of a containerized and potentially mirrored search index populated using Jena and Fuseki to execute SPARQL queries against Linked Data sources. We are extending the Questioning Authority (QA) work being developed for the Hydra repository to enable more configurable Linked Data lookups that can be used across different technology stacks. VitroLib, an experimental cataloging tool being developed using the Vitro platform, integrates QA lookups to support catalogers searching for and linking to authorities.

We will demonstrate the lookup configuration and integration into both the Hydra repository and VitroLib. All the software is open source.



14:45 - 15:10 Integrating Distributed Data Sources in VIVO via Lookup Services
Tatiana Walther / Martin Barber / Anna Kasprzik
Technische Informationsbibliothek (TIB) Hannover, Germany

Recording information about countries, conferences, organizations and concepts in a Linked Data application like VIVO means at the first stage an initial import of a large number of data items, which beforehand must be transformed into RDF and manually enriched with persistent identifiers, geographic position, short description, and multilingual labels. Collecting, enriching and converting such an amount of information cost considerable temporal and administrative efforts. Storage of the amount of data can slow down the performance, responsiveness and reasoning processes of an application.

Lookup services, already developed for VIVO, DSpace-CRIS, Linked Data for Libraries (LD4L) and other projects are aimed to facilitate the integration of external authority data.

Whereas some vocabularies and data sources like EuroVoc and Wikidata offer a SPARQL endpoint, other authority data sources such as the Integrated Authority File of the German National Library (GND) provide only data dumps. Our objective is to enable a combined access to external sources via a single interface, using Named Entity Recognition tools, APIs and SKOSMOS in the background. Beside concepts we would also provide integration of such data items as events, organizations and languages, supplemented with additional information, which requires mappings between source and target systems in order to insert and display attributes and relations of the selected entities. Furthermore we investigate the automated transferring of the changes made in external vocabularies to the data in the target system.

This presentation outlines our achievements and lessons learned concerning the integration of semantically structured and enriched data from distributed sources via lookup services, similar to the external vocabulary services in VIVO and related projects.



15:10 - 15:15 CLOSING







Joachim Neubert
T. +49-(0)-40-42834462
E-mail j.neubert(at)



Adrian Pohl
T. +49-(0)-221-40075235


Twitter: #swib17