Hide all abstracts

DAY 1   |   2018-11-26   PRECONFERENCE
DINI-AG KIM meeting
Jana Hentschke / Stefanie Rühle
DINI-AG Kompetenzzentrum Interoperable Metadaten (KIM)


Public meeting of the DINI-AG KIM. KIM is a forum for German-speaking metadata experts from LAM institutions. The Meeting will be held in German. Agenda

13:00 - 19:00

Introduction to Linked Open Data
Christina Marie Harlow / Simeon Warner / Camille Villa
Stanford University, United States of America / Cornell University, United States of America

This introductory workshop aims to introduce the fundamentals of linked data technologies on the one hand, and the basic issues of open data on the other. The RDF data model will be discussed, along with the concepts of dereferenceable URIs and common vocabularies. The participants will continuously create and refine RDF documents about themselves including links to other participants to strengthen their knowledge of the topic. Based on the data created, the advantages of modeling in RDF and publishing linked data will be shown. On a side track, Open Data principles will be introduced, discussed and applied to the content that is being created during the workshop.

  From LOD to LOUD: making data usable
Fabian Steeg / Adrian Pohl
Hochschulbibliothekszentrum NRW (hbz), Germany

Linked Open Usable Data (LOUD) extends Linked Open Data (LOD) by focussing on use cases, being as simple as possible, and providing developer friendly web APIs with JSON-LD. The term was coined by Rob Sanderson. This workshop will introduce you to the basic concepts of LOUD, web APIs, and JSON-LD. You'll learn how to publish and document data as LOUD, and how to use that data in different contexts. In this workshop, we will: (1) Convert RDF data into usable JSON-LD (2) Index and query the data with Elasticsearch (3) Create a simple web application using the data (4) Visualize the data with Kibana (5) Document the data using Hypothesis annotations (6) Use the data with OpenRefine. Audience: librarians and developers working with linked open data. Requirements: Laptop with Elasticsearch 6.x, OpenRefine 2.8, a text editor, web browser, and a command line with cURL and jsonld.js via node.js. As an alternative, we'll also provide a fully configured virtual machine to workshop participants.

  Adding your own stuff to Wikidata
Jakob Voß / Joachim Neubert
(Verbundzentrale GBV) / ZBW Leibniz Information Centre for Economics, Germany

The hands-on tutorial will introduce to usage and collaboration in Wikidata with focus on data import and mapping of existing data sets with Wikidata. Basic concepts of Wikidata, how it can be edited and queried are first explained with exercises. We will then work on connecting a common example dataset to the central database. Typical tools - such as Mix'n'match or QuickStatements - and workflows are shown and tested. The focus of the workshop is not only on technical aspects but on questions of data modeling, data provenance, references, and policies. Basic experience with Wikidata, Wikipedia, and SPARQL is helpful but not required. Participants are asked to create a Wikimedia account in advance and to bring a laptop with them.

  Building a better repository with Fedora
David Wilcox
DuraSpace, Canada

Fedora is a flexible, extensible, open source repository platform for managing, preserving, and providing access to digital content. Fedora is used in a wide variety of institutions including libraries, museums, archives, and government organizations. The latest version of Fedora introduces native linked data capabilities and a modular architecture based on well-documented APIs and ease of integration with existing applications. This year the Fedora community will publish a formal specification of the Fedora REST API that includes alignment with modern web standards such as the Linked Data Platform, Memento, and Web Access Control. This workshop will provide an opportunity for new and existing users to get hands-on experience working with Fedora features and standards-based functionality. Attendees will be given pre-configured virtual machines that include Fedora bundled with the Solr search application and a triplestore that they can install on their laptops and continue using after the workshop. The VM requires a laptop with at least 4GB of RAM. Participants will learn how to create and manage content in Fedora in accordance with linked data best practices and the Portland Common Data Model. Attendees will also learn how to exercise standards-based functionality such as versioning using Memento and authorization using Web Access Control. Finally, participants will learn how to search and run SPARQL queries against content in Fedora using the included Solr index and triplestore. This is an introductory workshop - no prior Fedora experience is required, though some familiarity with repositories will be useful. (Source code)

  Wikibase: configure, customize, and collaborate
Stacy Allison-Cassin / Dan Scott
York University, Canada / Laurentian University, Canada

Originally developed for the Wikidata project, Wikibase "is a collection of applications and libraries for creating, managing and sharing structured data." It offers a multilingual platform for linked open data, including a human-friendly editing interface, a SPARQL endpoint, and programmatic means of loading and accessing data, making it a potential match for libraries that like Wikidata's platform but want to maintain a local store of linked open data. In this workshop, we will discuss how a local Wikibase instance can support library needs, and work through exercises that include: Setting up a local Wikibase instance - Adding users - Creating custom classes and properties - Adding and editing entries - Loading data in bulk - Querying the data - Integrating data with external applications. Prerequisites: Participants will need a laptop to install Wikibase and run through the exercises. The Wikibase virtual machine requires 4GB of RAM, so a remote virtual machine such as an Amazon Web Services or Google Cloud instance might be an alternative.

  Sharing RDF data models and validating RDF graphs with ShEx
Katherine Thornton / Tom Baker / Eric Prud'hommeaux / Andra Waagmeester
Yale University, United States of America / DCMI / W3C / Micelio

Participate in a hands-on workshop about Shape Expressions (ShEx), a concise, formal, modeling and validation language for RDF structures. When reusing RDF graphs created by others, it is important to know how the data is represented. Current practices of using human-readable descriptions or ontologies to communicate data structures often lack sufficient precision for data consumers to quickly and easily understand data representation details. We provide concrete examples of how we use ShEx as a constraint and validation language that allows humans and machines to communicate unambiguously about data assets. We will provide an overview of the ShEx language and related tooling. We will introduce the Javascript and Python implementations in RDF data validation workflows applied contexts. We will walk participants through a data modeling example drawn from the bibliographic domain. We will also demonstrate a validation workflow drawn from the domain of computing, where we will use ShEx to validate entity data from Wikidata. The workshop will take approximately four hours: (1) Overview of ShEx (60 mins) (2) Data Modeling with ShEx (40 mins) (3) Data Validation with ShEx (40 mins) (4) Hands-on exploration of the tools (60 mins). Participants in this workshop will understand the basics of creating a ShEx schema, install either ShEx.js or PyShEx on their local machines, and gain experience using the Online Validator to test entity data from Wikidata to a ShEx schema. Links to software we will use in the workshop: ShEx.js, PyShEx, ShEx Online Validator Audience/Requirements: This workshop is for anyone interested in validating RDF data. A working knowledge of RDF is sufficient background for participation. Please bring your own laptop computer.

DAY 2   |   2018-11-27   CONFERENCE
09:00 - 09:25 OPENING
Silke Schomburg / Klaus Tochtermann
North Rhine-Westphalian Library Service Center (hbz), Germany / ZBW - Leibniz Information Centre for Economics, Germany
09:25 - 10:25 KEYNOTE: The Semantic Web: vision, reality and revision
James Hendler
Rensselaer Polytechnic Institute, United States of America

In 2001, James Hendler joined Web inventor Tim Berners-Lee and their colleague Ora Lassila in writing an article describing a vision for the Semantic Web. The paper, which appeared in Scientific American, has been widely cited and led to much work in both academia and industry aimed at adding machine-readable text to the Web. Now, nearly 20 years later, Google reports that machine-readable metadata is found on over 40% of their crawl and knowledge graph technology, which also grew from this vision, is now a big business used by major organizations around the world. Also growing out of that vision has been the use of linked data in many applications particularly including collection management in libraries, museums and video archiving applications. However, despite this success, much of the original vision of the Semantic Web remains unrealized. In this talk, he discusses what was in the original vision, what has occurred and, most importantly, what still remains to be done if we are truly to recognize the full potential of the Semantic Web.

10:30 - 11:00 COFFEE BREAK
11:00 - 11:25 Capturing cataloger expectations in an RDF editor: SHACL, lookups and VitroLib
Steven Folsom / Huda Khan / Lynette Rayle / Jason Kovari / Rebecca Younes / Simeon Warner
Cornell University, United States of America


The Linked Data for Libraries Labs (LD4L-Labs) and Linked Data for Production (LD4P) projects have created an RDF editor by involving the catalogers at every stage of a user centered design process. Faced with the challenge of developing an RDF editor which would support the desired data outputs that were loosely defined during ontology development, it became clear application profiles were required as an added layer on top of the formal definitions of the selected ontology terms. Through the lens of two use cases (cataloging LP records and rare monographs) we will discuss how the project involved a blend of catalogers, technologists, and ontologists to build VitroLib, a working prototype editor optimized for specific content types and with integrated lookup services. We will describe findings from cataloger assessment of the search behavior and display of information about entities across datasets from our work on lookup services based on the Questioning Authority gem. Continuing preliminary work discussed at SWIB 2017, we will provide details on the construction of SHACL in support of form building and the translation process from SHACL into configuration required for VitroLib. We will highlight challenges, including examples where SHACL semantics do not translate directly to VitroLib form definitions, the need for extensions to SHACL in support of forms, and the lack of best practices in the nascent SHACL community.

11:25 - 11:50 Powering Linked Open Data applications with Fedora and Islandora CLAW
David Wilcox
DuraSpace, Canada


Repositories have traditionally focused on storing content and metadata for use by local applications and services, but this is a poor fit for the world of linked open data. Fedora, the flexible, extensible, open source repository platform, has been designed and implemented as not just a repository but a linked data server. This has been accomplished primarily through alignment with the Linked Data Platform recommendation from the W3C, but Fedora also has a formally specified REST API that aligns with a variety of modern web standards, such as Memento, Web Access Control, and Activity Streams 2.0. This focus on linked data and web standards has allowed Fedora to serve as a reliable repository that also powers web-based linked open data applications. The latest version of Islandora, codenamed CLAW, integrates Fedora with Drupal 8, the popular content management system. CLAW takes full advantage of Fedora’s linked data capabilities while also leveraging Drupal’s powerful network of contributed modules to provide a modern, web-based repository platform that enables linked open data applications. This can be seen in production at the University of Toronto Scarborough (UTSC), where CLAW has been used to build a site that provides Palladio visualizations and exposes a SPARQL endpoint for complex RDF queries. This presentation will provide an overview of the latest versions of Fedora and Islandora CLAW with a focus on the linked data and web-based features and functionality. It will also use the UTSC site as an example of how Fedora can power linked open data applications. (Source code: fcrepo4, CLAW)

11:50 - 12:15 Connecting the dots of Linked Data of resource collections
Thorsten Liebig
derivo GmbH, Germany


Libraries and museums around the globe are transforming their catalogs and records into Linked Data to foster linkage and navigation across data silos. This in return significantly enriches their own assets with valuable information stemming from related sources. The result is a large Knowledge Graph of cross-linked resources typically based on standards such as RDF or OWL. However, navigating and querying large graphs is challenging. Sure, many retrieval tasks are best served by standard user interfaces based on forms and fields. Despite that data providers and users often complain about poor tool support for explorative navigation through complex cross-linked library data. In fact, there should be something in between query forms and SQL/SPARQL query syntax. We will discuss tool support for visually analyzing and querying large LOD volumes with the help of example data from museums and the scholarly domain for providers and users. This includes interactive network rendering approaches, faceted search and other visualization paradigms that promise to ad hoc understand, analyze and track graph-based data as a whole. Relevant benchmark criteria in this respect are among others: - Scalability of visualization approach and user orientation (data provider & user) - User guidance during data exploration (data provider & user) - Support in detection of data patterns or flaws to increase data quality (data provider) - Presumed knowledge of the data schema or query languages (user)

12:15 - 13:45 LUNCH
13:45 - 15:10 Applying Linked Data technologies as a Backend infrastructure for scientific search portals
Benjamin Zapilko / Katarina Boland / Dagmar Kern
GESIS - Leibniz-Institute for the Social Sciences, Germany


In recent years, Linked Data became a key technology for organizations in order to publish their data collections on the web and to connect it with other data sources on the web. With the ongoing change in the research infrastructure landscape where an integrated search for comprehensive research information gains importance, organizations are challenged to connect their historically unconnected databases with each other. In an online survey with 337 social science researchers in Germany, we found evidence that researchers are interested in links between information of different types and from different sources. However, in current scientific portals this is often not yet reflected. In this presentation, we present how Linked Open Data technologies can generally be used to build a backend infrastructure for scientific search portals. This backend infrastructure is set as an additional layer between unconnected non-RDF data collections and makes the links between datasets visible and usable for retrieval via a search index. To address occurring heterogeneity with vague links between datasets, a research data ontology is used in addition for representing different versions and aggregations of research datasets. The LOD backend infrastructure is in use at the search portal of GESIS. The in-use application of our approach has been evaluated in this scientific search portal for the social sciences by investigating the benefit of links between different data sources in a user study. The source code of this project is publicly available.

14:10 - 14:35 Documenting and preserving programming languages and software in Wikidata
John Samuel / Katherine Thornton / Kenneth Seals-Nutt
CPE Lyon, France / Yale University, United States of America


The digital landscape is evolving very fast. Programming languages as well as softwares once taught in universities and previously well-used among developers may not have the same acceptance among the new generation of developers. Initiatives like the Open Preservation Foundation, and Software Heritage play an important role to document, study and preserve these softwares for future generations. With the creation of Wikidata, the game has now changed. Wikidata provides an easy way to document and describe digital solutions using linked open data. Properties to describe various aspects of programming languages, softwares and mobile applications are being continuously proposed, created and supported by the Wikidata community. It is now increasingly becoming a central hub for linked data sources, thus enabling users to get a complete picture of a given digital artifact, especially for complementary information like dependencies, versions etc. We will present WDProp that can help both new and regular Wikidata contributors to get the latest information on Wikidata supported languages, datatypes, properties as well as community curated projects for finding relevant properties. To further facilitate this process, we also introduce the portal Wikidata for Digital Preservation. This free software portal allows people to quickly contribute to Wikidata. The interface guides users in contributing data in alignment with current data models for the domain of computing.

14:35 - 15:00 Integrating library metadata in a semantic web research environment for university collections
Martin Scholz
Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany


The university library of Erlangen-Nürnberg harbors not only historic manuscripts and monographs but also valuable paintings, graphics, coins, etc. These cultural objects form part of the university’s rich and heterogeneous collections, ranging from anatomy and archaeology to school history and zoology. Embedded in a broader digitization strategy, the project "Objekte im Netz” develops a documentation and research platform together with a data model based on semantic web technologies and guidelines to provide means to ensure homogeneous data modelling and minimum data quality across the collections. Eventually, they shall provide scholars the means to work on transdisciplinary research questions, too. The tool builds upon the open source WissKI software, a virtual research environment for cultural heritage that natively stores data using RDF. The common data model is implemented as an OWL ontology that extends the CIDOC CRM and acts as the integrating link between the collections, and also to external sources. This presentation will focus on the integration of the university library's metadata of historic graphics: The metadata can be harvested as MARCXML from different systems and has to be transformed into RDF data aligned with the CIDOC CRM. Following a sketch of the general set-up, an outline of the practical integration steps will be given with a discussion of different technical and data mapping approaches, among others a mapping from BIBFRAME to CIDOC CRM. The discussion is accompanied by remarks on challenges and hindrances met so far, like the lack of officially coined URIs and ontology mismatches.

15:00 - 15:30 COFFEE BREAK
15:30 - 17:30 OPEN SPACE
  Lightning Talks
  Breakout Sessions

Like the years before, we offer a time slot (ca. 16 - 18 h) for breakout sessions after the lightning talks. This is a possibility for you to get together with other participants over a specific idea, project, problem, to do hands-on work, discuss or write. We hope the breakout sessions will be used for a lot of interesting exchanges and collaboration. Please let us and possible participants know in advance (, and at the conference add your session to the breakout session board.

DAY 3   |   2018-11-28   CONFERENCE
09:00 - 10:00 KEYNOTE: Libraries and their communities: participation from town halls to mobile phones
Mia Ridge
British Library, United Kingdom


The rise of the internet has provided both opportunities and challenges for libraries. Focusing on the recent history of public participation in libraries enabled by crowdsourcing built on networked, digital platforms, this talk will also consider how libraries have evolved to provide new forms of access to their collections with open data and new working practices.

10:00 - 10:25 Linked data implementations — who, what, why?
Karen Smith-Yoshimura
OCLC, United States of America


Prompted by the interest among metadata managers within the OCLC Research Library Partnership in the potential of linked data applications to make new, valuable uses of existing metadata, OCLC Research conducted an International Linked Data Survey for Implementers in 2014 and 2015, receiving responses from a total of 90 institutions in 20 countries. Curious about what might have changed in the past three years since the last survey, and eager to learn about new projects or services that format metadata as linked data or make subsequent uses of it, OCLC Research repeated the survey between 17 April and 25 May 2018. The survey questions were mostly the same so we could more easily compare results. This presentation will summarize the 2018 survey results, and focus on comparing them with the results of the previous two surveys, including: 1) Which institutions have implemented or are implementing linked data and for what purpose. What linked data sources these institutions are consuming, and why. Which linked data sources are cited more or less frequently than in the previous surveys? Have motivations changed? 3) What data are these institutions publishing as linked data, and why. Are there different types of data being published as linked data since 2015? Have the drivers for publishing linked data changed? 4) What barriers have implementers had to address, and how would they advise others who are considering starting a project or service that consumes and/or publishes linked data. 5) A sampling of linked data projects or services in production to represent a variety of different uses, scales, domains, and maturity, especially those described as "successful in achieving the desired outcome(s).” The surveys provide a partial view of the linked data landscape, as the analysis is confined to the implementers who responded, primarily from the library domain.

10:30 - 11:00 COFFEE BREAK
11:00 - 11:25 Supporting LCSH subject indexing with LOD in a Nigerian university library
Babarinde Ayodeji Odewumi / Adetoun Adebisi Oyelude
Abba & King Systems LLC, Nigeria / University of Ibadan, Nigeria


Navigating through the peculiar challenges of a third world country could be sort of an art, particularly towards producing quality work in heavily under-funded libraries. With limited financial resources and limited internet connectivity, the University of Ibadan (UI) Library has had to rely on the generosity of organizations, who provide datasets or metadata in various forms (e.g. LOD, MARC, SRU, etc.), to build tools that can support the classification and cataloguing of items especially those published in Nigeria. One of such is the creation of a Search tool using an LOD from the Library of Congress: the LC Subject Headings (LCSH) Dataset. The developers of the University’s Integrated Library System (UIILS) were able to use this dataset to create a Web Service within the UIILS that allows cataloguing staff search through the LCSH entries, even when there is a downtime internet connectivity and access to updated print copies are not possible. This search tool provides the staff with the Classification Number for an item based the subjects they search for. We'll walk through the processes involved from setting up the Apache Jena and Fuseki Server, to generating SPARQL queries from search parameters, processing of query responses, how those responses are displayed to cataloguing staff and how those responses can LINK to other queries.

11:25 - 12:50 Engaging information professionals in the process of authoritative interlinking
Lucy McKenna / Christophe Debruyne / Declan O'Sullivan
ADAPT Centre, Trinity College Dublin, Ireland


Through the use of Linked Data (LD), Libraries, Archives and Museums (LAMs) have the potential to expose their collections to a larger audience and to allow for more efficient user searches. Despite this, relatively few LAMs have invested in LD projects and the majority of these display limited interlinking across datasets and institutions. A survey was conducted to understand Information Professionals' (IPs') position with regards to LD, with a particular focus on the interlinking problem. The survey was completed by 185 librarians, archivists, metadata cataloguers and researchers. Results indicated that, when interlinking, IPs find the process of ontology and property selection to be particularly challenging, and LD tooling to be technologically complex and unsuitable for their needs. Our research is focused on developing an authoritative interlinking framework for LAMs with a view to increasing IP engagement in the linking process. Our framework will provide a set of standards to facilitate IPs in the selection of link types, specifically when linking local resources to authorities. The framework will include guidelines for authority, ontology and property selection, and for adding provenance data. A user-interface will be developed which will direct IPs through the resource interlinking process as per our framework. Although there are existing tools in this domain, our framework differs in that it will be designed with the needs and expertise of IPs in mind. This will be achieved by involving IPs in the design and evaluation of the framework. A mock-up of the interface has already been tested and adjustments have been made based on results. We are currently working on developing a minimal viable product so as to allow for further testing of the framework. We will present our updated framework, interface, and proposed interlinking solutions.

11:50 - 12:15 Linking YSO and LCSH for better subject access
Satu Niininen / Osma Suominen
National Library of Finland, Finland


Linking concept schemes enables dynamic ways to reuse already existing metadata across linguistic and organisational barriers. This presentation outlines the lessons learned in the process of translating the General Finnish Ontology (YSO) into English and linking it to the Library of Congress Subject Headings (LCSH). The process involved manual translation of all 30,000 YSO concepts and establishing links to LCSH whenever an applicable equivalent was available. This meant connecting the indexing languages of two very different cultural spheres. Different practices in concept scheme construction also posed a set of challenges as the structure (e.g. hierarchy and view on precoordination) and intended use of concept schemes can vary significantly. It was a logical choice to do linking and translation simultaneously as both tasks include the same initial steps of specifying the scope and definition of each concept. Consulting LCSH also assisted in establishing more functional translations by clarifying how the concepts are used and understood in an English-speaking context. Out of all YSO concepts 44% have links to LCSH, but of the 100 most used LCSH concepts at the service, 69% are linked from YSO. In the future, the mappings can be used for generating Finnish YSO concepts to records with pre-existing LCSH annotations. An early experiment demonstrated that for such records, around half of the LCSH subjects seen in bibliographic records could be automatically converted to YSO concepts thanks to the mappings, despite the fact that the links from YSO cover only a fraction (less than 5%) of LCSH concepts.

12:15 - 13:45

13:45 - 14:10 Transformations for aggregating Linked Open Data
Lukas Koster / Ivo Zandhuis
Library of the University of Amsterdam, The Netherlands / Ivo Zandhuis Onderzoek en Advies


Linked Open Data is usually provided as-is. Institutions make choices how to model the data, including properties, blank nodes and uris, for valid reasons. If you want to combine data, there are generally two options: 1) do a distributed query and inference on the data 2) aggregate the data into a new, single endpoint. Distribution enables the use of all available data structures, aggregation enables more easy-to-use data and better performance. For aggregation it is good practice to do transformations to obtain the needed convenience. We give an overview of the transformation types needed, learned in the AdamNet Library Association project AdamLink, a collaboration of the Amsterdam City Archives, Amsterdam Museum, University of Amsterdam Library, Public Library of Amsterdam and International Institute of Social History. The objective is to create a linked open data infrastructure connecting the member institutions’ collections on the topic of "Amsterdam", targeted at reuse by researchers, teachers, students, creative industry and general public. We discuss the (dis)advantages of creating an aggregation vs. distribution of queries. Every transformation type should solve a distribution problem to be useful. But transformation probably reduces querying-options on the data. We therefore need to get the best trade-off between complexity and usability. An interesting option to investigate is to apply a caching node mechanism, that could combine the best of both worlds. We distinguish 6 types of transformation: Mapping ontologies - Mapping and adding thesauri and authority lists - Mapping and adding object-types - Adding our own statements - Restructuring data - Data-typing. We will illustrate the transformations with real examples. We will also discuss the issues with feeding back the enriched data in the cache or aggregation to the original data sources.

14:10 - 14:35 as a sandbox for FRBRization: automated work creation in
Sébastien Peyrard / Etienne Cavalié / Aude Le Moullec-Rieu / Raphaëlle Lapôtre
BnF, France

The French national library uses the online platform as an interface to disseminate its metadata on the semantic web. Source metadata includes notably, but is not limited to, MARC records to be converted and published as RDF triples, with authority records used as the basis for landing pages; the RDF graph is also enriched with alignments with external resources or FRBR relations between internal entities. 2018 is a game changer as a number of works displayed in will not be grounded on existing authority records, but will be automatically computed from clusters of bibliographic records. The first corpus for this will be the French textual works from the XXth and XXIst centuries, for which several hundred thousands works will be created. These works will be available as web pages available on the public interface, and as RDF data retrievable through the SPARQL Endpoint. In the long run, such works will be uploaded as MARC authority records in the main catalog and will comply with the traditional workflow where feeds on the BnF catalogue to generate its entities with persistent identifiers and records. In the meantime, the BnF must find answers to a number of questions: - Which data can reliably be used to group together bibliographic records as manifestations of a same work? - Which data can be used from those bibliographic records to compute work-level metadata? - What identifier can we use to identify such works that will not be persistent until they are uploaded in the BnF catalogue? - How can we promote this new metadata and inform its reuse by communicating about their specific nature and limits? The contribution will present methods and tools, steps and questions the team encountered in this process.

14:35 - 15:00 Annif: leveraging bibliographic metadata for automated subject indexing and classification
Osma Suominen
National Library of Finland, Finland

Manually indexing documents for subject-based access is a very labour-intensive intellectual process. A machine could perform similar subject indexing much faster. However, an algorithm needs to be trained and tested with examples of indexed documents. Libraries have a lot of training data in the form of bibliographic databases, but often only a title is available, not the full text. We propose to leverage both title-only metadata and, when available, already indexed full text documents to help indexing new documents. To do so, we are developing Annif, an open source tool for automated indexing and classification. After feeding it a SKOS vocabulary and existing metadata, Annif knows how to assign subject headings for new documents. It has a microservice-style REST API and a mobile web app that can analyse physical documents such as printed books. We have tested Annif with different document collections including scientific papers, old scanned books and current e-books, Q&A pairs from an “ask a librarian” service, Finnish Wikipedia, and the archives of a local newspaper. The results of analysing scientific papers and current books have been reassuring, while other types of documents have proved more challenging. The new version currently being developed is based on a combination of existing NLP and machine learning tools including Maui, fastText and Gensim. By combining multiple approaches, Annif can be adapted to different settings. The tool can be used with any vocabulary and with suitable training data, documents in many different languages may be analysed. With Annif, we expect to improve subject indexing and classification processes especially for electronic documents as well as collections that otherwise would not be indexed at all.

15:00 - 15:25 Automation and standardization of semantic video annotations for large-scale empirical film studies
Henning Agt-Rickauer / Christian Hentschel / Harald Sack
Hasso Plattner Institute for IT Systems Engineering, University of Potsdam, Potsdam, Germany / FIZ Karlsruhe & Karlsruhe Institute of Technology Karlsruhe, Karlsruhe, Germany


The study of audio-visual rhetorics of affect scientifically analyses the impact of auditory and visual staging patterns on the perception of media productions as well as the conveyed emotions. By large-scale corpus analysis of TV reports, documentaries and genre-films of the topos “political crisis”, film scientists aim to follow the hypothesis of TV reports drawing on audio-visual patterns in cinematographic productions to emotionally affect viewers. However, localization and description of these patterns is currently limited to micro-studies due to the involved extremely high manual annotation effort. The AdA Project presented here, therefore, pursues two main objectives: 1) creation of a standardized annotation ontology based on Linked Open Data principles and 2) semi-automatic classification of audio-visual patterns. Linked Open Data annotations enable the publication, reuse, retrieval, and visualization of data from film studies based on standardized vocabularies and Semantic Web technology. Furthermore, automatic analysis of video streams allows to speed up the process of extracting audio-visual patterns. Temporal video segmentation, visual concept detection and audio event classification are examples for the application of computer vision and machine learning technologies within this project. The ontology as well as the created semantic annotations of audio-visual patterns are published as Linked Open Data in order to enable reuse and extension by other researchers. The annotation software as well as the extensions for automatic video analysis developed and integrated by the project are published as open source as we envision these tools to be useful for general deep semantic analysis of audio-visual archives.

15:30 - 15:35 CLOSING


Please note that all information may be subject to change.



Data protection



Adrian Pohl
T. +49-(0)-221-40075235



Joachim Neubert
T. +49-(0)-40-42834462
E-mail j.neubert(at)


Twitter: #swib18