Programme

Note that all times are displayed in UTC. Clicking on a time display will show your local time.

Hide all abstracts

DAY 1   |   2021-11-29   CONFERENCE
12:00-13:00h UTC COLLOCATED EVENT: DINI-AG KIM MEETING
Tracy Arndt / Alex Jahnke
German National Library / Göttingen State and University Library, Germany
Abstract

DINI-AG Kompetenzzentrum Interoperable Metadaten (KIM) virtual public meeting of the DINI-AG KIM. KIM is a forum for German-speaking metadata experts from LAM institutions. The Meeting will be held in German. Everybody is invited to come. Agenda

14:00-15:00h UTC OPENING / KEYNOTE
Opening
Klaus Tochtermann / Silke Schomburg
ZBW Leibniz Information Centre for Economics, Germany / North Rhine-Westphalian Library Service Center (hbz), Germany
KEYNOTE: Surveillance capitalism in our libraries
Sarah Lamdan
CUNY School of Law
Abstract

In the transition from industrial to informational capitalism, much of our lived experience has gone from physical to digital, including library services. As publishers, library vendors, and other informational service providers have become internet-based companies, their business models have transitioned from analog services to data-based services. In short, our traditional library service providers are becoming data analytics companies, dabbling in, or diving into personal data brokering.
From RELX to ProQuest, major library vendors are finding new ways to extract and monetize people's personal data. Researchers are finding surveillance software like ThreatMetrix in their research databases, and data analytics companies like Clarivate are trying to acquire ProQuest, a major library service platform provider to exploit library patrons' data to create more academic metrics to sell grant funders and research institutions. All of these corporate decisions are part of a trend of our vendors collecting library patrons' personal data. The increasing surveillance capitalism in our library spaces makes open access more important than ever.

Slides

Video

15:00-15:30h UTC Coffee break
15:30-16:30h UTC INTEROPERABILITY
Using linked data notifications to assemble the scholarly record on the decentralised web
Patrick Hochstenbach / Ruben Dedecker / Miel Vander Sande / Jeroen Werbrouck / Herbert Vande Sompel / Ruben Verborgh
Ghent University, Belgium / Flemish Institute for Archiving, Belgium / DANS, Netherlands
Abstract

In this presentation we present the results of the Mellon funded Scholarly Communication project that proposes a decentralised architecture for generating, propagating and notification of artefact lifecycle information in scholarly networks. Scholarly artefacts go through many stages from the creation of artefacts, through the registration of these artefacts in repositories, requesting certification at a publisher websites where they will be peer-reviewed and eventually published, to the archivation in a (web) archive. The results of each of these events are typically stored in different environments that are rarely interconnected. This makes assembling the complete lifecycle of artefacts an expensive post-factum endeavour involving mining many information sources and applying heuristics to combine the information into a meaningful result.
The Mellon Scholarly Communication project proposes a researcher-centric, institution-enabled scholarly communication system aligned with Decentralised Web concepts and technologies. In this vision researchers use a personal domain and associated storage space (researcher pod) as their long-term scholarly hub. Using Linked Data Notifications these scholarly hubs communicate with service hubs (such as peer review systems, discovery systems, archives) for the fulfilment of the functions of scholarly communication. The research pod stores all the information pertaining to the artefacts that the researcher contributes to the scholarly record where it can be shared and consulted.

Slides

Video

Handling IIIF and Linked Data in the Handschriftenportal
Leander Seige / Annika Schröer
Leipzig University Library, Germany
Abstract

The new manuscript portal for Germany - Handschriftenportal - will become the central information platform for medieval and early modern manuscripts in German collections, providing descriptive information on manuscripts as well as digital facsimiles. In a decentralized approach, digital images from various institutions will be integrated leveraging the International Image Interoperability Framework (IIIF). Both images and texts will later be the target of user generated annotations. This does and will provide various challenges for the Handschriftenportal, such as: publishing reliable URIs for digitized manuscripts with truly persistent content in the institutions’ backends; linking portal data with external authority data; linking annotations with author information; handling born-digital texts in the context of the IIIF image viewer Mirador; enhancing annotations to citable and persistent micro-publications. Also, there should be ways to feed information on user-generated content back to the institutions holding the original manuscripts. The talk will report on the project's current implementations and future plans, while - at the time SWIB 2021 will take place - the Handschriftenportal will just recently have gone live. The Handschriftenportal is a DFG-funded joint project of the State Libraries in Berlin and Munich, the Herzog August Library Wolfenbüttel and Leipzig University Library.

Slides

Video

Publication, dissemination and network collaboration in the documentation of digital collections of memory institutions: interoperability between the information environments Wikidata, Wikimedia Commons, Wikipedia and the free software Tainacan
Dalton Lopes Martins
University of Brasília, Brazil
Abstract

The presentation proposes the development of a service that facilitates the publication and monitoring of editions and collaborations carried out by users of digital collections from memory institutions in the Wikidata, Wikimedia Commons and Wikipedia environments. The project is in the initial phase of modeling and information architecture development. Therefore, it proposes the technical modeling of a feedback service, called roundtripping, between the information networks of the wiki ecosystem and the free software Tainacan. Tainacan has become an important free software for the management and dissemination of digital collections in a network of memory institutions in Brazil. There are important technical and operational challenges for the implementation of a service that not only allows the publication of collections from a digital repository of a memory institution, but also monitors the reuse, editions and collaborations of users in other information networks. By allowing the connection between the Wikidata, Wikimedia Commons and Wikipedia information networks for the publication and encouragement of the reuse of the digital collections of institutions already published in Tainacan, the project aims to expand the circulation of heritage collections, make the knowledge generated about them relevant and, thus, value Brazilian material culture on the network society. All functionalities imagined for the project will be implemented integrated to the Tainacan plugin. The research project involves a cooperation between the Wikimedia Brasil Association and the São Paulo Museum of the University of São Paulo.

Slides

Video

DAY 2   |   2021-11-30   CONFERENCE
14:00-16:30h UTC 🛠 WORKSHOPS
The workshops will be held in parallel.
An Introduction to SKOS and SkoHub Vocabs
Adrian Pohl / Steffen Rörtgen
hbz, Cologne, Germany / GWDG, Göttingen, Germany
Abstract

With Simple Knowledge Organization Systems (SKOS), the World Wide Web Consotrium (W3C) more than 15 years ago published a clear and simple RDF-based data model for publishing controlled vocabularies on the web following Linked Data principles. Although a large part of controlled vocabularies – from simple value lists, to thesaurus and classifications – is created and maintained in libraries, SKOS has not been widely adopted yet in the library world.
This workshop gives an introduction to SKOS with hands-on exercises. Participants will create and publish their own SKOS vocabulary using GitHub/GitLab and SkoHub Vocabs, a static site generator for SKOS concept schemes.

Introduction to the Annif automated indexing tool
Osma Suominen / Mona Lehtinen / Juho Inkinen / Moritz Fürneisen / Anna Kasprzik
National Library of Finland / ZBW – Leibniz Information Centre for Economics
Abstract

Many libraries and related institutions are looking at ways of automating their metadata production processes for example through the adoption of AI technology. In this hands-on tutorial, participants will be introduced to the multilingual automated subject indexing tool Annif (annif.org) as a potential component in a library’s metadata generation system. By completing exercises, participants will get practical experience on setting up Annif, training algorithms using example data, and using Annif to produce subject suggestions for new documents using the command line interface, the web user interface and REST API provided by the tool. The tutorial will also introduce the corpus formats supported by Annif so that participants will be able to apply the tool to their own vocabularies and documents.
The tutorial will be organized using the flipped classroom approach: participants are provided with a set of instructional videos and written exercises, and are expected to attempt to complete them on their own time before the tutorial event, starting at least a week in advance. The actual event will be dedicated to solving problems, asking questions and getting a feeling of the community around Annif.
Participants are instructed to use a computer with at least 8GB of RAM and at least 20 GB free disk space to complete the exercises. The organizers will provide the software as a preconfigured VirtualBox virtual machine. Alternatively, Docker images and a native Linux install option are provided for users familiar with those environments. No prior experience with the Annif tool is required, but participants are expected to be familiar with subject vocabularies (e.g. thesauri, subject headings or classification systems) and subject metadata that reference those vocabularies. Exercises and introductory videos can be found in the Annif-tutorial GitHub repository.
We also plan to hold a hackathon-style event on Open Day at SWIB 21. This hackathon would be targeted to those who can already work fluently with Annif but would like deeper knowledge on how to apply Annif to their own data.

Introduction to OpenRefine
Sandra Fauconnier
OpenRefine

Abstract

OpenRefine is a powerful free and open source tool for working with messy data: cleaning it, transforming it from one format into another, and connecting it with knowledge bases, including Wikidata. OpenRefine is used by quite diverse communities interested in data manipulation and cleaning: librarians, researchers, data scientists, data journalists, and in the Wikimedia community.
This workshop demonstrates the basic functionalities of OpenRefine, including how to reconcile data with Wikidata.

IIIF in the wild
Leander Seige / Karsten Heck
Leipzig University Library, Germany / Georg-August-University Göttingen, Germany

Abstract

The International Image Interoperability Framework (IIIF) allows cross-institutional use of digital images and other media. Not only does this technology allow scholarly work with unique artifacts of human history to be taken to new levels and provide researchers with unprecedented tools. The flexibility of the standard also allows artifacts often made available under free licenses to be used to create new works of art, culture and education. The workshop will start with an introduction on how to find and use IIIF resources. From there, we will show which implementation details of IIIF services enable or prevent the reusability of the digital resources. For this we will use freely available IIIF applications and real IIIF repositories of well-known institutions around the globe. This may include digital workspaces, storytelling tools and fun apps. Participants should bring their own laptops. Having an own pre-created account on Github might be helpful. The workshop is intended for both: those who provide IIIF services and those who, as scholars or otherwise creative workers, would like to use IIIF in their daily work.

Introduction to Fedora 6.0
Arran Griffith / Daniel Bernstein
LYRASIS, Canada / LYRASIS, USA
Abstract

Fedora is a flexible, extensible, open source repository platform for managing, preserving, and providing access to digital content. For the past several years the Fedora community has prioritized alignment with linked data best practices and modern web standards, including the Linked Data Platform (LDP), Web Access Controls (WebAC), Memento, Activity Streams, and more. With the recent release of Fedora 6.0, we have shifted our attention back to Fedora's digital preservation roots with a focus on durability and the Oxford Common File Layout (OCFL). This workshop will provide an introduction to Fedora 6.0 with a focus on both the linked data and digital preservation capabilities. Both new and existing Fedora users will be interested in learning about and experiencing Fedora features first-hand.
Attendees will be given access to individual cloud-hosted Fedora instances to use during the workshop. These instances will be used to participate in hands-on exercises that will give attendees a chance to experience Fedora by following step-by-step instructions. The workshop will include two modules, each of which can be delivered in 1 hour or less:
Introduction and Feature Tour
This module will feature an introduction to Fedora generally, with a focus on new capabilities in version 6.0, followed by an overview of the core Fedora features. It will include hands-on demonstrations of the linked data features (resource management, using RDF to create and update metadata for description and access).
Digital Preservation with the Oxford Common File Layout
Fedora 6.0 focuses on digital preservation by aligning with the Oxford Common File Layout (OCFL). The OFCL is an application-independent approach to the storage of digital objects in a structured, transparent, and predictable manner. This module will provide an overview of the OCFL and how it is used in Fedora. Participants will be able to see how resources created in the first half of the workshop are represented as OCFL Objects on the file system.

CubicWeb, the Semantic Content Management System
Fabien Amarger / Elodie Thieblin / Nicolas Chauvat
Logilab, France

Abstract

Linked Open Data is often published via SPARQL endpoint or data "dumps" which do not provide content negociation on the URIs of the data they contain.
Moreover, there is no easy to use user interface to manage linked data (including CRUD operations, but also user permissions, rendering, etc.).
CubicWeb is a SCMS (Semantic Content Management System) for Linked Open Data.This python-based framework can be used to import OWL schema and RDF data automatically to generate a new CubicWeb instance. This instance can be used out-of-the-box as a single application to serve RDF data through a conventional web interface for browsing and through content negociation for downloading. No need to configure anything, just import and launch the app. The CubicWeb framework implements an administration interface to manage data easily even for non technical people. All the common features of a web application framework are available.
We propose to discover together CubicWeb to deploy and serve Linked Open Data: - How to make a CubicWeb instance from an OWL ontology; - How to import RDF data that conforms to the above ontology; - How to browse and query the imported data; - How to add/edit/delete data from the interface; - How to have other applications get RDF data through content negociation.
Last but not least, CubicWeb is free/libre and open-source software. Do not hesitate to reuse and contribute!

DAY 3   |   2021-12-01   CONFERENCE
14:00-15:00h UTC CONTROLLED VOCABULARIES
String matching algorithms in OpenRefine clustering and reconciliation functions - a case study of person name matching
Christiane Klaes
University of Hildesheim, Germany / University Library Braunschweig, Germany

Abstract

Person entities are important linking nodes both within and between Linked Open Data resources across different domains and use cases. Therefore, efficient identity management is a crucial part of resource development and maintenance. This case study is concerned with the task of semi-automatic population of a newly developed domain knowledge graph, LexBib Wikibase with high-quality person data. We aim to transform person name literals taken from publication metadata into Semantic Web entities, to enable improved retrieval and entity enrichment for the domain-specific discovery portal ElexiFinder.
In a prototype workflow, the open source tool OpenRefine is used as a one-tool solution to perform deduplication, disambiguation and reconciliation of person names with reference datasets, using a sample of 3.104 name literals taken from LexBib bibliography. We closely examine OpenRefine’s clustering functions with its underlying string matching algorithms, focusing on their ability to account for different error types that frequently occur in person name matching, such as spelling errors, phonetic variations, initials, or double names. Following the same approach, string matching processes implemented in two widely used reconciliation services for Wikidata and VIAF are examined. OpenRefine offers various features. We also analyse the usefulness of OpenRefine features to support further processing of algorithmic output.
The results of this case study may contribute to a better understanding and subsequent further development of interlinking features in OpenRefine and adjoining reconciliation services. By offering empiric data on OpenRefine’s underlying string matching algorithms, the study’s results supplement existing guides and tutorials on clustering and reconciliation, especially for person name matching projects.

Slides

Video

Automatic subject indexing with Annif at the German National Library
Sandro Uhlmann / Claudia Grote
German National Library

Abstract

The German National Library (Deutsche Nationalbibliothek, DNB) is currently setting up a new, modular system for fully automatic subject cataloguing, which allows for service-based flexible workflows.
For the automatic-subject-indexing task, Annif, an open-source software for automatic indexing developed at the National Library of Finland, has been evaluated successfully for the DNB data.
In the first part of the talk, results of the automatic subject indexing of German-language publications with descriptors of the Gemeinsame Normdatei (Integrated Authority File, GND) are presented. In this use case, 1.3 million GND descriptors are available for subject cataloguing and are applied to digital publications or tables of contents. Aspects of quality as well as the labeling of the metadata provenance of automatically assigned descriptors will be addressed.
The second part illustrates the technical integration of Annif based on Docker in the new, modular environment in order to become part of the automatic cataloguing workflows from data collection, identification of the language of the text, selection of the appropriate Annif model, communication with Annif, to updating the bibliographic record with the retrieved Annif results in the bibliographic database.

Slides

Semi-automated methods for BIBFRAME work entity description
Jim Hahn
University of Pennsylvania, United States of America

Abstract

Describing library resources with the BIBFRAME vocabulary and its core entities of Work, Instance, and Item is a resource intensive process. Cataloging in linked data RDF editors with BIBFRAME involves careful selection of, and referencing to, external authority entities. Creating external authoritative links is essential to produce an accurate context while describing the BIBFRAME Work entity in an RDF editor.
This presentation will report an investigation of machine learning methods for the semi-automated creation of a BIBFRAME Work entity description within the RDF linked data editor Sinopia. The automated subject indexing software Annif was configured with the Library of Congress Subject Headings (LCSH) vocabulary from the Linked Data Service. A dataset comprising 9.3 million titles and LCSH linked data references from the IvyPlus POD project was used as the training corpus. POD is a data aggregation project involving member institutions of the IvyPlus Library Confederation and contains over seventy million MARC21 records, nearly 40 million of which are unique to the corpus.
The machine learning outputs, accessed by Annif web API, enable a feature for dynamically auto suggesting subject attributes based on a cataloger supplied title. This method of research and development is foregrounded with considerations for ethical use of semi-automated subject description. Semi-automation as a potential integration target is in contrast to completely automated cataloging and is a very specific use of machine learning. In this work automation was used as a way to support, not replace, professional expertise.

Slides

Video

15:00-15:30h UTC Coffee break
15:30-16:30h UTC TOOLS
SparqlExplorer, exploring Linked Open Data
Fabien Amarger / Elodie Thieblin / Chauvat Nicolas
Logilab, France

Abstract

Since developing and maintaining web applications is costly, a lot of the data published in SPARQL endpoints is difficult to explore by users that are not experts of LOD tools and languages.
In order to make that data more accessible we developed SparqlExplorer, a web application that can be plugged into any SPARQL endpoint to make its content browsable without writing a single query by applying views to render web pages. For example, to explore a dataset about books, SparqlExplorer would use a book-specific views that executes a SPARQL query to retrieve that data related to a book and generate a web page that displays the title, the ISBN and maybe a thumbnail of the cover.
Views are javascript functions that take data and output HTML. Views depend on the structure of the input data, but are independent of the SPARQL endpoint. A "Book" view can be reused on all librairies and bookstore SPARQL endpoints that published data using the vocabulary that the view expects as input. Views are simple functions that do not require the developers to have a deep understanding of Semantic Web technologies: usual front-end knowledge is enough.
Last but not least, SparqlExplorer is free/libre and open-source software. Do not hesistate to reuse and contribute!

Slides

Video

Web editor for JSON/Bibframe data in libraries
Nicolas Prongué
RERO+, Switzerland

Abstract

This presentation shows a cataloguing form created to edit data stored in a Bibframe/JSON format. It points out the specificities, obstacles, advantages and difficulties of this implementation in comparison to a raw MARC editor. On the one hand, the interface must be really user-friendly, as it is targeted at public, school and special libraries. On the other hand, it must enable to receive granular bibliographic records from MARC and to edit JSON data that is hierarchically structured and deeply nested, composed of a multitude of fields with different conditionalities and validation rules.
The system does not enable to create directly RDF data but it represents, in the specific library context, a relevant step towards the edition of linked and interoperable data. The next step regarding semantic web is to define a transformation and context for exposing this data in JSON-LD.
This happens within the transition of RERO+, a Swiss competence and service centre for libraries, to an open source and in house library system called RERO ILS. About 60 libraries in Switzerland are using it since July 2021. In this context, it was decided to abandon MARC21 in favour of JSON, using the Bibframe model as far as possible and maintaining an interoperability for data import/export. RERO ILS is based on the Invenio 3 framework proposed by CERN, which heavily relies on JSON and JSON schema for resource management. The editor is also implemented within SONAR, another RERO+ software made to manage institutional repositories.
RERO ILS source code: https://github.com/rero/rero-ils/

Slides

Video

coli-ana – Automatic analysis of the Dewey Decimal Classification, a service of the Verbundzentrale des GBV
Uma Balakrishnan / Stefan Peters / Jakob Voß
VZG, Germany

Abstract

Dewey Decimal Classification (DDC) is the most widely used classification system internationally. It was developed by Melvil Dewey in 1873 based on the Decimal Classification conceived by Gottfried Wilhelm Leibniz in the 17ᵗʰ century. At the dawn of the 21ᵗʰ century, DDC attracted great interest amongst academic libraries in Europe, and was translated into German and several other European languages.

The contemporary Dewey Decimal System with over 48.000 classes and six tables allows a huge flexibility and fineness in building new numbers such as 700.90440747471. However, this process is complex and is based on several intricate rules and instructions. The lack of captions in such built numbers makes it difficult to understand and re-use by the non-Dewey catalogers. Thus, in 2003 under the project VZG-colibri, a sub-project coli-ana was initiated to develop a tool that would automatically decompose and analyze any given DDC notation to its tiniest component, enrich them with their caption, and provide the semantic relationship between each element in JSKOS (a JSON-LD format based on SKOS).
The results of coli-ana improve information retrieval, facilitate re-use, and aid further study to enhance knowledge organization systems in general. coli-ana
delivers DDC captions only in German, but it is planned to integrate the English version of DDC as well. The results of the tool are incorporated in the mapping tool Cocoda as well to assist mappings to and from DDC built numbers.

This talk will give a brief background of the project coli-ana, elucidate with examples the decomposition of notations with their adherent rules and instructions, present the coli-ana web service as well as demonstrate its use cases in the project coli-conc and in the K10plus union catalogue.

Slides

Video

DAY 4   |   2021-12-02   CONFERENCE
14:00-16:30h UTC 🏠 OPEN DAY - SHOWCASES, DEMOS, HACKATHONS
Tools and Projects are presented by their makers. The day will start with a common round of announcements. After that, the showcase & demo booths will be open in parallel.

Boothes are a new SWIB format being tried out for the first time at SWIB21. Just like at a physical conference, a virtual booth enables direct interaction between people who maintain or “power use” some kind of software or tool and those that have interest in learning more about it or using it.

Take a stroll along the virtual boothes and have a chat or participate in a hackathon. Registration (free of charge) is needed to participate. For an overview and links to the boothes see https://swib.org/swib21/boothes.html.

DAY 5   |   2021-12-03   CONFERENCE
14:00-15:00h UTC DATA MODELS
BIBFRAME as a data model for aggregating heterogeneous data in a search portal
Thorsten Fritze
UB JCS Frankfurt am Main, Germany

Abstract

The Lin|gu|is|tik portal is a research tool for the field of linguistics that has been developed at the University Library Frankfurt am Main since 2012. It provides an integrated search for discipline-specific scientific resources: printed as well as electronic publications, websites, and research data.
In order to facilitate the inclusion of language resources that are part of the Linguistic Linked Open Data cloud, linked data technologies have increasingly been incorporated into the portal since 2015. As a major part of this effort the established thesaurus of the Bibliography of Linguistic Literature (BLL) comprising more than 9,000 subject terms has been re-modeled as an ontology (BLL Ontology) and made freely available online. Since our goal is to make full use of the opportunities linked data technologies offer, we decided to replace the underlying proprietary data scheme of the portal by a standardized data model that has been expressly designed for linked data applications. BIBFRAME has been selected to fulfill this role.
In this presentation we will give an overview of the ongoing work in the current project phase (2020-2022) and discuss why we chose BIBFRAME for this use case, how we adapted it in terms of an application profile, how it fits into the overall LOD-centric architecture of the portal, and in what way it interacts with the BLL Ontology that is used throughout the portal as authority data. We will also show specific features that the chosen data model makes possible as well as give a brief technological overview.

Slides

Video

RDA/RDF at the University of Washington Libraries
Benjamin Moore Riesenberg / Theodore Gerontakos / Jian Lee / Melissa Morgan / Crystal Clements
University of Washington Libraries, United States of America

Abstract

The Linked Data Team at the University of Washington Libraries describes their work using RDA/RDF. This work includes ongoing creation and use of RDA/RDF machine-readable application profiles compatible with the Sinopia Linked Data Editor, a mapping and conversion project between the RDA/RDF and BIBFRAME data models, and a recently-launched project to create a mapping from MARC21 to RDA/RDF with future large-scale metadata conversion in mind. The Linked Data Team at the University of Washington Libraries’ goal in doing this work is to demonstrate that RDA/RDF is a better tool for representing RDA bibliographic descriptions than BIBFRAME, and to lay the groundwork for expanding the use of the RDA/RDF ontology in the international GLAM community.

Slides

Video

Representing the Luxembourg Shared Authority File based on CIDOC-CRM in Wikibase
Jose Emilio Labra Gayo / Michelle Pfeiffer / Andra Waagmeester / Maarten Zeinstra / Maarten Brinkerink / Joël Thill / Christel Kayser
University of Oviedo, Spain / Centre national de recherche archéologique, Luxembourg / Micelio / IP Squared / Digitaal Werktuig / Archives Nationales de Luxembourg / Bibliothèque nationale de Luxembourg

Abstract

The Luxembourg Competency Network on Digital Cultural Heritage is developing a Shared Authority File, to combine the knowledge from national heritage institutions and to increase the impact of the member institutions' digitised collections. The data of the authority file uses a custom developed CIDOC-CRM RDF/XML data model, starting with a model for authority records for person entities. The Shared Authority File uses the open source product Wikibase as its main data store and internal user interface.
Referring to CIDOC-CRM as an ontological standard in the context of Wikibase is challenged in a particular manner due to the event-centric approach of the domain ontology.
This contribution describes how the project solved the challenge of hosting data conforming to CIDOC-CRM on a Wikibase instance, while leveraging the qualifier/reference model of Wikibase. This solution takes into account some quality attributes like intuitiveness, expressiveness and consistency of the hosted data. We have created a prototype of a populated Wikibase from existing database dumps, which demonstrates the feasibility of the proposed approach.
We consider that the mapping between the developed CIDOC-CRM model and the Wikibase model that we present in this paper can be valuable for other cultural heritage institutions working with Wikibase.

Slides

Video

15:00-15:30h UTC Coffee break
15:30-16:30h UTC DISCOVERY
From string to thing: Wikidata based query expansion
Bernd Uttenweiler
ETH Library Zurich, Switzerland

Abstract

Our discovery system Primo VE (Ex Libris) returns a result list after entering a search string. Retrieval of resources by instances of the categories "Person" and "Place" is not possible. But these would have 2 advantages: We can better link (in both directions) to other sites that are built on persons or places. And: We can offer additional research support to our users, especially from the Digital Humanities field.
We need to find a way from the entered search term (string) to the object (thing) "person" or "place". For this purpose, the user's "normal" search query is dynamically preprocessed by SPARQL queries against Wikidata in such a way that the user is provided with offers for matching places and people.
When the user makes a selection from the list of persons or places, we have completed the step from string to thing. Besides, this allows names to be disambiguated and name variants that do not exist in the catalog to be taken into account.
If the user makes a selection, then a person page or place page is rendered. On the person page are biographical information, links to other sites, researcher profiles, and archive holdings (from Wikidata, Entityfacts, beacon.findbuch, Metagrid).
Among other things, teacher student relationships are also presented and the user can go to the teacher's or student's person page. Based on Wikidata graphs, the resources of the catalog are linked together in new ways.
The presentation will provide insights into the objectives, decisions and implementation under Primo. The person and place pages are are implemented in our productive system. Selection options for the user within the search process are implemented in a test system and planned to go live soon.
Link to Testsystem: Suggestions for persons

Slides

Video

BIOfid: Accessing legacy literature the semantic (search) way
Adrian Pachzelt
Frankfurt University Library, Germany

Abstract

In BIOfid, we make full texts of legacy biodiversity literature available through a semantic search. The semantic search is capable of processing simple single queries for a species (e.g. "beeches"), but also handles restriction for traits like "Plants with red flowers" by applying Natural Language Processing combined with a rule-based generation of database queries. For this purpose, the semantic search is aware of biological systematics (e.g. beeches are plants). Subsequently, the semantic search returns all documents that contain these species. Future expansions will include the geolocated search for a species in a specific area (e.g. "Beeches in the alps").
To enable these search capabilities, the semantic search draws from a pool of both ontologies and semantically annotated full texts. Within BIOfid, these two kinds of data are intertwined to support the machine understanding of the full texts.
In this talk, I will give an introduction into how literature and data are automatically harvested, processed, and prepared for being queried and presented in the BIOfid portal. Furthermore, I will give insights on how user query analysis is done in the portal, discuss its pros, cons, and alternative approaches (e.g. machine learning). Finally, I will give a insight on the current work in BIOfid that involves the extraction of facts from the full texts (Information Retrieval and Extraction).

Slides

Video

Using Linked Data relationships to enhance discovery and mitigate bias
Juliet L. Hardesty
Indiana University, United States of America

Abstract

This presentation will share an open-source JavaScript-based Linked Data project that explores techniques to improve terminology used for discovering resources from systemically marginalized communities (metadataBias). As a research project, it is also investigating a practical application of Linked Data to enhance usability of library systems. Controlled vocabularies used in cultural heritage organizations (galleries, libraries, archives, and museums) are a helpful way to standardize terminology but can also result in misrepresentation or exclusion of systemically marginalized communities. Library of Congress Subject Headings (LCSH) is one example of a widely used yet problematic controlled vocabulary for subject headings. Linked Data vocabularies can connect terms between larger, less representative vocabularies (like LCSH) and terms from a community’s vocabulary to aid and instruct end users conducting research online. This project uses The Homosaurus, an LGBTQ+ Linked Data controlled vocabulary, to provide an augmented and updated search experience to mitigate bias within a system that only uses LCSH for subject headings. The presentation will provide a demonstration, share progress to date on research findings, usability feedback, and implementation, instructions for how to use or contribute to this research, as well as plans for further development.

Slides

Video

 

 

 

 

 

 

Imprint

Data protection

CONTACT (PROGRAMME)

ZBW

Joachim Neubert
T. +49-(0)-40-42834462
E-mail j.neubert(at)zbw.eu

 

hbz

Adrian Pohl
T. +49-(0)-221-40075235
E-mail
swib(at)hbz-nrw.de

 

Twitter: #swib21