swib24 Semantic Web in Libraries conference swib24 banner

Day 1   Day 2   Day 3  

Programme

DAY 1 | Monday, 2024-11-25

04:00–08:00 h UTC

Workshop slot East Asia / Australia

Getting started with local LLMs

Nishad Thalhath 1,2
1 RIKEN Center for Integrative Medical Sciences (IMS), Japan; 2 University of Tsukuba, Japan

Abstract

Local large language models (LLMs) are becoming popular due to their privacy-first approach, security, and cost-effectiveness. This workshop introduces participants to the basics of setting up and running local LLMs using open-source tools.

The workshop will cover both basic and intermediate topics. In Part 1, participants will learn about the basics of LLMs and related concepts, setting up local LLMs, selecting the right tools and models, and the fundamentals of prompting. In Part 2, the focus will shift to intermediate topics such as programmatically using local LLMs and automating tasks, vector embeddings and semantic search, and retrieval augmented generation (RAG) basics.

This hands-on workshop allows participants to set up their local LLMs during the session or follow up later with the provided materials. It is designed for beginners and those seeking intermediate-level knowledge, making it suitable for participants at various stages of their journey with local LLMs.

Since LLMs are resource-intensive, participants are expected to have a computer with at least 8GB of RAM and 20GB of free disk space to follow along. Although the instructor will use Macs, the workshop is platform-agnostic, accommodating MacOS, Linux, or Windows users. A stable and fast internet connection is recommended. For Part 2, participants should have a basic understanding of Python programming and have a recent version of Python installed on their computers with Jupyter Notebook support.

08:00–12:00 h UTC

Workshop slot Europe / Africa

Introduction to the Annif automated indexing tool

Osma Suominen 1, Mona Lehtinen 1, Juho Inkinen 1, Anna Kasprzik 2, Ghulam Mustafa Majal2, Lakshmi Rajendram Bashyam2
1 National Library of Finland, Finland; 2 ZBW – Leibniz Information Centre for Economics, Germany

Abstract

Many libraries and related organizations are exploring automated methods for metadata creation. This workshop offers an introduction to the multilingual automated indexing tool, Annif (annif.org), which can be integrated into a library’s metadata production system. Participants will gain hands-on experience with Annif by setting it up, training its algorithms with sample data, and generating subject suggestions for new documents. The workshop includes both basic and complex scenarios.

Before the event, participants have access to instructional videos and exercises from the Annif-tutorial GitHub repository.The material is for self-study before the workshop, which leaves time to focus more on troubleshooting, inquiries, and discussions during the workshop.

Participants should have a computer with a minimum of 8GB of RAM and 20 GB of free disk space. The software will be provided as a preconfigured VirtualBox virtual machine, though Docker images and a Linux installation option are also available. No previous experience with Annif is necessary. However, familiarity with subject vocabularies (like thesauri or classification systems) and corresponding subject metadata is expected.

Introduction to property graphs

Jakob Voß
Verbundzentrale des GBV, Germany

Abstract

Property graphs are a powerful and flexible data structure used to represent relationships and properties based on graphs. The tutorial will give an introduction to property graphs including a comparision with RDF and how to map between both data structuring languages. After basic definition of property graphs and their elements we will look at examples and practical applications as well as tools and technologies such as graph database management systems and visualization tools. The workshop will include exercises using an online graph database and introduction to the query language CYPHER and to the property graph exchange format PG. Data modeling with property graphs is shown and compared to RDF with each of its advantages and limitations.

Leveraging Zurich Zentralbibliothek’s Jupyter Notebooks for metadata retrieval and analysis from Alma

Linda Samsinger
Zentralbibliothek Zürich, Switzerland

Abstract

Zurich Zentralbibliothek (ZB) has developed a powerful, versatile and user-friendly series of seven Jupyter Notebooks to search, download and analyze library data from the SLSP catalog. This Swiss-wide online repository of metadata in MarcXML format houses over 25 million entries (=Alma). While library users browse Swisscovery to search for media titles, their Excel export is unstructured and limited to 50 entries. In contrast, ZB’s Jupyter Notebooks allows downloading up to 10,000 highly structured metadata records with bespoke calculation and grouping of underlying MARC fields to yield records’ title, author, publisher, year, language, …, summary, epoch, Swisscovery link and link to Table of Contents (ToC). Also, Wikidata/GND data fields can be added upon customization. ToC records are downloadable as PDF en masse. In a second step, the search results can be analyzed using frequency, bar and pie charts, world maps and word clouds, which are PDF exportable. The insights generated could constitute the touchstone for developing brand-new, publication-quality findings about underlying trends and anomalies. The notebooks are customizable and open-source, a treasure trove for researchers and library professionals alike, who aim to explore and assess library data with data-driven methods. Consequently, these tools can be leveraged for academic projects, their bibliographies curated and citations sourced. Additionally, cataloging entries can be improved and gaps in the library’s collection uncovered, thereby guiding future acquisitions. Following in the steps of the DNB and the ÖNB, as well as the university library Bern, ZB has released its own series of Jupyter Notebooks to extend its data services with newly crafted functionalities. Users should be familiar with Jupyter Notebooks and basic Python programming as well as German.

For more information, visit Zurich Zentralbibliothek Jupyter Notebook’s project repository. This resource provides detailed documentation to help users get started with these tools.

The Open Research Knowledge Graph – a lighthouse in the publication flood

Anna-Lena Lorenz
TIB – Leibniz Information Centre for Science and Technology, Germany

Abstract

Science is facing a challenge: In the ever growing flood of publications, maintaining an overview becomes increasingly hard. Many researchers wish for AI tools to aid them in the process of finding and analyzing literature. But purely relying on Large Language Models can lead to misinformation or hallucination.

The Open Research Knowledge Graph (ORKG) aims to provide a solution by combining the power and high useability of LLMs with a human-in-the-loop approach for in-depth content curation.

Our novel tool, ORKG Ask, answers research questions based on almost 80 million open access articles. Researchers are given an overview over the key insights from the relevant publications as well as used methods, limitations and conclusions. Thereby, the content is fully customizable, so that researchers can extract additional details on demand or filter on the metadata.

But it does not end with a human-readable overview. In the ORKG, we structure the contents of scholarly publications in a knowledge graph, making them human- and machine-actionable. At its core, the ORKG serves as a central hub for organizing scientific information from scholarly publications, including papers, datasets and software. The contents are curated by researchers in a crowd-based approach. While ORKG Ask serves as a good entry point to a topic, this knowledge graph represents knowledge in a structured form and give more detail. All content is completely controlled by domain experts and can be collaboratively edited, reviewed and discussed.

In this workshop we will give a brief introduction to ORKG’s vision, followed by instructions on using the platform and a hands-on-session on curating content as well as a follow up discussion on LLM based methods for scholarly communication.

14:00–15:15 h UTC

Opening

Moderator: Anna Kasprzik, Katherine Thornton

Welcome

N.N. N.N.
N.N.

Abstract

Keynote: How knowledge representation is changing in a world of large language models

Denny Vrandečić
Wikimedia Foundation, Germany

Abstract

The last few years, large language models have profoundly impacted many research topics and product teams. From applications in health care to the creation of new soda flavors, Artificial Intelligence has captured the imagination of many people. Even though some of the initial enthusiasm and promises of large language models may have been somewhat exaggerated, it is clear that generative AI is a technology that will bring a massive impact that is still difficult to predict.

In areas such as libraries, bibliography, healthcare, finance, science metrics, and many others, we have invested heavily in structured knowledge representations, such as metadata and knowledge graphs, and it is not immediately clear how Semantic Web technologies and other structured knowledge representations will fit into a world that is being rapidly transformed by the deployment of large language models.

In this talk we will work on some of the answers how these two technologies might evolve and co-evolve. We will explore the weaknesses and strengths of the different approaches, and aim to identify the opportunities where they may complement each other. We may dream what may lay beyond knowledge graphs and metadata, and how the advances in language models might allow us to reach bold new frontiers in knowledge representation which might not have been accessible before.

Community-based development of a metadata profile for educational resources

Adrian Pohl
hbz, Germany

Abstract

In October 2023 the first official version of the General Metadata Profile for Educational Resources (AMB – Allgemeines Metadatenprofil für Bildungsressourcen) was published, a schema.org-based specification on how to describe educational resources with structured data in JSON-LD. Though the metadata profile was developed by and for German-speaking OER initiatives, it may well be worth adapting in other contexts.

The spec was the result of a 3.5-year-long process involving individuals and organizations from the library field, from education and private initiatives. The GitHub repo alone shows 17 contributors, others have contributed by opening issues or adding to the discussion in meetings. During the development, issues where opened at schema.org and contributions where made to controlled vocabularies of LRMI (Learning Resources Metadata Innovation) published within the Dublin Core Metadatadata Initiative (DCMI).

The AMB had already been adopted and implemented by different players when in draft status and is pushed by different players especially in the educational sector. It has been a notable tool for bringing good metadata practices on the Web, like SKOS, into projects from the education sector.

The talk will briefly introduce purpose and scope of the metadata profile, describe the tools and processes used for the development, examine motivations for joining the development community and the benefits of implementation. It will close with challenges and lessons learned e.g. regarding tools, community work, and contributions to upstream standard specs.

15:15–15:30 h UTC

Coffee Break

15:30–16:45 h UTC

Data & Interoperability

CapData Opéra: ease data interoperability for opera houses

Fabien Amarger1, Nicolas Chauvat1, Eudes Peyre2
1 Logilab, France; 2 Réunion des Opéras de France

Abstract

The “CapData Opéra” project, initiated by ROF (Réunion des Opéras de France – French Opera Association) and supported by the French Ministry of Culture, uses Semantic Web technologies to share cultural data with the public and the artistic community.

The aim is to aggregate data produced by various domain actors to make it globally searchable. This highlights previously invisible data, such as the exchange of creative works and performances between opera houses. To achieve this, an ontology has been designed to define a common vocabulary and implement data interoperability objectives. This ontology is aligned with schema.org, and we are working to align additional models. A set of SHACL rules has been created to validate the data before publication. A dedicated tool, Rodolf, has been developed to monitor the RDF publishing process. This tool is used to execute the process and track which sources have been uploaded to the SPARQL endpoint, including upload times and any errors encountered. Exporting RDF data can be challenging for institutions unfamiliar with Semantic Web technologies, so a dedicated Software Development Kit (SDK) has been developed to assist web developers in exporting CapData RDF data even if they lack experience in this area.

In this presentation, we aim to share with the SWIB community the objectives and solutions we have found to federate heterogeneous data. We will present feedback on this project, focusing on technical and management aspects, and then describe the results we have achieved and the future of this project.

An aggregation workflow based on Linked Data for the common European data space for cultural heritage

Nuno Freire1, Bob Coret2, Enno Meijers2, Hugo Manguinhas1, Jochen Vermeulen1, Antoine Isaac1, Dimitra Atsidis3
1 Europeana Foundation, The Netherlands; 2 Digital Heritage Network, The Netherlands; 3 Netherlands Institute for Sound & Vision, The Netherlands

Abstract

In the context of the activities for innovating the operating models and aggregation methods in the common European data space for cultural heritage, Europeana and the Dutch Digital Heritage Network cooperated to broaden the solutions for data aggregation in the data space by defining an aggregation method based on Linked Data.

Although the aggregation workflows currently in practice by both organisations are different, a generalisable method for Linked Data aggregation was successfully defined. This method builds on top of dataset-level metadata in the Data Catalogue Vocabulary (DCAT) model. These dataset descriptions must follow guidelines to ensure that the information about the dataset’s distributions can be fully understood by machines, allowing automatic harvesting, or downloading, of the datasets.

The aggregation workflow starts when a provider informs Europeana of the URI of a dataset. The URI must either be resolvable to a DCAT description of the dataset, or be queryable with a DESCRIBE SPARQL query on a SPARQL endpoint (also informed by the provider). Next, the location of downloadable distributions of the dataset in EDM are obtained from the metadata and automatically downloaded by Europeana. The dataset distributions do not have to follow the EDM RDF/XML schema (any well known RDF serialisation may be used) because Europeana segments the RDF data into individual EDM records. At this point, the datasets are ready for the normal ingestion process of Europeana.

The method was tested in practice in a pilot where several datasets from the Dutch National Archives of The Netherlands were aggregated into Europeana. The datasets were converted by the Dutch aggregator DC4EU from the National Archives ontology to EDM, and the DCAT metadata was made available in the dataset register of the Dutch Heritage Network.

It is not all Greek anymore: use of LOD to increase the interoperability of data created by the National Library of Greece

Sofia Zapounidou 1, Lazaros Ioannidis 2, Michalis Gerolimos 1, Eftychia Koufakou 1, Charalampos Bratsas 2,3
1 National Library of Greece; 2 Open Knowledge Greece; 3 Department of information and electronic engineering, International Hellenic University

Abstract

Given that cataloguing policies have an impact on the interoperability of bibliographic data, the National Library of Greece (NLG) initiated its efforts to increase the interoperability of Greek bibliographic data by selecting a new set of standards to implement, and by changing its cataloguing policies and workflows. The set of selected standards consists of the IFLA/LRM model, the official RDA rules, the MARC21 format (Linky Marc approach) for the Koha integrated library system, and the RDA/RDF format for the Wikibase prototype. The presentation summarizes the key decisions and practices applied in everyday business to integrate Linked Data in MARC21 records and to enable future transformations of those records to Linked Data using the RDA/RDF vocabulary. As a proof of concept, the NLG develops and maintains a Wikibase prototype where NLG data is transformed, mapped, and linked to other Linked Data sources. This is an ongoing process that, in each iteration, focuses on a specific RDA entity. So far, the authority records describing RDA Persons have been represented using RDA/RDF. The next RDA entity to be transformed and mapped using the LRM/RDA conceptualizations and vocabularies is the RDA Nomen entity. Nomens (e.g., pseudonyms) related to the RDA Persons will be incorporated, aiming to provide integrated displays of a Person and the names (Nomens) that this person has used. The presentation will conclude with a short demonstration of the Wikibase prototype and a discussion of the next steps for transforming data and further developing the prototype.

17:00–21:00 h UTC

Workshop slot Americas

Fail4Lib @ SWIB24

Andreas Kyriacos Orphanides
NC State University Libraries, United States of America

Abstract

Everyone experiences failure in their professional lives, but no one likes to talk about it. When we see failure approaching, we distance ourselves, avert our eyes, or – if we’re in its path – brace for the worst. But failure has intrinsic value and is an essential step on the path to professional and organizational success. And since it’s inevitable, we ought to learn how to look back on our failures to derive value from them, and how to look ahead so that our past failures can inform our future successes.

Fail4Lib is a workshop dedicated to discussing and coming to terms with the failures that we all encounter in our work as information professionals. It originated as a preconference workshop at the Code4Lib annual conference, and we now hope to offer it at SWIB24. Fail4Lib is a safe space for us to explore failure, to talk about our own experiences with failure, and to encourage enlightened risk taking. The goal of Fail4Lib is for participants – and their organizations – to get better at failing gracefully, so that when we do fail, we do so in a way that moves us forward.

This 4-hour workshop includes three major components: a case-study review of a high-profile failure, presentation of failure lightning talks from workshop participants, and a structured discussion of our personal and organizational relationships with failure and approaches to accepting and learning from it.

Publish & reconcile against SKOS vocabularies with SkoHub

Steffen Rörtgen 1, Petra Maier2, Tobias Bülte2
1 FWU, Germany; 2 Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen, Germany

Abstract

This hands-on workshop will guide participants through the process of publishing controlled vocabularies encoded in SKOS using SkoHub Pages. SkoHub Pages offers an efficient way to publish vocabularies in both human and machine-readable formats without requiring an additional server using GitHub Pages. After that we will see how to use these vocabularies in reconciliation tasks with SkoHub Reconcile

Agenda:

  1. Introduction to RDF & SKOS (Simple Knowledge Organization System)
  • Brief overview of RDF & SKOS and its significance in managing controlled vocabularies.
  • Examples of SKOS usage in libraries and education infrastructures.
  1. Introduction to SkoHub
  • Overview of SkoHub moduls their functionalities (SkoHub Vocabs, SkoHub Reconcile, SkoHub Shape and SkoHub Pages)
  1. Hands-on Session: Building and Publishing a Vocabulary
  • Step-by-step guide to creating a controlled vocabulary in SKOS format.
  • Practical exercise: Participants will build their own vocabulary.
  • Publishing the vocabulary using SkoHub Pages on GitHub in human and machine-readable formats.
  1. Reconciliation:
  • Introduction to Reconciliation and SkoHub Reconcile
  • Practical Exercise: Using the published vocabulary in a simple reconciliation task
  1. Q&A and Discussion
  • Addressing participants’ questions and discussing potential applications.

Target Audience: Librarians, information professionals, and developers interested in Semantic Web technologies, linked data, and vocabulary management.

Prerequisites: Participants should have basic knowledge of controlled vocabularies.

DAY 2 | Tuesday, 2024-11-26

14:00–15:15 h UTC

Generating Linked Metadata

Automating metadata extraction and cataloguing: experiences from the National Libraries of Norway and Finland

Pierre Beauguitte 1, Osma Suominen 2
1 National Library of Norway, Norway; 2 National Library of Finland, Finland

Abstract

The increasing volume of grey literature, such as reports produced by public sector organizations and academia, poses significant cataloguing, discoverability, and accessibility challenges in digital libraries. To help address these challenges, the National Library of Norway (NLN) and the National Library of Finland (NLF) have explored different strategies to automatically extract bibliographic metadata from PDF files. This presentation will first discuss METEOR, an open-source tool developed by the NLN that uses rule-based logic and keywords and is already integrated in the production workflow as a suggestion engine for librarians. Meanwhile, the NLF is exploring the potential of fine-tuned, locally hosted large language models for extracting bibliographic metadata. The strengths and weaknesses of both approaches are analyzed, as well as the common obstacles they face. This talk will also present our joint efforts to prepare high quality datasets for training and evaluation of metadata extraction systems along with newly developed metrics suited to the task. Finally, the discussion will focus on the integration of external catalogues and authority registries in these processes, enabling the use of persistent identifiers for entities in the metadata. Our presentation seeks to share practical solutions, promote methodology exchange, and inspire community collaboration.

MARC21 bibliographic to LRM/RDA/RDF mapping and conversion project

Crystal Elisabeth Yragui , Junghae Lee, Cypress Payne
University of Washington Libraries, United States of America

Abstract

Initiated in 2021, the University of Washington Libraries have committed to using LRM/RDA/RDF (RDA as Linked Data) to represent RDA description sets. While LRM/RDA/RDF has not been widely adopted and few test datasets exist, this could change rapidly if libraries transform legacy MARC data to LRM/RDA/RDF. As they near completion of Phase 1 of the project, presenters will describe a groundbreaking mapping and conversion project between the MARC21 bibliographic standard and LRM/RDA/RDF. In addition to describing the cross-organizational team, the presentation summarizes the scope and strategy of the first phase of the project, describes key discussions, highlights important decisions, and shares project workflows. Challenges with aggregate works and expressions, ambiguously-described entities, reproduction manifestations, inconsistent cataloging practices, and more are discussed. Advance releases of project deliverables are shared ahead of their anticipated official publication at the end of 2024, including mappings in .csv format, XSLT transformation code, and sample transformation data in the original MARC21 format and the resulting LRM/RDA/RDF which will be published in a Wikibase Cloud instance. Project members hope that Phase 2 of the project will address entity reconciliation and identity management as well as more complex aggregate types. More information about the open project can be found in the following GitHub repository: https://github.com/uwlib-cams/MARC2RDA

ANTELOPE: open source service for accessible semantic annotation in GLAM

Kolja Bailly, Lozana Rossenova, Ina Blümel, Thassilo Schiepanski
TIB Hannover, Germany

Abstract

ANTELOPE (Annotation, Terminology Lookup and Personalization) is a comprehensive annotation service developed at TIB. The service serves as a single point of access to a plethora of terminology sources across the Semantic Web. At that, it bundles several machine learning approaches to alleviate retrieval of candidate entities for annotation. The service is available through an integrated user interface, as well as a customisable API.

Comprehensive user research within the NFDI4Culture community (German national research data infrastructure for the cultural domain) has revealed a demand for services aiding the annotation process in GLAM: Retrieval of entities from different terminology searches based on a simple term, fulltext, or image is a specialised and time-consuming task. ANTELOPE supports these tasks with a lean interface that abstracts state-of-the-art natural language processing, classification, and information retrieval techniques.

More precisely, ANTELOPE currently provides three search interfaces: Straightforward, terminology search applies a given term to designated terminology sources and presents results in a distinctive hierarchy graph. Entity recognition accepts fulltext input that supplies contextual information for narrowed down results. Image recognition accepts image files instead and applies powerful image classifiers to bridge multimedia approaches. Ultimately, there will be a universal results interface that allows entity selection, storage and export.

We are currently working on integrating ANTELOPE functionality in research frameworks, such as TS4NFDI and Wikibase to make usage as seamless as possible.

15:15–15:30 h UTC

Coffee Break

15:30–16:45 h UTC

LOD for End Users

Empowering user discovery with Linked Data and Semantic Web technologies

Min Hoon Ee , Ashwin Nair , Robin Dresel
National Library Board, Singapore

Abstract

The National Library Board (NLB) of Singapore has made significant strides in leveraging data to enhance public access to its extensive collection of physical and digital resources. This talk explores the development and implementation of the Singapore Infopedia Sidebar, a recommendation service designed to guide users to related resources by utilizing metadata and a Linked Data knowledge graph.

By consolidating diverse datasets from various source systems and employing semantic web technologies such as RDF and Schema.org, NLB has created a robust knowledge graph that enriches user experience and facilitates seamless exploration. The Sidebar, integrated into Infopedia, the Singapore Encyclopedia, surfaces data through a user-friendly interface. It functions as a content-based filtering mechanism, allowing users to select topics based on their preference and presenting relevant resources by format.

Our resources are categorised using knowledge organisation systems’ controlled vocabularies and named entities. This effort will also supplement NLB’s work in publishing information about Singapore entities on international registries as linked data for reuse by others. The talk details the architecture of the Sidebar, the ranking algorithm used to prioritize resources, and the challenges faced in its development.

Future directions include integrating user feedback, enhancing semantic analysis, and scaling the service to other web platforms within NLB’s ecosystem. This initiative underscores NLB’s commitment to fostering innovation, knowledge sharing, and the continuous improvement of public data access by making our data available for reuse on NLB’s platform as well as on Wikidata and VIAF.

Setting the stage: enabling curation spaces for dialogues with Ibali Digital Collections UCT

Sanjin Muftic
University of Cape Town, South Africa

Abstract

In 2021, the Digital Library Services (DLS) department at University of Cape Town Libraries launched a university-wide showcasing platform for the [university’s digital collections] (https://ibali.uct.ac.za). The site is called Ibali (isiXhosa for ‘story’) and it utilises Semantic Web technologies through the open source software of [Omeka S] (https://omeka.org/s/) as well as IIIF (International Image Interoperability Framework). Ibali is part of UCT Libraries’ drive to nurture an Open Access space where digital collections can be created, curated, published, and showcased. It is a highly collaborative and flexible, future-thinking online repository space that supports Digital Humanities projects.

Omeka S allows, through the many modules developed by its growing open-source community, as well as its customisable nature, the building of very diverse yet structured showcase sites. At UCT we have tried to balance this flexibility with the institutional aims of making sure media and digital objects are consistently presented under the university umbrella and enhanced with LOD descriptions. This way our guiding principle is not only about the content, but how well each site weaves the content.​ In other words, with the help of Linked Data, how many useful links and connections ​within and across the showcase collections can be built?

In this presentation I will outline some of the DH projects that form part of Ibali, from collections of landscape photographs, historical documents and creative curations. I will also unpack the various customisations we have enabled towards a more open space for stories to be shared. This is in the hope to extend the digital showcase space for potential storytellers: embracing the transactional, transformative, and migratory nature of images, events, recordings of our archives and our collective memories.

E-LAUTE: establishing a music edition platform using Semantic Web technologies

Ilias Kyriazis 1, David M. Weigl 2, Christoph Steindl 1
1 Austrian National Library, Vienna, Austria; 2 University of Music and Performing Arts, Vienna, Austria

Abstract

E-LAUTE (https://e-laute.info/) is creating a comprehensive edition of German renaissance lute tablatures (GLT), a historically widespread music notation that has been largely neglected by modern research. It interlinks music and textual encodings, notation images, audio, semantic annotations, and bibliographic metadata by using open data formats and Linked Data throughout the entire process. Additionally, it builds on research data and information architectures provided by the Technical University of Vienna (research workflow management) and the Austrian National Library (ÖNB; GAMS digital edition platform and triplestore). We are extending the ÖNB platform with facilities for incorporating multifaceted music information, and we are augmenting the Music Encoding Initiative’s (MEI) existing XML schema for the representation of GLT documents. We contextually enrich the MEI encodings through interconnection with textual encodings of contemporary lyrics and instructional material, IIIF facsimile images, audio recordings (produced both project-internally and externally), and additional metadata. To do so, we apply Linked Data ontologies, XML transclusions between encoding schemas, and Web Annotations for external contributions through decentralized Solid pods. We aim to create a central hub for managing the enriched data and for publishing the results in uniform and state-of-the-art formats (e.g. JSON-LD), providing open APIs (e.g. SPARQL) and contributing innovative approaches to music informatics and musicological research, thus serving the needs of music researchers, practitioners and enthusiasts alike.

DAY 3 | Wednesday, 2024-11-27

14:00–14:25 h UTC

Lightning Talks

Use the opportunity to share your latest projects or ideas in a short (3-5 min) lightning talk. Registration of talks will open in the forum with the start of the conference.

14:25–15:15 h UTC

Workflows

Leveraging Linked Data Fragments for enhanced data publication: the Share-VDE case study

Andrea Gazzarini 1,2
1 Share Family, Italy; 2 SpazioCodice, Italy

Abstract

Share-Virtual Discovery Environment (SVDE, wiki site) is a library-driven initiative that brings together in a shared, BIBFRAME-based discovery environment the bibliographic catalogs and authority files of a growing number of leading academic and national libraries from across North America, Europe and globally.

In (big) data-driven environments, accessing, querying, and processing vast datasets efficiently and flexibly is challenging. Linked Data Fragments (LDF) have emerged as a promising paradigm to address these challenges by providing a distributed and scalable approach for publishing and serving Linked Data.

As part of the SVDE Labs activities, we developed a set of web APIs that adopt that approach and provide several benefits: real-time RDF generation and publication, on-demand ontology mapping, and multi-provenance management.

The Linked Data Fragments paradigm also addresses one challenging point in the infrastructure: It no longer needs a dedicated RDF Store. The API set provided by the system (GraphQL, REST, and SPARQL) uses a centralized knowledge base implemented using a hybrid approach composed of a relational database and an inverted-index-based search engine.

A working prototype will be used during the presentation. However, the overall work is under development as we follow two parallel “investigation paths”: the first using a plain RDBMS and the second using a NoSQL storage. As part of the presentation, we will discuss technical details and the architecture/infrastructure by examining the lessons learned and the challenges.

The ongoing development will be applied to the linked data underlying SVDE discovery portal. It will benefit the other interconnected initiatives that are part of the broader Share Family linked data ecosystem.

Constraints for Linked Open Data

Shawn Goodwin
University of Chicago, United States of America

Abstract

The University of Chicago Library is creating a single platform to support the development of future digital humanities projects and to maintain existing projects. This project is called UChicagoNode. The goal of the platform is to increase the visibility and sustainability of digital humanities work of the students and scholars at the university. At the forefront of making the data more discoverable is adopting the RDF standards of Dublin Core and the Europeana Data Model. We are using OCHRE as the data imput platform, which is feeding a MarkLogic TripleStore. However, how do we ensure that the data is accurate and standardized for ease of search? We have adopted SHACL, the Shapes Constraint Language, to validate the data in our RDF store and to ensure its accuracy. The implementation of SHACL we are currently using is the Topbraid reference implementation. SHACL provides some practical constraints for the open-world assumption of OWL. Providing constraints on RDF data provides several benefits: better reliability in querying the data, ensuring the quality of the data, and accuracy of data integration pipelines. This presentation will show how SHACL fits into our workflow, how SHACL aids the quality control over our Linked Open Data, and some examples of complex SHACL constraints.

15:15–15:30 h UTC

Coffee Break

15:30–16:45 h UTC

Vocabularies

This is an ontology, not a screwdriver

Helena Simões Patrício 1, Maria Inês Cordeiro 2, Pedro Nogueira Ramos 1
1 Information Science, Technologies and Architecture Research Center, University Institute of Lisbon (ISCTE-IUL); 2 National Library of Portugal, Portugal

Abstract

This presentation will explain and demonstrate the potential of a Reference Ontology (RO) in solving interoperability issues of bibliographic ontologies derived not only from misalignments in granularity and meaning of certain elements but also from their reduced application of semantic mechanisms (e.g., classification, equivalence, hierarchy or transitivity), and the low level of interlinking among ontologies. These problems require methods alternative to mapping techniques, such as crosswalks and application profiles, which are poorly scalable and not best suited for the Semantic Web.

The proposed RO is intended for such mediation, i.e., for connecting different ontologies at a higher level and without imposing a common central ontology. It is built on RDFS/OWL (Resource Description Framework Schema/Web Ontology Language) abstraction and inference mechanisms and mediation techniques for interconnecting ontologies. It also makes use of SHACL (Shapes Constraint Language) to overcome RDFS/OWL inability to impose and validate data restrictions, an aspect important in the interoperation of bibliographic ontologies to avoid inconsistencies.

RO demonstration will focus on solving two interoperability problems of RDA (Resource Description and Access) and BIBFRAME (Bibliographic Framework Initiative): i) the polysemy of the bf:Work class (multiple correspondence between bf:Work and RDA concepts of Work and Expression); and ii) the lack of transitivity, non-reflexivity and asymmetry in RDA and BIBFRAME formalization of whole-part relationships. The RO will be applied to real-world bibliographic examples from the Library of Congress and the Biblioteca Nacional de España datasets.

Both the RO main components (OWL and SHACL RDF/XML files) and the bibliographic examples datasets are available at https://libraryreferenceontology.com.

Shuai Wang , Maria Adamidou
VU Amsterdam, Netherlands, The

Abstract

The past years witnessed a significant adoption of LGBTQ+ ontologies and structured vocabularies in libraries. Some of them are published as Linked Data in the Semantic Web. Homosaurus is among the most popular ones with links from/to the QLIT, GSSO, Wikidata, LCSH, etc. Over the past years, three versions of Homosaurus have been released with updates every half a year. Despite its rapid development, little has been reported about the properties of these links. In this study, we first retrieve all the mappings and links between them as well as links about concept replacement to form an integrated knowledge graph, together with some obtained links about redirection. Using them, we perform qualitative and quantitative analysis. We discuss the discovery of missing links using weakly connected components. Among 105 newly discovered links from QLIT to LCSH, experts confirmed that 78 (72.38%) could be included. We analyze concept drift and change by providing examples of the convergence and divergence of concepts. Moreover, we study the reuse of multilingual labels. Our program discovered 775 labels in Swedish in Wikidata (524 prefLabels and 251 altLabels) for 524 entities for QLIT. Finally, we discuss potential issues with publishing related multilingual information in the semantic web and the consequences of our findings in practice in libraries, heritages, and online literature databases. The project is available on GitHub at: https://github.com/Multilingual-LGBTQIA-Vocabularies/Examing_LGBTQ_Concepts.

A web-based translation interface for the MeSH vocabulary

Lukas Geist , Nelson Quiñones , Rohitha Ravinder , Dietrich Rebholz-Schuhmann , Gabriele Wollnik-Korn, Miriam Albers , Leyla Jael Castro
ZB MED Information Centre for Life Sciences, Germany

Abstract

Here we present a web application interface to facilitate English-German translations and curation of new terms coined in the Medical Subject Headings (MeSH) thesaurus, officially provided in German by ZB MED.

MeSH is a global, hierarchical vocabulary for biomedical terms, key to indexing databases, cataloging resources, and crafting search profiles. Accurate German translations improve accessibility for German speakers and retrieval of German biomedical literature, and promote international standardization within the biomedical domain.

It all starts with machine translations from DeepL, followed by a 2-stage human translation-curation process for quality assurance, the submission to the National Library of Medicine in the USA, and the publication of an XML file containing original English terms and approved German translations. The XML is publicly available following open access principles on the PUBLISSO FRL platform so it can be used, e.g., on entity-annotation and -recognition processes.

The interface builds upon the web framework Nuxt 3, uses SQLite database integrated with Prisma ORM, and is deployed with Nginx. The interface prioritizes one-to-one translation of English MeSH main headings, with additional synonyms capturing, e.g., variations across German-speaking countries and regions, providing German-speakers a localized experience. A curator then approves translations and the XML file is produced. This application could be used for translations to other languages by changing the DeepL seed.

In the future, the interface will be extended to also cover updated MeSH terms in addition to the new ones, to better reflect modified and deleted main headings, preferred labels, and synonyms. Although the interface itself is tailored to the MeSH vocabulary, the DeepL translation-curation process could be adapted to other vocabularies.

16:45–17:00 h UTC

Closing