11:00 - 12:00
|
COLLOCATED EVENTS
DINI-AG KIM Meeting
Jana Hentschke / Stefanie Rühle
DINI-AG Kompetenzzentrum Interoperable Metadaten (KIM)
Public meeting of the DINI-AG KIM. KIM is a forum for German-speaking metadata experts from LAM institutions. The Meeting will be held in German. Agenda
|
13:00 - 19:00 |
TUTORIALS AND WORKSHOPS
Introduction to Jupyter Notebooks
Magnus Pfeffer / Kai Eckert Stuttgart Media University, Germany
Jupyter Notebook is an open source web application for creating and sharing “live documents” that can contain code and the results from its execution besides traditional document elements like text or images. Originally being developed as part of the IPython project, it is now independent of Python and supports a long list of different programming languages, including JavaScript, Ruby, R and Perl.
These live documents are uniquely suited to create teaching materials and interactive manuals that allow the reader to make changes to program code and see the results within the same environment: program outputs can be displayed, visualisation graphics or data tables can be updated on-the-fly. To support traditional use cases, static non-interactive versions can be exported in PDF, HTML or LaTeX format.
For data practitioners, Jupyter Notebooks are ideal to perform data analyses or transformations, e.g., to generate Linked Open Data, where the workflow documentation is part of the implementation. Single lines of code can be added or changed and then executed without losing the results of prior parts of the code. Visualizations can be generated in code and are directly embedded in the document. This makes prototyping and experimenting highly efficient and actually a lot of fun. Finally, Jupyter Notebooks are an ideal platform for beginners, as they can execute code line by line and immediately see how changes affect the result.
This workshop requires no prior knowledge of Jupyter Notebooks or the Python programming language; only basic programming and HTML/Markdown knowledge is required. Please bring your own laptop.
Agenda:
Part I: Introduction (Local installation of the necessary programming environment ++ Using existing documents ++ Creating documents with rich content ++ Notebook extensions)
Part II: Case studies (Using Jupyter Notebook in teaching data integration basics ++ Using Jupyter Notebook to develop, test and document a data management workflow with generation of RDF)
Part III: Advanced topics (Server installation and use ++ Version control ++ Using different language kernels)
|
|
Controlled vocabulary mapping with Cocoda
Jakob Voß Verbundzentrale des GBV, Germany
During the last years we developed the web application Cocoda for creating and managing mappings between library classification schemes, authority files, knowledge graphs, and similar knowledge organization systems. The workshop will introduce you to the technical background of Cocoda and vocabulary mappings. You will learn how to set up and configure your own instance of Cocoda and related services, and how to integrate additional vocabularies. We will explore the JSKOS data format and APIs, discuss integration of additional data sources and brainstorm about quality assurance and usability.
The participants are required to bring their own computer with NodeJS (at least version 8), git, and basic practical knowledge of processing JSON files. Participants are encouraged to bring their own vocabularies, mappings, and/or data sources.
Slides
|
|
Introduction to OpenRefine
Owen Stephens / Felix Lohmeier Owen Stephens Consulting, UK / Open Culture Consulting, Germany
This workshop will introduce the OpenRefine software to participants. The purpose and functionality of OpenRefine will be introduced and participants will use OpenRefine through a range of hands-on exercises to examine, clean, link and publish a data set.
At the end of this course, participants will be able to:
* Create a project in OpenRefine
* Use OpenRefine to explore a data set and identify problems
* Clean a dataset using some of the features and tools of OpenRefine
* Use OpenRefine to link together datasets including GND (the integrated authority file of the German National Library) and Wikidata based on entity names
* Use OpenRefine to enhance a data set by looking up, linking and adding data from other data sources
* Add data to Wikidata using OpenRefine
* Map data in OpenRefine to RDF and export data as RDF
Participants will also be introduced to how to run an OpenRefine reconciliation service and how they can contribute to the development of new reconciliation services and the OpenRefine software generally.
There are no specific requirements for prior knowledge for participants.
Participants are requested to bring a laptop with current stable release of OpenRefine installed including the latest stable version of RDF extension.
Slides
|
|
Automated subject indexing with Annif
Osma Suominen / Mona Lehtinen / Juho Inkinen / Anna Kasprzik National Library of Finland, Finland / ZBW - Leibniz Information Centre for Economics, Germany
Due to the proliferation of digital publications, intellectual subject indexing of every single literature resource in institutions such as libraries is no longer possible. For the task of providing subject-based access to information resources of different kinds and with varying amounts of available metadata, it has become necessary to explore possibilities of automation.
In this hands-on tutorial, participants will be introduced to the multilingual automated subject indexing tool Annif as a potential component in a library’s metadata generation system. By completing exercises, participants will get practical experience on setting up Annif, training algorithms using example data from the organizing institutions NLF and ZBW, and using Annif to produce subject suggestions for new documents using both the command line interface and the web user interface and REST API provided by the tool. The tutorial will also introduce the corpus formats supported by Annif so that participants will be able to apply the tool to their own vocabularies and documents.
Participants are requested to bring a laptop with at least 8GB of RAM and at least 20 GB free disk space. The organizers will provide the software as a preconfigured virtual machine. No prior experience with the Annif tool is required, but participants are expected to be familiar with subject vocabularies (e.g. thesauri, subject headings or classifications) and subject metadata that reference those vocabularies.
Slides
|
|
Hands-on IIIF: how to install, configure and prepare simple IIIF services
Leander Seige Leipzig University Library
In the last years, the International Image Interoperability Framework has developed into a widespread standard for making digital images of cultural objects available on the basis of Linked Open Data principles. The aim of this hands-on workshop is to walk the participants through a complete installation of a IIIF server in a virtual machine to make images available according to the IIIF Image API and deliver the corresponding metadata according to the IIIF Presentation API. Participants can bring up to ten own high-resolution images of their choice including simple metadata to convert them into the appropriate IIIF formats during the workshop. The organizers will provide access to centrally hosted virtual machines for the participants. The hands-on session starts with a ready-made installation of a basic Linux operating system. During the workshop all necessary server components will be installed and configured by the participants. This includes the configuration of web- and proxyservers, HTTPS and CORS. The provided images will be converted into tiled pyramidal images. IIIF manifests and collection files will be generated. Each step will be explained in detail and the participants will receive support in case of technical difficulties. Finally, participants will install IIIF viewers on their virtual machines in order to view IIIF images. It will be possible to access each others IIIF server via HTTPS in order to demonstrate the advantages of IIIF’s interoperability.
Participants should have basic knowledge of Linux, the shell and SSH. The images should be available under a Creative Commons license or similar free conditions. Participants should bring their own laptop with an SSH client preinstalled.
Slides
|
19:30 - 21:00
|
GET TOGETHER
Reception with refreshments at the exhibition "OPEN UP! - How Digitisation Changes Science"
ZBW - Leibniz Information Centre for Economics
Neuer Jungfernstieg 21, Hamburg
|
09:30 - 09:55
|
OPENING
Welcome
Thorsten Meyer / Silke Schomburg ZBW - Leibniz Information Centre for Economics, Germany / North Rhine-Westphalian Library Service Center (hbz), Germany
|
09:55 - 10:40
|
KEYNOTE: Forever in between : similarities and differences, opportunities and responsibilities in the LODLAM universe
Saskia Scheltjens
Rijksmuseum, Netherlands
Digital cultural heritage data from libraries, archives and museums is often seen as homogeneous, sharing similar semantic technological challenges and possible solutions. While this indeed might be the case, there are also differences that are interesting to take a closer look at. Current interdisciplinary research, technological innovations and societal changes also pose new challenges and opportunities. Can the LODLAM world live up to the expectations? Should it? And why? This talk will cover research, current projects and real-world examples from the library and museum world with a focus on digital strategy, networked infrastructure and open science concepts.
|
10:40 - 11:10
|
Coffee break |
|
LD IN NATIONAL LIBRARIES |
11:10 - 11:35
|
Publishing Linked Data on Data.Bibliotheken.nl
René Voorburg KB national library of the Netherlands, Netherlands
In 2018, KB - national library of the Netherlands (KB), published Data.Bibliotheken.nl (DBN), an online linked data publication environment. It was argued that linked data would help the KB to publish not just 'on the web' but also 'in the web', allowing others to reuse and link to KB data. Although a growing interest exists in extending linked data principles to core library registries, for example the catalogue, the practice of publishing linked data at the KB involved following a cascade of export, modelling or remodelling, conversion, selection and transformation steps. Various conversion routes were followed, depending on the data, systems and tools at hand.
Presented will be the steps that were applied, leading to the RDF published at DBN. Besides these technical steps, perhaps most crucial in creating linked data is the semantic modelling or the design principles to be adhered to. It is crucial since the model applied may greatly impact the usability of the output. Moreover, it complex since it requires people with differing backgrounds to collaborate, to understand and acknowledge each others expertise and perspectives and to arrive at a common language and model.
This complex step hasn't reached it conclusion yet. The design principles behind the data now at DBN may be summarized as generally established best practices plus a schema.org serialization. To further enhance and extend the published data, a more elaborate set of principles is required. Those principles will not to be applied to data at DBN only. As a 'generic KB entity model for content' it is thought of as the unifying metadatamodel for KB content related metadata. This work-in-progress model, based on IFLA-LRM and PREMIS, will be presented.
Regardless whether DBN has fulfilled its goals or not, it appears that embarking on the linked data trail has resulted in beneficial spin off.
Slides
Video
|
11:35 - 12:00
|
20 million URIs and the overhaul of the Finnish library sector subject indexing
Matias Frosterus / Jarmo Saarikko / Okko Vainonen
The National Library of Finland, Finland
The library sector of Finland has been using the General Finnish Thesaurus YSA and its Swedish language counterpart Allärs for over thirty years. YSA is the most widely used thesaurus in Finland and comprises of some 36,000 concepts. Lately, the National Library of Finland has been developing the General Finnish Ontology YSO, a multilingual successor of YSA and Allärs built according to linked data principles. YSO has been linked to the Library of Congress Subject Headings, over a dozen Finnish vocabularies, and Wikidata.
This year, the development of YSA and Allärs was frozen and the Finnish libraries are switching to using YSO and linked data en masse. This presentation describes the process of the switch and the lessons learned.
First, we needed two sets of conversion rules: one for converting the SKOS YSO into MARC authority records to support the subject indexing processes and another one to convert the millions of bibliographic records from YSA and Allärs annotations to YSO. Devising the rules turned out to be a very complex task and we formed an expert group with representation from various types of libraries.
The conversion extends from the national union catalog Melinda to the local library databases employing various library systems. To this end we developed open source conversion programs and made them available to libraries and library system providers. Aside from the conversion, we also added YSO URIs to the MARC records making linking and updates simpler in the future.
Slides
Video
|
12:00 - 12:25
|
In and out: workflows between library data and linked-data at the National Library of Spain
Ricardo Santos
National Library of Spain, Spain
Datos.bne.es is the linked-data based catalogue at the National Library of Spain. It is built upon the MARC21 library records as the main source of data, and transformed into RDF through a pipeline of analysis, defragmentation, and data re-clustering into the FRBR-based data model. On top of this, a new entity-driven “catalogue” is built for the use of general public and for enhanced discovery from search engines.
DAtos.bne is an experimental development, making room for testing unconventional workflows for metadata production and creation, breaking up the sometimes rigid conventions taking place in national libraries. This presentation will explore some of the most prominent features in this workflows, and discuss pros and cons found along the way, or how this may influence library metadata production in the near future.
One of the most recent features has been the massive data ingestion from Wikidata into 80.000-odd person library records, including properties as gender, birth place, occupation, language, field of work or membership. The matching between library records and Wikidata records was made based on Wikidata identifiers present on authority records. These identifiers were initially extracted from VIAF identifiers datadump file and loaded into the library authority records. After that initial massive load, catalogers have been routinely adding more wikidata URIs when available. After a file was extracted from the library authority file, containing BNE Ids and Wikidata IDs, the library technological partner for this venture got the data through the Wikidata API.
After extensive quality check-up, massive modifications, and alignment with the library subject terms, the data were succesfully loaded into the library records, and will be processed for its use in the datos platform, closing the feedback circle.
Slides
Video
|
12:25 - 13:55
|
Lunch |
|
AGGREGATION AND INTERLINKING |
13:55 - 14:20
|
From raw data to rich(er) data: lessons learned while aggregating metadata
Julia Beck Frankfurt University Library, Germany
In the project Specialised Information Service (SIS) Performing Arts metadata from German-speaking cultural heritage institutions from the performing arts domain is aggregated in a VuFind-based search portal. As the gathered metadata tends to be very heterogeneous due to various software solutions, workflows and material types in the participating GLAM institutions, not only the differences in data format and standardization are a challenge for modeling data as LOD. This talk also highlights how the differences in scope and detail of description, the lack of a common vocabulary and the handling of entities are tackled in order to make the collections linked and searchable. Over the years, we learned that thorough analysis of the delivered metadata in cooperation with the data providers is key in improving the search experience for users.
The current workflow from raw to linked data basically involves four steps: (1) thorough analysis and documentation of the delivered data, (2) preprocessing of the raw data for more interoperability, (3) modeling and transforming the preprocessed title and authority data in EDM, (4) interlinking and enrichment of the entities. Though the resulting enriched metadata can not always be given back to the data provider's in-house database, this workflow includes the creation of best practice documents for the performing arts GLAM community based on the results of the analysis. We will focus on the impact of the data's heterogeneity on the workflow by describing the different, possibly data provider-specific, stages of the process and their limitations.
Slides
Video
|
14:20 - 14:45
|
NAISC: an authoritative Linked Data interlinking approach for the library domain
Lucy McKenna / Christophe Debruyne / Declan O'Sullivan ADAPT Centre, Trinity College Dublin, Ireland
At SWIB 2018, we presented our early stage work on a Linked Data (LD) interlinking approach for the library domain called NAISC – Novel Authoritative Interlinking of Schema and Concepts. The aim of NAISC is to meet the unique interlinking requirements of the library domain and to improve LD accessibility for domain expert users. At SWIB 2019 we will present our progress in the development of NAISC including an improved graphical user-interface (GUI), user testing results, and a demonstration of NAISC’s interlink provenance components.
NAISC consists of an Interlinking Framework, a Provenance Model and a GUI. The Framework describes the steps of entity selection, link-type selection, and RDF generation for the creation of interlinks between entities, such as people, places, or works, stored in a library dataset to related entities held in another institution. NAISC specifically targets librarians by providing access to commonly used datasets and ontologies. NAISC incudes interlink provenance to allow data users to assess the authoritativeness of each link generated. Our provenance model adopts PROV-O as the underlying ontology which we extended to provide interlink specific data. An instantiation of NAISC is provided through a GUI which reduces the need for expert LD knowledge by guiding users in choosing suitable link-types.
We will present NAISC and demonstrate the use of our GUI as a means of interlinking LD entities across libraries and other authoritative datasets. We will also discuss our user-evaluation processes and results, including a NAISC usability test, a field test/real-word application of NAISC, and a review of the interlink quality. Finally, we will demonstrate our provenance model and discuss how the provenance data could be modelled as LD develops over time.
Slides
Video
|
14:45 - 15:10
|
Cool and the BnF gang : some thoughts at the Bibliothèque nationale de France about handling persistent identifiers.
Raphaëlle Lapôtre Bibliothèque nationale de France, France
“Cool URI’s don’t change,” as Tim Berners Lee wrote in 1998, implying that URIs should remain the same as long as possible. Still, cool URIs are also supposed to use web protocols to communicate information about the object they identify. This second requirement somehow contradicts the first, as web protocols are also mutable. Indeed, the recent transition of a lot of web resources towards the secure version of HTTP for indexing purposes brought change in URIs.
At the Bibliothèque nationale de France (BnF), this recent evolution of web technologies, combined with a revision of our indexing language RAMEAU, prompted us to start a thinking process about what in URIs should constitute durability - other than the immutability of the URI design.
For now, the BnF’s answer is twofold: a URI should be considered completely stable only if the identified resource is somehow digitally preserved. As the BnF has a tool for digital preservation at its disposal (SPAR), it should also have a tool that keeps track of changes in URIs.
Furthermore, the institution should communicate about its policy regarding the future of the URIs it is handling, even if this policy implies possible mutation or disappearing of the identifier: thus, reliability of the identifier wouldn’t solely depend on the permanence of its design, but also on the trustable transparency of the institution with regards to its commitment in maintaining identifiers.
Slides
Video
|
15:10 - 15:40
|
Coffee break |
15:40 - 18:00
|
OPEN SPACE |
|
Lightning talks
Exploring and mapping the category system of the world‘s largest public press archives
Joachim Neubert
ZBW - Leibniz Information Centre for Economics, Germany
Slides
The created work
Karen Coyle
kcoylenet
Slides
RDA Entity Finder DjokeDam:KB.nl
Djoke Dam
Koninklijke Bibliotheek, Netherlands
Slides
New Linked swissbibib worksflows
Jonas Waeber/Lionel Walter
Universitätsbibliothek Basel, Switzerland
Slides
dataslub-dresden.de: SLUB goes LOD!
Jens Nauber
SLUB Dresden, Germany
Slides
WikiCite
Jakob Voß
Verbundzentrale des GBV (VZG), Germany
Slides
ELAG2020
Uldis Bojars
Slides
Cook4Lib Journal author recruiting
Péter Király
GWDG, Germany
Slides
LOD and archives describing things/documenting actions
Oliver Schihin
State Archives of Basel-Stadt, Switzerland
Slides
Samvera Geo Predicates Working Group
John Huck
University of Alberta, Canada
Slides
|
|
Breakout sessions
Like the years before, we offer a time slot (ca. 16 - 18 h) for breakout sessions after the lightning talks. This is a possibility for you to get together with other participants over a specific idea, project, problem, to do hands-on work, discuss or write. We hope the breakout sessions will be used for a lot of interesting exchanges and collaboration.
Please let us and possible participants know in advance (Etherpad), and at the conference add your session to the breakout session board.
|
19:00 |
CONFERENCE DINNER |
09:00 - 09:45
|
KEYNOTE: Smart Data for Digital Humanities
Marcia Zeng Kent State University, United States of America
Smart data, a concept aligned with big data, can be simply explained as making sense out of big data, or, turning big data into actionable data. While the many “V”s of big data (volume, velocity, variety, variability, veracity) are crucial, the “V”alue of such data relies on the ability to achieve big insights from smart data—the trusted, contextualized, relevant, cognitive, predictive, and consumable data at any scale. The smart data concept is essential in relation to the role of libraries, archives, and museums (LAMS) in supporting Digital Humanities (DH) research. Rapid development of the digital humanities field creates a demand for bigger and smarter historical and cultural heritage data carried by information-bearing objects (textual or non-textual, digitized or non-digitized), which would typically not be obtainable through web crawling, scraping, real-time streams, mobile-analytics, or agile development methods. Increased funding for research in DH and innovative semantic technologies have enabled LAMs to advance their data into smart data, thus supporting deeper and wider exploration and use of data in DH research.
In this talk, the speaker shares an understanding of the what, why, how, who, where, and which data, in relation to smart data and digital humanities. Particular attention is given to the distinctive roles big data and smart data play in the humanities, which is seemingly marked by a methodological shift rather than a primarily technological one. It is the speaker’s belief that smart data will have extraordinary value in digital humanities.
Slides
Video
|
09:45 - 10:10
|
Digital sources and research data: linked and usable
Florian Kräutli / Esther Chen Max Planck Institute for the History of Science, Germany
We present the Max Planck Digital Research Infrastructure for the Humanities (MP-DRIH), which we developed to address an immediate need: the ability to maintain digital sources and research outputs in ways that they remain not only accessible, but also usable in the long term. Our ambition is to close the digital research lifecycle: to make sure that digital research outputs can be discovered, accessed, and reused. We achieve this through the adoption of a common model to represent our digital knowledge, and the implementation of linked open data technologies for data storage and exchange.
At the centre of our infrastructure is a Knowledge Graph, which makes all our digital artefacts – be they sources, annotations or entire research databases – centrally accessible. Key challenges are to bring data from various sources together in ways that retain the original context and detail, and to provide users with an environment in which they are able to make sense of this vast information resource. We address these challenges by harmonising all input data to a common model (CIDOC-CRM) that does not compromise the data's original expressivity. The resulting graph becomes usable through ResearchSpace, a software system based on the semantic data platform Metaphactory.
We have successfully implemented a pilot project to evaluate the feasibility of our approach and have implemented a production-ready first version of the entire system together with our partners.
Slides
|
10:10 - 10:40
|
Coffee break |
|
A LONG WAY TO GO |
10:40 - 11:05
|
Data modeling in and beyond BIBFRAME
Tiziana Possemato @Cult and Casalini Libri, Italy
Share-VDE has reached its production phase, with over 20 libraries involved in the analysis, conversion, enrichment and publication of their data (originally in MARC21) according to BIBFRAME. Unlike some software components, the deliverables (library datasets and the Cluster Knowledge Base) are open source and accessible as dump and/or as SPARQL end-point, available to enhance other databases. Share-VDE uses external sources in various formats (VIAF, ISNI, Wikidata, LC data etc.) in the enrichment process.
Share-VDE tools are evolving and aim to allow librarians wider and more direct interaction with bibliographic data expressed in Linked Data. The CKB editor and the URI Registry, two of the main tools for this direct interaction with data in LD, permit data to be validated, updated, controlled and maintained, ensuring a quality that massive and automated procedures cannot fully guarantee. The CKB Editor will be released in open source as an RDF/BIBFRAME Editor.
Since the CKB is released in two versions (Postgres database, for internal use, and RDF, for external use), the update procedures will connect with both versions, using APIs to align them through automatic and manual procedures.
Further analysis is also being carried out to verify how the BIBFRAME ontology complies to the requirements of a collective catalogue, intended as a catalogue produced by the integration of independent library catalogues. In this context, the SuperWork and Master Instance concepts implemented in Share-VDE are considered necessary extensions of the BIBFRAME ontology to better identify and qualify more specific entities. The presentation will touch upon these complex issues, typical of the current technological environment, but still conditioned by the dominant cataloguing tradition. Discussions and analysis of these topics, shared with the Library of Congress, will be presented.
Slides
Video
|
11:05 - 11:30
|
Empirical evaluation of library catalogues
Péter Király GWDG, Germany
The library community is in the transition period from Machine Readable Cataloguing (MARC) to some linked data based metadata schema. MARC is complex both as data structure and as semantic structure. This complexity leads to a wide range of errors. When we transform records from MARC to semantic schemas, we should not suppose that we have structurally and semantically perfect records. The aim of this presentation is to call attention to the typical issues revealed by an investigation of 16 library catalogues. The most frequent issue types are usage of undocumented schema elements, then improper values in places where a value should be taken from a dictionary, or should match to other strict requirements.
MARC has a number of features which makes validation a challenge such as intensive use of information compressing and encoding techniques, versions without versioning, dependency on internal and external data dictionaries, it is huge (the “core” MARC21 has about 3000 semantic data elements, while other versions defined several hundreds more), and the standard itself is not a machine-readable rule set. Some of these errors might block the transformation of the records, some others might survive in the new structure as well if we do not fix them.
The research aims to detect different issues of the metadata records. The foremost of them are those which do not fit the rules defined by the standard. Organized by structure the tool detects issues on record level, in control fields, in data fields, in indicators and in subfields. It also calculates completeness, Thompson-Traill completeness, runs a functional analysis based on the FRBR defined “user tasks” and provides a web based user interface. In the research process a tool has been built which contains an (exportable) object model of the standard.
Slides
Video
|
11:30 - 11:55
|
Design for simple application profiles
Karen Coyle / Tom Baker kcoylenet / Dublin Core Metadata Initiative
The Dublin Core Metadata Initiative has long promoted the notion of semantic interoperability on the basis of shared global vocabularies, or namespaces, selectively used and constrained for specific purposes in application profiles. For twenty years, the Dublin Core community has used application profiles for requirements ranging from building consensus about metadata structures and content within communities of practice to serving as templates for metadata creation and, more recently, as a basis for conformance validation and quality control. The emergence in the past few years of new validation languages (ShEx and SHACL) have provided an impetus for re-examining the long-elusive goal of making it easier for content experts to develop actionable profiles with user-friendly interfaces without having to rely on IT experts.
A DCMI Application Profiles Interest Group started in April 2019 aims at creating a core model for simple application profiles that can meet the most common, straightforward use cases. The talk will cover what the group has developed in a number of areas, including:
* Requirements and motivation for basic application profiles
* Comparison of some existing profile vocabularies
* Design patterns for simple constraints that can be used in profiles
* A proposed "core" vocabulary for the creation of application profiles
We will also report on the results of a Hack Day planned for the Dublin Core annual meeting in Seoul, Korea, in September, 2019.
Slides
Video
|
11:55 - 13:25
|
Lunch |
|
EXPLORING NEW WAYS |
13:25 - 13:50
|
SkoHub: KOS-based content syndication with ActivityPub
Adrian Pohl / Felix Ostrowski hbz, Germany / graphthinking GmbH
For a long time, openness movements and initiatives with labels like “Open Access”, “Open Educational Resources” (OER) or “Linked Science” have been working on establishing a culture where scientific or educational resources are by default published with an open license on the web to be read, used, remixed and shared by anybody. With a growing supply of resources on the web, the challenge grows to learn about or find resources relevant for teaching, studies, or research.
Current approaches provided by libraries for publishing and finding open content on the web are often focused on repositories as the place to publish content. Those repositories provide (ideally standardized) interfaces for crawlers to collect and index the metadata in order to offer search solutions on top. Besides being error-prone and requiring resources for keeping up with changes in the repositories, this approach also does not take into account how web standards work. In short, the repository metaphor guiding this practice obscures what constitutes the web: resources that are identified by HTTP URIs.
In this presentation, we describe the SkoHub project being carried out in 2019 by the hbz in cooperation with graphthinking GmbH. The project seeks to implement a prototype for a novel approach in syndicating content on the web. In order to do so, we make use of the decentralized social web protocol ActivityPub to build an infrastructure where services can send and subscribe to notifications for subjects defined in knowledge organization systems (KOS, sometimes also called “controlled vocabularies”). While the repository-centric approach favours content deposited in a repository that provides interfaces for harvesting, with SkoHub any web resource can make use of the notification mechanism.
Slides
Video
|
13:50 - 14:15
|
Proposing rich views of linked open data sets : the S-paths prototype and the visualization of FRBRized data in data.bnf.fr
Raphaëlle Lapôtre / Marie Destandau / Emmanuel Pietriga Bibliothèque nationale de France, France / Institut National de Recherche en Informatique et en Automatique, France
During the year 2011, the National Library of France (BnF) launched its open data service, the data.bnf.fr project. Alongside a SPARQL endpoint providing a dynamic query service, an interface also aims at directing users among the BnF collections thanks to FRBRized metadata.
In 2017, the idea of proposing a visual recommendation system based on Linked Open Data technologies emerged from discussions with data.bnf.fr users. As this idea implied visualizing large and complex datasets, the data.bnf.fr team decided to partner with the Human Computer Interaction research team ILDA (Interacting with Large Data).
This collaboration contributed to the design of S-paths, an interactive data visualization interface that aims at providing meaningful insights about Linked Datasets. S-Paths allows users to navigate linked open data more intuitively by systematically presenting the most readable view for a given set or subset of similar entities.
The main obstacles encountered during this experiment include the heterogeneity of the data, which challenged the system’s selection algorithm, as well as performance issues when querying SPARQL endpoints, partly related to the complexity of the FRBR model. Despite those difficulties, S-paths proved very useful to reveal defects in data sources, visualize modeling specificities, and show trends in the data that can be used for communication towards end users.
Slides
Video
|
14:15 - 14:40
|
Target vocabulary maps
Niklas Lindström National Library of Sweden
The union catalogue of the National Library of Sweden now has a core based on linked data structures. In the course of our continued development we have begun exploring the feasibility of semantic technologies (RDFS and OWL) for more wide and open-ended data integration.
OWL inferencing is sometimes considered a foundational mechanism for automatic semantic interoperability. But we need to go about it differently in order for semantic mappings to become a practical tool for effective data integration. OWL reasoners are rarely used when automating ingestion or publication of linked data. Reasoners aren't readily targeted at specific applications, but expand all possible implications of a set of statements, yielding rather unwieldy data.
Meanwhile the web of data at large continues to grow, and between organizations more and more integration needs crop up. These needs commonly have to be solved right now, thus requiring us to write up more custom integration code, with little reuse even within one organization. It is not uncommon to device custom mappings between RDF vocabularies as a part of these often complex ETL pipelines. Sometimes, these use SPARQL, sometimes XSLT, sometimes custom code with various non-portable dependencies.
We're exploring an arguably simpler approach to address a given set of described use cases. It is based on preprocessing of vocabulary data, scanning their mappings and creating a target map from each known property and class to a predefined selection of desired target properties and classes. This computed "target vocabulary map" is then used when reading input data using the known terms, to produce the selected target description. This yields more predictive results tailored for conceived use cases.
This presentation will elaborate on these considerations and explore the proposed solution, including limitations and possible shortcomings.
Slides
Video
|
14:40- 15:05
|
Lessons from representing library metadata in OCLC research’s Linked Data Wikibase prototype
Karen Smith-Yoshimura OCLC, United States of America
This presentation highlights key lessons from OCLC Research’s Linked Data Wikibase Prototype (“Project Passage”), a 10-month pilot done in 2018 in collaboration with metadata specialists in 16 U.S. libraries. Our Wikibase prototype provided a framework to reconcile, create, and manage bibliographic and authority data as linked data entities and relationships. We chose to install a local Wikibase instance for its built-in features and generating entities in RDF without requiring technical knowledge of linked data; we could focus on what participants needed beyond the initial set of capabilities. It served as a “sandbox” for participants to experiment with describing library and archival resources in a linked data environment. Participants showcased “use cases”: non-English descriptions, visual resources, archives, a musical work, and events. Among the lessons learned:
* The building blocks of Wikibase can create structured data with a precision exceeding current library standards.
* The Wikibase platform, supplemented with OCLC’s enhancements and stand-alone utilities, lets librarians see the results of their effort in a discovery interface without leaving the metadata-creation workflow.
* Populating knowledge graphs with library metadata requires tools for importing and enhancing data created elsewhere.
* The pilot underscored the need for interoperability between data sources, both for ingest and export.
* The traditional distinction between authority and bibliographic data disappears in a Wikibase description.
* Transitioning from human-readable records to knowledge graphs represents a paradigm shift.
Slides
Video
|
15:05 - 15:10 |
Closing |
15:10 - 15:30 |
Farewell coffee |