Note that all times are displayed in UTC. Clicking on a time display will show your local time.
DAY 1 | 2020-11-23 CONFERENCE |
13:00-14:00h UTC | COLLOCATED EVENT: DINI-AG KIM MEETING |
Jana Hentschke / Alexander Jahnke DINI-AG Kompetenzzentrum Interoperable Metadaten (KIM) Virtual public meeting of the DINI-AG KIM. KIM is a forum for German-speaking metadata experts from LAM institutions. The Meeting will be held in German. Agenda | |
14:00-15:00h UTC | OPENING / KEYNOTE Opening Silke Schomburg North Rhine-Westphalian Library Service Center (hbz), Germany |
KEYNOTE: Open Data & Social Innovation: Experiences from Taiwan Audrey Tang Digital Minister, Taiwan AbstractWhen we see “internet of things”, let’s make it an internet of beings. |
|
15:00-15:30h UTC | Coffee break |
15:30-16:30h UTC | AUTOMATED SUBJECT INDEXING |
Automatic indexing of institutional repository content using SKOS Ricardo Eito-Brun Universidad Carlos III de Madrid, Spain AbstractThe lack of well-defined indexing practices is a common problem in most institutional repositories. Researchers typically assign keywords to their submissions, these terms, however are not extracted from a controlled vocabulary or thesaurus. This leads to ambiguity and lack of specificity in the terms which are used to describe the content of their contributions. |
|
Annif and Finto AI: DIY automated subject indexing from prototype to production Osma Suominen / Mona Lehtinen / Juho Inkinen The National Library of Finland, Finland AbstractThe first prototype of Annif (annif.org), the multilingual automated subject indexing tool, was created at the National Library of Finland in early 2017. Since then, the open source tool has grown from an experiment into a production system. Through its REST API it has been integrated, into the document repositories of several university libraries, the metadata workflows of the book distributor Kirjavälitys Oy that serves publishers, bookshops, libraries and schools, into the Dissemin service for publishing academic papers in open repositories, and into the automated subject indexing service Finto AI (ai.finto.fi) that was launched in May 2020 as a companion to the Finto thesaurus and ontology service. In the meantime, we have organized workshops and tutorials around Annif and automated indexing as well as grown an international community of users and developers. |
|
AutoSE@ZBW: Building a productive system for automated subject indexing at a scientific library Anna Kasprzik / Moritz Fürneisen / Christopher Bartz ZBW - Leibniz Information Centre for Economics, Germany AbstractAt ZBW we have been developing prototype machine learning solutions for an automated subject indexing in the context of applied research for several years now. However, and as of 2019 these solutions were yet to be integrated into the metadata management system and into the subject indexing workflows at ZBW. It turns out that building a corresponding software architecture is a challenge on another level which requires additional resources on top of those for academic research as well as additional expertise. In order to create a productive system that makes these machine learning solutions usable in practice and that allows a continuous development we need to look at aspects such as user and data interfaces, suitable development and test environments, system stability, modularity and continuous integration. |
DAY 2 | 2020-11-24 CONFERENCE |
14:00-16:30h UTC | WORKSHOPS |
Automated subject indexing with Annif Osma Suominen / Mona Lehtinen / Juho Inkinen / Anna Kasprzik / Moritz Fürneisen National Library of Finland, Finland / ZBW - Leibniz Information Centre for Economics, German AbstractDue
to the proliferation of digital publications, intellectual subject
indexing of every single literature resource in institutions such as
libraries is no longer possible. For the task of providing subject-based
access to information resources of different kinds and with varying
amounts of available metadata, it has become necessary to explore
possibilities of automation. The workshop Automated subject indexing with Annif is not recorded. Workshop material with step by step walkthroughs and video recordings for self-learning is available online. For more information on Annif, see also the Annif homepage and wiki. Last but not least don't hesitate contacting us e.g. via the Annif user group. |
|
Making use of the coli-conc infrastructure for controlled vocabularies Jakob Voß / Stefan Peters Verbundzentrale des GBV, Germany AbstractProject coli-conc has created an infrastructure to facilitate management and exchange of concordances between library knowledge organization systems. The most visible outcome of the project is Cocoda, a web application that simplifies the creation and evaluation of mappings between concepts from different classifications, thesauri, and other controlled vocabularies. This tutorial will give an introduction to the infrastructure that allows to work with controlled vocabularies from diverse sources. After a brief introduction to the architecture, the data format JSKOS, its API, and utility node packages, we will live code a small cataloging application for semantic tagging of resources with concepts from controlled vocabularies. Active participation requires basic knowledge of JavaScript and HTML. The Workshop Making use of the coli-conc infrastructure for controlled vocabularies is not recorded. Workshop material and slides for self-learning are available. The coli-conc homepage also provides pointers to screencasts, documents, and source code. Last but not least don't hesitate contacting us! |
|
Managing and Preserving Linked Data with Fedora David Wilcox LYRASIS, Canada AbstractFedora
is a flexible, extensible, open source repository platform for
managing, preserving, and providing access to digital content. Fedora is
used in a wide variety of institutions including libraries, museums,
archives, and government organizations. For the past several years the
Fedora community has prioritized alignment with linked data best
practices and modern web standards. We are now shifting our attention
back to Fedora's digital preservation roots with a focus on durability
and the Oxford Common File Layout (OCFL). This workshop will provide an
introduction to the latest version of Fedora with a focus on both the
linked data and digital preservation capabilities. Both new and existing
Fedora users will be interested in learning about and experiencing
Fedora features first-hand. The SWIB20 workshop Introduction to Fedora & Fedora 6.0 and the Oxford Common File Layout are not recorded. Workshop Slides for Introduction to Fedora are available online. Workshop Slides for Fedora 6.0 and the Oxford Common File Layout are also available online. For more information on Fedora, please see the Fedora homepage and follow our progress on the Road to Fedora 6.0. You can also follow along on our blog. Please feel free to reach out to contact us at any time through our Fedora Community communication channel. |
|
Using SkoHub for web-based metadata management & content syndication Adrian Pohl / Steffen Rörtgen hbz, Germany / GWDG, Germany AbstractAuthority
files, thesauri and other controlled vocabularies systems have long
been a key element for knowledge management. Frequently a controlled
vocabulary is used in cataloguing by different institutions, thus
indirectly connecting resources about one topic. In order to find all
resources about one topic, one has to query different databases or
create and maintain a discovery index. This approach is error-prone and
requires high maintenance. The SWIB20 workshop Using SkoHub for web-based metadata management & content syndication is not recorded. Workshop material with step by step walkthroughs and video recordings for self-learning is available online. For more information on SkoHub, see also the SkoHub homepage and blog posts. Last but not least don't hesitate contacting us: skohub[at]hbz-nrw[dot]de |
DAY 3 | 2020-11-25 CONFERENCE |
14:00-15:00h UTC | BIBFRAME |
Developing BIBFRAME application profiles for a cataloging community Paloma Graciani-Picardo / Nancy Lorimer / Christine DeZelar-Tiedman / Nancy Fallgren / Steven Folsom / Jodi Williamschen Harry Ransom Center, University of Texas at Austin / Stanford University / University of Minnesota / National Library of Medicine / Cornell University / Library of Congress AbstractAs libraries experiment with integrating BIBFRAME (BF) data into library workflows and applications, it is increasingly clear that there is little to no formal agreement on what a baseline BF description might be, and even how specific properties are modeled in what is a very flexible ontology. This basic agreement is imperative, at least in these early days, for data producers and developers in building out and implementing practical workflows and viable interactions among disparate data sources; the more flavors that need to be dealt with, the more difficult initial implementation will be. Additionally, there is little consensus on how to integrate BF and RDA, our primary cataloging standard, and it is difficult to move ahead without a basic mapping. The Program for Cooperative Cataloging (PCC), with its close connection to the Library of Congress and the Linked Data for Production grants, and its focus on standards-building in the MARC cataloging community, is well set up to develop standards and become a steward of well-formed BF. To that end, the PCC Sinopia Application Profiles Task Group is working on developing BF application profiles through creating PCC templates in Sinopia, the linked data editor developed in Linked Data for Production 2 grant, to serve as the basis for metadata creation by the PCC community. In this talk, we discuss the community process and challenges encountered in creating applications profiles through template development, including modeling questions, technical challenges, and reconciling BF with RDA and PCC standards. |
|
Sinopia Linked Data Editor Jeremy Nelson Stanford University Libraries, United States of America AbstractThe Sinopia Linked Data environment is a Mellon foundation funded project that provides catalogers a native Linked-Data editor with a focus on the BIBFRAME ontology for describing resources. Following an iterative Agle development process, Sinopia is currently in it's third version with improved user interface and third-party integrations based on continuous feedback from an international cohort of users. This presentation will start off with a high-level introduction of Sinopia, followed by cataloger data workflows using external authority sources like Library of Congress and ShareVDE and supporting technologies, and finishing with the new and planned features, including machine learning RDF classification, in the upcoming year. |
|
Cataloging rare books as linked data: a use case Paloma Graciani-Picardo / Brittney Washington Harry Ransom Center - University of Texas at Austin AbstractLinked Data for Production phase 2 (LD4P2), a two-year pilot supported by the Andrew W. Mellon Foundation wrapped up in May 2020. As a member of the LD4P2 cohort, the Harry Ransom Center is eager to share with the community some of our activities within the project and lessons learnt. Application profiles have been at the core of LD4P2 activities and the Ransom Center has supported this effort with the evaluation of ontologies, models, vocabularies and best practices for item-level description of rare and special collection materials in a linked data collaborative environment. In this presentation, we will discuss our work analyzing MARC to BIBFRAME conversion, defining local workflows for linked data cataloging and self-training strategies, and developing an application profile for special collections materials. We will do a quick review of the existing ontologies and controlled vocabularies relevant to the project, and present data modeling approaches and challenges. Finally, but no less important, we will emphasize the value of special collections community engagement in these types of projects and the need for continued collaboration beyond the grant. |
|
15:00-15:30h UTC | Coffee break |
15:30-16:30h UTC | AUTHORITIES |
BIBFRAME instance mining: Toward authoritative publisher entities using association rules Jim Hahn University of Pennsylvania, United States of America AbstractThe catalyst for this talk stems from work within the Share-VDE initiative,
a shared discovery environment based on linked data. The project
encompasses enrichment with linked open data and subsequent conversion
from MARC to BIBFRAME/RDF and creation of a cluster knowledge base made
up of over 400 million triples. The resulting BIBFRAME network is
comprised of the BIBFRAME entities Work and Instance, among other
Share-VDE specific entities. |
|
Generating metadata subject labels with Doc2Vec and DBPedia Charlie Harper Case Western Reserve University, United States of America AbstractPrevious
approaches to metadata creation using unsupervised learning have often
centered on generating document clusters, which then require manual
labeling. Common approaches, such as topic modelling with Latent
Dirichlet Allocation, are also limited by the need to determine the
number of clusters prior to training. While this is useful for finding
underlying relationships in corpora, unlabeled clustering does not
provide an ideal way to generate metadata. In this presentation, I
examine one way that unsupervised machine learning and linked data can
be employed to generate rich metadata labels for textual resources and
thereby improve resource discovery. |
|
Automated tools for propagating a common hierarchy from a set of vocabularies Joeli Takala National Library of Finland, Finland AbstractThrough
finto.fi the National Library of Finland publishes controlled
vocabularies for subject indexing and linking data. Linked Open Data
formats such as SKOS, also enable us to combine several vocabularies
into a common repository of concepts from various sources. The purpose
is to expand one general-purpose vocabulary with others of more specific
fields of knowledge in a way that enables us to cover a wider context
with one vocabulary. The problem is in assessing and ensuring the
interoperability of each vocabulary when used in this manner. |
DAY 4 | 2020-11-26 CONFERENCE |
14:00-15:00h UTC | IDENTIFIERS |
Integration and organization of knowledge in a Current Research Information System (CRIS) based on semantic technologies Ana Maria Fermoso García / Maria Isabel Manzano García / Julian Porras Reyes / Juan Blanco Castro Pontifica University of Salamanca, Spain AbstractWe
present OpenUPSA, a system based, inter alia, on semantic technologies.
It is a project developed in collaboration with the university library,
and whose goal is to share with society information about research at
the university, about its agents and its scientific production. The
result is a software system that can be regarded as a Current Research
Information System (CRIS). |
|
ORCID for Wikidata: A workflow for matching author and publication items in Wikidata Eva Seidlmayer ZB MED - Information Centre for Life Sciences, Germany AbstractIn
the context of a bibliometric project, we retrieved social context
information on authors of scientific publications from Wikidata in order
to import it into the metadata of our dataset. While we were able to
capture about 95% of the requested scholarly publications in Wikidata,
only 3% of the authors could be assigned and used for retrieval of
social context information. One reason probably was that authors in
general are rarely curated in Wikidata. Whilst research papers account
for 31.5% of the items, it is only 8.9% which represent humans in
general, not even researchers in particular (according to Wikidata
statistics, January 2020). Another reason we observed is the frequent
absence of relations between the Wikidata item of a publication and the
Wikidata item of the author(s), although the author is already listed. |
|
id.loc.gov and Wikidata, one year later. Matt Miller Library of Congress, United States of America AbstractThe id.loc.gov linked data platform at the Library of Congress has been ingesting Wikidata identifiers since mid-2019. This process has enabled the connection of over 1.2 million links between the two systems. These links are powered by Wikimedians adding a NACO or LCSH LCCN identifier to a Wikidata entity which then flows into the ID platform. Due to the scale and nature of Wikidata there is a high velocity of change to this data. New connections are made and broken everyday in addition to ancillary data changes like Wikidata labels. This talk will present the analysis of this ingest process from 2019 and 2020. We will take a detailed look at trends that emerged from this analysis as well as a holistic look at linked records in both systems. Topics around vandalism, comparing record completeness in the two systems, and change frequency will be explored. As more data from the Wikimedia ecosystem is leveraged in our bibliographic systems it is important to understand the dynamics and differences between the two worlds. |
|
15:00-15:30h UTC | Coffee break |
15:30-16:30h UTC | PROJECTS |
Changing the tires while driving the car: A pragmatic approach to implementing linked data David Seubert / Shawn Averkamp / Michael Lashutka American Discography Project, University of California, Santa Barbara, USA / AVP, USA / PropertlySorted Database Solutions, Beacon, NY, USA AbstractThe Discography of American Historical Recordings (DAHR) is an online database of sound recordings made by American record companies during the 78rpm era. Based at the University of California, Santa Barbara, DAHR now includes authoritative information on over 300,000 master recordings by over 60,000 artists and has 40,000 streaming audio files online. To provide even more context for researchers using the database, DAHR editors chose to use linked data to enrich the database with information from other open data sources. With funding from the Library of Congress National Recording Preservation Board, UCSB engaged consultants at AVP in the development of a strategy for enriching DAHR by mining public data. After the harvesting and integration of data for over 18,000 names from Library of Congress Name Authority File, Wikidata, and MusicBrainz, users can now find Wikipedia biographies, photographs, and links to additional content at many other databases, such as LP reissues in Discogs, record reviews on Allmusic, or streaming audio on Spotify as well as links to names in other authority files like VIAF. In this presentation, we will share our process of harvesting data, retrofitting DAHR’s underlying FileMaker Pro data model and workflows to accommodate the addition of this new data and the minting of URIs, and leveraging the unique circumstances of the COVID-19 outbreak to redirect staff time towards quality control of this new data. We will also discuss current efforts to populate Wikidata and MusicBrainz with our newly minted URIs to provide broader entry and visibility to the DAHR database. |
|
Linked data for opening up discovery avenues in library catalogs Huda Khan Cornell University, United States of America AbstractExploring the integration of linked data sources into library discovery interfaces was an important goal for the recently concluded Linked Data For Production: Pathway to Implementation ( LD4P2) grant. We conducted a series of focused experiments involving user studies and the implementation of Blacklight-based prototypes. In this presentation, we will provide an overview of lessons learned through these experiments as well as subsequent discovery research as part of the ongoing Linked Data for Production: Closing the Loop (LD4P3) LD4P3 grant. Examples of areas we investigated for the integration of linked data include: knowledge panels bringing in contextual information and relationships from knowledge graphs like Wikidata to describe people and subjects related to library resources in the catalog; suggested searches based on user-entered queries using results from Wikidata and DbPedia; browsing experiences for subjects and authors bringing in relationships and data from Wikidata and library authorities; and autosuggest for entities represented in the catalog using supplementary information from FAST, the Library of Congress authorities, and Wikidata. Grant work also supported the development of Blacklight functionality for embedding Schema.org JSON-LD representation of some catalog metadata. We will also review opportunities for the larger community to engage in discussions around use cases and implementation techniques for using linked data in discovery systems. |
|
Using IIIF and Wikibase to syndicate and share cultural heritage material on the Web Jeff Keith Mixter OCLC, United States of America AbstractDigitized
cultural heritage material is ubiquitous across the library, archive,
and museum landscape but the material descriptions can vary based on
domain, institutional best practices, and the amount of effort dedicated
to digitization programs. OCLC Research has spent the past few years
exploring two primary functions of digital material management:
syndication of the material for research use and best metadata
management practices for discoverability. This work is closely tied to
OCLC’s participation in the IIIF Community. |
DAY 5 | 2020-11-27 CONFERENCE |
14:00-16:30h UTC | OPEN SPACE |
Lightning Talks
Semantic MediaWiki Share VDE: A facilitator for the library community Building Wikidata one Branch of Knowledge at a Time W3C Entity Reconciliation CG Building the SWIB20 participants map Linking K10plus library union catalog with Wikidata |
|
Breakout sessions | |
hbz
Adrian Pohl
T. +49-(0)-221-40075235
E-mail
swib(at)hbz-nrw.de
ZBW
Joachim Neubert
T. +49-(0)-40-42834462
E-mail j.neubert(at)zbw.eu
Twitter: #swib20