|DAY 1 | 2016-11-28 PRECONFERENCE|
|10:00 - 12:00||COLLOCATED EVENTS
Treffen der DINI AG KIM (Meeting of the DINI AG KIM, Germany)
Stefanie Rühle / Jana Hentschke
DINI AG KIM
Metafacture User Meeting
|13:00 - 19:00||WORKSHOPS AND TUTORIALS
Introduction to Linked Open Data
Jana Hentschke / Christina Harlow / Uldis Bojars
German National Library / Cornell University / National Library of Latvia
This introductory workshop aims to introduce the fundamentals of linked data technologies on the one hand, and the basic issues of open data on the other. The RDF data model will be discussed, along with the concepts of dereferenceable URIs and common vocabularies. The participants will continuously create and refine RDF documents about themselves including links to other participants to strengthen their knowledge of the topic. Based on the data created, the advantages of modeling in RDF and publishing linked data will be shown. On a side track, Open Data principles will be introduced, discussed and applied to the content that is being created during the workshop. Attendees are not expected to have any technical, RDF or Linked Open Data experience. We do ask that attendees bring a laptop with a modern web browser for participation.
IIIF: Linked-Data to Support the Presentation and Reuse of Image Resources
Cornell University, United States of America
The International Image Interoperability Framework (IIIF) defines simple APIs and data formats to give scholars uniform and rich access to image-based resources hosted around the world. The community aims to develop, cultivate and document shared technologies, including image servers and web clients that provide a world-class user experience in viewing, comparing, manipulating and annotating images. While the framework supports use by non-semantic clients using JSON, all data models and formats are based on linked data. The IIIF thus provides both a rich environment for semantically aware clients, and an opportunity for support within larger linked-data systems. We believe that the rigor of developing within the framework of linked-data has helped in the development of clean semantics and provides a solid foundation for future work and extension. This workshop will briefly introduce the IIIF Image API, for image access and manipulation, and survey current client application. It will then focus on the IIIF Presentation API and the detailed description how resources are organized, related and presented. Participants will have the opportunity to work through some hands-on examples of manipulation of IIIF presentation information, using JSON-LD and linked-data tools, in order to support different interaction and viewing experiences. Audience: developers, systems librarians, those interested in image provision and presentation. Expertise: Participants should be familiar with the basics of linked data and RDF. In order to work through some hands-on examples then you'll need the ability to edit and run simple (Python) programs. Participants not wanting to do this directly could pair with others to work through examples together. Required equipment to work through examples: Laptop with the ability to run Python (2.7 or 3.x) and internet access to download code and modules. I can provide assistance with setup on Linux/OSX but not on Windows machines. (If you would be able and willing to help others in the workshop using Windows then please let me know.)
FREME: A Framework for Multilingual and Semantic Enrichment of Digital Content
Felix Sasaki / Phil Ritchie / Jan Nehring / Pieter Heyvaert / Kevin Koidl
DFKI, Germany / Vistatec, Ireland / iMinds, Belgium / Wripl Technologies Limited
This workshop introduces FREME, a framework for multilingual and semantic enrichment of digital content. FREME provides 1) a growing set of general knowledge sources and sources from the library and other domains for enrichment; 2) several widely used content formats as input and output of enrichment processes; 3) the ability to re-use published linked data in enrichment pipelines, and 4) the application of standards and best practices like the W3C Web Annotation model or NIF. The purpose of the FREME framework is to give access to so-called e-Services that provide certain enrichment functionalities, like named entity recognition or terminology annotation. The access is given via APIs and graphical user interfaces, e.g. a plugin for the CKEditor. The FREME documentation provides details on how to access the e-Services. The FREME framework is available as a set of configurable components. Most of the components and e-Services are available under Apache 2.0 license and hence suitable for (commercial) use. All codes are on GitHub, and contributions are very welcome. The goal of this workshop is to enable various types of users to work with FREME. We will start with examples for API users who want to deploy the e-Services endpoints relying on the publicly available FREME installation. We then will show how to install FREME on your own server and how to deploy selected components. Finally we will show how to parameterize FREME, e.g. by working with custom data sets for enrichment or by providing your own e-Service. Participants of the workshop should have basic knowledge of linked data and RESTful web services. We will contact registered participants before the workshop to provide use cases for semantic and multilingual enrichment in the realm of SWIB. We then will provide the participants with material to implement the use cases in the public FREME installation, or in their own installation. For the latter case, participants need to fulfill certain hardware requirements that will be shared before the workshop.
d:swarm - A Data Management Platform for Knowledge Workers
Thomas Gängler / Thomas Gersch / Christof Rodejohann
SLUB Dresden, Germany
d:swarm is a data management platform intended for knowledge workers, e.g., system librarians or metadata librarians. Right now, the focus of this application is on realizing one of the most import steps of an ETL task - the transformation part. Our handy d:swarm Back Office UI can be utilized to create and verify mappings of a certain data source (with help of a sample dataset) to a certain target schema (e.g. schema.org bib extension). This GUI simply runs as a web application in your browser. Besides, one can apply the created and verified mappings with help of the d:swarm Task Processing Unit to process larger amounts of data (i.e. the real datasets). Afterwards, the mapped/resulted data can be imported into various data stores and exported to different formats, like search engine indices (e.g. Solr) or Linked Data. Workshop participants will learn how to use and interact with the d:swarm Back Office UI and d:swarm Task Processing Unit. Furthermore, some background about the design and architecture of the whole d:swarm application will be imparted. Finally, we will show how one can share (self-made) mappings rather easily with the rest of the (library) world. This workshop should treat a common, full ETL workflow of library data processing, i.e. preprocessing and/or harvesting of source data (optional), data mapping with help of d:swarm Back Office UI + applying the mappings to larger amounts of data with help of the d:swarm Task Processing Unit, loading the resulting data into a data store (to be concrete search engine index) + exporting as Linked Data, showing an application (catalogue frontend) that retrieves data from that datastore and uses Linked Data. Audience: systems librarians, metadata librarians, knowledge workers, data mungers. Expertise: Participants could be familiar with ETL Tools (GUI), e.g., OpenRefine or Talend. Required: Laptop with VirtualBox installed. Organizers will provide a VirtualBox image (Linux guest system) beforehand. Participants can also install their own environment. Programming experience: Not required. (However, domain knowledge and/or knowledge about the data formats themselves can and should be an advantage).
Catmandu & Linked Data Fragments
Patrick Hochstenbach / Carsten Klee / Johann Rolschewski
Berlin State Library, Germany / Ghent University Library, Belgium
"Catmandu" is a command line tool to access and convert data from your digital library, research services or any other open data sets. The "linked data fragments" (LDF) project developed lightweight tools to publish data on the web using the Resource Description Framework (RDF). In combination both projects offer an easy way to transform your data to RDF and provide access via a graphical user interface (GUI) and application programming interface (API). We will present all required tools at the workshop. The participants will be guided to transform data to RDF, to host it with a LDF server and to run SPARQL queries against it. The participants should install a virtual machine (VM) as a development environment on their laptops, see here for further information. Audience: Systems librarians, Metadata librarians, Data manager. Expertise: Participants should be familiar with command line interfaces (CLI) and the basics of RDF.
Linked Open Development
Fabian Steeg / Adrian Pohl
Hochschulbibliothekszentrum NRW (hbz)
Increasingly, software in libraries (Software in Bibliotheken, SWIB) is developed as open source software by distributed communities in what could be described as linked open development (LOD). This workshop will introduce you to this way of developing software, its tools, and processes. It will empower you to both set up your own development projects in this way, and contribute to existing projects. Workshop topics are distributed version control basics, open source development workflows, markdown for issues and documentation and workflow visualization with Kanban boards. The current center of the open source community, both in the library world and beyond, is GitHub, a social network for software development. In different exercises, the workshop will introduce you to managing your source code with git, to tracking your issues on GitHub, to integrated development and review tools like Travis CI and Waffle boards, and to using the GitHub API for programmatic access to your data. Audience: developers and librarians involved in software projects; no previous experience needed; requirements: laptop with git, a modern web browser and text editor installed.
|DAY 2 | 2016-11-29 CONFERENCE|
|09:00 - 10:15||WELCOME / OPENING
Tom Baker / Silke Schomburg / KlausTochtermann
Friedrich-Ebert-Stiftung / North Rhine-Westphalian Library Service Center (hbz) / ZBW - Leibniz Information Centre for Economics, Germany
Keynote: (Packaged) Web Publication
World Wide Web Consortium (W3C)
The publication of EPUB3 has been a major step forward digital publishing. Relying on Web Technologies like HTML, CSS, SVG, and others, EPUB3 offers a solid basis to publish not only digital books, but all sorts of digital publications in a portable, adaptable and accessible manner. However, it is possible to bring the publishing and the Web world even closer together, making the current format- and workflow-level separation between offline/portable and online (Web) document publishing eventually disappear. These should be merely two dynamic manifestations of the same publication: content authored with online use as the primary mode can easily be saved by the user for offline reading in portable document form. Content authored primarily for use as a portable document can be put online, without any need for refactoring the content. Essential features flow seamlessly between online and offline modes; examples include cross-references, user annotations, access to online databases, as well as licensing and rights management.
|10:15 - 10:45||COFFEE BREAK|
|10:45 - 12:00||PROJECTS
Using LOD to crowdsource Dutch WW2 underground newspapers on Wikipedia
Olaf Janssen / Gerard Kuys
Koninklijke Bibliotheek, national library of the Netherlands / Wikimedia Nederland / DBpedia
During the second World War some 1.300 illegal newspapers were issued by the Dutch resistance. Right after the war as many of these newspapers as possible were physically preserved by Dutch memory institutions. They were described in formal library catalogues that were digitized and brought online in the ‘90s. In 2010 the national collection of underground newspapers – some 200.000 pages – was full-text digitized in Delpher, the national aggregator for historical full-texts. Having created online metadata and full-texts for these publications, the third pillar ''context'' was still missing, making it hard for people to understand the historic background of the newspapers. We are currently running a project to tackle this contextual problem. We started by extracting contextual entries from a hard-copy standard work on Dutch illegal press and combined these with data from the library catalogue and Delpher into a central LOD triple store. We then created links between historically related newspapers and used Named Entity Recognition to find persons, organisations and places related to the newspapers. We further semantically enriched the data using DBPedia. Next, using an article template to ensure uniformity and consistency, we generated 1.300 Wikipedia article stubs from the database. Finally, we sought collaboration with the Dutch Wikipedia volunteer community to extend these stubs into full encyclopedic articles. In this way we can give every newspaper its own Wikipedia article, making these WW2 materials much more visible to the Dutch public, over 80% of whom uses Wikipedia. At the same time the triple store can serve as a source for alternative applications, like data visualizations. This will enable us to visualize connections and networks between underground newspapers, as they developed over time between 1940 and 1945.
|Improving data quality at Europeana: New requirements and methods for better measuring metadata quality
Péter Király / Hugo Manguinhas / Valentine Charles / Antoine Isaac / Timothy Hill
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen / Europeana Foundation, The Netherlands
Europeana aggregates metadata from a wide variety of institutions, a significant proportion of which is of inconsistent or low quality. This low-quality metadata acts as a limiting factor for functionality, affecting e.g. information retrieval and usability. Europeana is accordingly implementing a user- and functionality-based framework for assessing and improving metadata quality. Currently, the metadata is being validated (against the EDM XML schema) prior to being loaded into the Europeana database. However, some technical choices with regard to the expressions of rules impose limitations on the constraints that can be checked. Furthermore, Europeana and its partners sense that more than simple validation is needed. Finer-grained indicators for the 'fitness for use' of metadata would be useful for Europeana and its data providers to detect and solve potential shortcomings in the data. Beginning 2016, Europeana created a Data Quality Committee to work on data quality issues and to propose recommendations for its data providers, seeking to employ new technology and innovate metadata-related processes. This presentation will describe more specifically the activities of the Committee with respect to data quality checks: - Definition of new data quality requirements and measurements, such as metadata completeness measures; - Assessment of (new) technologies for data validation and quantification, such as SHACL for defining data patterns; - Recommendations to data providers, and integration of the results into the Europeana data aggregation workflow.
|Linked Open Data in Practice: Emblematica Online
Myung-Ja K. Han / Timothy W. Cole / Maria Janina Sarol / Patricia Lampron / Mara Wade / Thomas Stäcker / Monika Biel
University of Illinois at Urbana-Champaign. Library, United States of America / Herzog August Bibliothek Wolfenbüttel, Germany
Emblematica Online allows humanities scholars to seamlessly discover and link to items in a unique virtual emblem corpus distributed across six institutions in the US and Europe. The site supports multi-granular discovery of 1,400+ digitized emblem books and 25,000+ individual emblems from selective emblem books. To better integrate with related digital images and textual resources elsewhere, and to provide additional context for users, the site exploits linked open data (LOD) in two ways. First, as a producer of LOD, it publishes emblem and emblem book metadata as HTML+RDFa with schema.org semantics, making emblem resources more visible and useful in a linked open data context. Second, as a consumer of LOD, it enhances user experiences by utilizing LOD services and resources. For example, using the Iconclass LOD service, Emblematica Online supports multi-lingual browsing of the Iconclass vocabulary and connects users to digital sources elsewhere that share Iconclass descriptors. Also, it provides additional context about authors and contributors, including gender, nationality, and occupation, by reconciling names appearing in emblem metadata with LOD sources, such as the VIAF, DNB, and Wikipedia. This presentation discusses how Emblematica Online publishes its metadata as LOD and improves user experience using LOD sources as well as Emblem ontology development and plans for new services that allow possible reuse of Emblem LOD.
|12:00 - 13:30||LUNCH|
|13:30 - 15:10||VISUALIZATION & IMAGES
RDF by Example: rdfpuml for True RDF Diagrams, rdf2rml for R2RML Generation
Ontotext Corp, Bulgaria
RDF is a graph data model, so the best way to understand RDF data schemas (ontologies, application profiles, RDF shapes) is with a diagram. Many RDF visualization tools exist, but they either focus on large graphs (where the details are not easily visible), or the visualization results are not satisfactory, or manual tweaking of the diagrams is required. We describe a tool *rdfpuml* that makes true diagrams directly from Turtle examples using PlantUML and GraphViz. Diagram readability is of prime concern, and rdfpuml introduces various diagram control mechanisms using triples in the puml: namespace. Special attention is paid to inlining and visualizing various Reification mechanisms (described with PRV). We give examples from Getty CONA, Getty Museum, AAC (mappings of museum data to CIDOC CRM), Multisensor (NIF and FrameNet), EHRI (Holocaust Research into Jewish social networks), Duraspace (Portland Common Data Model for holding metadata in institutional repositories), Video annotation. If the example instances include SQL queries and embedded field names, they can describe a mapping precisely. Another tool *rdf2rdb* generates R2RML transformations from such examples, saving about 15x in complexity.
Towards visualizations-driven navigation of the scholarship data
Christina Harlow / Muhammad Javed / Sandy Payette
Cornell University, United States of America
One of the key goals of Cornell University Library (CUL) is to ensure preservation of the scholarly works being published by the Cornell faculty members and other researchers. VIVO is an open source and semantic technologies driven application that enables the preservation and open access of the scholarship across institutions. Driven by different needs, users look at VIVO implementation at Cornell from different viewpoints. The college requires the structure data for reporting needs. The library is interested in preservation of the scholarship data. University executives are interested in identifying the areas where they should invest in the forthcoming future. First, these viewpoints do not completely overlap with each. Second, current user interface represents the scholarship data in the list view format. Such representation of the scholarship data is not easy to use and consumable by the users. In this presentation, we present our ongoing work of integration of D3 visualizations into the VIVO pages. Such visualizations are constructed on the fly based on the underlying RDF data. A visualization-driven approach provides an efficient overview of the huge linked data network of interconnected resources. These visualizations are intuitive for the users to interact and offer the ability to visualize and navigate through the large linked data network. We discuss the performed (data) gap analysis as well as a few of the visualizations in detail and their integration into the VIVO framework.
|TIB|AV-Portal - Challenges managing audiovisual metadata encoded in RDF
Jörg Waitelonis / Margret Plank / Harald Sack
yovisto GmbH, Potsdam, Germany / German National Library of Science and Technology, Hannover, Germany / Hasso-Plattner-Institute, Potsdam, Germany
The TIB|AV-Portal provides access to high quality scientific videos from the topic area of technology/engineering, architecture, chemistry, information technology, mathematics, and physics in English as well as German language. A key feature of the portal is the use of automated video analysis technologies further enhanced by semantic analyses to enable pinpoint and cross lingual searches on video segment level and to display content-based filter facets for further exploration of the steadily increasing number of its video resources. Based on text-, speech- and image recognition text-based metadata are automatically extracted from the videos and mapped to subject specific GND subject headings via named entity linking. This results in an enrichment of the reliable authoritative metadata by time-based metadata from video analysis. In the talk, we present the strategy and implementation for the RDF-based metadata export of the TIB|AV-Portal to illustrate encountered challenges as well as to justify adopted solutions. This includes the ontology design, balancing the best possible compromise between granularity, simplicity, extensibility and sustainability. Since the data is partially generated by an automatic process it may contain errors or might be incomplete. Accordingly a closer inspection of the data quality is mandatory. Therefore, the main focus of the talk is on data cleansing methods to ensure the best possible quality with reasonable effort. This includes the presentation of requirements as well as the comparison of different approaches ranging from semi-automated methods to manual editing and override. We further demonstrate additional application scenarios based on the semantically annotated data, such as e. g. content based recommendations and exploratory search.
|Implementing the IIIF Presentation 2.0 API as a Linked Open Data Model in the Fedora Repository
Christopher Hanna Johnson
Akademie der Wissenschaften zu Göttingen, Germany
"The IIIF Presentation API specifies a web service that returns JSON-LD structured documents that together describe the structure and layout of a digitized object or other collection of images and related content." IIIF website The dynamic serialization of IIIF JSON-LD structured manifests via SPARQL CONSTRUCT is an interesting possibility that has great potential for cross-domain discovery and rendering of digitized objects with variable criteria. I have explored this possibility by implementing a data model in the Fedora Commons Repository that matches the specifications of the IIIF Presentation API. Fedora has the facility to index objects via Apache Camel directly to a triplestore. With SPARQL CONSTRUCT, the triplestore can serialize normalized JSON-LD as a graph. The use of "ordered lists" (aka collections) is a fundamental component of JSON-LD and necessary feature of the IIIF manifest sequence which is represented in a canonical RDF graph as a cascade of blank nodes. In order to dynamically create the sequence with SPARQL requires that the data is modelled identically to the IIIF specification. This gist is a representation of a compacted and framed JSON-LD graph that was serialized from a SPARQL query of Fedora metadata. The ability to assemble parts of distinct, disparate and disassociated digital objects on demand in one cohesive presentation becomes a real possibility. For example, the "range" object is equivalent to a part of a sequence, like a chapter in a book. With SPARQL, it is possible to target ranges from different "editions" based on a metadata specification (i.e. a person, place, or date) and unify them in a manifest object which is then rendered by a client viewer like OpenSeadragon.
|15:10 - 15:40||COFFEE BREAK|
|15:40 - 17:30||OPEN SPACE
This year's SWIB will again provide space for participants to meet, discuss and exchange ideas or results of their work on specific topics. Please, introduce your topic or question to discuss with a short statement in the preceding Lightning talks.
|19:00||CONFERENCE DINNER (BIERHAUS MACHOLD)|
|DAY 3 | 2015-11-30 CONFERENCE|
|09:00 - 10:15||OPENING
Keynote: Linked Open Community
They say “build it, and they will come”, but what happens if you build it and they don’t? Getting people involved with open source projects takes more than good software or even a compelling use case: it’s about infrastructure, governance, and culture. This talk will cover research, current thinking, and real-world strategies for increasing and diversifying participation in open source projects.
|Swissbib goes Linked Data
Felix Bensmann / Nicolas Prongué / Mara Hellstern / Philipp Kuntschik
GESIS – Leibniz Institute for the Social Sciences / University of Applied Sciences HEG Geneva / University of Applied Sciences HTW Chur
The project linked.swissbib.ch aims to integrate the Swiss library metadata into the semantic web. A Linked Data infrastructure has been created to provide on the one hand a data service for other applications and on the other hand an improved interface for the end user (e.g. a searcher). The workflow for the development of this infrastructure involves basically five steps: (1) data modeling and transformation in RDF, (2) data indexing, (3) data interlinking and enrichment, (4) creation of a user interface and (5) creation of a RESTful API. The project team would like to highlight some challenges faced during these stages, and the means found to solve them. This includes for example the conception of various use cases of innovative semantic search functionalities to give specifications for data modelling, data enrichment and for the design of the search index. Data processing operations such as transformation and interlinking must be highly scalable, with the aim of an integration in the workflow of the already existing system. Wireframes have been made to realize early usability evaluations. Finally, negotiations have been undertaken with the various Swiss library networks to adopt a common open license for bibliographic data.
|10:15 - 10:45||COFFEE BREAK|
|10:45 - 12:00||LINKING THINGS
Person Entities: Lessons learned by a data provider
John W. Chapman
Continuing the longstanding research program by OCLC in the field of linked data, recent projects have focused on creating sets of entities of high interest for any organization wanting to utilize linked data paradigms. Through intensive mining and clustering of WorldCat bibliographic data, name and subject authority files, and other related data sets, OCLC has produced over 300 million entity representations. These clusters pull together and represent creative works, and persons related to those works. OCLC has engaged with a number of libraries and organizations to create and experiment with this data. A pilot project during October 2015-February 2016 to explore new methods of providing access to Person entities provided a number of new directions and insights. The core purpose of the work is to understand how these entities might best be leveraged to make library workflows more efficient, and to improve the quality of metadata produced in the library sector. This presentation will provide a background on data used in the project, as well as the development of services and APIs to provision the data. It will address challenges and opportunities in the area of creating and managing entities, and ways in which they could be improved and enriched over time.
|Performing LOD: Using the Europeana Data Model (EDM) for the aggregation of metadata from the performing arts domain
Julia Beck / Marko Knepper
University Library Frankfurt am Main, Germany / University Library Mainz, Germany
Imagine a theatre play. There are contributors such as the playwright, director, actors, etc. The play may have several performances with changing casts while actors may contribute to other plays. The play might be based on a drama which also has a screen adaption. All this is documented in manuscripts, photos, videos and other materials. The more relations you find among these performance-related objects, the more it emerges as a perfect use case for linked data. At the University Library Frankfurt am Main, the Specialised Information Service Performing Arts aggregates performing arts-related metadata of artefacts gathered by German-speaking cultural heritage institutions. It is funded by the German Research Foundation and aims to give researchers access to specialized information by providing a VuFind-based search portal that presents the metadata modeled as linked and open data. The Europeana Data Model (EDM) offers a universal and flexible metadata standard that is able to model the heterogeneous data about cultural heritage objects resulting from the data providers’ variety of data acquisition workflows. Being a common aggregation standard in digitization projects a comprehensive collection of mappings already exists. With the amount of delivered manuscript data in mind, the DM2E-extension of EDM was used and further extended by the ECLAP-namespace covering the specific properties for the performing arts domain. The presentation will show real life examples and focus on the modeling as linked data and the implementation within the VuFind framework.
|Entitifying Europeana: building an ecosystem of networked references for cultural objects
Hugo Manguinhas / Valentine Charles / Antoine Isaac / Timothy Hill
Europeana Foundation, The Netherlands
In the past years, the number of references to places, peoples, concepts and time in Europeana’s metadata has grown considerably and with it new challenges have arisen. These contextual entities are provided as references as part of the metadata delivered to Europeana or selected by Europeana for semantic enrichment or crowdsourcing. However their diversity in terms of semantic and multilingual coverage and their very variable quality make it difficult for Europeana to fully exploit this rich information. Pursuing its efforts towards the creation of a semantic network around cultural heritage objects and intending in this way to further enhance its data and retrieval across languages, Europeana is now working on a long term strategy for entities. The cornerstone of this strategy is a “semantic entity collection” that acts as a centralised point of reference and access to data about contextual entities, which is based on the cached and curated data from the wider Linked Open Data cloud. While Europeana will have to address the technical challenges of integration and representation of the various sources, it will also have to define a content and curation plan for its maintenance. This presentation will highlight the design principles of the Europeana Entity Collection and its challenges. We will detail our plans regarding its curation and maintenance while providing the first examples of its use in Europeana users' services. We will also reflect on how our goals can fit our partners' processes and how can organizations like national cultural heritage portals and smaller institutions contribute to (and benefit from) such a project as a network.
|12:00 - 13:30||LUNCH|
|13:30 - 15:10||WHERE ARE WE?
From MARC silos to Linked Data silos
Osma Suominen / Nina Hyvönen
National Library of Finland, Finland
Many libraries are experimenting with publishing their metadata as Linked Data in order to open up bibliographic silos, usually based on MARC records, and make them more interoperable, accessible and understandable to developers who are not intimately familiar with library data. The libraries who have published Linked Data have all used different data models for structuring their bibliographic data. Some are using a FRBR-based model where Works, Expressions and Manifestations are represented separately. Others have chosen basic Dublin Core, dumbing down their data into a lowest common denominator format. The proliferation of data models limits the reusability of bibliographic data. In effect, libraries have moved from MARC silos to Linked Data silos of incompatible data models. Data sets can be difficult to combine, for example when one data set is modelled around Works while another mixes Work-level metadata such as author and subject with Manifestation-level metadata such as publisher and physical form. Small modelling differences may be overcome by schema mappings, but it is not clear that interoperability has improved overall. We present a survey of published bibliographic Linked Data, the data models proposed for representing bibliographic data as RDF, and tools used for conversion from MARC. We also present efforts at the National Library of Finland to open up metadata, including the national bibliography Fennica, the national discography Viola and the article database Arto, as Linked Data while trying to learn from the examples of others.
|Who is using our linked data?
Corine Deliot / Neil Wilson / Luca Costabello / Pierre-Yves Vandenbussche
British Library, United Kingdom / Fujitsu Ireland Ltd, Ireland
The British Library published the first Linked Open Data iteration of the British National Bibliography (BNB) in 2011. Since then it has continued to evolve with regular monthly updates, addition of new content (e.g. serials) and new links to external resources (e.g. International Standard Name Identifier (ISNI)). Data is available via deferenceable URIs, a SPARQL endpoint and RDF dataset dumps. There has been clear value to the Library in its linked data work, e.g. learning about RDF modelling and linked data. However, like many linked open data publishers, the Library has found it challenging to find out how the data has been used and by whom. Although basic usage data are captured in logs, there is currently no widely available tool to extract Linked Open Data insights. This makes it challenging to justify continued investment at a time of limited resourcing. This talk will report on collaboration between Fujitsu Laboratories Limited, Fujitsu Ireland and the British Library in the development of a Linked Open Data Analytics platform. The aim of the project was twofold: to examine Linked Open BNB usage and to potentially develop a tool of interest to the wider Linked Open Data community. We will describe the analytics platform and the functionality it provides as well as demonstrate what we found out about the usage of our data. Over the period under consideration (April 2014-April 2015) usage of the Linked Open BNB increased, and there was a discernible growth in the number of SPARQL queries relative to HTTP queries. Usage patterns were traced to the addition of new metadata elements or to linked data tuition sites or events.
|Linked Data for Production
Philip Evan Schreur
Stanford University, United States of America
The Mellon Foundation recently approved a grant to Stanford University for a project called Linked Data for Production (LD4P). LD4P is a collaboration between six institutions (Columbia, Cornell, Harvard, Library of Congress, Princeton, and Stanford University) to begin the transition of technical services production workflows to ones based in Linked Open Data (LOD). This first phase of the transition focuses on the development of the ability to produce metadata as LOD communally, the enhancement of the BIBFRAME ontology to encompass multiple resource formats, and the engagement of the broader academic library community to ensure a sustainable and extensible environment. As its name implies, LD4P is focused on the immediate needs of metadata production such as ontology coverage and workflow transition. In parallel, Cornell also has been awarded a grant from the Mellon Foundation for Linked Data for Libraries-Labs (LD4L-Labs). LD4L-Labs will in turn focus on solutions that can be implemented in production at research libraries within the next three to five years. Their efforts will focus on the enhancement of linked data creation and editing tools, exploration of linked data relationships and analysis of the graph to directly improve discovery, BIBFRAME ontology development and piloting efforts in URI persistence, and metadata conversion tool development needed by LD4P and the broader library community. The presentation will focus on a brief description of the projects, how they interrelate, and what has been accomplished to date. Special emphasis will be given to extensibility and interactions with the broader LOD community.
|How We Killed Our Most-Loved Service and No One Batted an Eye
Matias Mikael Frosterus / Mikko Kalle Aleksanteri Lappalainen
The National Library of Finland, Finland
Controlled vocabularies and IT systems enabling their use have been in the forefront of library work for decades. In the National Library of Finland the national bibliography has been indexed using the YSA general thesaurus since the 1980s. A dedicated browser called VESA was developed in 1999 in order to eliminate the need to publish YSA as a printed document. In user surveys, VESA continually ranked as our most loved service. However, as years went on it became more difficult to integrate VESA’s old code to new environments. When the time came to renew VESA, library world was already buzzing with open linked data, semantic web etc. So it was decided that the new system should provide YSA and other vocabularies as open linked data with the ability to integrate the vocabularies to other systems using modern APIs. In 2013 work begun on the national ontology and thesaurus service Finto slated to replace VESA. Due to VESA being so well-liked, Finto was developed in deep collaboration with the users. Regular usability tests were conducted during the development and in all aspects and features care was taken in order to not put any extra burden on the daily tasks of the annotators. Finto provides the functionalities that VESA did, but also offers various new features and possibilities. An example of an auxiliary feature is the new suggestions system streamlining the process of gathering suggestions for new concepts into Finto vocabularies. Furthermore, the modular design of Finto also allowed us to utilize open APIs in other systems to, e.g., provide direct links to content annotated using a given concept in a vocabulary. We present the lessons learned during the development of a replacement for an extremely well-loved core service of a national library. A particular focus will be on the collaboration with the users during the development process and the migration.
|15:10||END OF CONFERENCE|