PROGRAMME   (PDF)

Day 1 · 2012-11-26 Workshops

13:00 - 19:00

Introduction to Linked Open Data

About this workshop

This introductory workshop aims to introduce the fundamentals of Linked Data technologies on the one hand, and the basic legal issues of Open Data on the other. The RDF data model will be discussed, along with the concepts of dereferencable URIs and common vocabularies. The participants will continuously create and refine RDF documents to strengthen their knowledge on the topic. Linked Data tenets such as publishing RDF descriptions in a web environment and utilizing Content-Negotiation will be demonstrated and applied by the participants. Aggregating data from several sources and querying this data will showcase the advantages of publishing Linked Data, and RDF Schema will be introduced as an effective way of data integration. On a side track, Open Data principles will be introduced, discussed and applied to the content that is being created during the workshop.
Workshop outcomes: The participants will have created openly licenced RDF descriptions of themselves, published these to a webserver, aggregated the data into a triplestore against which SPARQL queries are then executed. The possibilities of using RDFs to integrate data across vocabularies have been explored.

Felix Ostrowski / Pascal Christoph / Adrian Pohl
Humboldt-Universität, Germany / hbz, Germany

13:00 - 19:00

Metadata Provenance

About this workshop

When metadata is distributed, combined, and enriched as Linked Data, the tracking of its provenance becomes a hard issue. Using data encumbered with licenses that require attribution of authorship may eventually become impractible as more and more data sets are aggegated - one of the main motivations for the call to open data under permissive licenses like CC0. Nonetheless, there are important scenarios where keeping track of provenance information becomes a necessity. A typical example is the enrichment of existing data with automatically obtained data, for instance as a result of automatic indexing. Ideally, the origins, conditions, rules and other means of production of every statement are known and can be used to put it into the right context.
Part 1 - Metadata Provenance in RDF: In RDF, the mere representation of provenance - i.e., statements about statements - is challenging. We explore the possibilities, from the unloved reification and other proposed alternative Linked Data practices through to named graphs and recent developments regarding the upcoming next version of RDF.
Part 2 - Interoperable Metadata Provenance: As with metadata itself, common vocabularies and data models are needed to express basic provenance information in an interoperable fashion. We investigate the PROV model that is currently developed by the W3C Provenance Working Group and compare it to Dublin Core as a representative of a flat, descriptive metadata schema.
We actively encourage participants to present their own use cases and open challenges at this workshop. Please contact the organizers for details.

Kai Eckert / Magnus Pfeffer
University of Mannheim / Stuttgart Media University (HdM)

13:00 - 19:00

PhD Workshop

About this workshop

The Linked Open Data approach provides a framework for the generation, publishing and sharing of information by means of semantic technologies. It plays a vital role in the realization of the Semantic Web at a global scale by publishing and interlinking diverse data sources on the Web. The access to a huge amount of Linked Data presents exciting opportunities for the next generation of Web-based applications, especially with regard to data hosted and provided by libraries. Facing use cases as depicted by the Library Linked Data Incubator Group, however there is still a need of Linked Data applications and best practice examples. This PhD workshop will provide an excellent opportunity both for the beginner as well as the senior PhD student to present his or her ideas and receive feedback by experienced researchers and other PhD students working in research areas related to Linked Data based infrastructures and applications in libraries.
Call for Participation

Atif Latif / Timo Borst
ZBW - Leibniz Information Centre for Economics

Day 2 · 2012-11-27 Conference

08:30 - 09:15

Registration

09:15 - 10:15

Opening

Silke Schomburg / Klaus Tochtermann
North Rhine-Westphalian Library Service Center (hbz) / ZBW - Leibniz Information Centre for Economics

Keynote: LODLAM - Fostering Global Collaboration and Community

Jon Voss
LOD-LAM & Historypin

Abstract

What happens when hundreds of thousands of archival photos are shared with open licenses, then mashed up with geolocation data and current photos? Or when app developers can freely utilize information and images from millions of books? Like the web of documents that became the World Wide Web, a web of data is the goal of Linked Open Data. Jon will discuss how the cultural, technological, and legal environment is enabling a growing ecosystem of open historical data and cross-pollination across business sectors, as particularly illustrated by the International Linked Open Data in Libraries, Archives and Museums Summit. We'll explore evolving examples of Linked Open Data in several institutions, and how a global community within libraries, archives and museums is beginning to play a critical role in the evolution of the Web.

10:15 - 10:45

Coffee Break

10:45 - 12:30

The Library Catalogue as Linked Open Data : How to Do It and What to Do with It

Asgeir Rekkavik / Benjamin Rokseth
Deichmanske bibliotek, Oslo Public Library

Abstract

Oslo Public Library has developed an open source toolkit for producing an RDF representation of the library catalogue and its authority files. In addition to plain conversion of MARC bibliographic data into RDF, the current implementation includes SPARQL methods for FRBR clustering of the library collection, as well as enriching catalogue data with external content, such as cover images and book reviews, from various APIs and Linked Open Data sources.
During the first half of 2012, Oslo Public Library has run two simultaneous projects that demonstrate and take advantage of some of the possibilities an enriched RDF representation of the library catalogue might provide. This has resulted in two working open source service prototypes:
* Book Reviews. Librarians produce lots of book reviews, a valuable resource that regrettably is often poorly exploited. The Book Reviews prototype is a web application, designed to collect, register and distribute library produced book reviews, as well as linking them to the library catalogue and making them easily accessible for use in other web applications.
* Active Shelves. The active shelf is a physical touch-screen device that makes use of open source software, RFID technology, RDF data and external web service APIs to provide information about any book a library patron is curious to know more about, as well as suggesting other titles that might spur the user's interest.

Linked Open Library Data in Practice: Lessons Learned and Opportunities for data.bnf.fr

Romain Wenz
Bibliothèque nationale de France

Abstract

The presentation will give a report on “data.bnf.fr”, the Linked Open Data project of the National Library of France (Bibliothèque nationale de France - BnF). The first purpose of data.bnf.fr is to gather data from several sources: library catalogue, archives and manuscripts, digital collections. Data.bnf.fr has reached a stage of maturity, with millions of resources. Some issues had to be solved in the course of building the project. For implementing the FRBR principles, data matching had to be done programmatically. The alignments rely on roles of authors, so as to be able to match resources at the proper FRBR level. The presentation will explain the general modelling principles. With the “Open data” initiatives led by the French government, it is possible to use an Open Licence. Once data is linked and open, what comes next? First, changes in general use, since people can now find BnF’s resources directly on the Web. Secondly, the data is being used by broader communities. For instance for small public libraries, new procedures are being explored for re-use of the dataset in local catalogues. In the long term, Semantic Web technologies could set a standard for library data, if we keep them linked and open.

Wikidata - the Wikipedia of Linked Open Data

Daniel Kinzler
Wikimedia Deutschland

Abstract

Wikidata is a new project by Wikimedia Deutschland with the goal to create a data repository for Wikipedia and the world. It aims to be Wikimedia Commons for data, allowing Wikipedia editors to put factual information like the population of a city in one central database, instead of having to maintain it as text in dozens or hundreds of languages.This presentation gives an overview of the planned software architecture of Wikidata and how it ties in Wikipedia. We want to explain how we are going to address the many technical and conceptual challenges that arise from the complexity and scale of the data. Among other things, we will describe how data records are transcluded between wikis, and how changes are recorded and propagated throughout the system. Another important focus is the data model Wikidata will use to represent the diversity of knowledge throughout the world.

12:30 - 14:00

Lunch

14:00 - 15:30

Culturegraph Authorities

Markus Geipel
German National Library (DNB)

Abstract

Authority files play an increasing role in the fields of Semantic Web and semantic search as they provide reliable identifiers for entities, which otherwise would be tedious to identify and disambiguate. The German Universal Authority File, “Gemeinsame Normdatei” (GND) comprises more than entities such as person names, conferences or events, corporate bodies, places or geographic names, subject headings, and works. It is maintained and used by a variety of cultural heritage organizations and thus has the potential to be an entry point for semantic search beyond institutional boundaries. However, as links are still unidirectional, pointing from catalogue data to the GND but not backwards, we currently face a navigational dead end. To add the missing back links is the objective of Culturegraph Authorities. By analyzing the main German library catalogues as well as crosslink data in BEACON files, the respective back links are created. By making the results available online, Culturegraph Authorities brings closer together a vast variety of cultural heritage data, while increasing the visibility of all participating institutions.

Enrichment of Library Authority Files by Linked Open Data Sources

Gerd Zechmeister / Helmut Nagy
Semantic Web Company GmbH

Abstract

The Linked Data initiative enables institutions to publish and share their data following open standards and to merge, interlink and reuse data provided by numerous sources of the Linked Open Data (LOD) cloud. It might be the next paradigm shift in the library world: Catalogues, for instance, are predestined to make use of publicly available data to enrich (add pictures, geodata etc.) or annotate (add reviews, comments etc.) the content. Dynamically generated mashups of cross-media information would create an added value to library systems and its records. The LOD community in return strongly benefits from these sources as librarians revise and annotate resources, thus ensuring the quality of data. The German National Library recently made a big step forward in their ambitions of providing "cooperatively maintained German authority files for persons, corporate bodies and subject headings" as LOD DNB. The subject headings authority files are expressed with SKOS, an open WC3 specification to build thesauri. We evaluated the dataset and identified potential LOD sources to be linked with the authority files. We will show how the technology stack from the LOD2 project can be used to create links and to enrich the subject headings with LOD sources, and demonstrate an example application that presents a mashup of authority files data and matching LOD sources. An outlook to further possible use cases and application scenarios will round off the presentation.

First Insights into the Library Track of the OAEI

Dominique Ritze
Mannheim University Library

Abstract

The OAEI is the Ontology Alignment Evaluation Initiative, an annual campaign to evaluate ontology matching systems. To compare different matching systems, various test cases with different purposes are provided by several groups. To see how well current matching systems are able to deal with data especially occurring in libraries, we submitted a library track to the OAEI. In our test case, the task is to match the Thesaurus for Economics (STW) against the Thesaurus for the Social Sciences (TheSoz). As a result, the ontology matching systems create a list of cross-concordances. Since we are in possession of cross-concordances which have been manually generated several years ago, we are able to evaluate the results of the ontology matching systems. In this presentation, we report from the OAEI ontology matching workshop held at the ISWC 2012 and show how state-of-the-art matchers perform on this specific task and whether such systems can be applied in the bibliographic domain to automatically discover cross-concordances. Knowing these concordances is very important since the subject authority data maintained by libraries are the backbone of the Semantic Web and are able to support the semantic search.

15:30 - 16:00

Coffee Break

16:00 - 17:30

Simple Semantic Enrichment of Scientific Papers in Social Sciences

Alexander Garcia / Philipp Mayr / Leyla Jael Garcia
Florida State University / GESIS - Leibniz Institute for the Social Sciences / Universität der Bundeswehr, E-Business and Web Science Research Group

Abstract

In this paper we present a simple methodology for enriching scholarly papers in the social sciences domain. We are making use of existing technology; resulting from our approach we obtain a publication readily available for the Web of Data. Our scenario is that posed by the journal Methods, Data, Analysis (henceforth MDA); an archive of PDF files, rich bibliographic metadata available, a publication workflow mostly based on word files, no navigation tools tailored for this journal, no previous structure in the content of the journal - authors are free to organize the document as they best consider. As there is a pre-existing archive the solution could not simply consider future publications. Our approach is based on the orchestration of various ontologies; for instance, DOCO, BiRO, CiTO, DC, FOAF, and the Annotation Ontology (AO). We are using the AO for structuring the markup of domain specific concepts as well as for nanopublications derived from community-based annotation – focusing on hypothesis, results, etc. The structure of the document is represented by DOCO; in addition, we are also using BiRO and CiTO for bibliographic references and citation typing. Our model facilitates the exploration of content as well as the formulation of semantic queries expressed in SPARQL. In addition, we are also modelling the relationship between the data and the document. External resources relevant for the document are aggregated as part of the end user environment. The workflow for the pre-existing archive of PDFs starts by converting the PDF to RDF; supporting this task we developed a web service for the “fully automated PDF-to-XML” process.

Discovering Links for Metadata Enrichment on Computer Science Papers

Johann Schaible / Philipp Mayr
GESIS - Leibniz Institute for the Social Sciences

Abstract

Libraries, which have collections of scientific papers, most of the times show only basic information comprising title, authors, publication date, and an abstract of a paper. The user can utilize this data to manually look up more information about the paper on the web. This is time-consuming though and simply not done by many users. In this paper we demonstrate a 3-step approach using semantic web technology, which consists of integrating Linked Data in order to enlarge metadata of a paper with links to external data sources. For this we use a link discovery tool, in our case Silk. Our initial record for each computer science paper consists of its title, its authors, and its publication date. This information is represented in simple RDF using the dcterms namespace, i.e. dcterms:title, dcterms:creator, and dcterms:date. As external data sources with the highest expectations of finding additional data we identified the DBLP computer science bibliography, the Association for Computing Machinery (ACM), and the Semantic Web Conference Corpus. In the first step we build a connection between our records and each of these data sources using Silk. By defining linkage rules e.g. link dcterms:title in data set 'a' and rdfs:label in data set 'b' and a matching scenario e.g. owl:sameAs, in the second step Silk looks up equal values in both data sources. If it finds an equal value, a link between data sets will be generated. All generated links are stored as n-triples. In the third step we manually add these links to our data set, enriching a paper’s metadata. We also illustrate our experiences with SILK regarding correctness and the handling of RDF dumps and SPARQL Endpoints.

Building a High Performance Environment for RDF Publishing

Pascal Christoph
hbz, Germany

Abstract

Linked Open Data can be published in different forms (flat files, RDFa, triple store). In lobid we used an approach centered around a triple store (4store). Different RDF-serializations of resources (turtle, RDF/XML, RDFa-enriched HTML presentations etc.) are generated through SPARQL queries and provided via content negotitation. Running a triple store and providing a SPARQL endpoint allows powerful queries to make the best out of LOD. But then, these queries often take their time, especially if you have a large data pool. While it is possible doing string searches via SPARQL, we found that it is not really performant dealing with lots of data, and language processing is not supported at all. Thus, we are running a search engine (elasticsearch) alongside the triple store to enable fast string searches. This talk is about indexing of data into the triple store and into the search engine in parallel to reap even more benefits from elasticsearch. As elasticsearch indexes data using JSON, this approach makes use of JSON-LD, a JSON serialization for RDF. elasticsearch comes with many features that we are looking for using with our LOD anyway - like high availability, distributed index, near-realtime updates, versioning and fast geo-searching. The talk will highlight how these benefits can be used for LOD.

Day 3 · 2012-11-28 Conference - Towards an international LOD library ecology

09:00 - 10:15

Keynote: Linking Data, Linking People

Emmanuelle Bermès
Centre Pompidou

Abstract

Creating a linked graph of open library data means that libraries have to go global. Global beyond the barriers of countries, languages and continents. Global beyond the challenges of formats and software systems. Global beyond the differences of intellectual property rights and licensing traditions. Global beyond the diversity of domains and data models. Library already are global in many ways, and they have been for many years. International cooperation, shared cataloguing strategies, standardization have been promoted in libraries for decades. So what's new with Linked Data ? What does it mean for libraries to change their old cooperation models and embrace the Linked Data movement ? Such a paradigm change requires not only technology and standards, but also cooperation between institutions and people. Community building is also an important aspect of the creation of an international linked library graph.

Old Silos, New Silos, No Silos - From Redundancy to Aggregation or Distribution?

Lukas Koster
Library of the University of Amsterdam

Abstract

Traditionally library systems/catalogues have been isolated local systems, thereby creating an enormous redundancy in both data and metadata backends and search frontends. Even in shared cataloguing environments the local subsystems are the real production environments. In recent years we have seen two separate developments that claim to solve the redundancy problem: aggregated (meta)data stores and distributed Linked Open Data networks. Examples of aggregation: content aggregators and publishers’ proprietary databases, discovery layer global indexes, Europeana, Worldcat, etc. An earlier form of distribution, federated search/metasearch, is now gradually abandoned. A number of its inherent problems (performance, relevance ranking, network) are solved by aggregation, but others are not. Do the new silos of aggregation with their limited content solve all our problems, or is a completely open global linked networked model better? The pros and cons of both general models and of hybrid, blended options will be considered. We will also discuss the practical conditions, implications and feasibility of the models, looking at licensing, commercial interests, trust, authority, etc. Last but not least: Can and will libraries have a role in the new data universe that is outside their direct control, and what can these roles look like?

10:15 - 10:45

Coffee Break

10:45 - 12:30

Status Quo and Limitations of Library Linked Data

Asunción Gómez-Pérez / Phillip Cimiano / Daniel Vila-Suero
OEG-UPM

Abstract

In recent years, many libraries, museums and archives have started to release data as Linked Data (LD). The benefits of publishing library data as LD have recently been summarized by the W3C Incubator Group on Linked Library Data (W3C LLD XG). Several national libraries have started to publish metadata as LD. However, a number of limitations still exist which prevent from bringing the Library Linked Data (LLD) paradigm to its full potential: - No cross-lingual linking of bibliographic entities showing which books are translations of others, or if a motion picture is based upon a specific play. - High manual efforts required in linking vocabularies and authority files across languages and thus very labor-intensive and costly. - No semantic enrichment of the content (textual, sound and visual) and linkage of this content to the URIs that represent the real-world entities. - Lack of integration of temporal, geographical, provenance and IPR vocabularies into widely accepted library vocabularies. - Lack of linking of cultural objects across media types and formats. - Lack of the necessary LLD infrastructure allowing libraries to integrate, exchange and share content across libraries. Publishing metadata as LD is a first step towards overcoming the above mentioned limitations, but additional tools and services and lifecycle support for librarians need to be provided. As a first step in this direction, in this talk we will present an analysis of the limitations involved in the publication of LLD and suggest a number of challenges that need to be overcome to bring the LLD approach to its full potential.

Towards an Infrastructure for the Synchronisation of Metadata in Libraries

Christoph Böhme
German National Library (DNB)

Abstract

With LOD information is represented in a giant distributed graph. This graph is constantly changing and evolving but to date no infrastructure has been established for continuously propagating and tracking the changes of nodes and edges in this graph. In libraries, metadata is traditionally distributed and synchronised using protocols such as Z39.50, SRU or OAI-MPH to poll individual datasets for changes. While this works well when only a small number of datasets exchanges data infrequently, it does not when synchronisation happens continuously and at the scale of the LOD graph. However, users expect that data is always up-to-date. For example, a search index which includes external information about the latest headlines is expected to contain not only last week’s headlines but also today’s. Constant polling for changes can constitute a major performance issue in such scenarios. In our contribution we discuss different synchronisation patterns for library metadata and the requirements for a synchronisation infrastructure arising from them. Of particular importance in this discussion is the fact that subscribers are not always interested in all changes to a dataset but only in those affecting a small set of selected records. Related to this is the aspect that changes of library metadata most of the time only affect single records but sometimes large updates touch whole data sets producing change sets which can consist of millions of records at a time. A synchronisation system must be able to handle these different sized payloads and help participating systems to cope with large change sets. We review existing solutions and outline a future solution for the library domain.

The Library of Congress's Bibliographic Framework Initiative

Kevin Ford
Library of Congress

Abstract

This presentation will provide a general update about the Library of Congress's Bibliographic Framework Initiative, including a short synopsis of its historical import, information about the attractiveness of Linked Data, and details about the work – models, tools, and findings - completed to date about transitioning from a MARC-based environment to a new bibliographic ecology. The general update about the Bibliographic Framework Initiative will include work and progress since the Library's update at the American Library Association Annual Conference in Anaheim in June. The first phase of the Initiative is complete. Among many objectives set for the initial phase, the most relevant outcome is the presentation of a draft model for community appraisal and on which to further build. Because an open process is desirable and expected, the community will already have seen the draft model, but this forum will provide an opportunity to go into greater detail, present the outcomes as a cohesive whole, and explore the ramifications and future directions stemming from the initial stage. Linked Data methods and strategies are proving to provide a very consistent yet flexible means to communicate data, which has always been one of the main aims of the MARC communication formats.

12:30 - 14:00

Lunch

14:00 - 15:30

Statistical Research Data on the Semantic Web

Daniel Bahls
ZBW - Leibniz Information Centre for Economics

Abstract

At present, efforts are being made to pick up research data as bibliographic artefacts for re-use, transparency and citation. In the field of economics, a large amount of research is based on empirical data, which is often combined from several sources such as data centres, affiliated institutes or self-conducted surveys. A good deal of the data used in empirical research is protected or simply cannot be shared with third parties due to data usage rights, partly because some of the providers are commercial. As a consequence, a researcher is often not allowed to upload the entire data set as a whole to any independent data repository. Thus, we investigate techniques for fine-grained referencing that enable the exact reconstruction of a researcher's data set and suit an environment of distributed data sources with access restrictions and different curatorial versions of data. As it motivates the application of Semantic Web technologies, we examine the emerging RDF Data Cube Vocabulary which integrates the SDMX standard for a harmonized representation of statistical data. In addition to statistical data resources, empirical research data sets in economics also comprise scripts for data processing. To support transparency to the highest level, we extend the scope of our research and elaborate a generating model, which enables the reproduction of analyses and results for a given scientific publication. All in all, we aim to lay the grounds for machine-processible descriptions of conducted empirical research to enable the automatic reconstruction of individual statistical data sets and reproduction of results. An overview of the work at its latest stage will be presented in this talk.

Encoding Patron Information in RDF

Jakob Voß
Common Library Network (GBV VZG)

Abstract

Current efforts to publish library data as Linked Open Data (LOD) have focused on bibliographic data, authority files and organizations. It took some time to also publish information about single items and their current availability in particular libraries. Applications that make use of these data sets are just being created. Patron information, consisting of links between library patrons and documents held by libraries, however, is not published yet. This gap exists with good cause because of privacy concerns but also because it is difficult to get patron information out of legacy systems and because there is no encoding in RDF yet. The proposed research report will present an encoding of patron information in RDF that was created for two applications in the GBV library network. The encoding aligns with existing ontologies, namely the Document Availability Information Ontology (DAIA) and Semantically-Interlinked Online Communities (SIOC). It will be shown how patron information from library systems is extracted from (legacy) library systems and how the information is encoded in RDF, among other formats. The patron information can then be combined with other sources, while respecting patron privacy.

Lightning Talks

--

Abstract

Take the chance to present a new project, an elegant problem solution or an open research question for which you look for combatants - in max. 5 minutes.

15:30 - 16:00

Coffee Break

16:00 - 17:30

Panel Discussion:
Standards, Services and Tools for Building a LOD Library Ecology - What We Got & What We Need

Emmanuelle Bermès (Chair) / Christoph Böhme / Kevin Ford / Asunción Gómez-Pérez / Daniel Kinzler / Romain Wenz
 

 

Please note that all information may be subject to change.

organized by:

LOCATION

Bürgerhaus Stollwerck
Dreikönigenstr. 23
50678 Cologne

CONTACT

Adrian Pohl
hbz
Tel. +49-(0)221-40075235
E-Mail: swib(at)hbz-nrw.de

Joachim Neubert
ZBW
Tel. +49-(0)40-42834462
E-Mail: j.neubert(at)zbw.eu

Twitter: #swib12