|
13:00 - 19:00 |
Introduction to Linked Open DataAbout this workshopThis introductory workshop aims to introduce the fundamentals of Linked Data technologies on the one hand, and the basic legal issues of Open Data on the other. The RDF data model will be discussed, along with the concepts of dereferencable URIs and common vocabularies. The participants will continuously create and refine RDF documents to strengthen their knowledge on the topic. Linked Data tenets such as publishing RDF descriptions in a web environment and utilizing Content-Negotiation will be demonstrated and applied by the participants. Aggregating data from several sources and querying this data will showcase the advantages of publishing Linked Data, and RDF Schema will be introduced as an effective way of data integration. On a side track, Open Data principles will be introduced, discussed and applied to the content that is being created during the workshop.
Felix Ostrowski / Pascal Christoph / Adrian Pohl |
|
13:00 - 19:00 |
Metadata ProvenanceAbout this workshopWhen metadata is distributed, combined, and enriched as Linked Data, the tracking of its provenance becomes a hard issue. Using data encumbered with licenses that require attribution of authorship may eventually become impractible as more and more data sets are aggegated - one of the main motivations for the call to open data under permissive licenses like CC0. Nonetheless, there are important scenarios where keeping track of provenance information becomes a necessity. A typical example is the enrichment of existing data with automatically obtained data, for instance as a result of automatic indexing. Ideally, the origins, conditions, rules and other means of production of every statement are known and can be used to put it into the right context. Kai Eckert / Magnus Pfeffer |
|
13:00 - 19:00 |
PhD WorkshopAbout this workshopThe Linked Open Data approach provides a framework for the generation, publishing and sharing of information by means of semantic technologies. It plays a vital role in the realization of the Semantic Web at a global scale by publishing and interlinking diverse data sources on the Web. The access to a huge amount of Linked Data presents exciting opportunities for the next generation of Web-based applications, especially with regard to data hosted and provided by libraries. Facing use cases as depicted by the Library Linked Data Incubator Group, however there is still a need of Linked Data applications and best practice examples.
This PhD workshop will provide an excellent opportunity both for the beginner as well as the senior PhD student to present his or her ideas and receive feedback by experienced researchers and other PhD students working in research areas related to Linked Data based infrastructures and applications in libraries.
Atif Latif / Timo Borst |
|
08:30 - 09:15 |
Registration |
|
09:15 - 10:15 |
OpeningSilke Schomburg / Klaus Tochtermann Keynote: LODLAM - Fostering Global Collaboration and CommunityJon Voss AbstractWhat happens when hundreds of thousands of archival photos are shared with open licenses, then mashed up with geolocation data and current photos? Or when app developers can freely utilize information and images from millions of books? Like the web of documents that became the World Wide Web, a web of data is the goal of Linked Open Data. Jon will discuss how the cultural, technological, and legal environment is enabling a growing ecosystem of open historical data and cross-pollination across business sectors, as particularly illustrated by the International Linked Open Data in Libraries, Archives and Museums Summit. We'll explore evolving examples of Linked Open Data in several institutions, and how a global community within libraries, archives and museums is beginning to play a critical role in the evolution of the Web. |
|
10:15 - 10:45 |
Coffee Break |
|
10:45 - 12:30 |
The Library Catalogue as Linked Open Data : How to Do It and What to Do with ItAsgeir Rekkavik / Benjamin Rokseth AbstractOslo Public Library has developed an open source toolkit for producing an RDF representation of the library catalogue and its authority files. In addition to plain conversion of MARC bibliographic data into RDF, the current implementation includes SPARQL methods for FRBR clustering of the library collection, as well as enriching catalogue data with external content, such as cover images and book reviews, from various APIs and Linked Open Data sources.
Linked Open Library Data in Practice: Lessons Learned and Opportunities for data.bnf.frRomain Wenz AbstractThe presentation will give a report on “data.bnf.fr”, the Linked Open Data project of the National Library of France (Bibliothèque nationale de France - BnF). The first purpose of data.bnf.fr is to gather data from several sources: library catalogue, archives and manuscripts, digital collections. Data.bnf.fr has reached a stage of maturity, with millions of resources. Some issues had to be solved in the course of building the project. For implementing the FRBR principles, data matching had to be done programmatically. The alignments rely on roles of authors, so as to be able to match resources at the proper FRBR level. The presentation will explain the general modelling principles. With the “Open data” initiatives led by the French government, it is possible to use an Open Licence. Once data is linked and open, what comes next? First, changes in general use, since people can now find BnF’s resources directly on the Web. Secondly, the data is being used by broader communities. For instance for small public libraries, new procedures are being explored for re-use of the dataset in local catalogues. In the long term, Semantic Web technologies could set a standard for library data, if we keep them linked and open. Wikidata - the Wikipedia of Linked Open DataDaniel Kinzler AbstractWikidata is a new project by Wikimedia Deutschland with the goal to create a data repository for Wikipedia and the world. It aims to be Wikimedia Commons for data, allowing Wikipedia editors to put factual information like the population of a city in one central database, instead of having to maintain it as text in dozens or hundreds of languages.This presentation gives an overview of the planned software architecture of Wikidata and how it ties in Wikipedia. We want to explain how we are going to address the many technical and conceptual challenges that arise from the complexity and scale of the data. Among other things, we will describe how data records are transcluded between wikis, and how changes are recorded and propagated throughout the system. Another important focus is the data model Wikidata will use to represent the diversity of knowledge throughout the world. |
|
12:30 - 14:00 |
Lunch |
|
14:00 - 15:30 |
Culturegraph AuthoritiesMarkus Geipel AbstractAuthority files play an increasing role in the fields of Semantic Web and semantic search as they provide reliable identifiers for entities, which otherwise would be tedious to identify and disambiguate. The German Universal Authority File, “Gemeinsame Normdatei” (GND) comprises more than entities such as person names, conferences or events, corporate bodies, places or geographic names, subject headings, and works. It is maintained and used by a variety of cultural heritage organizations and thus has the potential to be an entry point for semantic search beyond institutional boundaries. However, as links are still unidirectional, pointing from catalogue data to the GND but not backwards, we currently face a navigational dead end. To add the missing back links is the objective of Culturegraph Authorities. By analyzing the main German library catalogues as well as crosslink data in BEACON files, the respective back links are created. By making the results available online, Culturegraph Authorities brings closer together a vast variety of cultural heritage data, while increasing the visibility of all participating institutions. Enrichment of Library Authority Files by Linked Open Data SourcesGerd Zechmeister / Helmut Nagy AbstractThe Linked Data initiative enables institutions to publish and share their data following open standards and to merge, interlink and reuse data provided by numerous sources of the Linked Open Data (LOD) cloud. It might be the next paradigm shift in the library world: Catalogues, for instance, are predestined to make use of publicly available data to enrich (add pictures, geodata etc.) or annotate (add reviews, comments etc.) the content. Dynamically generated mashups of cross-media information would create an added value to library systems and its records. The LOD community in return strongly benefits from these sources as librarians revise and annotate resources, thus ensuring the quality of data. The German National Library recently made a big step forward in their ambitions of providing "cooperatively maintained German authority files for persons, corporate bodies and subject headings" as LOD DNB. The subject headings authority files are expressed with SKOS, an open WC3 specification to build thesauri. We evaluated the dataset and identified potential LOD sources to be linked with the authority files. We will show how the technology stack from the LOD2 project can be used to create links and to enrich the subject headings with LOD sources, and demonstrate an example application that presents a mashup of authority files data and matching LOD sources. An outlook to further possible use cases and application scenarios will round off the presentation. First Insights into the Library Track of the OAEIDominique Ritze AbstractThe OAEI is the Ontology Alignment Evaluation Initiative, an annual campaign to evaluate ontology matching systems. To compare different matching systems, various test cases with different purposes are provided by several groups. To see how well current matching systems are able to deal with data especially occurring in libraries, we submitted a library track to the OAEI. In our test case, the task is to match the Thesaurus for Economics (STW) against the Thesaurus for the Social Sciences (TheSoz). As a result, the ontology matching systems create a list of cross-concordances. Since we are in possession of cross-concordances which have been manually generated several years ago, we are able to evaluate the results of the ontology matching systems. In this presentation, we report from the OAEI ontology matching workshop held at the ISWC 2012 and show how state-of-the-art matchers perform on this specific task and whether such systems can be applied in the bibliographic domain to automatically discover cross-concordances. Knowing these concordances is very important since the subject authority data maintained by libraries are the backbone of the Semantic Web and are able to support the semantic search. |
|
15:30 - 16:00 |
Coffee Break |
|
16:00 - 17:30 |
Simple Semantic Enrichment of Scientific Papers in Social SciencesAlexander Garcia / Philipp Mayr / Leyla Jael Garcia AbstractIn this paper we present a simple methodology for enriching scholarly papers in the social sciences domain. We are making use of existing technology; resulting from our approach we obtain a publication readily available for the Web of Data. Our scenario is that posed by the journal Methods, Data, Analysis (henceforth MDA); an archive of PDF files, rich bibliographic metadata available, a publication workflow mostly based on word files, no navigation tools tailored for this journal, no previous structure in the content of the journal - authors are free to organize the document as they best consider. As there is a pre-existing archive the solution could not simply consider future publications. Our approach is based on the orchestration of various ontologies; for instance, DOCO, BiRO, CiTO, DC, FOAF, and the Annotation Ontology (AO). We are using the AO for structuring the markup of domain specific concepts as well as for nanopublications derived from community-based annotation – focusing on hypothesis, results, etc. The structure of the document is represented by DOCO; in addition, we are also using BiRO and CiTO for bibliographic references and citation typing. Our model facilitates the exploration of content as well as the formulation of semantic queries expressed in SPARQL. In addition, we are also modelling the relationship between the data and the document. External resources relevant for the document are aggregated as part of the end user environment. The workflow for the pre-existing archive of PDFs starts by converting the PDF to RDF; supporting this task we developed a web service for the “fully automated PDF-to-XML” process. Discovering Links for Metadata Enrichment on Computer Science PapersJohann Schaible / Philipp Mayr AbstractLibraries, which have collections of scientific papers, most of the times show only basic information comprising title, authors, publication date, and an abstract of a paper. The user can utilize this data to manually look up more information about the paper on the web. This is time-consuming though and simply not done by many users. In this paper we demonstrate a 3-step approach using semantic web technology, which consists of integrating Linked Data in order to enlarge metadata of a paper with links to external data sources. For this we use a link discovery tool, in our case Silk. Our initial record for each computer science paper consists of its title, its authors, and its publication date. This information is represented in simple RDF using the dcterms namespace, i.e. dcterms:title, dcterms:creator, and dcterms:date. As external data sources with the highest expectations of finding additional data we identified the DBLP computer science bibliography, the Association for Computing Machinery (ACM), and the Semantic Web Conference Corpus. In the first step we build a connection between our records and each of these data sources using Silk. By defining linkage rules e.g. link dcterms:title in data set 'a' and rdfs:label in data set 'b' and a matching scenario e.g. owl:sameAs, in the second step Silk looks up equal values in both data sources. If it finds an equal value, a link between data sets will be generated. All generated links are stored as n-triples. In the third step we manually add these links to our data set, enriching a paper’s metadata. We also illustrate our experiences with SILK regarding correctness and the handling of RDF dumps and SPARQL Endpoints. Building a High Performance Environment for RDF PublishingPascal Christoph AbstractLinked Open Data can be published in different forms (flat files, RDFa, triple store). In lobid we used an approach centered around a triple store (4store). Different RDF-serializations of resources (turtle, RDF/XML, RDFa-enriched HTML presentations etc.) are generated through SPARQL queries and provided via content negotitation. Running a triple store and providing a SPARQL endpoint allows powerful queries to make the best out of LOD. But then, these queries often take their time, especially if you have a large data pool. While it is possible doing string searches via SPARQL, we found that it is not really performant dealing with lots of data, and language processing is not supported at all. Thus, we are running a search engine (elasticsearch) alongside the triple store to enable fast string searches. This talk is about indexing of data into the triple store and into the search engine in parallel to reap even more benefits from elasticsearch. As elasticsearch indexes data using JSON, this approach makes use of JSON-LD, a JSON serialization for RDF. elasticsearch comes with many features that we are looking for using with our LOD anyway - like high availability, distributed index, near-realtime updates, versioning and fast geo-searching. The talk will highlight how these benefits can be used for LOD. |
|
09:00 - 10:15 |
Keynote: Linking Data, Linking PeopleEmmanuelle Bermès AbstractCreating a linked graph of open library data means that libraries have to go global. Global beyond the barriers of countries, languages and continents. Global beyond the challenges of formats and software systems. Global beyond the differences of intellectual property rights and licensing traditions. Global beyond the diversity of domains and data models. Library already are global in many ways, and they have been for many years. International cooperation, shared cataloguing strategies, standardization have been promoted in libraries for decades. So what's new with Linked Data ? What does it mean for libraries to change their old cooperation models and embrace the Linked Data movement ? Such a paradigm change requires not only technology and standards, but also cooperation between institutions and people. Community building is also an important aspect of the creation of an international linked library graph. Old Silos, New Silos, No Silos - From Redundancy to Aggregation or Distribution?Lukas Koster AbstractTraditionally library systems/catalogues have been isolated local systems, thereby creating an enormous redundancy in both data and metadata backends and search frontends. Even in shared cataloguing environments the local subsystems are the real production environments. In recent years we have seen two separate developments that claim to solve the redundancy problem: aggregated (meta)data stores and distributed Linked Open Data networks. Examples of aggregation: content aggregators and publishers’ proprietary databases, discovery layer global indexes, Europeana, Worldcat, etc. An earlier form of distribution, federated search/metasearch, is now gradually abandoned. A number of its inherent problems (performance, relevance ranking, network) are solved by aggregation, but others are not. Do the new silos of aggregation with their limited content solve all our problems, or is a completely open global linked networked model better? The pros and cons of both general models and of hybrid, blended options will be considered. We will also discuss the practical conditions, implications and feasibility of the models, looking at licensing, commercial interests, trust, authority, etc. Last but not least: Can and will libraries have a role in the new data universe that is outside their direct control, and what can these roles look like? |
|
10:15 - 10:45 |
Coffee Break |
|
10:45 - 12:30 |
Status Quo and Limitations of Library Linked DataAsunción Gómez-Pérez / Phillip Cimiano / Daniel Vila-Suero AbstractIn recent years, many libraries, museums and archives have started to release data as Linked Data (LD). The benefits of publishing library data as LD have recently been summarized by the W3C Incubator Group on Linked Library Data (W3C LLD XG). Several national libraries have started to publish metadata as LD. However, a number of limitations still exist which prevent from bringing the Library Linked Data (LLD) paradigm to its full potential: - No cross-lingual linking of bibliographic entities showing which books are translations of others, or if a motion picture is based upon a specific play. - High manual efforts required in linking vocabularies and authority files across languages and thus very labor-intensive and costly. - No semantic enrichment of the content (textual, sound and visual) and linkage of this content to the URIs that represent the real-world entities. - Lack of integration of temporal, geographical, provenance and IPR vocabularies into widely accepted library vocabularies. - Lack of linking of cultural objects across media types and formats. - Lack of the necessary LLD infrastructure allowing libraries to integrate, exchange and share content across libraries. Publishing metadata as LD is a first step towards overcoming the above mentioned limitations, but additional tools and services and lifecycle support for librarians need to be provided. As a first step in this direction, in this talk we will present an analysis of the limitations involved in the publication of LLD and suggest a number of challenges that need to be overcome to bring the LLD approach to its full potential. Towards an Infrastructure for the Synchronisation of Metadata in LibrariesChristoph Böhme AbstractWith LOD information is represented in a giant distributed graph. This graph is constantly changing and evolving but to date no infrastructure has been established for continuously propagating and tracking the changes of nodes and edges in this graph. In libraries, metadata is traditionally distributed and synchronised using protocols such as Z39.50, SRU or OAI-MPH to poll individual datasets for changes. While this works well when only a small number of datasets exchanges data infrequently, it does not when synchronisation happens continuously and at the scale of the LOD graph. However, users expect that data is always up-to-date. For example, a search index which includes external information about the latest headlines is expected to contain not only last week’s headlines but also today’s. Constant polling for changes can constitute a major performance issue in such scenarios. In our contribution we discuss different synchronisation patterns for library metadata and the requirements for a synchronisation infrastructure arising from them. Of particular importance in this discussion is the fact that subscribers are not always interested in all changes to a dataset but only in those affecting a small set of selected records. Related to this is the aspect that changes of library metadata most of the time only affect single records but sometimes large updates touch whole data sets producing change sets which can consist of millions of records at a time. A synchronisation system must be able to handle these different sized payloads and help participating systems to cope with large change sets. We review existing solutions and outline a future solution for the library domain. The Library of Congress's Bibliographic Framework InitiativeKevin Ford AbstractThis presentation will provide a general update about the Library of Congress's Bibliographic Framework Initiative, including a short synopsis of its historical import, information about the attractiveness of Linked Data, and details about the work – models, tools, and findings - completed to date about transitioning from a MARC-based environment to a new bibliographic ecology. The general update about the Bibliographic Framework Initiative will include work and progress since the Library's update at the American Library Association Annual Conference in Anaheim in June. The first phase of the Initiative is complete. Among many objectives set for the initial phase, the most relevant outcome is the presentation of a draft model for community appraisal and on which to further build. Because an open process is desirable and expected, the community will already have seen the draft model, but this forum will provide an opportunity to go into greater detail, present the outcomes as a cohesive whole, and explore the ramifications and future directions stemming from the initial stage. Linked Data methods and strategies are proving to provide a very consistent yet flexible means to communicate data, which has always been one of the main aims of the MARC communication formats. |
|
12:30 - 14:00 |
Lunch |
|
14:00 - 15:30 |
Statistical Research Data on the Semantic WebDaniel Bahls AbstractAt present, efforts are being made to pick up research data as bibliographic artefacts for re-use, transparency and citation. In the field of economics, a large amount of research is based on empirical data, which is often combined from several sources such as data centres, affiliated institutes or self-conducted surveys. A good deal of the data used in empirical research is protected or simply cannot be shared with third parties due to data usage rights, partly because some of the providers are commercial. As a consequence, a researcher is often not allowed to upload the entire data set as a whole to any independent data repository. Thus, we investigate techniques for fine-grained referencing that enable the exact reconstruction of a researcher's data set and suit an environment of distributed data sources with access restrictions and different curatorial versions of data. As it motivates the application of Semantic Web technologies, we examine the emerging RDF Data Cube Vocabulary which integrates the SDMX standard for a harmonized representation of statistical data. In addition to statistical data resources, empirical research data sets in economics also comprise scripts for data processing. To support transparency to the highest level, we extend the scope of our research and elaborate a generating model, which enables the reproduction of analyses and results for a given scientific publication. All in all, we aim to lay the grounds for machine-processible descriptions of conducted empirical research to enable the automatic reconstruction of individual statistical data sets and reproduction of results. An overview of the work at its latest stage will be presented in this talk. Encoding Patron Information in RDFJakob Voß AbstractCurrent efforts to publish library data as Linked Open Data (LOD) have focused on bibliographic data, authority files and organizations. It took some time to also publish information about single items and their current availability in particular libraries. Applications that make use of these data sets are just being created. Patron information, consisting of links between library patrons and documents held by libraries, however, is not published yet. This gap exists with good cause because of privacy concerns but also because it is difficult to get patron information out of legacy systems and because there is no encoding in RDF yet. The proposed research report will present an encoding of patron information in RDF that was created for two applications in the GBV library network. The encoding aligns with existing ontologies, namely the Document Availability Information Ontology (DAIA) and Semantically-Interlinked Online Communities (SIOC). It will be shown how patron information from library systems is extracted from (legacy) library systems and how the information is encoded in RDF, among other formats. The patron information can then be combined with other sources, while respecting patron privacy. Lightning Talks-- AbstractTake the chance to present a new project, an elegant problem solution or an open research question for which you look for combatants - in max. 5 minutes. |
|
15:30 - 16:00 |
Coffee Break |
|
16:00 - 17:30 |
Panel Discussion:
|
Please note that all information may be subject to change.
organized by:
Bürgerhaus Stollwerck
Dreikönigenstr. 23
50678 Cologne