MetaScholar: An Emory University Digital Library Research Initiative
homeaboutdocumentssearchfaqscontact us
computer image




background information | digital library frameworks | open source tools for digital libraries


Background Information
  • Metadata Harvesting
    Metadata harvesting refers to the sharing of digital library metadata records through protocols, APIs, and services which are designed specifically to facilitate this task.
    • Open Archives Initiative portal. This site is the premier source of information on the OAI protocol for metadata harvesting.
      [www.openarchives.org]
    • The Metadata Harvesting Initiative of the Mellon Foundation by Donald J. Waters, Program Officer, The Andrew W. Mellon Foundation. ARL Bimonthly Report 217, August 2001.
      [www.arl.org/newsltr/217/waters.html]
    • Metadata Harvesting and the Open Archives Initiative by Clifford A. Lynch, Executive Director, Coalition for Networked Information. ARL Bimonthly Report 217, August 2001.
      [www.arl.org/newsltr/217/mhp.html]


  • Focused Crawling
    Focused crawling is a method of discovering valuable resources within the uncontrolled body of the web. Focused crawling is meant to be more efficient than standard crawling, which is used to build general web-wide search engines. Focused crawling results in higher search precision within the subject domain used for the focusing, as well as shorter collection-building times and lower demands on data storage.


  • Topics in Semantic Clustering
    Semantic clustering refers to the "bringing together" of items (such as digital library records) based on similarity of meaning. As we use the term, a number of sub-methods are included. These are cluster formation and visualization, classification, and metatagging.
    • National Institute for Technology & Liberal Education introduction to the concept of "semantic clustering".
      [www.nitle.org/semantic_search.php]
    • An Introduction to Support Vector Machines (and other kernel-based learning methods) by N. Cristianini and J. Shawe-Taylor. Cambridge University Press, 2000. ISBN: 0 521 78019 5. Offers table of contents of text, as well as additional informative links on kernel methods.
      [www.support-vector.net]


Digital Library Frameworks
Digital library frameworks refers to both formalized theoretical ideas about the organization of digital libraries and their realization in practice in the form of software and its subsequent deployment. Formal frameworks are important for time savings and qualitative improvements in digital libraries; they allow us to benefit from both the insights and previous effotst of others in the field.
  • Open Digital Libraries (ODLs) are systems built as networks of extended Open Archives.
    [oai.dlib.vt.edu/odl/]
  • A Framework for Building Open Digital Libraries by Hussein Suleman and Edward A Fox. in D-lib Magazine, December 2001. ISSN 1082-9873.
    [www.dlib.org/dlib/december01/suleman/12suleman.html]
  • 5S
    5S is a theoretical framework for understanding and modeling digital libraries. It consists of 5 elements: structures, streams, scenarios, spaces, and societies. 5SL is an XML-based language conforming to the 5S theory for the purpose of expressing digital library design. Upon this, work is being done to develop 5SGraph, a graphical tool for generating 5SL specifications, and 5SLGen, a tool for taking 5SL specifications and generating a digital library from a software component pool.
    • Scenario-based Generation of Digital Library Services by Rohit Dilip Kelapure (Master's thesis). 12-2003.
      [scholar.lib.vt.edu/theses/available/etd-06182003-055012/]
    • 5SGraph: A Modeling Tool for Digital Libraries by Qinwei Zhu (Master's thesis). 11-2002.
      [scholar.lib.vt.edu/theses/available/etd-11272002-210531/]
    • 5SL: A Language for Declarative Specification and Generation of Digital Libraries by Marcos André Gonçalves and Edward A Fox, Department of Computer Science, Virginia Polytechnic Institute and State University.
      [www.dlib.vt.edu/projects/5S-Model/p117-goncalves.pdf]
    • Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital lIbraries by Marcos André Gonçalves, Edward A Fox, Layne T Watson, and Neill A Kipp, Department of Computer Science, Virginia Polytechnic Institute and State University.
      [www.dlib.vt.edu/projects/5S-Model/5s6.pdf]


Open Source Tools for Digital Libraries
There are many open source tools that implement one or more services for digital libraries. The following tools are being studied as a part of these projects, and include crawlers, search engines, and semantic clustering-related software.

  • Comparing Open Source Indexers by Eric Lease Morgan. date last updated: 08-2002. This text compares and contrasts the features and functionality of various open source indexers: freeWAIS-sf, Harvest, Ht://Dig, Isite/Isearch, MPS, SWISH, WebGlimpse, and Yaz/Zebra.
    [www.infomotions.com/musings/opensource-indexers/]
  • SVMlight is an implementation of Support Vector Machines (SVMs) in C.
    [svmlight.joachims.org]
  • Simple Web Indexing System for Humans - Enhanced (SWISH-E). {focused crawler}
    [swish-e.org]
  • Heritrix is the Internet Archive's open-source, extensible, wqeb-scale, archival-quality web crawler project. {focused crawler and indexer combined}
    [crawler.archive.org]
  • Jakarta Lucene is a high-performance, full-featured text search engine written entirely in Java. It is a technology for nearly any application that requires full-text search, especially cross-platform. {indexer}
    [jakarta.apache.org/lucene/docs/]