

|

Background Information
- Metadata Harvesting
Metadata harvesting refers to the sharing of digital library metadata
records through protocols, APIs, and services which are designed
specifically to facilitate this task.
- Open Archives Initiative portal. This site is the premier source of
information on the OAI protocol for metadata harvesting.
[www.openarchives.org]
- The Metadata
Harvesting Initiative of the Mellon Foundation
by Donald J. Waters, Program Officer, The Andrew W. Mellon
Foundation. ARL Bimonthly Report 217, August 2001.
[www.arl.org/newsltr/217/waters.html]
- Metadata Harvesting and the Open Archives Initiative
by Clifford A. Lynch, Executive Director, Coalition for
Networked Information. ARL Bimonthly Report 217, August 2001.
[www.arl.org/newsltr/217/mhp.html]
- Focused Crawling
Focused crawling is a method of discovering valuable resources
within the uncontrolled body of the web. Focused crawling is meant to
be more efficient than standard crawling, which is used to build general
web-wide search engines. Focused crawling results in higher search
precision within the subject domain used for the focusing, as well as
shorter collection-building times and lower demands on data storage.
- Topics in Semantic Clustering
Semantic clustering refers to the "bringing together" of items (such as
digital library records) based on similarity of meaning. As we use the
term, a number of sub-methods are included. These are cluster formation and
visualization, classification, and metatagging.
- National Institute for Technology & Liberal Education introduction to the concept of "semantic clustering".
[www.nitle.org/semantic_search.php]
- An Introduction to Support Vector Machines (and other kernel-based learning methods) by N. Cristianini and J. Shawe-Taylor.
Cambridge University Press, 2000. ISBN: 0 521 78019 5. Offers table of contents of text, as well as additional informative links on
kernel methods.
[www.support-vector.net]
Digital Library Frameworks
Digital library frameworks refers to both formalized theoretical ideas
about the organization of digital libraries and their realization in
practice in the form of software and its subsequent deployment.
Formal frameworks are important for time savings and qualitative
improvements in digital libraries; they allow us to benefit from both
the insights and previous effotst of others in the field.
- Open Digital Libraries (ODLs) are systems built as networks of extended Open Archives.
[oai.dlib.vt.edu/odl/]
- A Framework for Building Open Digital Libraries by Hussein Suleman and Edward A Fox. in D-lib Magazine, December 2001.
ISSN 1082-9873.
[www.dlib.org/dlib/december01/suleman/12suleman.html]
- 5S
5S is a theoretical framework for understanding and modeling digital
libraries. It consists of 5 elements: structures, streams, scenarios,
spaces, and societies. 5SL is an XML-based language conforming to the 5S
theory for the purpose of expressing digital library design. Upon this, work
is being done to develop 5SGraph, a graphical tool for generating 5SL
specifications, and 5SLGen, a tool for taking 5SL specifications and
generating a digital library from a software component pool.
- Scenario-based Generation of Digital Library Services by Rohit Dilip Kelapure (Master's thesis). 12-2003.
[scholar.lib.vt.edu/theses/available/etd-06182003-055012/]
- 5SGraph: A Modeling Tool for Digital Libraries by Qinwei Zhu (Master's thesis). 11-2002.
[scholar.lib.vt.edu/theses/available/etd-11272002-210531/]
- 5SL: A Language for Declarative Specification and Generation of Digital Libraries by
Marcos André Gonçalves and Edward A Fox, Department of Computer Science, Virginia Polytechnic Institute and State
University.
[www.dlib.vt.edu/projects/5S-Model/p117-goncalves.pdf]
- Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital lIbraries by
Marcos André Gonçalves, Edward A Fox, Layne T Watson, and Neill A Kipp, Department of Computer Science, Virginia
Polytechnic Institute and State University.
[www.dlib.vt.edu/projects/5S-Model/5s6.pdf]
Open Source Tools for Digital Libraries
There are many open source tools that implement one or more services for
digital libraries. The following tools are being studied as a part of
these projects, and include crawlers, search engines, and semantic
clustering-related software.
- Comparing Open Source Indexers by Eric Lease Morgan. date last updated: 08-2002.
This text compares and contrasts the features and functionality of various open source indexers: freeWAIS-sf,
Harvest, Ht://Dig, Isite/Isearch, MPS, SWISH, WebGlimpse, and Yaz/Zebra.
[www.infomotions.com/musings/opensource-indexers/]
- SVMlight is an implementation of Support Vector Machines (SVMs) in C.
[svmlight.joachims.org]
- Simple Web Indexing System for Humans - Enhanced (SWISH-E). {focused crawler}
[swish-e.org]
- Heritrix is the Internet Archive's open-source, extensible, wqeb-scale, archival-quality web crawler project. {focused crawler and indexer combined}
[crawler.archive.org]
- Jakarta Lucene is a high-performance, full-featured text search engine written entirely in Java. It is a technology for nearly any
application that requires full-text search, especially cross-platform. {indexer}
[jakarta.apache.org/lucene/docs/]
|
 |