Date: Thu, 15 Jul 2004 11:46:49 +0530 (IST)
From: I.R.N.Goudar
Open Access Services, Sources and Standards
Half a Day Brain Storming Session
President and office bearers of Society for Information Science, Bangalore
Chapter and British Library, Bangalore cordially invite you for half a
day brain storming session on Open Access Services, Sources and Standards.
Following colleagues will initiate the discussion on the topics mentioned:
1. Dr. T.B. Rajashekar, Associate Chairman, NCSI, IISc, Bangalore
Open Access Movement, Services and Issues
2. Dr. A.R.D. Prasad, Associate Professor, DRTC/ISI, Bangalore Open
Source Software and Standards
VENUE: Conference Hall, British Library, Prestige Takt, 23,
Kasturba Road Cross, Bangalore 560 001 Tel: 22489220
DATE : 20th July, 2004 (Tuesday), 2.00 pm
Discussions will be moderated by Prof. C.R. Karisiddappa, Ex President ILA
and Professor, Dept. of LIS, Karnatak University, Dharwad.
(I.R.N. Goudar)
Secretary,
SIS Bangalore Chapter.
Please Note: I have appended a brief note on this topic, an article on
OSS&S and select list of reading materials along with URLs. Please make
this program interesting and meaningful by your active participation.
Society for Information Science
Bangalore Chapter
C/O ICAST, National Aerospace Laboratories, Airport Road, Bangalore-560017
Tel: 2508-6080; Fax No: 2526-8072; e-mail:goudar@css.nal.res.in
Open Access Services, Sources and Standards
A Brief Note
Information resources especially journals have become very expensive.
Even the best universities and research organizations cannot afford to
buy all those resources that they need. The open access movement has
brought some relief for this eternal problem. Vast collections of
high-quality publications are being made available through Internet for
everybody to use without any payment. Creating and distributing this
information is expensive, yet individuals and organizations provide it
freely.
The roots of OAI lie in the development of e-print repositories. E-print
repositories were established in order to communicate the results of
ongoing scholarly research prior to peer review and journal publication.
The earliest of these was xxx (later arXiv), which began with high energy
physics in 1991 and expanded to cover the field of physics plus related
fields of mathematics, nonlinear sciences and computer science. The
Networked Computer Science Technical Reference Library (NCSTRL) provided
access to computer sciences technical reports deposited either in xxx or
in departmental repositories of cooperating research bodies. The Networked
Digital Library of Theses and Dissertations (NDLTD) built a digital
library of electronic theses and dissertations (ETDs) authored by students
of member institutions. Few other Open Access Services are: "PubMed
Service" of The National Library of Medicine; "Books in Print +"
through the web site of Amazon.com; and CERN Document Server.
Web interfaces allowed people to interact with these repositories and some
finding aids were provided. Different interfaces were designed for
different repositories, so end users were forced to learn diverse
interfaces in order to access the various repositories and finding aids.
Certain key players in these developments came to see interoperability as
an increasingly important issue to be addressed by the e-print community.
Two key interoperability problems were identified as impairing the impact
of e-print archives: end users were faced with multiple search interfaces
making resource discovery harder, and there was no machine-based way of
sharing the metadata. Solutions that were being explored included
cross-searching of archives and harvesting archive metadata in order to
provide centralised search services. OAI-PMH is a simple protocol based on
HTTP and XML with flexible deployment. A number of toolkits are available.
Metadata and full-text resources are typically made freely available.
Open Source Initiative (OSI) is a non-profit corporation dedicated to
managing and promoting the Open Source Definition for the good of the
community, specifically through the OSI Certified Open Source Software
certification mark and program. One can read about successful software
products that have these properties, and about our certification mark and
program, which allow you to be confident that software really is "Open
Source." One also make copies of approved open source licenses here.
The basic idea behind open source is very simple: When programmers can
read, redistribute, and modify the source code for a piece of software,
the software evolves. People improve it, people adapt it, people fix bugs.
And this can happen at a speed that, if one is used to the slow pace of
conventional software development, seems astonishing.
Some key definitions
Open Archive Initiative (OAI)
OAI is an initiative to develop and promote interoperability standards
that aim to facilitate the efficient dissemination of content.
Archive
The term "archive" in the name Open Archives Initiative reflects the
origins of the OAI in the e-prints community here the term archive is
generally accepted as a synonym for repository of scholarly papers.
Members of the archiving profession have justifiably noted the strict
definition of an "archive" within their domain; with connotations of
preservation of long-term value, statutory authorization and institutional
policy. The OAI uses the term "archive" in a broader sense: as a
repository for stored information. Language and terms are never
unambiguous and uncontroversial and the OAI respectfully requests the
indulgence of the professional archiving community with this broader use
of "archive" .
OAI Protocol for Metadata Harvesting (OAI-PMH)
OAI-PMH is a lightweight harvesting protocol for sharing metadata between
services.
Protocol
A protocol is a set of rules defining communication between systems. FTP
(File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are
examples of other protocols used for communication between systems across
the Internet.
Harvesting
In the OAI context, harvesting refers specifically to the gathering
together of metadata from a number of distributed repositories into a
combined data store.
Data Provider
A Data Provider maintains one or more repositories (web servers) that
support the OAI-PMH as a means of exposing metadata.
Service Provider
A Service Provider issues OAI-PMH requests to data providers and uses the
metadata as a basis for building value-added services. A Service
Provider in this manner is "harvesting" the metadata exposed by Data
Providers
Aggregator
An OAI aggregator is both a Service Provider and a Data Provider. It is a
service that gathers metadata records from multiple Data Providers and
then makes those records available for gathering by others using the
OAI-PMH.
Dublin Core
Dublin Core (DC) is a metadata format defined on the basis of
international consensus. The Dublin Core Metadata Element Set defines
fifteen elements for simple resource description and discovery, all of
which are recommended, and none of which are mandatory. DC has been
extended with further optional elements, element qualifiers and vocabulary
terms.
DCMI (Dublin Core Metadata Initiative)
The Dublin Core Metadata Initiative is an open forum engaged in the
development of interoperable online metadata standards that support a
broad range of purposes and business models. DCMI's activities include
consensus-driven working groups, global workshops, conferences, standards
liaison, and educational efforts to promote widespread acceptance of
metadata standards and practices.
E-print
An e-print is an author self-archived document. In the sense that the term
is ordinarily used, the content of an e-print is the result of scientific
or other scholarly research.
Interoperability
Interoperability is the ability of systems, services and organisations to
work together seamlessly toward common or diverse goals. In the technical
arena it is supported by open standards for communication between systems
and for description of resources and collections, among others.
Interoperability is considered here primarily in the context of resource
discovery and access.
Metadata
Structured information about resources (including both digital and
non-digital resources). Metadata can be used to help support a wide range
of operations on those resources. In the context of services based on
metadata harvested via OAI-PMH, the most common operation is discovery and
retrieval of resources.
-----------
Open Source, Open Standards
Karen Coyle
Information Technology and Libraries: 2004, 21(1), 33-37
(http://www.lita.org/ital/2101_coyle.html)
When people speak of open source software they are referring to computer
code - programs that run. But code is only the final step in the
information technology process. Prior to writing code the information
technology professional must do analysis to determine the nature of the
problem to be solved and the best way to solve it. When software projects
fail, the failure is more often than not attributable to shortcomings in
the planning and analysis phase rather than in the coding itself. Open
source software provides some particular challenges for planning since the
code itself will be worked on by different programmers and will evolve
over time. The success of an open source project will clearly depend on
the clarity of the shared vision of the goals of the software and some
strong definitions of basic functions and how they will work. This
all-important work of defining often takes place through standards and the
development of standards that everyone can use has become a movement in
itself: open standards.
Open standards are publicly available standards that anyone can
incorporate into their software. An example from the library environment
is the MARC record standard. The original documentation for the MARC
record was published by the American National Standards Institute.1 The
most common use of the standard, that of the MARC21 records that libraries
adhere to, is also published and available for use. No one owns the MARC
record format; there are no fees for its use and no restrictions on who
can use it in their products. Any software developer who wishes to write
for library systems therefore has access to a vital part of the system
needs: the basic data structure that libraries use today.
This may seem so obvious that its importance is hard to grasp. In fact,
the library world has probably made more use of open standards than
practically any other industry. Let's face it, "open" is practically our
middle name. Examples from the non-open world of proprietary software
might help us understand the importance of our preference for open
standards, and the examples are not hard to find: Microsoft Windows versus
the Macintosh operating system; VHS versus Betamax; Nintendo versus Sega.
In each case you have unique products that are inherently incompatible. As
a matter of fact, this incompatibility is purposeful and actually enhanced
by the companies in question as part of their market strategy. If you need
to compete, then openness is a disadvantage. If you need to cooperate,
then openness is the way to go.
Goals of Open Standards
Open standards can serve multiple needs. The most common one is the need
for interoperability. Interoperability refers to communication between
systems or system parts. In the highly networked world of the twenty-first
century, the ability for computer systems to exchange data in order to
carry out basic functions is absolutely vital since most systems operate
in a vast and varied digital community. Our library systems communicate
electronically with sources of bibliographic records, book vendors, and
users. They also now interconnect themselves with networked information
resources outside of the library and deliver these through
library-maintained interfaces. Much of this communication is through
open-standard interfaces, such as Z39.50, Electronic Data Interchange
(EDI), and hypertext transport protocol (HTTP).2 These standards operate
at the point where system boundaries touch; they determine the rules of
the digital membrane but do not determine how systems handle data up to
that point of permeability. Internally, few systems store bibliographic
data in the format prescribed by ANSI Z39.2, the basis for the MARC
record. But they are able to transform the data into that format for
communication with other systems.
Another purpose of open standards is to create the framework for a
community. In many ways this is the prime reason for many library
standards. The use of common cataloging rules does not so much allow
libraries to intercommunicate as it does create a certain look and feel
and a commonality between libraries that is an aid to users. It allows
users to move between libraries without having to learn a whole new
process for finding materials, and it makes it possible for the library
profession to train librarians and hire from among a pool of candidates.
The cataloging rules, published and readily available to anyone with the
desire and patience to learn them, contributed to the rise of professional
(rather than artisan) librarianship. Creating the rules brought members of
the library community together to ponder not only the vagaries of title
pages but also to confront some basic philosophical issues about the
organization of knowledge.
Today, in a world where many activities are performed through computer
programs, open standards can be promulgated as a way to encourage
decentralized development. Much of the work of the World Wide Web
Consortium (W3C) falls into this area. The W3C is a membership-sponsored
standards body that creates new standards for the Web. These standards can
be used by anyone writing software for the Web. What is critical about
many of these standards is that they set the foundation for entirely new
Web functions; functions that will only work if many different people
develop their part of the software that is needed.
This is rather hard to describe but should become clear with an example.
I'll use the recent development of the Platform for Privacy Preferences
(P3P).3 P3P is a set of rules that allows Web sites to describe their
privacy practices in a standard way. It would also allow Web users to
express their "privacy preferences" using the same standard vocabulary.
P3P does not specify how this will be implemented on the Web; the
development of actual software will be left to the rather amorphous Web
community. For P3P to be part of the Web, it will be necessary for Web
site owners to incorporate P3P into their sites, and for Web browsers to
create a user interface to the function. But for P3P to be successful, it
needs to be recognized by all major browsers (Internet Explorer, Netscape,
and AOL), and it must be used by a large number of Web sites. Since many
companies and institutions make use of software like FrontPage or Cold
Fusion to develop their large and complex Web sites, tools for building
P3P will need to be included in these packages. By specifying a standard
for privacy preferences, the W3C is attempting to set in motion a very
decentralized software development project that will need to be undertaken
by a wide variety of players.
Sort of Open versus Really Open
Although we speak about open standards, some are more open than others.
This is because there are a variety of aspects to open standards, and
standards that call themselves open do not always adhere to all of these.
Open standards are:
standards that anyone can use to develop software or functions;
standards in which anyone can participate in their development and
modification; and
standards that anyone can obtain without a significant price
barrier.
The best example of standards that meet all of those criteria are those
created by the Internet Engineering Task Force (IETF). The IETF dates back
to pre-Internet days, when it was a group of engineers working on the
first developments that eventually became the basis for that network.
These engineers developed a way of chronicling and communicating their
technical ideas through a series of documents called Requests for Comments
or RFCs.4 The first RFCs were almost in the form of notices ("OK, I'm
going to send packets with a 5-byte header, let me know if you can read
them"), but as time went on the RFCs became well-thought-out standards
that had been developed by groups of volunteers. Anyone can comment on the
RFC, either to point out errors or to make suggestions. Even after the
technical decisions in the RFC are accepted and implemented, the RFC
remains an RFC. Some RFCs improve or comment on previous ones, as
technology changes or as better ideas arise.
The functioning of the IETF is like a lesson in democracy: one person or a
group of people sees the need for a new or modified function for the
Internet; they draft a proposal which is placed on the Internet for anyone
to read and comment on; if the proposed function meets a need and is
successfully tested with an actual program, it becomes part of Internet
use. The IETF is open to anyone who wishes to participate. That last
statement needs qualification, however: participation in the IETF requires
a high level of technical knowledge and a considerable amount of a
person's time. Those who make up the various IETF committees are a
self-selected technocracy. And while the philosophy of the IETF is one of
engineering "purity," today's committees invariably have members who
represent technology companies that often have a particular bias toward
their own products. Still, there is no other standards organization that
is as open as the IETF, and there is still considerable input from the
academic and research communities.
This can be compared to the W3C, the standards organization formed to
develop and promulgate standards for the Web. Participation in the W3C is
limited to members - predominantly technology companies - who pay between
$5,000 and $50,000 per year to belong to the group. Compared to the IETF,
this group is lacking the academic and research engineers who bring a
financially neutral viewpoint to the discussions. There are also almost no
members who might represent a public interest viewpoint. This latter is
significant because the W3C does not limit itself to standards of
engineering; there is an effort called Technology and Society (within
which P3P was developed) that develops standards for functions like
content filtering and privacy.
There are a number of other standards bodies, such as the National
Information Standards Organization (NISO), the International Standards
Organization (ISO), and the American National Standards Institute (ANSI).
These organizations have members who participate directly in the
development of standards. The standards, once developed, are not only open
for use, but some of them are actually mandatory within certain
industries. Obtaining the actual text of the standards is, however,
another question.
Standards-making is an expensive enterprise and standards bodies have
traditionally made money on the sale of the printed form of their
standards. Since many companies and organizations would be required to
adhere to the standard, this provided a kind of guaranteed audience for
the standards documents, many of which carried rather hefty price tags.
The W3C, having arisen from the Internet community (and with the example
of the IETF preceding it) makes its standards available for open access on
the Web. In comparison, the document from ISO describing the Universal
Character Set which all modern computing is moving toward is priced at
about one hundred dollars. Although it isn't a huge price if viewed in
light of the research and development budget of a company, it does make it
difficult for small organizations, nonprofits, schools and libraries, and
individuals to make active use of the standards. Responding to these needs
and to the move toward greater openness in the standards area, in 2000
NISO became the only national standards organization making its standards
available over the Internet for free. There is some risk because this
removes a significant revenue stream from the organization. The gain is
that the organization should be even more successful in its primary
mission, which is that of providing standards for widespread use.
Open Standards and Libraries
The first of the library technology standards was the decision at the
first annual ALA meeting in September of 1877 to standardize the catalog
card at 7.5 x 12.5 cm.5 While this was intended to make mass production of
cards possible (and by analogy more standardized production of card
cabinets as well), the advantages of an open standard manifested
themselves when in 1898 the Library of Congress (LC) began its printed
card service. This was possible only because libraries in the United
States were using the same sized card and thus filing into cabinets that
held cards of that size. We can consider the LC card service the
technological predecessor of the MARC record service of the latter half of
the twentieth century. The card-size standard was its key to
interoperability.
The next technological standard of great interest was the computerization
of those same cards through the MARC record standard. Prior to the
development of what we now think of as MARC, a group of librarians led by
Henriette Avram of LC developed a machine-readable record format standard
for bibliographic data, ANSI Z39.2. This standard made use of other
national standards, such as the ASCII character set. Although at the time
only LC had the capability of producing the records (and the motivation to
do so), this is arguably the most significant technological development of
modern librarianship. By establishing an open standard for
machine-readable records, LC created the basis for the computerization of
library catalogs. That wasn't the intention in 1965 when Z39.2 was
proposed, however. LC was focused on automating its card production
services and creating a print-on-demand card service. Like Dewey's desire
to reduce the cost of card-stock production, the LC standard, because it
was open, was available to be used in ways that its creators had not yet
imagined.
Few library open standards have been as successful as the MARC standard.
Since 1965 arguably the most widely used standard is Z39.50, the protocol
for information retrieval from remote databases. Z39.50 takes advantage of
the existence of searchable bibliographic databases in library automation
systems and the networking provided through the Internet. The protocol had
a somewhat slow beginning, partly due to its complexity, but today the
functionality is included in most library system packages and there are
even open source versions of the software.
Other standards have been less successful. One example is the Common
Command Language (CCL), Z39.58. CCL is a standard set of commands for
searching in online catalogs that was developed by NISO in 1992. When the
standard came up for its five-year NISO review the organization's members
allowed the standard to lapse. Although some systems claim to use a common
command language, these generally do not use the standard commands defined
in the NISO standard. So how did a standard become not a standard after
all?
The reason for creating a common command language was not unlike one of
the original motivations behind a standardized set of cataloging rules:
the uniformity between libraries makes it easier for users to move from
library to library. A common command language is especially important in
current times because users may be using a number of library systems
almost simultaneously over the Internet. Why would such a useful standard
fail? There are a number of reasons why standards might not be adopted.
One of the obvious ones in terms of the CCL standard is the fact that the
technology that the standard responded to, the command-line interface to
library databases, was eclipsed by a new technology, the Web browser.
Although some command-line searching remains, it is not the main user
interface. Another reason for the lack of adoption of the CCL is something
that gives standards development a tricky aspect: people seem less likely
to accept standards that affect the content aspects of their computer
systems. Successful standards tend to define background functions, and
leave a great deal of flexibility for system developers in terms of
presentation. For example, the protocols that control the Internet e-mail
function do not dictate how e-mail will be presented to the user.
Everything from the command line Pine e-mail software to the almost
user-obsequious Microsoft Outlook product make use of the same e-mail
protocols. Yet another reason is that standardizing the command line gains
you very little where the underlying indexes of the system are not
themselves standardized. The command line is merely the interface to a
much more complex set of decisions about what fields feed into what
indexes, and about how the data in those language-based fields is treated
for the purpose of searching.
The lesson here is that not all aspects of systems are ideal candidates
for normalization. Whether rational or whimsical, system developers
clearly express a need to have a certain amount of freedom. Standards need
to facilitate functionality without suppressing the creativity of system
developers or their ability to meet the needs of their particular target
audience. Standards work best in the underlying technology layers and less
well the closer one gets to the actual user.
Some library standards currently in development might fit this bill. For
example, the NISO Circulation Interchange Protocol (NCIP) standard for
interlibrary loan (ILL) is intended to facilitate interoperability between
library systems for ILL transactions.6 ILL is an obvious area where
communication between diverse systems is needed for automation of the
function.
Libraries don't live by library standards alone, however. Increasingly our
library systems are interacting with the wider world of technology,
delivering library services over public networks. We use mainstream
standards such as the Internet protocols developed by the IETF, the Web
protocols of the W3C, the character sets defined internationally by the
ISO. Library representatives were heavily involved in the latter effort,
having already participated in the development of a similar standard known
as Unicode. However, there is virtually no library participation in
organizations such as the IETF or W3C, even though the standards developed
by these organizations are vital to our operations. Not only are libraries
missing from the standards groups, so also are schools and nonprofit
organizations, which are kept out not only by the membership fees but also
by the labor requirements for active participation: the need to dedicate a
significant amount of time of a highly skilled technical worker to the
standards process.
While it is unlikely that individual libraries would be able to be active
in a standards organization, we now have a possible model for greater
library participation: in 2000, the American Library Association (ALA)
joined the Open eBook Forum (OEBF), an industry group working on e-book
standards. By leveraging the strength of ALA's membership it has been
possible to spread the burden of participation while at the same time
provide a visible library presence for the standards process.
Conclusion
The Internet has given us an entirely new model for the cooperative
development of highly complex systems and subsequently of the standards
that allow those systems to work. Although the Internet has not lived up
to some of the Utopian promises of its early days, it still allows a low
entry barrier for active participation; so low in fact that individuals
can create their own Web sites right on the same network beside those of
major companies. We might not be able to give all of the credit to the
IETF and its example of open standards, but it is clear that open
standards are an essential element in the success of the Internet and its
widespread use. Continuing the open standards tradition will be essential
for its continued success.
References and Notes
1. National Information Standards Organization (U.S.), Information
Interchange Format (Bethesda, Md.: NISO Pr., 1994.) National Information
Standards Series ANSI/NISO Z39.2-1994.
2. National Information Standards Organization, Information Retrieval
(Z39.50): Application Service Definition and Protocol Specification
(Bethesda, Md.: NISO Pr., 1995.)
3. Full documentation on P3P is available at www.w3.org/P3P. Accessed Oct.
2, 2001.
4. There are a number of sites that house searchable copies of the IETF
RFCs. The official IETF RFC site is www.ietf.org/rfc.html. Accessed Oct.
2, 2001.
5. Wayne A. Wiegand, Irrepressible Reformer: A Biography of Melvil Dewey
(Chicago: ALA, 1996): 53-54.
6. NISO Circulation Interchange Protocol is a draft standard, available
for review and testing. Accessed Oct. 2, 2001,
www.niso.org/committees/committee_at.html.