At 18:15 26/03/2005, Leslie Carr wrote:
On 26 Mar 2005, at 15:14, Franck Laloe wrote:
We now have a goood experience of this question at CCSD, since we have
run an archive for the CNRS (a French research institution) for a few
years. Actually, the cost of running an archive is not much; one
salary is needed to pay someone to check that the documents which are
uploaded are OK for the archive; the price of the buyiung and
manitaining the hardware is comparable or less.
What costs more money, on the other hand, is to write new software. We
constantly improve ours (it is now significantly different from
ArXiv, although it remains compatible with it), and we pay three
engineers for this. I would say that for a whole (medium size)
country like France, a centralized system for all disciplins would
cost about 10 salaries; this is of course an extremely small fraction
of the research budget of the country.
This is very interesting and important information. Would you be able
to give an indication of the kinds of changes that you have had to
build on the base software (I assume from your message that you began
with arxiv)? With all of these systems, the devil (and the expense) is
in the details, but the precise details differ from one situation to
another. It would be a terrific insight to have an Institutional
Repository costing data-point at the National end of the spectrum!
---
Les Carr
Well, maybe I should first say that I was reasoning more in terms of the
contribution on one country (France for us) to international archives
(or repositories, I do not know which word is best). Of course, if each
institution in the country wants to have totally independent archives
(even if compatible through OAI-PMH for instance), the overall cost
would be much higher. In my country there are many institutions (we
have universities, research institutions, what we call "grandes
écoles", etc..), and the danger to build an expensive Babel tower is
real. The whole idea of CCSD is to offer a kind of national (or
international) service to all institutions that want to set up "direct
scientific communcation" through openarchives; CCSD develops the
software and maintains it, adapts it when special requirements are
necessary, and will ensure the long term preservation (technical
migrations, soft and hard). This is the general idea, with no special
Friends, especially friends in India:
Here is a very useful exchange. Can we in India think of a centralised
archive similar to the one run by CCSD in France for all research councils
and departments of the Central Government (CSIR, ICAR, DAE, Dept of Space,
ICMR, etc.)? Will it be better than each individual laboratory having its
own archive in the long run? I welcome your views.
Arun
---------------------------- Original Message ----------------------------
Subject: Re: [SI] Ann Okerson on institutional archives
From: "Stevan Harnad"
Date: Mon, March 28, 2005 8:14 pm
To: si@wsis-cs.org
Cc: "Leslie Carr"
-------------------------------------------------------------------
I have to point out that the information from Franck Laloe about CNRS's
HAL is correct and very helpful but risks being extremely misleading about
the cost of distributed institutional archiving. Here are the pertinent
points:
(1) France is unique in having a national research "mega-institution", the
CNRS. This consists of CNRS researchers in just about all scholarly and
scientific disciplines (not just those we call "science") distributed all
over the country, either in independent CNRS unit or in CNRS units that
are administratively associated with local universities.
(2) I am not sure what percentage of the researchers and research output
of France the CNRS comprises, but it is considerable, and if we add in the
three other CNRS-like national research institutes (INSERM in medicine,
INRA in biology and INRIA in information/computer science, which are all
collaborating with CNRS in self-archiving their research output in HAL),
that covers the great majority of French research output.
(3) Because of this unified national mega-institution and mega-archive,
France is in a position to take a huge step forward toward making 100% of
French research output OA, thereby setting an example for the rest of the
world. The total cost of this is very low, because of the economies of
scale that come with having all national research output centralized in
this way.
(4) Most important of all, because all four of these institutions are
indeed institutions, with the status of employer (and, I am not sure about
this, but I believe also the status of research funder in some cases),
CNRS, INSERM, INRIA and INRA are in a position to adopt a unified
self-archiving policy at a national level, and to ensure that the policy
is implemented in the whole country, by just about all of its researchers,
for just about all of French research output, all at once.
(5) Now the misinterpretation of all this:
(5a) Few if any other countries are in a position to adopt and
implement a national self-archiving policy like this, distributed
across all disciplines. Their research output is local to their
distributed universities and research institutions, and hence
self-archiving policy must be distributed and local to those
institutions.
(5b) The cost of self-archiving *per local institution* (which is what
I and Les, and others who have actually implemented such local
archives said it was: about a $2000 server plus a few days one-time
sysad time for start-up and a few days a year sysad time for
maintenance) is far, far lower than the cost of national, central
archiving (which is itself quite low). It may be that the national sum
of the local costs of the institutional self-archiving across all
local universities in a country comparable to the size of France will
be somewhat higher than the price of France's single national archive,
HAL, but *this national sum is *meaningless* in countries that have no
such national structure! It is like summing the library book
acquisition costs for each of the universities in a country and
comparing them to central costs: There is no national "pocket" out of
which all those local library acquisition budgets come, just as there
is no national pocket for the sum of institutionsè computer, network,
telephone or research travel costs. Such a comparison only makes
sense for a country with centralized research, like France.
(5c) HAL, though an excellent and no doubt robust and highly
functional national research self-archiving system happens to be
modelled on the properties of the Physics Arxiv. This is all fine, but
rather arbitrary, in making comparisons with distributed local
institutional self-archiving: There is no reason whatsoever why local
institutions need to adopt either the particular properties of Arxiv
or the strategies of a centralized national archive. The only thing
that these local university archives need to ensure is that they are
OAI-interoperable. The rest of the properties of HAL are merely
specific further choices that have been made (many no doubt based on a
priori guesses, not concrete experience or empirical study of
optimality) it the special case of HAL and CCSD.
(5d) Franck Laloe's guess that OAI-interoperability is not enough (to
forestall a 'Tower of Babel;) is precisely that -- an a-priori guess.
It has not been tested; all the a-posteriori evidence to date, from
actual distributed university archives, is that the guess is simply
incorrect: that what archives need is not more functionality (whether
arxiv-like functionality, HAL-like
functionality, or otherwise) but *more contents*. Archive content is
the only thing standing between the research world and 100% OA.
(5e) The only systematic analysis that has been done, comparing the
merits of central, national self-archiving and distributed
institutional self-archiving has come out very strongly in favour of
distributed institutional self-archiving -- followed by central
*harvesting* and (if desired) metadata enhancement. A primary reason
given was the existing research culture of independent research
universities and institutions, which is local, not centralized or
national: CNRS and France are a prominent exception in this regard
(and hence not considered in this study). One of the secondary reasons
was cost.
Swan, Alma and Needham, Paul and Probets, Steve and Muir,
Adrienne and O'Brien, Ann and Oppenheim, Charles and Hardy, Rachel
and Rowland, Fytton (2005) Delivery, Management and Access Model
for E-prints and Open Access Journals within Further and Higher
Education. JISC Report.
http://cogprints.org/4122/
Swan, Alma and Needham, Paul and Probets, Steve and Muir, Adrienne
and Oppenheim, Charles and O'Brien, Ann and Hardy, Rachel and
Rowland, Fytton and Brown, Sheridan (2005) Developing a model for
e-prints and open access journal content in UK further and higher
education. Learned Publishing.
http://cogprints.org/4120/
So, in summary, the special case of CNRS+, HAL and France is a great asset
to world OA, accelerating French OA provision substantially, in a way not
possible in any other country, at a national and central level, and
setting a splendid example (of systematic self-archiving policy) that will
encourage the rest of the world's research institutions to self-archive
too.
But please, having already lost so much time in reaching 100% OA because
of so many other misunderstandings, let us not now lose still more time in
over-focusing on the local particulars of France's centralized research
institutions, as these cannot be generalized literally to other countries
lacking such centralized institutions. Even less should we focus on the
special Arxiv- and other features HAL has elected to incorporate, or,
indeed, the cost of HAL: The Arxiv features and their extensions are not
essential (nor even necessarily optimal!) ones, OAI-interoprability is
enough, and the costs of a national centralized archive have no basis for
comparison with countries that distribute their research across
independent universities and research institutions. What is essential is
more content, *not* more functionality!
The take-home message from France is that 100% self-archiving is desirable
and feasible -- but the details (central-institutional vs.
distributed-institutional, HAL's specific special features, and their
cost) are, as they say in hexagonese: << des précisions inutiles >>
(useless details). The principle of
adopting and implementing institutional self-archiving policies for 100%
of research output is what the rest of the world should be taking to heart
from France's splendid example and initiative.
Best wishes,
Stevan
Stevan Harnad
On Mon, 28 Mar 2005, Franck Laloe wrote:
limit put at the borders of the country: if any scientific institution
in the world wants to join, they are welcome, assuming a sufficient
scientific qualité of course.
The data base where the articles are stored is a single base, with
homogeneous metadata. But our technique allows institution to create
personalized environments, with their own texts, logos, screen layout,
and even with additional metadata if useful. Everyone can have acess to
the generaly system (sumbmission and consultation) either through a
generic interface, or through a personalized interface that is
institution dependent and selects only the articles belonging to the
institution. Institutions which want to have a mirror of backup of all
their data on a computer they own may do so, if for some reason they do
not trust CCSD for keeping their material.
I should add that it was agreed with our american friends who run ArXiv
that every document that is collected by CCSD and belongs to one of the
scientific caterogires of ArXiv will automatically be transferred to
ArXiv. This works pretty well, and ensures more visibility to the
articles we collect. But we also collect articles in history,
education, linguistics, etc.. , which do not go to ArXiv for obvious
reasons.
In practice, of course, there is still a long way to go before we
collect all scientific production of the country. CNRS is the largest
research institution in France, and roughly speaking half of the
scientific departements strongly support CCSD by asking their people;
there is good hope to include more departements soon. We now have an
agreement with another research institution in France, INRIA, so this
will expand the impact of the system. Negociations with other
scientific insitutions are undeway. Just a figure to give an idea: in
2004 we have collected 1 500 thesis files, i.e. about 10% of the
national production. My hope is to be at about 50% in two or three
years, but this is only an extrapolation for the moment. And our main
goal is not limited to thesis, it includes all kinds of scientific
documents (articles, conference proceedings, etc..).
No, at last, the answer to your questions! No, we did not start from the
ArXiv software, and actually were not advised by Paul Ginsparg and
colleagues to do so when we started in 2000, for good reasons. ArXiv is
almost 15 years old, techniques have changed since. Our software, which
we call Hal (as the crazy computer in the movie!) does many things that
ArXiv does not do: as I said above, it allows a personnalization of
environments, contains the notion of "stamps", of "collections", can
extract lists of publications, etc.. It constantly evolves under the
pressure of various demands, and this is why we need three engineers at
CCSD.
This has been a long message, I will stop here! But please do not
hesitate to ask if you wish to know more. Concerning the cost of CCSD,
it is easy to calculate: salaries for three engineers (count 4 if you
count Marco and me, two part time physicists), offices, usual expenses,
computers and servers (but this is not much, except if you count backup
procedures which can be expensive if they are at a high level of
security).
best wishes
Franck Laloë
Franck Laloë, LKB, Dept de physique de l'ENS, 24 rue Lhomond, F 75005
Paris (France)
tel et fax 33 (1) 47 07 54 13 -- laloe@ens.fr
Franck Laloë, LKB, Dept de physique de l'ENS, 24 rue Lhomond, F 75005
Paris (France)
tel et fax 33 (1) 47 07 54 13 -- laloe@ens.fr
_______________________________________________
SI mailing list
SI@wsis-cs.org
http://mailman.greennet.org.uk/mailman/listinfo/si
_______________________________________________
SI mailing list
SI@wsis-cs.org
http://mailman.greennet.org.uk/mailman/listinfo/si