[Fwd: Re: [SI] Ann Okerson on institutional archives]
At 18:15 26/03/2005, Leslie Carr wrote:
On 26 Mar 2005, at 15:14, Franck Laloe wrote:
We now have a goood experience of this question at CCSD, since we have run an archive for the CNRS (a French research institution) for a few years. Actually, the cost of running an archive is not much; one salary is needed to pay someone to check that the documents which are uploaded are OK for the archive; the price of the buyiung and manitaining the hardware is comparable or less.
What costs more money, on the other hand, is to write new software. We constantly improve ours (it is now significantly different from ArXiv, although it remains compatible with it), and we pay three engineers for this. I would say that for a whole (medium size) country like France, a centralized system for all disciplins would cost about 10 salaries; this is of course an extremely small fraction of the research budget of the country.
This is very interesting and important information. Would you be able to give an indication of the kinds of changes that you have had to build on the base software (I assume from your message that you began with arxiv)? With all of these systems, the devil (and the expense) is in the details, but the precise details differ from one situation to another. It would be a terrific insight to have an Institutional Repository costing data-point at the National end of the spectrum! --- Les Carr
Well, maybe I should first say that I was reasoning more in terms of the contribution on one country (France for us) to international archives (or repositories, I do not know which word is best). Of course, if each institution in the country wants to have totally independent archives (even if compatible through OAI-PMH for instance), the overall cost would be much higher. In my country there are many institutions (we have universities, research institutions, what we call "grandes écoles", etc..), and the danger to build an expensive Babel tower is real. The whole idea of CCSD is to offer a kind of national (or international) service to all institutions that want to set up "direct scientific communcation" through openarchives; CCSD develops the software and maintains it, adapts it when special requirements are necessary, and will ensure the long term preservation (technical migrations, soft and hard). This is the general idea, with no special
Friends, especially friends in India: Here is a very useful exchange. Can we in India think of a centralised archive similar to the one run by CCSD in France for all research councils and departments of the Central Government (CSIR, ICAR, DAE, Dept of Space, ICMR, etc.)? Will it be better than each individual laboratory having its own archive in the long run? I welcome your views. Arun ---------------------------- Original Message ---------------------------- Subject: Re: [SI] Ann Okerson on institutional archives From: "Stevan Harnad" <harnad@ecs.soton.ac.uk> Date: Mon, March 28, 2005 8:14 pm To: si@wsis-cs.org Cc: "Leslie Carr" <lac@ecs.soton.ac.uk> ------------------------------------------------------------------- I have to point out that the information from Franck Laloe about CNRS's HAL is correct and very helpful but risks being extremely misleading about the cost of distributed institutional archiving. Here are the pertinent points: (1) France is unique in having a national research "mega-institution", the CNRS. This consists of CNRS researchers in just about all scholarly and scientific disciplines (not just those we call "science") distributed all over the country, either in independent CNRS unit or in CNRS units that are administratively associated with local universities. (2) I am not sure what percentage of the researchers and research output of France the CNRS comprises, but it is considerable, and if we add in the three other CNRS-like national research institutes (INSERM in medicine, INRA in biology and INRIA in information/computer science, which are all collaborating with CNRS in self-archiving their research output in HAL), that covers the great majority of French research output. (3) Because of this unified national mega-institution and mega-archive, France is in a position to take a huge step forward toward making 100% of French research output OA, thereby setting an example for the rest of the world. The total cost of this is very low, because of the economies of scale that come with having all national research output centralized in this way. (4) Most important of all, because all four of these institutions are indeed institutions, with the status of employer (and, I am not sure about this, but I believe also the status of research funder in some cases), CNRS, INSERM, INRIA and INRA are in a position to adopt a unified self-archiving policy at a national level, and to ensure that the policy is implemented in the whole country, by just about all of its researchers, for just about all of French research output, all at once. (5) Now the misinterpretation of all this: (5a) Few if any other countries are in a position to adopt and implement a national self-archiving policy like this, distributed across all disciplines. Their research output is local to their distributed universities and research institutions, and hence self-archiving policy must be distributed and local to those institutions. (5b) The cost of self-archiving *per local institution* (which is what I and Les, and others who have actually implemented such local archives said it was: about a $2000 server plus a few days one-time sysad time for start-up and a few days a year sysad time for maintenance) is far, far lower than the cost of national, central archiving (which is itself quite low). It may be that the national sum of the local costs of the institutional self-archiving across all local universities in a country comparable to the size of France will be somewhat higher than the price of France's single national archive, HAL, but *this national sum is *meaningless* in countries that have no such national structure! It is like summing the library book acquisition costs for each of the universities in a country and comparing them to central costs: There is no national "pocket" out of which all those local library acquisition budgets come, just as there is no national pocket for the sum of institutionsè computer, network, telephone or research travel costs. Such a comparison only makes sense for a country with centralized research, like France. (5c) HAL, though an excellent and no doubt robust and highly functional national research self-archiving system happens to be modelled on the properties of the Physics Arxiv. This is all fine, but rather arbitrary, in making comparisons with distributed local institutional self-archiving: There is no reason whatsoever why local institutions need to adopt either the particular properties of Arxiv or the strategies of a centralized national archive. The only thing that these local university archives need to ensure is that they are OAI-interoperable. The rest of the properties of HAL are merely specific further choices that have been made (many no doubt based on a priori guesses, not concrete experience or empirical study of optimality) it the special case of HAL and CCSD. (5d) Franck Laloe's guess that OAI-interoperability is not enough (to forestall a 'Tower of Babel;) is precisely that -- an a-priori guess. It has not been tested; all the a-posteriori evidence to date, from actual distributed university archives, is that the guess is simply incorrect: that what archives need is not more functionality (whether arxiv-like functionality, HAL-like functionality, or otherwise) but *more contents*. Archive content is the only thing standing between the research world and 100% OA. (5e) The only systematic analysis that has been done, comparing the merits of central, national self-archiving and distributed institutional self-archiving has come out very strongly in favour of distributed institutional self-archiving -- followed by central *harvesting* and (if desired) metadata enhancement. A primary reason given was the existing research culture of independent research universities and institutions, which is local, not centralized or national: CNRS and France are a prominent exception in this regard (and hence not considered in this study). One of the secondary reasons was cost. Swan, Alma and Needham, Paul and Probets, Steve and Muir, Adrienne and O'Brien, Ann and Oppenheim, Charles and Hardy, Rachel and Rowland, Fytton (2005) Delivery, Management and Access Model for E-prints and Open Access Journals within Further and Higher Education. JISC Report. http://cogprints.org/4122/ Swan, Alma and Needham, Paul and Probets, Steve and Muir, Adrienne and Oppenheim, Charles and O'Brien, Ann and Hardy, Rachel and Rowland, Fytton and Brown, Sheridan (2005) Developing a model for e-prints and open access journal content in UK further and higher education. Learned Publishing. http://cogprints.org/4120/ So, in summary, the special case of CNRS+, HAL and France is a great asset to world OA, accelerating French OA provision substantially, in a way not possible in any other country, at a national and central level, and setting a splendid example (of systematic self-archiving policy) that will encourage the rest of the world's research institutions to self-archive too. But please, having already lost so much time in reaching 100% OA because of so many other misunderstandings, let us not now lose still more time in over-focusing on the local particulars of France's centralized research institutions, as these cannot be generalized literally to other countries lacking such centralized institutions. Even less should we focus on the special Arxiv- and other features HAL has elected to incorporate, or, indeed, the cost of HAL: The Arxiv features and their extensions are not essential (nor even necessarily optimal!) ones, OAI-interoprability is enough, and the costs of a national centralized archive have no basis for comparison with countries that distribute their research across independent universities and research institutions. What is essential is more content, *not* more functionality! The take-home message from France is that 100% self-archiving is desirable and feasible -- but the details (central-institutional vs. distributed-institutional, HAL's specific special features, and their cost) are, as they say in hexagonese: << des précisions inutiles >> (useless details). The principle of adopting and implementing institutional self-archiving policies for 100% of research output is what the rest of the world should be taking to heart from France's splendid example and initiative. Best wishes, Stevan Stevan Harnad On Mon, 28 Mar 2005, Franck Laloe wrote: limit put at the borders of the country: if any scientific institution in the world wants to join, they are welcome, assuming a sufficient scientific qualité of course.
The data base where the articles are stored is a single base, with
homogeneous metadata. But our technique allows institution to create personalized environments, with their own texts, logos, screen layout, and even with additional metadata if useful. Everyone can have acess to the generaly system (sumbmission and consultation) either through a generic interface, or through a personalized interface that is institution dependent and selects only the articles belonging to the institution. Institutions which want to have a mirror of backup of all their data on a computer they own may do so, if for some reason they do not trust CCSD for keeping their material.
I should add that it was agreed with our american friends who run ArXiv
that every document that is collected by CCSD and belongs to one of the scientific caterogires of ArXiv will automatically be transferred to ArXiv. This works pretty well, and ensures more visibility to the articles we collect. But we also collect articles in history, education, linguistics, etc.. , which do not go to ArXiv for obvious reasons.
In practice, of course, there is still a long way to go before we
collect all scientific production of the country. CNRS is the largest research institution in France, and roughly speaking half of the scientific departements strongly support CCSD by asking their people; there is good hope to include more departements soon. We now have an agreement with another research institution in France, INRIA, so this will expand the impact of the system. Negociations with other scientific insitutions are undeway. Just a figure to give an idea: in 2004 we have collected 1 500 thesis files, i.e. about 10% of the national production. My hope is to be at about 50% in two or three years, but this is only an extrapolation for the moment. And our main goal is not limited to thesis, it includes all kinds of scientific documents (articles, conference proceedings, etc..).
No, at last, the answer to your questions! No, we did not start from the
ArXiv software, and actually were not advised by Paul Ginsparg and colleagues to do so when we started in 2000, for good reasons. ArXiv is almost 15 years old, techniques have changed since. Our software, which we call Hal (as the crazy computer in the movie!) does many things that ArXiv does not do: as I said above, it allows a personnalization of environments, contains the notion of "stamps", of "collections", can extract lists of publications, etc.. It constantly evolves under the pressure of various demands, and this is why we need three engineers at CCSD.
This has been a long message, I will stop here! But please do not
hesitate to ask if you wish to know more. Concerning the cost of CCSD, it is easy to calculate: salaries for three engineers (count 4 if you count Marco and me, two part time physicists), offices, usual expenses, computers and servers (but this is not much, except if you count backup procedures which can be expensive if they are at a high level of security).
best wishes
Franck Laloë
Franck Laloë, LKB, Dept de physique de l'ENS, 24 rue Lhomond, F 75005
Paris (France)
tel et fax 33 (1) 47 07 54 13 -- laloe@ens.fr
Franck Laloë, LKB, Dept de physique de l'ENS, 24 rue Lhomond, F 75005 Paris (France) tel et fax 33 (1) 47 07 54 13 -- laloe@ens.fr
_______________________________________________ SI mailing list SI@wsis-cs.org http://mailman.greennet.org.uk/mailman/listinfo/si
_______________________________________________ SI mailing list SI@wsis-cs.org http://mailman.greennet.org.uk/mailman/listinfo/si
participants (1)
-
Subbiah Arunachalam