This is the reply of Mr. Prafulla's mail.
Well, the images will be there in the book. Actually books are being scanned and stored in TIFF format and also OCR is being done to create the text. You may like to see at http://www.dli.ernet.in/ where you can find the TIFF, HTML and Text version of the book. I am not sure why it is being called as Text-based. The reason may be that they are creating the text format also and allowing the search inside the book using the index created by OCR software. However there is little ambiguity. Project document says that keyword search is allowed but document will be delivered only in image format. Text will be used only for creating the index and searching. But Digital Library of India website is not allowing the keyword search (only Title and Author search with some browse options) and giving the full document in image as well as text format.
I saw this scanning work almost 2 and half year back in Digital Library of IISc and there they were using Minolta overhead Book Scanner, which can scan 10,000 pages in a day in 3 shifts. Also this is overhead book scanner so there is no need to open the binding of the book. Also, after that they were doing OCR (probably using Abby FineReader s/w). Someone related to project may throw more light on this.
You may like to see the following pages in order to have more details about the project. http://www.library.cmu.edu/Libraries/MBP_FAQ.html and http://www.ulib.org/html/index.html
Thanks and Regards
Madhuresh Singhal
Aurigene Discovery Technologies Limited,
Electronic City, phase II, Hosur Road,
Bangalore 560100
Phone 28521314-16 Ext.- 422
Mobile 98861 82822
E-mail: madhureshsinghal@yahoo.com
http://nettalk2.tripod.com/
-----Original Message-----
From: Prafulla Chandra [mailto:prafulla@nisc.co.in]
Sent: Thursday, December 30, 2004 2:49 AM
To: lis-forum@ncsi.iisc.ernet.in; Madhuresh
Subject: Spam:Re: [nmlis] Internet Archive to build alternative to Google
Dear Friends,
This has reference to Mr Madhuresh's email reproduced below.
How about the images contained in the books digitised? Are they going to be excluded or included? What does the phrase 'text- based' in the following extract mean? Can we search full-text of the books digitised by any key word?
Ten major international libraries have agreed to combine their digitised book collections into a free text-based archive hosted online by the not-for-profit Internet Archive. All content digitised and held in the text archive will be freely available to online users.
Thank you
Sincerely
T.V. Prafulla Chandra
Senior Editor
To: , ,
From: "Madhuresh"
Date sent: Thu, 30 Dec 2004 14:36:28 +0530
Subject: [nmlis] Internet Archive to build alternative to Google
[ Double-click this line for list subscription options ]
Dear Friends,
This announcement didn't get much attention in comparison to Google's
announcement. Internet Archive is hosting the digitized book collections
of 10 International libraries including the library of Indian Institute
of Science, Bangalore. See below news item taken from
http://www.iwr.co.uk/IWR/1160176
Internet Archive to build alternative to Google
Ten international libraries agree to add digitised book collections to
not-for-profit Internet Archive's new Text Archive project
By Mark Chillingworth [21-12-2004]
Ten major international libraries have agreed to combine their digitised
book collections into a free text-based archive hosted online by the
not-for-profit Internet Archive. All content digitised and held in the
text archive will be freely available to online users.
Two major US libraries have agreed to join the scheme: Carnegie Mellon
University library and The Library of Congress have committed their
Million Book Project and American Memory Projects, respectively, to the
text archive. The projects both provide access to digitised collections.
The Canadian universities of Toronto, Ottawa and McMaster have agreed to
add their collections, as have China's Zhejiang University, the Indian
Institute of Science, the European Archives and Bibliotheca Alexandrina
in Egypt.
In a statement, the Internet Archive describes the Text Archive as an
Open Access archive that will "ensure permanent and public access to our
published heritage". Over a million books have been committed to the
Text Archive by the member institutes, with 50,000 available in the
first quarter of 2005.
The San Francisco-based Internet Archive was founded in 1996 to build a
library for the internet that offered access to historical collections.
It's most well-known online project is the Wayback Machine, which
harvests snapshots of freely-available websites.
Announced 24 hours after Google's tie-up with the university libraries
of Oxford, Stanford, Michigan and Harvard, and the New York Public
Library, the Internet Archive project is likely to be seen as the first
of many alternatives to the Google Print library.
Internet Archive said: "Commercial companies are currently working with
libraries to digitise materials as well. We are encouraging these
efforts and hope most of these materials will also be available through
Text Archives."
Madhuresh Singhal
Aurigene Discovery Technologies Limited,
Electronic City, phase II, Hosur Road,
Bangalore 560100
Phone 28521314-16 Ext.- 422
Mobile 98861 82822
E-mail: madhureshsinghal@yahoo.com
http://nettalk2.tripod.com/
------------------------ Yahoo! Groups Sponsor --------------------~-->
Make a clean sweep of pop-up ads. Yahoo! Companion Toolbar.
Now with Pop-Up Blocker. Get it for free!
http://us.click.yahoo.com/L5YrjA/eSIIAA/yQLSAA/HDfwlB/TM
--------------------------------------------------------------------~->
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/nmlis/
<*> To unsubscribe from this group, send an email to:
nmlis-unsubscribe@yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
---------------------------------------------------------------------------------------------------------
NISC Export Services Pvt. Ltd.
(an affiliate of NISC International, Inc. USA)
S-1 Ballad Estates, St.Ann's School Road, Tarnaka, Hyderabad 500 017
Andhra Pradesh, India - Tel:+91 40 27001517 Tel/Fax:+91 40 27002538
WWW.NISC.COM
A company in service to NISC worldwide.
---------------------------------------------------------------------------------------------------------