RE: Spam:Re: [nmlis] Internet Archive to build alternative to Google
This is the reply of Mr. Prafulla's mail. Well, the images will be there in the book. Actually books are being scanned and stored in TIFF format and also OCR is being done to create the text. You may like to see at http://www.dli.ernet.in/ where you can find the TIFF, HTML and Text version of the book. I am not sure why it is being called as Text-based. The reason may be that they are creating the text format also and allowing the search inside the book using the index created by OCR software. However there is little ambiguity. Project document says that keyword search is allowed but document will be delivered only in image format. Text will be used only for creating the index and searching. But Digital Library of India website is not allowing the keyword search (only Title and Author search with some browse options) and giving the full document in image as well as text format. I saw this scanning work almost 2 and half year back in Digital Library of IISc and there they were using Minolta overhead Book Scanner, which can scan 10,000 pages in a day in 3 shifts. Also this is overhead book scanner so there is no need to open the binding of the book. Also, after that they were doing OCR (probably using Abby FineReader s/w). Someone related to project may throw more light on this. You may like to see the following pages in order to have more details about the project. http://www.library.cmu.edu/Libraries/MBP_FAQ.html and http://www.ulib.org/html/index.html Thanks and Regards Madhuresh Singhal Aurigene Discovery Technologies Limited, Electronic City, phase II, Hosur Road, Bangalore 560100 Phone 28521314-16 Ext.- 422 Mobile 98861 82822 E-mail: madhureshsinghal@yahoo.com http://nettalk2.tripod.com/ -----Original Message----- From: Prafulla Chandra [mailto:prafulla@nisc.co.in] Sent: Thursday, December 30, 2004 2:49 AM To: lis-forum@ncsi.iisc.ernet.in; Madhuresh Subject: Spam:Re: [nmlis] Internet Archive to build alternative to Google Dear Friends, This has reference to Mr Madhuresh's email reproduced below. How about the images contained in the books digitised? Are they going to be excluded or included? What does the phrase 'text- based' in the following extract mean? Can we search full-text of the books digitised by any key word? Ten major international libraries have agreed to combine their digitised book collections into a free text-based archive hosted online by the not-for-profit Internet Archive. All content digitised and held in the text archive will be freely available to online users. Thank you Sincerely T.V. Prafulla Chandra Senior Editor To: <lis-forum@ncsi.iisc.ernet.in>, <nmlis@yahoogroups.com>, <india-lis@infoserv.inist.fr> From: "Madhuresh" <madhuresh_s@aurigene.com> Date sent: Thu, 30 Dec 2004 14:36:28 +0530 Subject: [nmlis] Internet Archive to build alternative to Google [ Double-click this line for list subscription options ] Dear Friends, This announcement didn't get much attention in comparison to Google's announcement. Internet Archive is hosting the digitized book collections of 10 International libraries including the library of Indian Institute of Science, Bangalore. See below news item taken from http://www.iwr.co.uk/IWR/1160176 Internet Archive to build alternative to Google Ten international libraries agree to add digitised book collections to not-for-profit Internet Archive's new Text Archive project By Mark Chillingworth [21-12-2004] Ten major international libraries have agreed to combine their digitised book collections into a free text-based archive hosted online by the not-for-profit Internet Archive. All content digitised and held in the text archive will be freely available to online users. Two major US libraries have agreed to join the scheme: Carnegie Mellon University library and The Library of Congress have committed their Million Book Project and American Memory Projects, respectively, to the text archive. The projects both provide access to digitised collections. The Canadian universities of Toronto, Ottawa and McMaster have agreed to add their collections, as have China's Zhejiang University, the Indian Institute of Science, the European Archives and Bibliotheca Alexandrina in Egypt. In a statement, the Internet Archive describes the Text Archive as an Open Access archive that will "ensure permanent and public access to our published heritage". Over a million books have been committed to the Text Archive by the member institutes, with 50,000 available in the first quarter of 2005. The San Francisco-based Internet Archive was founded in 1996 to build a library for the internet that offered access to historical collections. It's most well-known online project is the Wayback Machine, which harvests snapshots of freely-available websites. Announced 24 hours after Google's tie-up with the university libraries of Oxford, Stanford, Michigan and Harvard, and the New York Public Library, the Internet Archive project is likely to be seen as the first of many alternatives to the Google Print library. Internet Archive said: "Commercial companies are currently working with libraries to digitise materials as well. We are encouraging these efforts and hope most of these materials will also be available through Text Archives." Madhuresh Singhal Aurigene Discovery Technologies Limited, Electronic City, phase II, Hosur Road, Bangalore 560100 Phone 28521314-16 Ext.- 422 Mobile 98861 82822 E-mail: madhureshsinghal@yahoo.com http://nettalk2.tripod.com/ ------------------------ Yahoo! Groups Sponsor --------------------~--> Make a clean sweep of pop-up ads. Yahoo! Companion Toolbar. Now with Pop-Up Blocker. Get it for free! http://us.click.yahoo.com/L5YrjA/eSIIAA/yQLSAA/HDfwlB/TM --------------------------------------------------------------------~-> Yahoo! Groups Links <*> To visit your group on the web, go to: http://groups.yahoo.com/group/nmlis/ <*> To unsubscribe from this group, send an email to: nmlis-unsubscribe@yahoogroups.com <*> Your use of Yahoo! Groups is subject to: http://docs.yahoo.com/info/terms/ --------------------------------------------------------------------------------------------------------- NISC Export Services Pvt. Ltd. (an affiliate of NISC International, Inc. USA) S-1 Ballad Estates, St.Ann's School Road, Tarnaka, Hyderabad 500 017 Andhra Pradesh, India - Tel:+91 40 27001517 Tel/Fax:+91 40 27002538 WWW.NISC.COM A company in service to NISC worldwide. ---------------------------------------------------------------------------------------------------------
participants (1)
-
Madhuresh