[LIS-Forum] FW: Google- the book scanning project- Microsoft of the
future other issues- the scenario gets murkier
Shalini R Urs
shalini at vidyanidhi.org.in
Wed Mar 2 16:24:47 IST 2005
The Google Library book scanning project has raised may issues in the
world of information.
Please read on the msg below for some more light (or is more dust in the
It makes an interesting read anyways -
From: owner-ipr at mailhost.soros.org [mailto:owner-ipr at mailhost.soros.org]
On Behalf Of Darius Cuplinskas
Sent: 01 March 2005 14:02
To: ipr & public domain list
Subject: [ipr] How Google will scan the world, 1 book at a time
This project, although valuable, will not provide full-text access, and
will entail severe restrictions even on "fair use" permitted by
"You can't print the book out or digitally cut and paste any of it,
even for this, an out-of-copyright volume. The restrictions will be more
severe on in-copyright volumes, essentially those printed after 1920,
limiting searchers to seeing a line or two around their search term."
Brewster Kahle of the Internet Archive is bringing together a group of
libraries to organize a rival project which would allow full-text access
and put fewer restrictions on in-copyright materials.
-------- Original Message --------
How Google will scan the world, 1 book at a time
February 25, 2005
As Google prepares to create the world's most comprehensive digital
library, it's getting harder not to think of the company as the next
Microsoft, morphing from a friendly Internet helper with a cutesy name
into an awesome and inescapable force of digital nature.
Already the dominant search engine, the California technology company
is testing Gmail, a free e-mail service that's likely to be a
blockbuster, Google Maps, a rival to Mapquest, and Google Desktop, a
function that allows users to search within their computers much more
quickly than Microsoft Windows does.
And then there's the small matter of accumulating vast chunks of
humankind's recorded knowledge. Even as you read this, Google scanners
are busy making bits and bytes out of books from the Harvard,
Stanford, Oxford and University of Michigan libraries, as well as from
New York Public Library, a project that ultimately could total more than
57 million volumes.
The company's goal in all it does, it says, is "to organize the
world's information and make it universally accessible and useful"
with, you know, selling ads alongside the world's information). But
goal seems to be ubiquity, and as Bill Gates has learned, when you
become unavoidable, you also become resented.
For now, though, Google remains mostly well-liked because the core of
its business, the search engine, is so good. And the book-scanning
project it's taking on seems more altruistic than not because Google
is bearing the enormous (and, of course, unspecified) cost of copying
books into the digital world.
"There are [other] people who had that vision," says Sidney Verba,
director of the Harvard University Library. "What Google had was the
vision and the ambition, the technical skill. ..."
Shifting to a whisper, he adds the most important factor: "a lot of
money -- in some sense I think more than they knew what to do with.
.. For these kinds of people to invest millions and millions of dollars
in it, it is a good exploitation of the profit sector for public
The University of Michigan says its pre-Google digital collection of
21,000 volumes is among the country's "most ambitious." But making it
-- actually placing the books, page by page, onto scanners and then
making sure the result is clean and accurate -- is very slow, hard work.
"At its current rate of digital production," the school explains in a
press release, "it would take the university more than a thousand
years to digitize the 7 million volumes in the collection. Google plans
do the job in a matter of years."
Beyond vague talk about Google having developed a much more efficient
process, the project's specifics are secret. At Harvard, for instance,
Google won't allow reporters to visit or photograph the scanning
currently being done -- of 40,000 volumes as a kind of pilot project,
just to make sure the books don't get damaged or lost -- at the
university library's 5-million-volume off-campus storage facility.
But the aims seem transparent enough. It will bring to the masses
these great research institutions, full of books one would normally need
plane ride and permission to access, and make them as easy to search
for and within as a particular city's restaurant listings.
"The company as a whole has been really excited about it," says Susan
Wojcicki, Google's director of product management, in part because it
relates to the company's roots. "The founders were working on a
library digitization project when they wound up creating a search engine
that today is called Google.
"We're just really excited to be working with these institutions as
well. A lot of them have been around for hundreds of years," adds
Google's age: 6 1/2.
Libraries will get copies
The books will be gathered by Google in the Google Print company
subdivision, known previously for trying to get publishers to put
their current books in print up for public perusal. The libraries will
get their own copies of their texts turned binary.
"We're very anxious to make sure this is of real service to our own
users and a public good, as well," Verba says. "We're very sensitive
to not having somebody come back and say, `Look, you've just turned over
to a monopoly something that should belong to the world.'"
The project, almost everybody agreed at its December announcement,
holds enormous promise. Scholars will be able to learn, at the press
of a button, which books have discussed Francis Bacon, for instance, and
in what manner. Journalists and bloggers can add books to their research
repertoire; previously, they have used mostly other journalism, the
already-digitized and quickly searchable record of newspapers,
magazines and some television shows.
"There'll be more of that contextual information which you'll be able
to get more readily," says Harvard's Verba, "How that'll change the
way people think, I don't know, but it's really exciting.
"It will give us this huge digitized file of our books to do things
with that we don't know yet what they are. Searching text is really a
thing that's on the frontier. As it evolves, we will have a text to
search. We are thinking of it as a very valuable resource for things
that are not 100 percent clear."
Ordinary users can see the project at work already. Type "books and
culture" into the company's familiar search interface, and you get
access not only to all the Web pages Google has indexed that contain
those words, but also, right near the top of your search, to one of
the early books to be scanned in, an 1896 tome of that title by Hamilton
Right away you notice that there are no ads on the page, and there are
no plans to add them "at this time," Wojcicki says. You may also
notice there are links to help you "Buy this book" and to "Find this
in a library."
No printing allowed
You can't print the book out or digitally cut and paste any of it,
even for this, an out-of-copyright volume. (The restrictions will be
severe on in-copyright volumes, essentially those printed after 1920,
limiting searchers to seeing a line or two around their search term.
But Stanford law professor Lawrence Lessig, writing in the Los Angeles
Times, questioned the right of Google to make available even
"snippets" of copyrighted material.) Verba says you can certainly make
notes on it, and that a print-on-demand feature is worth considering.
In the meantime, want to know whether Mabie's "Books and Culture"
contains the word "ominous"? Type the word into the "Search within
this book" field. There it is on page 193, in a sentence praising the
ability of the cultured man to recognize the characteristics of his era.
The thing that's really encouraging, Verba says, is that the project
turns the current fears of scholars and parents on their ear.
"There's no academic and in fact there's no parent with a teenager who
isn't worried about the fact that that generation will believe that
all knowledge is on the Internet and on Google and will never want to
a book again," he says. "The nice thing about this project is that it's
a kind of, `If you can't beat 'em, join 'em.' People will go to Google,
and they will find books, and they will have to then go to the library
and get the books. We think this is really a nice way of squaring that
More information about the LIS-Forum