Information retrieval with cURL

28 Jul 2003

      Dear friends

Please find an interesting article related to Information Retrieval.
Source: http://www.hinduonnet.com/stories/2003072800120200.htm

Aman Kumar Jha
Librarian
South Asia Human Rights Documentation Centre
New Delhi

Information retrieval with cURL

THIS WEEK NetSpeak focuses on a tool for automatically transferring files 
from the Net using a variety of Net protocols.

The innumerable and wide ranging Net resources available on the various 
servers across the Net are stored/retrieved using different well-defined 
methods or protocols. The web protocol HTTP (Hyper Text Transfer Protocol), 
the FTP (File Transfer Protocol) and the Gopher protocol are three popular 
protocols.

To retrieve resources stored under various servers, many tools that 
understand these protocols have been developed. These are called Net 
clients. The web browser, which understands the HTTP protocol, is a client 
tool for retrieving resources from a web server. Similarly there are clients 
for other protocols such as FTP and POP. But the drawback of most of these 
tools is that, though quite easy to operate, they lack enough flexibility. 
For example, the browser does not have an easy mechanism to selectively 
download a few web pages from a site.

Again, if you want to automatically download regularly all newly updated 
files with a specific extension from a site, current browser features become 
inadequate.

Here is another scenario: Suppose a Net resource is mirrored at several 
locations and to speed up the download process you want to split the file 
into multiple parts and download each of them from separate locations, 
instead of downloading the file completely from one location. None of the 
current browsers can do of do this. Al these point to the fact that there is 
need for a better information retrieval tool that will help automate some of 
the download tasks and allow downloading of materials as per requirements. 
One such client tool is cURL, which can be used to download Net resources 
automatically with extreme ease and flexibility.

cURL

As per its web site (http://curl.haxx.se/) cURL (stands for `client URL') is 
a command-line utility for transferring files from diverse Net servers such 
as FTP, HTTP, HTTPS, GOPHER, TELNET, DICT, FILE and LDAP. Any Net resource 
that uses standard URL format (like http://, ftp://) can be retrieved with 
this tool. Please note that cURL has nothing to do with 'Curl', the web 
programming language discussed in this column 
(http://www.hinduonnet.com/thehindu/biz/2001/12/13/stories/2001121300 
340100.htm) long ago.

You may also note that this free tool, which runs on platforms such as 
Windows and Linux, not only allows you to download files but can also be 
used to upload files on to such servers as web and FTP.

Another highlight of this service is the availability of the free curl 
library  libcurl  that can be integrated with popular programming 
languages such as Basic, C, C++, Python and PHP to develop curl based 
applications.

As usual, to get a good hold and internalise the capabilities of this 
program, let us go through a few examples:

The command: `curl http://www.hinduonnet.com' will download the home page of 
The Hindu's site and will display it on your screen.

If you want to view the HTTP exchange or the conversation that takes place 
between the client and the web server while downloading the web page, 
execute the above command with the `v' (v stands for verbose) option (like 
this: curl - v http://www. hinduonnet.com)

To upload a file on to an FTP server, use curl with the option `T' as 
follows:

Curl - T readme-file ftp://user: pass word@ftp-domain-name.com

The above command will upload the file `readme-file' on to the FTP server.

If you have a file stored in two locations, using cURL with `r' option, you 
can retrieve parts of the file simultaneously from these servers. For 
example, suppose you have a file called `example.doc' of size 39424 bytes, 
stored in two servers called server 1 and server 2. To download the file in 
two parts from these servers, use the following commands:

curl -r 0-1500 -o example 1 http://server1/example.doc (this will download 
first 1500 bytes of the file from server 1 and store it as `example 1')

curl -r 1501- -o example 2 http://server2/example.doc (this command will 
download the rest of the file from server 2 and store it as `example 2')

Combining the two downloaded files, we will get the original file.

Programs based on libcurl

As already mentioned, along with the cURL client program, a curl `program 
library' that can be used to create cURL-based applications is also 
available at the cURL site. The free program Getleft developed for 
downloading complete web sites is a good example of a product created with 
the help of cURL library.

The software, Getleft, written in TCL (Tool Command Language  
http://tcl.activestate.com/), can be used to download a web site completely 
by just entering the site's address in its input box. During the downloading 
process, it alters links in the original page so that you can browse the 
site locally without any trouble. For more details, check out: 
http://personal1.iddeo.es/andresgarci/ getleft/english/

Bootdisk

If you are a regular computer user, it is likely that you have faced the 
problem of hard disk failure at least once. Once the machine fails to boot 
from the hard disk, the next feasible solution is to attempt to boot using a 
bootable floppy disk. If you have failed/forgotten to create a boot disk 
during the installation stage, you will not be able to take this course. In 
case you land up in such troubled situations, check out the site BootDisk 
(http://www.bootdisk.com), which hosts many programs that can be used to 
create boot disks.

For example, if you want to create a DOS 6.22 boot disk, access the site, 
click on the `DOS/Windows... ' option and download the appropriate boot 
software. Now, insert a fresh floppy disk and execute the downloaded 
software, which will convert the floppy into bootable disk by transferring 
the DOS system files on to it. Apart from several `boot' programs, you will 
find many valuable materials that include information on hard disk 
partitioning, networking and several `how to guides' in this site.

Free web space index

It is quite likely that many of you have own web sites or are planning to 
launch one. One easy route is to avail the services of a free web space 
provider and build a site on its server. There are many free web space 
providers on the Net such as Tripod (http://www.tripod.com) and Netfirms 
(http://netfirms.com/) with varying tools/facilities. To get the latest 
information on the free web space providers, check out the search service 
`Free web hosting space finder' at: http://www.free-web-space-finder. com/

J. Murali

_________________________________________________________________
They are beautiful. They are in danger. 
http://server1.msn.co.in/Slideshow/BeautyoftheBeast/index.asp Our 
four-legged friends.

Aman Kumar Jha

tags

participants (1)