Go to the previous, next chapter.
Hundreds of systems connected to Internet have file libraries, or archives, accessible to the public. Much of this consists of free or low-cost shareware programs for virtually every make of computer. If you want a different communications program for your IBM, or feel like playing a new game on your Amiga, you'll be able to get it from the Net.
But there are also libraries of documents as well. If you want a copy of a recent U.S. Supreme Court decision, you can find it on the Net. Copies of historical documents, from the Magna Carta to the Declaration of Independence are also yours for the asking, along with a translation of a telegram from Lenin ordering the execution of rebellious peasants. You can also find song lyrics, poems, even summaries of every "Lost in Space" episode ever made. You can also find extensive files detailing everything you could ever possibly want to know about the Net itself. First you'll see how to get these files; then we'll show you where they're kept.
The commonest way to get these files is through the file transfer protocol, or ftp. As with telnet, not all systems that connect to the Net have access to ftp. However, if your system is one of these, you'll be able to get many of these files through e-mail (see section Advanced E-mail).
Starting ftp is as easy as using telnet. At your host system's command line, type
ftp site.name
and hit enter, where "site.name" is the address of the ftp site you want to reach. One major difference between telnet and ftp is that it is considered bad form to connect to most ftp sites during their business hours (generally 6 a.m. to 6 p.m. local time). This is because transferring files across the network takes up considerable computing power, which during the day is likely to be needed for whatever the computer's main function is. There are some ftp sites that are accessible to the public 24 hours a day, though. You'll find these noted in the list of ftp sites.
How do you find a file you want, though?
Until a few years ago, this could be quite the pain -- there was no master directory to tell you where a given file might be stored on the Net. Who'd want to slog through hundreds of file libraries looking for something?
Alan Emtage, Bill Heelan and Peter Deutsch, students at McGill University in Montreal, asked the same question. Unlike the weather, though, they did something about it.
They created a database system, called archie, that would periodically call up file libraries and basically find out what they had available.
In turn, anybody could dial into archie, type in a file name, and see where on the Net it was available. Archie currently catalogs close to 1,000 file libraries around the world.
Today, there are three ways to ask archie to find a file for you: through telnet, "client" Archie program on your own host system or e-mail. All three methods let you type in a full or partial file name and will tell you where on the Net it's stored. If you have access to telnet, you can telnet to one of the following addresses: archie.mcgill.ca; archie.sura.net; archie.unl.edu; archie.ans.net; or archie.rutgers.edu. If asked for a log-in name, type
archie
and hit enter.
When you connect, the key command is prog, which you use in this form:
prog filename
followed by enter, where "filename" is the program or file you're looking for. If you're unsure of a file's complete name, try typing in part of the name. For example, `PKZIP' will work as well as `PKZIP201.EXE'. The system does not support DOS or Unix wildcards. If you ask archie to look for `PKZIP*', it will tell you it couldn't find anything by that name. One thing to keep in mind is that a file is not necessarily the same as a program -- it could also be a document. This means you can use archie to search for, say, everything online related to the Beetles, as well as computer programs and graphics files.
A number of Net sites now have their own archie programs that take your request for information and pass it onto the nearest archie database -- ask your system administrator if s/he has it online. These "client" programs seem to provide information a lot more quickly than the actual archie itself! If it is available, at your host system's command line, type
archie -s filename
where filename is the program or document you're looking for, and hit enter. The `-s' tells the program to ignore case in a file name and lets you search for partial matches. You might actually want to type it this way:
archie -s filename |more
which will stop the output every screen (handy if there are many sites that carry the file you want). Or you could open a file on your computer with your text-logging function.
The third way, for people without access to either of the above, is e-mail.
Send a message to
prog filename
where filename is the file you're looking for. You can ask archie to look up several programs by putting their names on the same "prog" line, like this:
prog file1 file2 file3
Within a few hours, archie will write back with a list of the appropriate sites.
In all three cases, if there is a system that has your file, you'll get a response that looks something like this:
Host sumex-aim.stanford.edu Location: /info-mac/comm FILE -rw-r--r-- 258256 Feb 15 17:07 zterm-09.hqx Location: /info-mac/misc FILE -rw-r--r-- 7490 Sep 12 1991 zterm-sys7-color-icons.hqx
Chances are, you will get a number of similar looking responses for each program. The "host" is the system that has the file. The "Location" tells you which directory to look in when you connect to that system. Ignore the funny-looking collections of r's and hyphens for now. After them, come the size of the file or directory listing in bytes, the date it was uploaded, and the name of the file.
Now you want to get that file.
Assuming your host site does have ftp, you connect in a similar fashion to telnet, by typing:
ftp sumex-aim.stanford.edu
(or the name of whichever site you want to reach). Hit enter. If the connection works, you'll see this:
Connected to sumex-aim.stanford.edu. 220 SUMEX-AIM FTP server (Version 4.196 Mon Jan 13 13:52:23 PST 1992) ready. Name (sumex-aim.stanford.edu:adamg):
If nothing happens after a minute or so, hit control-C to return to your host system's command line. But if it has worked, type
anonymous
and hit enter. You'll see a lot of references on the Net to "anonymous ftp." This is how it gets its name -- you don't really have to tell the library site what your name is. The reason is that these sites are set up so that anybody can gain access to certain public files, while letting people with accounts on the sites to log on and access their own personal files. Next, you'll be asked for your tpassword. As a password, use your e-mail address. This will then come up:
230 Guest connection accepted. Restrictions apply. Remote system type is UNIX. Using binary mode to transfer files. ftp>
Now type
ls
and hit enter. You'll see something awful like this:
200 PORT command successful. 150 Opening ASCII mode data connection for /bin/ls. total 2636 -rw-rw-r-- 1 0 31 4444 Mar 3 11:34 README.POSTING dr-xr-xr-x 2 0 1 512 Nov 8 11:06 bin -rw-r--r-- 1 0 0 11030960 Apr 2 14:06 core dr--r--r-- 2 0 1 512 Nov 8 11:06 etc drwxrwsr-x 5 13 22 512 Mar 19 12:27 imap drwxr-xr-x 25 1016 31 512 Apr 4 02:15 info-mac drwxr-x-- 2 0 31 1024 Apr 5 15:38 pid drwxrwsr-x 13 0 20 1024 Mar 27 14:03 pub drwxr-xr-x 2 1077 20 512 Feb 6 1989 tmycin 226 Transfer complete. ftp>
Ack! Let's decipher this Rosetta Stone.
First, ls is the ftp command for displaying a directory (you can actually use dir as well, but if you're used to MS-DOS, this could lead to confusion when you try to use dir on your host system, where it won't work, so it's probably better to just remember to always use ls for a directory while online).
The very first letter on each line tells you whether the listing is for a directory or a file. If the first letter is a `d', or an `l', it's a directory. Otherwise, it's a file.
The rest of that weird set of letters and dashes consist of "flags" that tell the ftp site who can look at, change or delete the file. You can safely ignore it. You can also ignore the rest of the line until you get to the second number, the one just before the date. This tells you how large the file is, in bytes. If the line is for a directory, the number gives you a rough indication of how many items are in that directory -- a directory listing of 512 bytes is relatively small. Next comes the date the file or directory was uploaded, followed (finally!) by its name.
Notice the `README.POSTING' file up at the top of the directory. Most archive sites have a "read me" document, which usually contains some basic information about the site, its resources and how to use them. Let's get this file, both for the information in it and to see how to transfer files from there to here. At the ftp> prompt, type
get README
and hit enter. Note that ftp sites are no different from Unix sites in general: they are case-sensitive. You'll see something like this:
200 PORT command successful. 150 Opening BINARY mode data connection for README (4444 bytes). 226 Transfer complete. 4444 bytes received in 1.177seconds (3.8 Kbytes/s)
And that's it! The file is now located in your home directory on your host system, from which you can now download it to your own computer. The simple `get' command is the key to transferring a file from an archive site to your host system.
If the first letter on the line starts with a `d', then that is a directory you can enter to look for more files. If it starts with an `r', then it's a file you can get. The next item of interest is the fifth column, which tells you how large the item is in bytes. That's followed by the date and time it was loaded to the archive, followed, finally, by its name. Many sites provide a `README' file that lists simple instructions and available files. Some sites use files named `Index' or `INDEX' or something similar.
If you want to download more than one file at a time (say a series of documents, use mget instead of get; for example:
mget *.txt
This will transfer copies of every file ending with .txt in the given directory. Before each file is copied, you'll be asked if you're sure you want it. Despite this, mget could still save you considerable time -- you won't have to type in every single file name.
There is one other command to keep in mind. If you want to get a copy of a computer program, type
bin
and hit enter. This tells the ftp site and your host site that you are sending a binary file, i.e., a program. Most ftp sites now use binary format as a default, but it's a good idea to do this in case you've connected to one of the few that doesn't.
To switch to a directory, type
cd directory-name
(substituting the name of the directory you want to access) and hit enter. Type
ls
and hit enter to get the file listing for that particular directory. To move back up the directory tree, type
cd ..
(note the space between the d and the first period) and hit enter. Or you could type
cdup
and hit enter. Keep doing this until you get to the directory of interest. Alternately, if you already know the directory path of the file you want (from our friend archie), after you connect, you could simply type
get directory/subdirectory/filename
On many sites, files meant for public consumption are in the pub or public directory; sometimes you'll see an info directory.
Almost every site has a bin directory, which at first glance sounds like a bin in which interesting stuff might be dumped. But it actually stands for "binary" and is simply a place for the system administrator to store the programs that run the ftp system. Lost+found is another directory that looks interesting but actually never has anything of public interest in them.
Before, you saw how to use archie. From our example, you can see that some system administrators go a little berserk when naming files. Fortunately, there's a way for you to rename the file as it's being transferred. Using our archie example, you'd type
get zterm-sys7-color-icons.hqx zterm.hqx
and hit enter. Instead of having to deal constantly with a file called `zterm-sys7-color-icons. hqx', you'll now have one called, simply, `zterm.hqx'.
Those last three letters bring up something else: Many program files are compressed to save on space and transmission time. In order to actually use them, you'll have to use an un-compress program on them first.
There are a wide variety of compression methods in use. You can tell which method was used by the last one to three letters at the end of a file. Here are some of the more common ones and what you'll need to un-compress the files they create (and these decompression programs can all be located through archie).
.txt
.TXT
.ps
.PS
.doc
.DOC
.Z
.zip
.ZIP
.gz
.zoo
.ZOO
.Hqx
.hqx
.shar
.Shar
.tar
.TAZ
.sit
.Sit
.ARC
.LHZ
A few last words of caution: Check the size of a file before you get it. The Net moves data at phenomenal rates of speed. But that 500,000-byte file that gets transferred to your host system in a few seconds could take more than an hour or two to download to your computer if you're using a 2400-baud modem. Your host system may also have limits on the amount of bytes you can store online at any one time. Also, although it is really extremely unlikely you will ever get a file infected with a virus, if you plan to do much downloading over the Net, you'd be wise to invest in a good anti-viral program, just in case.
System administrators are like everybody else -- they try to make things easier for themselves. And when you sit in front of a keyboard all day, that can mean trying everything possible to reduce the number of keys you actually have to hit each day.
Unfortunately, that can make it difficult for the rest of us.
Connect to many ftp sites, and one of the entries you'll often see is a directory named `bin'.
You might think this is a bin where interesting things get thrown. It's not. "Bin" is short for "binary," i.e., the programs that make the ftp site work, to which you won't have access anyway.
Etc is another seemingly interesting directory that turns out to be another place to store files used by the ftp site itself. `lost+found' directories are used by Unix systems for some routine housekeeping -- again, nothing of any real interest.
Then, once you get into the actual file libraries, you'll find that in many cases, files will have such non-descriptive names as `V1.1-AK.TXT'. The best known example is probably a set of several hundred files known as RFCs, which provide the basic technical and organizational information on which much of the Internet is built. These files can be found on many ftp sites, but always in a form such as `RFC101.TXT', `RFC102.TXT' and so on, with no clue whatsoever as to what information they contain.
Fortunately, almost all ftp sites have a "Rosetta Stone" to help you decipher these names. Most will have a file named `README' (or some variant) that gives basic information about the system. Then, most directories will either have a similar `README' file or will have an index that does give brief descriptions of each file. These are usually the first file in a directory and often are in the form `00INDEX.TXT'. Use the ftp command to get this file. You can then scan it online or download it to see which files you might be interested in.
Another file you will frequently see is called `ls-lgR.Z'. This contains a listing of every file on the system, but without any descriptions (the name comes from the Unix command `ls -lgR', which gives you a listing of all the files in all your directories). The `.Z' at the end means the file has been compressed, which means you will have to use a Unix un-compress command before you can read the file.
And finally, we have those system administrators who almost seem to delight in making things difficult -- the ones who take full advantage of Unix's ability to create absurdly long file names. On some FTP sites, you will see file names as long as 80 characters or so, full of capital letters, underscores and every other orthographic device that will make it almost impossible for you to type the file name correctly when you try to get it. Your secret weapon here is the mget command. Just type mget, a space, and the first five or six letters of the file name, followed by an asterisk, for example:
mget This_F*
The FTP site will ask you if you want to get the file that begins with that name. If there are several files that start that way, you might have to answer `n' a few times, but it's still easier than trying to recreate a ludicrously long file name.
What follows is a list of some interesting ftp sites, arranged by category. With hundreds of ftp sites now on the Net, however, this list barely scratches the surface of what is available. Liberal use of archie will help you find specific files.
The times listed for each site are in Eastern time and represent the periods during which it is considered acceptable to connect.
gatekeeper.dec.com Recipes are in the `pub/recipes' directory.
ftp.netcom.com The `pub/profiles' directory has lists of ftp sites.
nptn.org The General Accounting Office (GAO) is the investigative wing of Congress. The `pub/e.texts/gao.reports' directory represents an experiment by the agency to use ftp to distribute its reports. Available 24 hours.
ra.msstate.edu Mississippi State maintains an eclectic database of historical documents, detailing everything from Attilla's battle strategy to songs of soldiers in Vietnam, in the `docs/history' directory. 6 p.m. - 6 a.m.
seq1.loc.gov The Library of Congress has acquired numerous documents from the former Soviet government and has translated many of them into English. In the `pub/soviet.archive/text. english' directory, you'll find everything from telegrams from Lenin ordering the death of peasants to Khrushhchev's response to Kennedy during the Cuban missile crisis. The `README' file in the `pub/soviet.archive' directory provides an index to the documents. 6 p.m. - 6 a.m.
nic.ddn.mil The `internet-drafts' directory contains information about Internet, while the `scc' directory holds network security bulletins. 6 p.m. - 6 a.m.
ftp.uu.net Supreme Court decisions are in the court-opinions directory. You'll want to get the index file, which tells you which file numbers go with which file names. The decisions come in Word Perfect and Atex format only. Available 24 hours a day.
world.std.com The `obi' directory has everything from online fables to accounts of Hiroshima survivors. 6 p.m. - 6 a.m.
ftp.uu.net Carries copies, or "mirrors" of Macintosh programs from the Simtel20 collection in the `systems/mac/simtel20' directory. Available 24 hours a day.
ftp.uu.net Carries copies, or "mirrors" of MS-DOS programs from the Simtel20 collection in the `systems/msdos/simtel20' directory. Available 24 hours a day.
SITES 1528 Other music-related FTP archive sites classical/ - (dir) Classical Buying Guide database/ - (dir) Music Database program discog/ = (dir) Discographies faqs/ = (dir) Music Frequently Asked questions files folk/ - (dir) Folk Music Files and pointers guitar/ = (dir) Guitar TAB files from ftp.nevada.edu info/ = (dir) rec.music.info archives interviews/ - (dir) Interviews with musicians/groups lists/ = (dir) Mailing lists archives lyrics/ = (dir) Lyrics Archives misc/ - (dir) Misc files that don't fit anywhere else pictures/ = (dir) GIFS, JPEGs, PBMs and more. press/ - (dir) Press Releases and misc articles programs/ - (dir) Misc music-related programs for various machines releases/ = (dir) Upcoming USA release listings sounds/ = (dir) Short sound samples 226 Transfer complete. ftp>
When you switch to a directory, don't include the `/'. 7 p.m. - 7 a.m.
potemkin.cs.pdx.edu The Bob Dylan archive. Interviews, notes, year-by-year accounts of his life and more, in the `pub/dylan' directory. 9 p.m. - 9 a.m.
ftp.nevada.edu Guitar chords for contemporary songs are in the `pub/guitar' directory, in subdirectories organized by group or artist.
ftp.cs.widener.edu The `pub/simpsons' directory has more files than anybody could possibly need about Bart and family. The `pub/strek' directory has files about the original and Next Generation shows as well as the movies. See also under Science Fiction section Science Fiction.
pit-manager.mit.edu This site contains all available FAQs "frequently asked questions" files for Usenet newsgroups in the `pub/usenet' directory. For easy access, get the `index' file. See under Books section Books for a caveat in using this ftp site. 6 p.m. - 6 a.m.
ftp.germany.eu.net Germany's backbone site located at the University of Dortmund, in the European part of the Internet; the so-called EUnet. It's also Germany's default server for X window system releases, and also "mirrors" several important sites; e.g. in `pub/packages/gnu' the GNU project's default server. Furthermore you'll find "mirrors" of `386BSD', `NetBSD', and `Linux'. Available 24 hours.
Liberal use of archie will help you find specific files or documents. For information on new or interesting ftp sites, try the comp.archives newsgroup on Usenet. You can also look in the comp.misc, comp.sources.wanted or news.answers newsgroups on Usenet for lists of ftp sites posted every month by Tom Czarnik and Jon Granrose.
The comp.archives newsgroup carries news of new ftp sites and interesting new files on existing sites.
In the comp.virus newsgroup on Usenet, look for postings that list ftp sites carrying anti-viral software for Amiga, MS-DOS, Macintosh, Atari and other computers.
The comp.sys.ibm.pc.digest and comp.sys.mac.digest newsgroups provide information about new MS-DOS and Macintosh programs as well as answers to questions from users of those computers.
"Welch ein Ort zum Plündern!" (What a place to plunder!) --- General Gebhard Leberecht von Blücher
Go to the previous, next chapter.