Advertisement
Wget can be got as part of the <a href="www.weihenstephan.de/~syring...</a>.
I have found this utility very useful in conjunction with my <a href="geocities.com/n8chz/td070...ml">web "to do" lists</a>.
I come to the library with a /wget directory on my USB key.
In this directory is wget.exe, wget.hlp and wget.GID (the last of which appears to have been generated by wget).
Also is a file called 'webtodo.bat,' which consists of the following single line:
<code>wget -e robots=off -rH geocities.com/n8chz/td070602.html -l1 -p -T30 -t5</code>
'robots=off' causes wget to disregard robot exclusions, which is bad netiquette, but is a practical
necessity for vagrant netizens, since we often get only one shot at an online session, with weeks between such
opportunities, and <a href="en.wikipedia.org/">wi...ia</a>, <a href="alltheweb.com/">alltheweb</a>
and many other very important sites exclude wget.
The greed of 'robots=off' is offset by the modesty of '-l1' which instructs wget to go only one level deep.
'-T30' and '-t5' prevent wget from waiting indefinitely, since connect time is often of the essence for the
vagrant netizen. '-p' makes sure you (usually) get your pages with pictures, and '-rH' gives you recursion (r)
and spanning of hosts (H). Without the -H option you only get files from the host at which wget is aimed.
At the library, I pull up 'My Documents,' select (in my case) "Travel Drive (F:\)" (your brand and drive letter may vary),
select folder 'wget,' right mouse|edit file 'webtodo.bat,' providing the URL of the current web "to do" list.
Close 'webtodo.bat,' saving changes, then double click its icon. It runs in a command line window, which disappears
(in XP, anyway) on exit. I typically check my web-based email while the wget batch runs.
Back home, on my own computer, I do offline reading of my bulk downloads.
This can be frustrating due to several shortcomings in wget:
1. URL pathnames are replicated as local disk pathnames, so html files tend to be leaf nodes in the directory tree.
2. Un-suffixed URL filenames are saved as un-suffixed files on the USB disk.
So far, the most efficient method I have found is as follows:
1. Start|Search|For Files or Folders, or right mouse|Search... on the \wget folder icon.
2. Look in: F:\wget (or whatever drive letter the USB gets assigned to)
3. Containing text: <html
4. In 'Search Results,' rename (F2) files with .html extension as necessary.
5. If picture frames come up empty, try editing source, replacing (Ctrl-H) <code>src="http://</code> with <code>src="../</code>
6. Save (in Notepad) and Refresh (in Firefox or other browser) after any html edits. Save pages to hard disk for future reference
as desired.
7. Re-run search utility for non-html files such as .txt and .pdf
All in all, a pretty cumbersome process, but in the offline reading. Makes much more productive use of connect time
than manually clicking through to do lists, and can actually retrieve several weeks' worth of reading material in a half hour,
making some measure of boredom relief and alternative (noncommercial) reading material possible without paying actual $ for residential
internet access.
Needless to say, suggestions for and discussion of other retrieval agents, and their usefulness on public access computers,
are always welcome
here at the Vagrant Netizen tribe!
I have found this utility very useful in conjunction with my <a href="geocities.com/n8chz/td070...ml">web "to do" lists</a>.
I come to the library with a /wget directory on my USB key.
In this directory is wget.exe, wget.hlp and wget.GID (the last of which appears to have been generated by wget).
Also is a file called 'webtodo.bat,' which consists of the following single line:
<code>wget -e robots=off -rH geocities.com/n8chz/td070602.html -l1 -p -T30 -t5</code>
'robots=off' causes wget to disregard robot exclusions, which is bad netiquette, but is a practical
necessity for vagrant netizens, since we often get only one shot at an online session, with weeks between such
opportunities, and <a href="en.wikipedia.org/">wi...ia</a>, <a href="alltheweb.com/">alltheweb</a>
and many other very important sites exclude wget.
The greed of 'robots=off' is offset by the modesty of '-l1' which instructs wget to go only one level deep.
'-T30' and '-t5' prevent wget from waiting indefinitely, since connect time is often of the essence for the
vagrant netizen. '-p' makes sure you (usually) get your pages with pictures, and '-rH' gives you recursion (r)
and spanning of hosts (H). Without the -H option you only get files from the host at which wget is aimed.
At the library, I pull up 'My Documents,' select (in my case) "Travel Drive (F:\)" (your brand and drive letter may vary),
select folder 'wget,' right mouse|edit file 'webtodo.bat,' providing the URL of the current web "to do" list.
Close 'webtodo.bat,' saving changes, then double click its icon. It runs in a command line window, which disappears
(in XP, anyway) on exit. I typically check my web-based email while the wget batch runs.
Back home, on my own computer, I do offline reading of my bulk downloads.
This can be frustrating due to several shortcomings in wget:
1. URL pathnames are replicated as local disk pathnames, so html files tend to be leaf nodes in the directory tree.
2. Un-suffixed URL filenames are saved as un-suffixed files on the USB disk.
So far, the most efficient method I have found is as follows:
1. Start|Search|For Files or Folders, or right mouse|Search... on the \wget folder icon.
2. Look in: F:\wget (or whatever drive letter the USB gets assigned to)
3. Containing text: <html
4. In 'Search Results,' rename (F2) files with .html extension as necessary.
5. If picture frames come up empty, try editing source, replacing (Ctrl-H) <code>src="http://</code> with <code>src="../</code>
6. Save (in Notepad) and Refresh (in Firefox or other browser) after any html edits. Save pages to hard disk for future reference
as desired.
7. Re-run search utility for non-html files such as .txt and .pdf
All in all, a pretty cumbersome process, but in the offline reading. Makes much more productive use of connect time
than manually clicking through to do lists, and can actually retrieve several weeks' worth of reading material in a half hour,
making some measure of boredom relief and alternative (noncommercial) reading material possible without paying actual $ for residential
internet access.
Needless to say, suggestions for and discussion of other retrieval agents, and their usefulness on public access computers,
are always welcome
here at the Vagrant Netizen tribe!
Advertisement
Advertisement