The Edit Tags function edits a set of web documents, changing all tags to upper/lower case. It can operate on:
The Check Links function lists and/or checks all hyperlinks in a set of web documents. It can operate on:
The Check Metatags function lists all metatags in a set of web documents. It can operate on:
The Submit URL function submits a URL to a set of search engines, and
may be used to promote a website.
Details of search engines are contained in a data file, referred to as the
Search Engine Database. See Section 2.7.4.2 for details.
Note: a "web document" is a file written in a web markup language (such as HTML, SHTML, XML). These are plain text languages employing plain text tags. In fact, some of the functions may be used to operate on plain text files which contain other markup languages (eg SGML) or plain text files which do not contain markup language.
1. INTRODUCTION
1.1 Overview
1.2 Contents
1.3 Document References
1.4 Document Control
1.5 Terms and Terminology
1.6 Copyright
2. WEBTOOLBOX: INTRODUCTION AND OVERVIEW
3. WEBTOOLBOX: USER GUIDE
3.1 Introduction
3.2 Invocation and Arguments
3.3 Action: Edit Strings
3.4 Action: Edit Tags
3.5 Action: Check Links
3.6 Action: Check Metatags
3.7 Action: Submit URL
4. WEBTOOLBOX: ADMINISTRATOR GUIDE
4.1 Introduction
4.2 Product Distribution & Contents
4.3 Installing and Configuring WebToolbox
4.4 The Perl Script
4.5 Restrictions and Deficiencies
4.6 Test Status
4.7 Errors and Diagnostics
[1] Learning Perl (2nd Edition), Schwartz & Christiansen, O'Reilly & Associates, Inc. [2] Perl Cookbook, Christiansen & Torkington, O'Reilly & Associates, Inc.
The script may or may not be provided in source code form. This source code is the intellectual property of Beaumont Systems Ltd., and, as such, is subject to copyright and legal protection. It may not be copied or redistributed, in whole or in part, without the express permission of Beaumont Systems Ltd.
webtool.pl [-e] [-xd xxx] [-xf xxx] [-xt] webtool.pl [-t] [-tl] [-xd xxx] [-xf xxx] webtool.pl [-l] [-lo xxx] [-lg] [-lt ttt] [-xt] [-xr] webtool.pl [-m] [-xf xxx] [-xt] webtool.pl [-s] [-sd xxx] [-xr]The first argument (-e,-t,-l,-m,-s) specifies the basic action, and one of these must be selected. There is no default action.
The arguments are as follows...
| Argument | Used with | Use | Default |
|---|---|---|---|
| -e | - | Action: Edit Strings (edit set of web documents to replace strings). | - |
| -t | - | Action: Edit Tags (edit set of web documents to change tags to upper/lower case). | - |
| -l | - | Action: Check Links (in set of web documents). | - |
| -m | - | Action: Check Metatags (in set of web documents). | - |
| -s | - | Action: Submit URL (to set of search engines). | - |
| -tl | -t | Specifies the use of lower case. | Upper case |
| -ta | -t | Specifies that attributes (types and values) are to be converted also. | Tag types only |
| -lo xxx | -l | Specifies the extent of checking, where xxx is one of...
|
Check all links |
| -lg | -l | Report good links (...as well as bad). | Report bad links only |
| -lt ttt | -l | Set timeout value = ttt seconds. | 20 |
| -sd xxx | -s | Specifies the filename of the search engine database. | se.dat |
| -xd xxx | - | Specifies the root directory. | - |
| -xf xxx | - | Specifies the input filename. | - |
| -xt | -e,-l,-m | Operation on ALL text files. | Some text files only (web documents) |
| -xr | - | Restart log file. (Previous contents discarded). | Append |
| -v | - | Specifies verbose mode. | - |
| -h | - | Requests help. | - |
Use WebToolbox with no arguments or webtool -h to obtain help information.
Upon invocation, WebToolbox prompts the user for further parameters, depending on the action selected. See the following subsections for further information.
This function is particularily useful for maintaining links in a collection of web documents (HTML documents).
Directory or filename ? : String (to be found/replaced) ? : String (new) ? :A directory or filename can be specified via the command-line arguments -xd and -xf respectively. The first prompt (Directory or filename ? :) appears when neither -xd nor -xf is used, or when one is used but the name given is found not to be a valid directory or filename.
| Before | After | Arguments |
|---|---|---|
| <center> | <CENTER> | (none*) |
| <hr noshade> | <HR noshade> ...or... ><HR NOSHADE> |
(none) -ta |
| <body bgcolor=white text="black"> | <BODY bgcolor=white text="black"> ...or... <BODY bgcolor=WHITE TEXT="BLACK"> |
(none) -ta |
| <h1>WEBTOOLBOX<br>Administrator and User Guide</h1> | <H1>WEBTOOLBOX<BR>Administrator and User Guide</H1> | (none*) |
* Note: -ta not relevant here, because there are no tag attributes present.
Directory or filename ? :A directory or filename can be specified via the command-line arguments -xd and -xf respectively. The first prompt (Directory or filename ? :) appears when neither -xd nor -xf is used, or when one is used but the name given is found not to be a valid directory or filename.
There are a number of other arguments that may be used in conjunction with
-l. See Table 2-1 for a summary.
Use -lo to specify/restrict the extent of checking, ie to list links
(without verification), check some links only (local files) or check all
links.
Use -lg to report good links (as well as bad).
Use -lt ttt to alter the timeout value.
Use -xr to restart the log file.
URL/local directory/local filename ? :The response may take one of three forms:
The output takes the following form when listing links (using -lo 0) in a local file...
--------------------
WEBTOOLBOX started. Wed Sep 15 23:38:56 1999.
Filename: /usr/arf/web/html/andyf/aflinks.html, # links: 46
Link: tagtype=a, name=href, value=http://cm.bell-labs.com/cm/cs/who/dmr/st.html.
Link: tagtype=a, name=href, value=http://playground.sun.com/pub/ipng/html/INET-IPng-Paper.html.
Link: tagtype=a, name=href, value=http://www.nexor.co.uk/public/rfc/index/rfc.html.
Link: tagtype=a, name=href, value=http://www.cs.utah.edu/csinfo/texinfo/gnats/gnats.html.
Link: tagtype=a, name=href, value=http://www.hwg.org/resources/html/index.html.
Link: tagtype=a, name=href, value=http://WWW.Stars.com/Authoring/HTML/. (Full URL: http://www.stars.com/Authoring/HTML/).
Link: tagtype=a, name=href, value=http://www.gamelan.com/.
Link: tagtype=a, name=href, value=http://www.cgi-resources.com/.
Link: tagtype=a, name=href, value=http://www.perl.com/.
Link: tagtype=a, name=href, value=http://www.iconbazaar.com/.
Link: tagtype=a, name=href, value=http://www.clipart.co.uk/.
Link: tagtype=a, name=href, value=http://www.econ.cbs.dk/~gemal/urlheaven/index.html.
Link: tagtype=a, name=href, value=http://www.data.com/.
Link: tagtype=a, name=href, value=http://www.byte.com/.
The output takes the following form when checking links in a remote web page
(HTML document)...
--------------------
WEBTOOLBOX started. Thu Sep 16 11:11:57 1999.
URL: http://members.netscapeonline.co.uk/beaumontsystems. URL OK, base=http://members.netscapeonline.co.uk/beaumontsystems/, contenttype=text/html, # links: 7
Link: tagtype=a, name=href, value=http://members.netscapeonline.co.uk/beaumontsystems/products/uwipnifss.html.
ERROR: Problem accessing http://members.netscapeonline.co.uk/beaumontsystems/products/uwipnifss.html, response code=404, message=Not Found
Link: tagtype=a, name=href, value=http://members.netscapeonline.co.uk/beaumontsystems/products/netdbov.html.
ERROR: Problem accessing http://members.netscapeonline.co.uk/beaumontsystems/products/netdbov.html, response code=404, message=Not Found
If you anticipate a large number of links, you may prefer to run WebToolbox during a less busy period (perhaps overnight).
HTML meta tags contain meta-information about the page itself. This information is not displayed normally (when a person views the web page with a browser), but applications (and this includes search engines) can, and do, make use of it.
--------------------
WEBTOOLBOX started. Thu Nov 25 14:13:24 1999.
Action: 3.
Filename: xfileov.html
[TagName] copyright [TagValue] (c) 1999 Beaumont Systems Ltd.
[TagName] description [TagValue] XFILE, a utility for managing files
[TagName] keywords [TagValue] XFILE, file utility, link verification, Perl
Filename: netdbov.html
[TagName] copyright [TagValue] (c) 1999 Beaumont Systems Ltd.
[TagName] keywords [TagValue] NETDB, network publication, network application, CGI
Filename: wtbov.html
[TagName] copyright [TagValue] (c) 1999 Beaumont Systems Ltd.
[TagName] description [TagValue] WEBTOOLBOX, a utility for website management
[TagName] keywords [TagValue] WEBTOOLBOX, website management, link verification, URL submission, website promotion, search engine, Perl
Filename: wtb.html
[TagName] copyright [TagValue] (c) 1999 Beaumont Systems Ltd.
Filename: xfile.html
[TagName] copyright [TagValue] (c) 1999 Beaumont Systems Ltd.
Filename: wtbov_g.html
[TagName] copyright [TagValue] (c) 1999 Beaumont Systems Ltd.
[TagName] description [TagValue] WEBTOOLBOX, a utility for website management
[TagName] keywords [TagValue] WEBTOOLBOX, website management, link verification, URL submission, website promotion, search engine, Perl
Filename: wtbov_f.html
[TagName] copyright [TagValue] (c) 1999 Beaumont Systems Ltd.
[TagName] description [TagValue] WEBTOOLBOX, a utility for website management
[TagName] keywords [TagValue] WEBTOOLBOX, website management, link verification, URL submission, website promotion, search engine, Perl
|
|
|
|
|
|
|
|
There are a number of other arguments that may be used in conjunction with
-s. See Table 2-1 for a summary.
Use -sd xxx to specify a search engine database other than the
default one (se.dat).
Use -xr to restart the log file.
Use -xf xxx to provide the submission parameters in an input file
(rather than interactively).
Search engines: Information relating to the search engines in contained in a separate data file, referred to as the Search Engine Database. See Section 3.7.4.2.
Submission parameters: The search engines require various parameters relating to submission. See Section 3.6.4.1. There are two way to provide these parameters: interactively (the default method, not using -xf xxx) or via an input file (using -xf xxx).
URL to be submitted ? : E-mail to be submitted ? : Name to be submitted ? : Title to be submitted ? : Description to be submitted ? : Keywords to be submitted ? :If invoked with the -xf xxx argument, it reads the specified input file in order to obtain the parameters.
The output takes the following form...
--------------------
WEBTOOLBOX started. Sun Nov 14 09:39:22 1999.
Action: 3.
Number of search engines: 40/40/0.
URL: http://www.qwerty.com
Submitting URL to search engine (Altavista)...successful.
Submitting URL to search engine (Excite)...successful.
Submitting URL to search engine (Hotbot)...successful.
Submitting URL to search engine (Infoseek)...successful.
Submitting URL to search engine (Lycos)...successful.
Submitting URL to search engine (Webcrawler)...successful.
Submitting URL to search engine (Search United Kingdom)...successful.
Submitting URL to search engine (Excite United Kingdom)...successful.
Submitting URL to search engine (Lycos United Kingdom)...successful.
Submitting URL to search engine (Cyber Britain United Kingdom)...successful.
Submitting URL to search engine (Acoon Germany)...successful.
Submitting URL to search engine (Altavista Germany)...successful.
Submitting URL to search engine (Blitzsuche Germany)...successful.
Submitting URL to search engine (Infoseek Germany)...successful.
Submitting URL to search engine (Lotse Germany)...successful.
Skipping search engine (Rex Germany), requires description.
Submitting URL to search engine (Spider Germany)...successful.
Submitting URL to search engine (Voila France)...successful.
Submitting URL to search engine (Info Tiger)...successful.
Submitting URL to search engine (Aeiwi)...successful.
Submitting URL to search engine (Anzwers)...successful.
Submitting URL to search engine (Canada)...FAILED!
(Response Code=404, Message=Object Not Found)
Submitting URL to search engine (Claymont)...successful.
Submitting URL to search engine (Crawler Germany)...successful.
Submitting URL to search engine (E-Special Germany)...successful.
Submitting URL to search engine (Euro Ferret)...successful.
Submitting URL to search engine (Excite Australia)...successful.
Submitting URL to search engine (Funky Cat)...successful.
Submitting URL to search engine (Google)...successful.
Submitting URL to search engine (ICQ-It)...successful.
Submitting URL to search engine (Infomak)...successful.
Submitting URL to search engine (Magellan)...successful.
Submitting URL to search engine (Northern Light)...successful.
Skipping search engine (REX), requires description.
Submitting URL to search engine (Sear)...successful.
Submitting URL to search engine (Surf Gopher)...successful.
Submitting URL to search engine (UK MAX Search)...successful.
Submitting URL to search engine (Voila)...successful.
Submitting URL to search engine (What-U-Seek)...successful.
Submitting URL to search engine (World Search Engine)...FAILED!
(Response Code=500, Message=read timeout, chunk 6.)
Number of SE entries in database: 40
Number of valid entries: 40
Number of valid entries with missing parameters: 2
Number of invalid entries: 0
Number of SEs contacted: 38
Number of SEs skipped: 2
Number of successful submissions: 36
Number of unsuccessful submissions: 2
The above example shows: (mostly) successful submissions; some failed submissions;
some search engines being skipped because of insufficient parameters; at the end,
a summary (statistics).
The -sd xxx argument makes it possible to create and use your own search engine database. You might want to do this to: define a specific subset of search engines (eg the main ones, or country-specific ones); define a single entry, for test purposes.
[URL]xxx [EMAIL]xxx [NAME]xxx [TITLE]xxx [DESCRIPTION]xxx [KEYWORDS]xxxIn all cases, 'xxx' is an appropriate value. The URL parameter is mandatory, the e-mail address is optional but recommended and other parameters are optional. However, you should note that the more parameters you provide, the more search engines that webtool will be able to contact.
| Filename | Type | Use | Notes |
|---|---|---|---|
| wtb.pl | application | - | (Executable) Perl script |
| wtb.exe | application | - | Win32 executable |
| se.dat | data | Search engine database. Contains information on all search engines. |
Maintain as necessary. |
| wtb.html | documentation | Administrator and User Guide (this document) | - |
WebToolbox has been tested on the following platforms: