|
| ||||||||||
|
| ||||||||||
|
|
This page shows examples of all of the command line (and command file) inputs to Linklint. There are three main modes of operation:
- Local Site Checking
- Checks pages and links on your site locally, looking for files on the local file system. This is convenient for small sites that you build "at home" which will later be uploaded to an HTTP server. It can also be used for very simple sites that have little CGI.
- HTTP Site Checking
- Checks pages and links on your site by requesting pages via HTTP, just like a browser would. This mode is less efficient than just reading directly from the file system because Linklint must make a socket connection for each page/file and your web server most respond to each request.
- Remote URL Checking
- The two site checking options only check pages from a single host computer. Remote URL checking can check all of the links on your site that go to other sites or a list of specific URL's.
Input Files |
There are two types of input files that can be specified on the command line: @command_files which contain command line options and @@http_files which are parsed to find http:// URLs. Command files indicated with a single @ sign before the file name. Http files are indicated with two @ signs before the file name. [This means that the actual file name of command files cannot start with an @ sign. Oh well.]
Command Files linklint @command_file
Reads in command line arguments from command_file. Command files can be nested. Each command file is interpreted line by line. Empty lines and lines beginning with # are ignored. Lines that start with -anything can only contain command line arguments. You can have multiple arguments on one line, and arguments can take repeated parameters in command files only. Example:
# This is a sample command file # -host www.linklint.org -root /www/ -doc linkdoc -index index.html index.cgi
Reading Commands from STDIN linklint @ < command_file
linklint @STDIN < command_file
A plain @ sign or @STDIN will cause Linklint to read STDIN and treat it as a command file. This is useful if you want to run Linklint as a configurable CGI program. If no STDIN is available then Linklint will hang waiting for an end-of-file from STDIN. You can also use this mode to "interactively" feed commands to linklint. On Unix, terminate your input with ^D.
Files of Local Pages If you only want to check the links on one or two pages then just use the path to those pages (starting each path with "/") on the command line instead of /@:
linklint /first/page.html /second/page.html
If you have a long list of pages (on your site) that you want to have link checked (not just the existence of each page, but all of the links on each page) then put the path to each page in a command file and send that command file (with a leading @ sign:
linklint @local_pages
# local_pages # /first/page.html /second/page.html /third/page.html # etc.If the list of pages you want to check contains full URLs, it is very easy to write a little Perl program to strip off the scheme and host:
perl -ne "s{http://[^/]+}{} and print" full_links.in > rel_links.out
Files of Remote Links linklint @@http_file
Check the status of all http:// references that are found in http_file. Very forgiving in looking for links. If the file looks like a remoteX.txt file generated by Linklint then failed URLs will be cross referenced.linklint -doc linkdoc @@
When you specify @@ with no filename, Linklint will check all the http links found in the file linkdoc/remoteX.txt. You must specify a -doc directory. This is an easy way to recheck all of the remote links on your site.
Which Files to Check |
Linksets defined Whether you are doing a local site check or an HTTP site check, you specify which directories (presumably containing HTML files) to check with one or more linksets. A linkset uses two wildcard characters @ and #. Each linkset specifies one or more directories much like the standard * and ? wildcard characters are used to specify the characters in the names of files in one directory.
The @ character matches any string of characters (this kind of acts like "*"), and the # character (which is kind of like "?") matches any string of characters except "/" . The best way to understand how @ and # work is to look at a few examples:
the entire site /@ the homepage only (default) / files in the root directory only /# . . . and one directory down /#/# files in the sub directory only /sub/# files in the sub directory and below /sub/@ specific files /file1 /file2 ... specific subdirectories /sub1/@ /sub2/@ ... If you specify more than one linkset, files matching any of the linksets will be checked. HTML files that don't match any of the linksets will be skipped. Linklint will see if they exist but won't check any of their links.
Other File Selection Options -skip skipset
Skips HTML files that match skipset. Linklint will make sure these files exist but won't add any of their links to the list of files to check. Multiple skipsets are allowed, but each must be preceded with -skip on the command line. Skipsets use the same wildcard characters as linksets.-ignore ignoreset
Ignores files matching ignoreset. Linklint doesn't even check to see if these files exist. Multiple ignoresets are allowed, but each must be preceded with -ignore on the command line. Ignoresets use the same wildcard characters as linksets.-limit n
Limits checking to n HTML files (default 500). All HTML files after the first n are skipped.
Local Site Checking |
If you are developing HTML pages on a computer that does not have an http server, or if you are developing a simple site that does not use Server Redirection or extensive CGI, you should use local site checking.linklint /@
Checks all HTML files in the current directory and below. Assumes that the current directory is the server root directory so links starting with "/" default to this directory. You must specify /@ to check the entire site. See Which Files to Check for details.linklint -root dir /@
Checks all HTML files in dir and below. This is useful if you want to check several sites on the same machine or if you don't want to run Linklint in your public HTML directory.
Other Local Site Options -host hostname
By default Linklint assumes all links on your site that start with http:// are remote links to other sites. If you have absolute links to your own site, give Linklint your hostname and links starting with http://hostname will be treated as local files. If you specify -host hostname:port, only http links to this hostname and port will be treated as local files.-case
Makes sure that the filename (upper/lower) case used links inside of html tags matches the case used by the file system. This is for Windows only and is very handy if you are porting a site to a Unix host.-orphan
Checks all directories that contain files used on the site for unused (orphan) files.-index file
Uses file as the default index file instead of the default list used by Linklint. You can specify more than one file but each one must be preceded by -index on the command line. If a default index file is not found, Linklint uses a listing of the entire directory. See the Default File section for details.-map /a=[/b]
Substitutes leading /a with /b. For server-side image maps or to simulate Server Redirection.-no_warn_index Turns of the "index file not found" warning. Applies to local site checking only.
-no_anchors Tells Linklint to ignore named anchors. This could ease memory problems for people with large sites who are primarily interested in missing pages and not missing named anchors. This option works for both HTTP and local site checks.
HTTP Site Checking |
If you have a complicated site that uses lots of CGI or Server Redirection, you should use HTTP site checking. Even though an HTTP site check reads pages via your HTTP server, you will get the best performance if you do your checking on a machine that has a high speed connection to your server.linklint -http -host www.site.com /@
The -http flag tells Linklint to check HTML files on the site www.site.com via a remote http connection. You must specify a -host whenever you do an HTTP site check (otherwise Linklint won't where to get your pages). You can specify /@ to check the entire site. See Which Files to Check for details.
HTTP Site Check Options -http
This flag tells Linklint to perform an HTTP site check instead of a local site check. All files (except server side image maps) will be read via the HTTP protocol from your web server.-host hostname:port
If you include :port at the end of your hostname, Linklint uses this port for the HTTP site check.-password realm user:password
Uses user and password as authorization to enter password protected realm. Realms are named areas of a site that share a common set of usernames and passwords. If passwords are needed to check your site, Linklint will tell you which realms need passwords in warning messages. Enclose the realm in double quotes if it contains spaces. If no password is given for a specific realm, Linklint will try using the password for the "DEFAULT" realm if it was provided.-timeout t
Times out after t seconds (default 15) when getting files via http. Once data is received, an additional t seconds is allowed. The timeout is disabled on Windows machines since the Windows port of Perl does not support the alarm() function.-delay d
Delays d seconds between requests (default 0). If you want to remote check in the background you can set delay to a large number, and Linklint will spend most of its time sleeping.-local linkset
Gets files that match linkset locally. The default -local linkset is @.map (which matches any link ending in .map). This allows Linklint to follow links through server-side image maps. The default is ignored if you specify your own -local expressions. You need to specify the -root directory for this option to work propery.-map /a=[/b]
Substitutes leading /a with /b. For server-side image maps or to simulate Server Redirection.-no_query_string
Up until version 2.3.4, Linklint did not use query strings while doing HTTP site checks. Query strings were removed before making HTTP requests. As of 2.3.4 query strings in links are used in the requests. Use the -no_query_string flag to get back the "old" behavior.-http_header "Name:value"
Adds the HTTP header "Name: value" to all HTTP requests generated by Linklint. You will need to use quotation marks to hide spaces in the header line from the command line interpreter. Linklint will automatically add a space after the first colon if there is not one there already. Multiple (unique) header lines are allowed.-language zz
This option is only useful if you are checking a site that uses content negotiation to present the same URL in different languages. Creates an HTTP Request header of the form "Accept-Language: zz" that is included as part of all HTTP requests generated by Linklint. Multiple -language specifications are allowed. This will result in a single Accept-Language: header that lists all of the languages you have specified in alphabetical order. Some web sites can use this information to return pages to you in a specific language.If you need to get more complicated than this, use the more general purpose -http_header to create your own header. There is a partial list of language abbreviations (taken from Debian) included as part of the Linklint documentation.
Remote URL Checking |
A remote URL check is used to see if a remote URL exists (or has been recently modified). Links in the remote pages are not checked nor does Linklint look for named anchors in remote URLs.
Which URLs to check Remote URL checking can be used to check all of the "remote" links on your site (those that link to pages on other sites) or it can check a list of URLs. There are several ways to specify which remote URLs to check:
linklint http://somehost/file.html
Checks to see if /file.html exists on somehost. Multiple URLs can be entered on the command line, in an @commandfile, or in an @@httpfile. Every URL to be checked must begin with http://. This will disable site checking.linklint @@httpfile
Checks all the remote http URLs found in httpfile. Anything in the file starting with http:// is considered to be a URL. If the file looks like a remoteX.txt file generated by Linklint then all failed URLs will be cross referenced.linklint @@ -doc linkdoc
Assuming you have already done a site check and used "-doc linkdoc" to put all of your output files in the linkdoc directory, Linklint will check all the remote links that were found on your site and cross reference all failed URLs without doing a site check. You can use the -netmod or -netset flags to enable the status-cache.linklint -net
[site check options]
The -net flag tells Linklint to check all remote links after doing either a local or HTTP site check site. If you are having memory problems, don't use the -net option, instead use one of the @@ options above.
Other Remote URL Options -timeout t
Times out after t seconds (default 15) when getting files via http. Once data is received, an additional t seconds is allowed. The timeout is disabled on Windows machines since the Windows port of Perl does not support the alarm() function.-delay d
Delays d seconds between requests to the same host (default 0). This is a friendly thing to do especially if you are checking many links on the same host.-redirect
Checks for <meta> redirects in the headers of remote URLs that are html files. If a redirect is found it is followed. This feature is disabled if the status cache is used.-proxy hostname[:port]
Sends all remote HTTP requests through the proxy server hostname and the optional port. This allows you to check remote URLs or (new with version 2.3.1) your entire site from within a firewall that has an http proxy server. Some error messages (relating to host errors) may not be available through a proxy server.-concise_url
Turns off printing successful URLs to STDOUT during remote link checking.
Status Cache Options The Status Cache is a very powerful feature. It allows you to keep track of recent changes in all of the remote (off-site) pages you link to. You can then use the Linklint output files to quickly check changed pages to see if they still meet your needs.
The flags below make use of the status cache file linklint.url (kept in your HOME or LINKLINT directory). This file keeps track of the modification dates of all the remote URLs that you check.
-netmod
Operates just like -net but makes use of the status cache. Newly checked URLs will be entered in the cache. Linklint will tell you which (previously cached) URLs have been modified since the last -netset.-netset
Like -netmod but also resets the last modified status in the cache for all URLs that checked ok. If you always use -netset, modified URLs will be reported just once.-retry
Only checks URLs that have a host fail status in the cache. Sometimes a URL fails because its host is temporarily down. This flag enables you to recheck just those links. An easy way to recheck all the cached URLs with host failures is linklint @@ -retry. use linklint @@linkdoc/remoteX.txt -retry if you want failed URLs to be cross referenced.-flush
Removes all URLs from the cache that are not currently being checked. The -retry flag has no effect on which URLs are flushed.-checksum
Ensures that every URL that has been modified is reported as such. This flag can make the remote checking take longer. Many of the pages that require a checksum are dynamically generated and will always be reported as modified.-cache directory
Reads and writes the linklint.url cache file in this directory. The default directory is set by your LINKLINT or HOME environment variables.
Output Options |
No output files are generated by default, only progress and a brief summary of the results are printed to the screen. You can produce complete documentation (split up into separate files) in a -doc directory or put selected output in a single -out file or by redirecting the standard output to a file. See the Output File Specification section for a detailed description of all output files.
Multi File Output linklint -doc linkdoc
Sends all output to the linkdoc directory. The output is divided into separate .txt and .html files. Complete documentation is always produced regardless of the single file flags.The file index.txt contains an index to all the other files; index.html is an HTML version of the index. The index files for remote URL checking are ur_lindex.txt and url_index.html.
-textonly
Prevents any HTML files from being created in the -doc directory.-htmlonly
Erases redundant text files in the -doc directory after they have been used to create the HTML output files. The files remote.txt and remoteX.txt are not erased since they can be used by Linklint to recheck remote URLs.-docbase base
Overrides the default base expression used for directing a browser to the resources listed in the output HTML files. The base is prepended to local links in the output HTML files. This only affects the links in HTML output files, it has no effect on what is displayed in these files. Ordinarily this flag would only be used during a local site check to set the base to http://host.-output_frames
All HTML output data files are linked to from index.html. If you use this flag then the the data files will be opened up in a new frame (window) which can be handy in some cases since it always leaves the index.html file open in its own window.-output_index filename
The output index files were previously named linklint.txt and linklint.html. These have now been changed to index.txt and index.html. You can use the -output_index option to change this name back to linklint or to something else.-url_doc_prefix url/
By default, the output files associate with remote URL checking all start with "url". You can change this with the -url_doc_prefix option. If the url_doc_prefix contains a "/" character then the appropriate directory will be created (as a subdirectory of the -doc directory).-dont_output xxxx
Don't create output files that contain "xxxx". Can be repeated. Example: -dont_output "X$" will supress the output of all cross reference files.
Single File Output linklint -error > linklint.out
Lists all errors to linklint.out. Progress and summary information will not be included. You can get cross referenced lists with the -xref flag or lists sorted by the files containing errors with the -forward flag.linklint -error -out linklint.out
Lists all errors and a brief summary to linklint.out You can get cross referenced lists, etc., as in the example above.
-out file sends list output and summary information to file -list lists all found files, links, directories etc. -error lists missing files and other errors -warn lists all warnings -xref adds cross references to the lists -forward sorts lists by referring file
Debug and other Flags |
Debug Flags
-db1 debugs command line input and linkset expressions -db2 prints the name of every file that gets checked (not just HTML files) -db3 debugs HTML parser, prints out tags and resulting links -db4 debugs socket connection (kind of) -db5 not used -db6 details last-modified status for remote URLs (requires -netset or -netmod) -db7 prints brief debug information while checking remote URLs -db8 prints all http headers while checking remote URLs -db9 generates random http errors
Other Flags Use linklint with no command line arguments to get simple usage.
-version Gives version information. -help Lists a few simple examples of how to use Linklint. -help_all Lists all help (contained in program) including every input option. -quiet disables printing progress to the screen -silent disables printing summarys to the screen
|
| ||||||||||
|
| ||||||||||
|
|
© Copyright 1997 - 2001 James B. Bowlin |