|
| ||||||||||
|
| ||||||||||
|
|
This page provides a detailed description of every output file created by Linklint.
-Doc Directory |
If a -doc directory is specified, all output files are written in that directory. The directory will be created if it does not already exist. There are two sets of files that are created, one is for site checks and the other is for remote url checks.Even though new files are written only as needed, all previously written files from each set are erased before new files from the set are written. Each set is independent of the other. The site-check files will get erased only if you do another site check. The url-check files will be erased only if you check remote URLs again.
Many of the output files now come in HTML and text versions. The HTML versions look like the text versions but contain hyper-links to the resources they refer to. Files that come in both versions are listed without an extension.
Some of the output files are part of a family. Members of the family are related using the simple mnemonic of a trail X meaning cross referenced and a trailing F meaning forward referenced. For example: error lists all missing files, errorX lists the missing files and all HTML files that reference them, and errorF lists all HTML files that contain links to missing files along with the names of the missing files.
Site-Check Output Files |
During local site checking, local links ending in "/" are kept as-is to maintain consistency with HTTP site results. However, if a default index file was found, it is listed below the link in square brackets. If a directory listing is used because no default file was found, this information is listed in the brackets.Any link that Linklint knows has been redirected (either by the http server or with the -map option), gets an extra line in the output files showing the original link it was mapped from in parentheses. See the Server Redirection section for more details.
Site-Check Summary Files
index.html
a hyperlinked index to all site-check files created. On Windows machines this file will be named linklint.htm. This list differs from the summary list in two ways. First, it contains entries for every file written including cross referenced lists and forward referenced lists that are missing in the summary. Second, it lacks the detailed divisions by file type and external link schema that exist in the summary.index.txt
text version of the index file.summary.txt
summary of site-check results, similar to what gets printed to the screen. Every non-empty data family (listed below) will cause a line to be printed in the summary. In addition, external links are listed by the following schema: http, https, ftp, javascript, mailto, gopher, file, news, view-source, about, unknown. Found and missing files are listed by file type:
cgi starting with "/cgi-bin" or containing "?" or .cgi or .pl default index ending with "/" HTML ending with .htm .html or .shtml map ending with .map image ending with .gif .jpg .jpeg .tif .tiff .pic .pict .hdf .ras .xbm text ending with .txt audio ending with .au .snd .wav .aif .aiff .midi .mid video ending with .mpg .mpeg .avi .qt .mov shockwave ending with .dcr applet ending with .class other all other files These distinctions sort files (mostly) by extension, they are not true indications of the tag the link was found inside, nor of the MIME format returned by your server.
log.txt
log of site-check progress, similar to what gets printed to the screen but also includes extra output created when -db flags are used.
Site-Check Data Files
action, actionX
a list of all ignored actions. Any link found inside a <form action=LINK> tag is listed here, but it is not checked since normally extra input from the form is required.anchor, anchorX
a list of all named anchors found. The anchorX file lists all files that use each named anchor. If a named anchor is not referenced by any of the files checked by Linklint, it will have no cross references. Named maps are also included in this list.case, caseX, caseF
If -case was used during a local site check on a Windows machine, all files that have references that do not match the (upper/lower) case of the file are listed here. This is handy if you are developing a site on a Windows machine that you are planning to port to a Unix server.dir.txt
a list of all the directories that contain files used by the site. Only created during local site checking.error, errorX, errorF
a list of all missing files. A file is listed as missing if:See the Parsing Html section for details.
- it is referenced by a file checked by Linklint, and
- it is a local file, and
- it does not match any -ignore expression, and
- Linklint could not find the file locally, or an error occurred when Linklint tried to get the file via http.
errorA, errorAX
a list of missing named anchors. A named anchor is missing if:
- it is referenced by a file checked by Linklint, and
- it is located inside of a local file, and
- the file it is in is not -ignored or -skipped, and
- Linklint could not find the file, or the file was found but the named anchor does not exist.
errorM, errorMX
a list of missing named client-side image maps. The rules for inclusion in this list are the same as those for missing named anchors.file, fileX, fileF
a list all files found on the site. file is a list of all files found sorted by file type; fileX is a cross referenced version, showing a sublist of all the HTML files that reference each file in the list; fileF lists each HTML file and a sublist of all of the links it references. These lists are meant to show file dependencies so multiple links to the same file result in a single listing. Likewise, a named anchor causes the file containing the anchor to be listed, but the actual named anchors are listed separately.httpfail
a list of all the http errors that occurred while trying to remote check a site.httpok
a list of all the files that were obtained without error while remote checking a site.If the -db6 flag is used and the status-cache is enabled, these entries are expanded to include the following extra information:
- ok (200)
- ok parsed HTML
- ok skipped
ignore, ignoreX
a list of all ignored files.imgmap, imgmapX
a list of named image maps for client-side image maps, taken from tags <img usemap=NAME> and <map name=NAME>. The format is in this list is the same as the one used for named anchors.mapped
a list of all the redirected files that were found while checking a site either locally or remotely.Some servers automatically change the name of a link. For example a link to http://host/subdir will get automatically mapped to http://host/subdir if subdir is a directory. Many other mappings are possible. Linklint will follow these mappings and treat the resulting file as the actual link (just like a browser would).
This is a potential cause for confusion. If your index.html file has a link to A.html and this gets mapped to B.html, Linklint will tell you that index.html has a link to A.html and that it has a link to B.html, but only B.txt will be listed as a found file.
See the Server Redirection section for more details.
orphan
If the -orphan flag was used during a local site check, all of the unused files and subdirectories in each directory that contain files used by the site, are listed here sorted by directory. If an orphan HTML file contains a meta refresh tag redirecting the visitor to a different file, this new file is listed under its parent preceded by " =>". This method of redirection is often used to steer visitors to the current version of a file without requiring them to change their bookmarks.remote, remoteX
a list of all references that are not to local files. The remoteX file lists which HTML files link to these resources.skipped, skipX
a list of all files that were skipped by Linklint. These are generally HTML files which were found to exist but were not checked further because:
- they did not match any of the linksets specified, or
- they did match one of the -skip expressions, or
- more than -limit files had already been checked.
warn, warnX, warnF
a list of all warnings that were generated during the site check. Warnings include: unexpected I/O errors, HTML errors such as unterminated comments, missing index files (during local site checks), space characters inside of links, the use of "\" inside of links, files that are not world readable, mappings that cause infinite loops, and meta refresh tags that redirect to relative URLs.For HTTP site checking warnings are also generated for: files disallowed by robots.txt, files mapped to a different server, files that require a username and password (and none was provided), an invalid username and password, and files mapped to non-http schemes.
Url-Check Output Files |
You may notice that all of the files in this section start with the prefix url. You can change this prefix with the -url_doc_prefix option. The default value is still url for backward compatibility, but I now prefer either url/ or url_.
Url-Check Summary Files
urlindex.html
a hyperlinked index to all url-check files created. On Windows machines this file will be named urlindex.htm. This list differs from the summary list in two ways. First, it contains entries for every file written including: host failures, cross referenced lists, and forward referenced lists. Second, it lacks the detailed divisions by failure type that exist in the summary.urlindex.txt
text version of the index file.urlsum.txt
summary of url-check results, similar to what gets printed to the screen. The summary list includes an entry for every type of warning or failure.urllog.txt
log of url-check progress, similar to what gets printed to the screen but also includes extra output created when -db flags are used.
Url-Check Data Files
urlfail, urlfailX, urlfailF
a list of URLs that failed due to one of the following errors:
- could not find ip address
- could not connect to host
- all timeout errors
- had no content (204)
- bad request (400)
- access forbidden (403)
- not found (404)
- internal server error (500)
- service not implemented on server (501)
- server temporarily overloaded (502)
- gateway timeout (503)
urlhost.txt
a list of all hosts that had failures during the url check. This includes:URLs that fail due to host failures can be retried with the -retry flag if the status-cache was enabled.
- could not find ip address
- could not connect to host
- could not open socket
- malformed status line
- timeout errors
- server overloaded (502)
- gateway timeout (503)
urlmod
a list of all URLs that have changed since the last time they were checked with the -netset flag. This list is only generated if -netmod or -netset are specified.urlmoved
a list of URLs that were reported as redirected by their server. Often this redirection involves nothing more than adding a trailing "/" to a directory name. Sometimes it can be a precursor to a site changing location permanently. Some servers report that the url has been moved temporarily, others will say that the url has been moved permanently. A far as I can tell there is no real distinction between "temporary" and "permanent", they seem to be used interchangeably.urlok
a list of all URLs that were found with no errors. If the -db6 flag is used and the status-cache is enabled, these entries are expanded to include the following extra information:
- ok (200)
- ok not modified (304)
- ok last-modified date unchanged
- ok did not compute checksum
- ok checksum matched
urlskip
a list of URLs that were not checked. This is most often caused by the lack of password authorization but could also be due to to an exceptional condition such as an infinite redirect loop or an unknown internal Linklint error.urlwarn, urlwarnX, urlwarnF
a list of warning messages generated while Linklint was doing a url-check. The most common warning tells you the name of a realm which requires a username and password.
|
| ||||||||||
|
| ||||||||||
|
|
© Copyright 1997 - 2001 James B. Bowlin |