weblock.io — Robust Linking and Archiving of Websites Cited in Scholarly, Scientific and Legal Documents
Weblock is currently in beta. You can already use the service, but things may still change a bit.
How to Read Weblock Link Reports
Summary: On top of the report you will find the summary of total number of citations, citations with web links and the percentage of obsolete links (links that are broken for different reasons).
Visualization: Right below you will find the visualization of failed links versus links that resolved fine, a top-level summary of http response classes, and more detailed summary of http response codes for failed links.
Log File: Further down on the report you will find the log file with details for each URL that was found in the references list and tested by the WEBLOCK service. In particular, you will find the current http status code that was returned when trying to access the link. The http status codes 2xx indicate valid links. Status codes 3xx, 4xx and 5xx indicate obsolete links - links that either vanished or need additional, case-specific actions to perform properly. All links with status codes 3xx, 4xx or 5xx are thus considered obsolete links. The status code 444 is not an official status code and is used by our service to indicate links that timed out, i.e. links where the server took too long to respond or where no server responded at all - the latter typically happens when a website is completely gone and the host can not be resolved to an IP address via the DNS system.
HTTP status codes according to RFC 2616 and additional status codes used by WEBLOCK:
- 2xx Successful: "This class of status code indicates that the client's request was successfully received, understood, and accepted."
- 3xx Redirection: "This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request."
- 4xx Client Error: "The 4xx class of status code is intended for cases in which the client seems to have erred."
- 5xx Server Error: "Response status codes beginning with the digit "5" indicate cases in which the server is aware that it has erred or is incapable of performing the request."
- 444 this status code is not specified in RFC 2616. It is used in two cases within WEBLOCK: (1) the host could not be resolved via the DNS system (e.g. the domain is no longer registered by anyone, there is no IP address specified for the domain, or there is no server connected to the specified IP address); and (2) the connection to the server timed-out, the server took too long to respond, or the connection speed dropped below a minimum acceptable rate. Specifically, this error code is generated when:
- host could not be resolved via the DNS system
- connection to the host could not be established within 5 seconds
- host did not respond within 10 seconds
- transfer speed dropped below 1000 bytes/second for at lest 15 seconds
- FTP Note on ftp links: ftp servers do not respond with http status codes. For links to ftp servers, we register the status code 200 if the specified file was found on the ftp server, or status code 400 if the specified file was not found on the ftp server.