Link Checker

Feature Highlights of the Link Checker:

1 Activation steps

This feature is a custom tool that needs to be activated for your Gentics CMS license key. Please contact your account manager for details.

  1. Activate and configure the feature in the respective configuration files
  2. Configure a scheduler tasks for regularly checking external links in a background job
  3. Configure the Link Checker Custom Tool for the Editor User Interface
  4. Configure user group permissions for the Link Checker Custom Tool

2 Configuration

2.1 Feature activation

/Node/etc/conf.d/*.conf

$FEATURE["link_checker"] = true;

Once the feature is generally activated, it can be turned on and off for each node using the Backend user interface by a user with edit permission on the node:

  1. Choose Features from the Context Menu of the Node in the tree.
  2. Activate the checkbox next to link_checker
  3. Click OK to activate the feature

2.2 Additional configuration

The Link Checker can be configured with the following configuration options:

/Node/etc/conf.d/*.conf

$LINK_CHECKER = array(
	"history_length"     => 5,
	"notify"             => true,
	"debounce"           => 3,
	"read_buffer_size"   => 100,
	"check_buffer_size"  => 10,
	"update_buffer_size" => 100,
	"retry_after"        => 3600,
	"call_timeout"       => 60,
	"connect_timeout"    => 60,
	"write_timeout"      => 60,
	"read_timeout"       => 60
);
Parameter Description Default
history_length Number of check results, which are kept for each external link 5
notify Whether editors shall be notified if links turn invalid true
debounce Number of successive checks, that must be invalid, before the editor is notified (must be lower than history_length) 3
read_buffer_size Size of the buffer for external links while checking. Bigger buffer sizes increase performance at the cost of memory consumption 100
check_buffer_size Number of external links checked in parallel. Bigger buffer sizes increase performance at the cost of network traffic 10
update_buffer_size Size of the buffer for updating link check results 100
retry_after Number of seconds a specific host will not be checked again, if a request returns response code 429 (Too many requests) but the response does not contain a “Retry-After” header. See Handling 429 (Too Many Requests) responses 3600
call_timeout Timeout in seconds for the overall call for checking a link 60
connect_timeout Timeout in seconds to connect to the foreign host 60
write_timeout Timeout in seconds for writing the request to the foreign host 60
read_timeout Timeout in seconds for reading the response from the foreign host 60

The Aloha Editor plugin can be configured with the following options:

/Node/etc/conf.d/*.conf

$ALOHA_SETTINGS["plugins"]["gcnlinkchecker"] = array(
	"livecheck" => true,
	"delay"     => 500
);
Parameter Description Default
livecheck Whether links shall be checked live (during editing) true
delay Delay in milliseconds for checking an entered link 500

2.3 Scheduler Task

For automatic execution of the Link Checker, it is necessary to create a schedule for the internal linkcheker task.

To enable the Link Checker Custom Tool in the new Editor User Interface you must add the following to your CMS configuration:

Node/etc/conf.d/*.conf

$CUSTOM_TOOLS[] = array(
    "id" => 1, // or whatever ID you want this tool to have
    "key" => "linkchecker", // this must be the key for this Custom Tool!
    "toolUrl" => '/tools/link-checker/?sid=${SID}',
    "iconUrl" => "link", // Material Icon name or a URL
    "name" => array(
        "de" => "Link Checker",
        "en" => "Link Checker"
    ),
    "newtab" => false
);

For more information regarding Gentics CMS Custom Tools, please see Custom Tools

2.5 User Group permissions

In order to see the Link Checker Custom Tool in the Editor User Interface, you need to set the group permissions for the specific user groups accordingly. More details here.

2.6 Alerts and Alert Center in the Editor User Interface

In case broken links are found, the Editor User Interface will display a red exclamation mark on the top-right icon bar. Clicking on this icon will show an overview of the alerts in the User Profile Sidebar and offer shortcuts for the details.

External links are checked by making a HEAD request to the URL.

  • The check will follow redirects (validity of the final response will be checked)
  • The following response codes will be considered valid: 200 – 299, 401 (Unauthorized), 403 (Forbidden)
  • If the response has code 400 (Bad Request), 404 (Not Found) or 405 (Method Not Allowed), the check will be repeated with a GET request
  • All (insecure) SSL Certificates will be accepted for https requests.
  • URLs starting with mailto:, javascript:, file:, callto:, tel:, skype: or # are not checked.

For the Link Checker to successfully check external URLs, the GCMS Server must be allowed to make http/https requests to all checked hosts. If a proxy is required, the JVM must be started with the parameters -Dhttp.proxyHost=… -Dhttp.proxyPort=… -Dhttp.nonProxyHosts=….

4 Handling 429 (Too Many Requests) responses

Some servers limit the number of allowed requests and might send a response with status 429 (Too Many Requests). Such responses will not be considered invalid, but the URL will remain in the previous status (which may be “unchecked”).

The host of the URL will be blocked for some time: Either the number of seconds returned in the Retry-After header, or the configuration value retry_after (which defaults to 3600 seconds = 1 hour).

This means that both the live check of URLs and the full check triggered by the scheduler task might not be able to check all URLs.

The full check will sort the URLs by their last status update time, so that URLs, which have never been checked or have not been checked for the longest time will be checked first.