You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
atom 66f58193dd minor edit 3 months ago
LICENSE Upload files to '' 3 months ago
blc.sh Upload files to '' 3 months ago
readme.md minor edit 3 months ago

readme.md


CONTENTS

  1. INTRODUCTION
  2. FEATURES
  3. DEPENDENCIES
  4. DOWNLOAD
  5. CONFIGURATION
  6. USAGE
  7. HISTORY

INTRODUCTION

Broken Link Checker (BLC) is a Bash shell script that can be used to check for broken HTTP links which it extracts from a text file, such as a WordPress post/page XML export file or a file containing a list of URL's.

FEATURES

  • BLC tries to be server-friendly when making queries
  • broken links are saved to a text file along with the HTTP status code and redirect URL if there was one
  • several configuration options are available for fine-tuning how the script works

DEPENDENCIES

  • bash
  • cURL
  • mapfile
  • pcre2

DOWNLOAD

ZIP archive: https://codeberg.org/12bytes.org/broken-link-checker/archive/main.zip

TAR.GZ archive: https://codeberg.org/12bytes.org/broken-link-checker/archive/main.tar.gz

CONFIGURATION

Configuration options are contained in an INI-like format within the blc.sh script in the "## BEGIN USER DEFINED VARIABLES" section.

BE VERY CAREFUL to not make any syntax errors when editing the options. It is recommended to check for syntax errors using ShellCheck (https://www.shellcheck.net/).

Each option consists of a key-value pair, the key being the part before the '=' character and the value being the part after it. There are 2 types of values for each key: integer and string. each key type can be identified by its first letter, either i or s. Integer values must not be quoted. Floating point numbers (1.2) are not allowed. All string values must be contained within single quotes. Following are examples of properly formatted key-value pairs:

iInteger=1

sString='value'

The "sPCREUrlMatch" option is of particular importance since this is the regular expression used to extract URL's from the source file. The value for this option is a Perl Compatible Regular Expression (PCRE). For PCRE syntax and testing see:

syntax: https://www.pcre.org/original/doc/html/pcrepattern.html

testing: https://regexr.com/ (be sure to set the expression language to PCRE2)

or: https://regex101.com/

The default value for the "sPCREUrlMatch" option should be suitable for WordPress XML export files and possibly other XML files as well.

If you want to check a list of URL's contained in a text file, you can set the "sPCREUrlMatch" value to:

'(https?://\S+)'

USAGE

To run the script, open a terminal and CD to the script directory, then run:

$ ./blc.sh -h

Broken links are saved to the "brokenlinks.txt" file.

If you're running a ClassicPress or WordPress site, you can export all posts and/or pages and feed the files directly to BLC without making any changes to them or the scripts configuration options.

If the current domain being queried is the same as the last domain, BLC will delay the request by default in order to avoid the potential of having your IP address blocked by the server.

HISTORY

Broken Link Checker was written because of my frustration with the WordPress plugin of the same name. For a very long time the WordPress plugin has been buggy and has not received the attention it deserved. After the developer promised to issue an update to fix some non-trivial issues, then doing nothing for approximately 2 years, i decided to write my own script. While this script does not offer nearly the feature set of the WordPress plugin, it a) works, and b), works well enough for my needs. I hope it helps you too.