General command-line parsing solution without using getopt[s]

I never liked using getopts since it’s limited to parsing short options. Extending it would require placing everything else under *). getopt on the other hand is an external tool and requires output to be expanded using eval or compgen. These are the reasons why I choose not to use them and simply choose to use the simple parsing mechanism that uses the while-case-shift loop instead.

The following code is a simple example of it.

#!/bin/bash

function show_help_info {
	echo "Usage: $0 [options] [--] [file ...]

Options:
  -l, --log-file logfile  Set target logfile
  -h, --help              Show this help info"
}

function fail {
	printf '%s\n' "$1" >&2
	exit "${2-1}"
}

function main {
	local files=() log_file=() error_file=() verbose_mode=false

	while [[ $# -gt 0 ]]; do
		case $1 in
		-h|--help)
			show_help_info
			return 2
			;;
		-l|--logfile)
			[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
			log_file=$2
			shift
			;;
		-e|--error-file)
			[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
			error_file=$2
			shift
			;;
		-v|--verbose)
			verbose_mode=true
			;;
		--)
			files+=("${@:2}")
			break
			;;
		-?*)
			fail "Invalid option: $1"
			;;
		*)
			files+=("$1")
			;;
		esac

		shift
	done

	...
}

main "$@"

This code however does not allow multiple options to be merged in a single argument; neither does it allow optargs to be placed directly next to the options. It also doesn’t allow optargs of long options to be specified using the --long-option=arg format.

While asking for opinion about a normalizer function in #bash, geirha shared to me an idea of splitting merged options as they are encountered. The shared code was conceptually presented like this.

function main {
	local files=() log_file=() error_file=() verbose_mode=false

	while [[ $# -gt 0 ]]; do
		case $1 in
		-h|--help)
			show_help_info
			return 2
			;;
		-l|--logfile)
			[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
			log_file=$2
			shift
			;;
		-e|--error-file)
			[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
			error_file=$2
			shift
			;;
		-v|--verbose)
			verbose_mode=true
			;;
		--)
			files+=("${@:2}")
			break
			;;
		-[el]*)
			set -- "${1:0:2}" "${1:2}" "${@:2}"
			continue
			;;
		-[!-][!-]*)
			set -- "${1:0:2}" "-${1:2}" "${@:2}"
			continue
			;;
		--logfile=*|--error-file=*)
			set -- "${1%%=*}" "${1#*=}" "${@:2}"
			continue
			;;
		-?*)
			fail "Invalid option: $1"
			;;
		*)
			files+=("$1")
			;;
		esac

		shift
	done

	...
}

It works but I’m not contented with it because it requires options to be specified in two more places like -[el]* and --logfile=*|--error-file=*.

The expressions for the long options can also be not simplified to just --*=* as it will allow options like --verbose which has no argument to be identified separately from its supposedly invalid argument just in case --verbose=<invalid_argument> has been specified. Once the argument is split and the loop reiterates, invalid_argument will be recognized as a file argument instead.

Thankfully I came up with the idea of using a helper function in the option’s condition block to unify the expressions.

...

function get_opt_and_optarg {
	OPT=$1 OPTARG= OPTSHIFT=0

	if [[ $1 == -[!-]?* ]]; then
		OPT=${1:0:2} OPTARG=${1:2}
	elif [[ $1 == --*=* ]]; then
		OPT=${1%%=*} OPTARG=${1#*=}
	elif [[ ${2+.} ]]; then
		OPTARG=$2 OPTSHIFT=1
	else
		fail "No argument specified to '$1'." # Or 'return 1'
	fi
}

function main {
	local files=() log_file=() error_file=() verbose_mode=false

	while [[ $# -gt 0 ]]; do
		case $1 in
		-h|--help)
			show_help_info
			return 2
			;;
		-l*|--logfile|--logfile=*) # Or simply '--logfile?(=*)' when extglob is enabled
			get_opt_and_optarg "${@:1:2}"
			log_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		-e*|--error-file|--error-file=*)
			get_opt_and_optarg "${@:1:2}"
			error_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		-v|--verbose)
			verbose_mode=true
			;;
		--)
			files+=("${@:2}")
			break
			;;
		-[!-][!-]*)
			set -- "${1:0:2}" "-${1:2}" "${@:2}"
			continue
			;;
		-?*)
			fail "Invalid option: $1"
			;;
		*)
			files+=("$1")
			;;
		esac

		shift
	done

	...
}

main "$@"

It can also be extended to allow optional arguments.

...

function get_opt_and_optarg {
	local optional=false

	if [[ $1 == @optional ]]; then
		optional=true
		shift
	fi

	OPT=$1 OPTARG= OPTSHIFT=0

	if [[ $1 == -[!-]?* ]]; then
		OPT=${1:0:2} OPTARG=${1:2}
	elif [[ $1 == --*=* ]]; then
		OPT=${1%%=*} OPTARG=${1#*=}
	elif [[ ${2+.} && (${optional} == false || $2 != -?*) ]]; then
		OPTARG=$2 OPTSHIFT=1
	elif [[ ${optional} == true ]]; then
		return 1
	else
		fail "No argument specified for '$1'."
	fi

	return 0
}

DEFAULT_LOG_FILE=${0##*/}.log # Just a concept

function main {
	local files=() log_mode=false verbose_mode=false

	# Initialize log_file as an empty array so ${log_file} is null.
	# Alternatively it can be initialized to a default value.

	local log_file=()

	while [[ $# -gt 0 ]]; do
		case $1 in
		-h|--help)
			show_help_info
			return 2
			;;
		-l*|--log|--log=*)
			log_mode=true
			get_opt_and_optarg @optional "${@:1:2}" && log_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		--logfile|--logfile=*)
			get_opt_and_optarg "${@:1:2}"
			log_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		-e*|--error-file|--error-file=*)
			get_opt_and_optarg "${@:1:2}"
			error_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		-v|--verbose)
			verbose_mode=true
			;;
		--)
			files+=("${@:2}")
			break
			;;
		-[!-][!-]*)
			set -- "${1:0:2}" "-${1:2}" "${@:2}"
			continue
			;;
		-?*)
			fail "Invalid option: $1"
			;;
		*)
			files+=("$1")
			;;
		esac

		shift
	done

	if [[ ${log_mode} == true ]]; then
		# Determine if log file has been specified and validate it.

		if [[ ${log_file+.} ]]; then
			[[ -z ${log_file} ]] && fail "Invalid log file specified."
		else
			log_file=${DEFAULT_LOG_FILE}
		fi

		...
	fi

	...
}

...

Note that my method for parsing optional arguments deviates from the the usual convention of UNIX tools like sed and getopt which only allows optional arguments of short options (and even long options) to be specified directly next to the option, and not with a space.

I chose not to follow the convention because it complicates parsing and syntax documentation. It’s also less intuitive because it allows normal arguments to be specified with a space but it doesn’t when it comes to optional arguments. It also doesn’t allow an empty string to be assigned as an explicit argument, simply because an empty string itself is the one used to tell that the argument has not been specified. The option will always have a string value and there really isn’t a true way to specify a “no argument”. TLDR, I dislike its inconsistency.

The better way to do it instead is to allow optargs to be specified with a space but exclude arguments that look like an option (i.e. -?*), just like how I did it above.

This will allow all forms of non-nil arguments to be specified through the --long-option=arg format including empty strings and arguments that look like an option just like how it’s done in the non-optional form, while at the same time allow options to be specified without an argument through the -o or --long-option format.

One known program that follows this method is Ruby’s OptionParser.

If we run its example code with -t and "$(date +%F)" as arguments separated by a space, we can see in the output that the date string is recognized as an argument and is assigned as a parsed value to time. Running the script with -t -v or --time --verbose on the other hand will assign nil as a value instead, and not just an empty string. This means that the parser recognizes that no argument was specified.

An empty string argument to -t on the other hand will cause a not RFC 2616 compliant date: "" exception because the empty string is recognized as an argument but it is not a valid time string. Same thing happens with ruby example.rb --time "" or ruby example.rb --time=.

Thanks again to geirha for sharing the idea of splitting options as they are encountered.

The examples above are written in Bash but should be easy to convert to POSIX versions.

For a real working example, see tail-follow-grep.bash.