General command-line parsing solution without using getopt[s]

I never liked using getopts since it’s limited to parsing short options. Extending it would require placing everything else under *). getopt on the other hand is an external tool and requires output to be expanded using eval or compgen. These are the reasons why I choose not to use them and simply choose to use the simple parsing mechanism that uses the while-case-shift loop instead.

The following code is a simple example of it.

#!/bin/bash

function show_help_info {
	echo "Usage: $0 [options] [--] [file ...]

Options:
  -l, --log-file logfile  Set target logfile
  -h, --help              Show this help info"
}

function fail {
	printf '%s\n' "$1" >&2
	exit "${2-1}"
}

function main {
	local files=() log_file=() error_file=() verbose_mode=false

	while [[ $# -gt 0 ]]; do
		case $1 in
		-h|--help)
			show_help_info
			return 2
			;;
		-l|--logfile)
			[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
			log_file=$2
			shift
			;;
		-e|--error-file)
			[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
			error_file=$2
			shift
			;;
		-v|--verbose)
			verbose_mode=true
			;;
		--)
			files=("${@:2}")
			break
			;;
		-?*)
			fail "Invalid option: $1"
			;;
		*)
			files+=("$1")
			;;
		esac

		shift
	done

	...
}

main "[email protected]"

This code however does not allow multiple options to be specified merged in a single argument; neither does it allow optargs to be placed directly next to the options. It also doesn’t allow optargs of long options to be specified using the --long-option=arg format.

While asking for opinion about a normalizer function in #bash, geirha shared to me the idea of splitting merged options every time they are encountered. The shared code was initially different but conceptually it should look like this.

		...
		--)
			files=("${@:2}")
			break
			;;
		-[el]*)
			set -- "${1:0:2}" "${1:2}" "${@:2}"
			continue
			;;
		-[!-][!-]*)
			set -- "${1:0:2}" "-${1:2}" "${@:2}"
			continue
			;;
		--logfile=*|--error-file=*)
			set -- "${1%%=*}" "${1#*=}" "${@:2}"
			continue
			;;
		-?*)
			fail "Invalid option: $1"
			;;
		...

It works well but I’m not contented with it because it requires option names to be specified in two more places.

The long option expressions can also not be simplified to simply --*=* as it will allow no-argument options like --verbose to be identified separately from its argument. The argument will then be recognized as a file argument which isn’t right.

Thankfully I came up with an idea of using a helper function in the option’s condition block instead.

...

function get_opt_and_optarg {
	OPT=$1 OPTARG= OPTSHIFT=0

	if [[ $1 == -[!-]?* ]]; then
		OPT=${1:0:2} OPTARG=${1:2}
	elif [[ $1 == --*=* ]]; then
		OPT=${1%%=*} OPTARG=${1#*=}
	elif [[ ${2+.} ]]; then
		OPTARG=$2 OPTSHIFT=1
	else
		fail "No argument specified to '$1'." # Or 'return 1'
	fi
}

function main {
	local files=() log_file=() error_file=() verbose_mode=false

	while [[ $# -gt 0 ]]; do
		case $1 in
		-h|--help)
			show_help_info
			return 2
			;;
		-l*|--logfile|--logfile=*) # Or simply '-l*|--logfile?(=*))' when extglob is enabled
			get_opt_and_optarg "${@:1:2}"
			log_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		-e*|--error-file|--error-file=*)
			get_opt_and_optarg "${@:1:2}"
			error_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		-v|--verbose)
			verbose_mode=true
			;;
		--)
			files=("${@:2}")
			break
			;;
		-[!-][!-]*)
			set -- "${1:0:2}" "-${1:2}" "${@:2}"
			continue
			;;
		-?*)
			fail "Invalid option: $1"
			;;
		*)
			files+=("$1")
			;;
		esac

		shift
	done

	...
}

main "[email protected]"

The best part of it is that it can be extended to allow optional arguments.

...

function get_opt_and_optarg {
	local optional=false

	if [[ $1 == @optional ]]; then
		optional=true
		shift
	fi

	OPT=$1 OPTARG= OPTSHIFT=0

	if [[ $1 == -[!-]?* ]]; then
		OPT=${1:0:2} OPTARG=${1:2}
	elif [[ $1 == --*=* ]]; then
		OPT=${1%%=*} OPTARG=${1#*=}
	elif [[ ${2+.} && (${optional} == false || $2 != -?*) ]]; then
		OPTARG=$2 OPTSHIFT=1
	elif [[ ${optional} == true ]]; then
		return 1
	else
		fail "No argument specified for '$1'."
	fi

	return 0
}

function main {
	local files=() log_mode=false verbose_mode=false
	local log_file=${0##*/}.log # Just a conceptual default

	while [[ $# -gt 0 ]]; do
		case $1 in
		-h|--help)
			show_help_info
			return 2
			;;
		-l*|--log|--log=*)
			log_mode=true
			get_opt_and_optarg @optional "${@:1:2}" && log_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		--logfile|--logfile=*)
			get_opt_and_optarg "${@:1:2}"
			log_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		-e*|--error-file|--error-file=*)
			get_opt_and_optarg "${@:1:2}"
			error_file=${OPTARG}
			shift "${OPTSHIFT}"
			;;
		-v|--verbose)
			verbose_mode=true
			;;
		--)
			files=("${@:2}")
			break
			;;
		-[!-][!-]*)
			set -- "${1:0:2}" "-${1:2}" "${@:2}"
			continue
			;;
		-?*)
			fail "Invalid option: $1"
			;;
		*)
			files+=("$1")
			;;
		esac

		shift
	done

	...
}

...

Note that this method deviates from the the usual convention of UNIX tools like sed and getopt which only allows optional arguments of short options (and even long options) to be specified directly next to the option, and not with a space.

I chose not to follow this convention because it complicates parsing and syntax documentation. It’s also less intuitive because it allows normal arguments to be specified with a space but not when it comes to optional arguments. It also doesn’t allow an empty string to be assigned as an explicit argument, simply because an empty string itself is the one used to tell that the argument has not been specified. The option will always have a string value and there really isn’t a true way to specify a “no argument”.

The better way to do it instead is to allow optargs to be specified with a space but exclude arguments that look like an option (i.e., -?*), just like how I dit it above.

This will allow all forms of non-nil arguments to be specified through the --long-option=arg format including empty strings and arguments that look like an option just like how it’s done in the non-optional form, while at the same time it will also allow options to be specified without an argument through the -o or --long-option format.

One known program that implements this method is Ruby’s OptionParser.

If we run its example code with -t and "$(date +%F)" as arguments separated by a space, we can see in the output that the date string is recognized as an argument and is assigned as a parsed value to time. Running the script with -t -v or --time --verbose on the other hand will assign nil as a value instead, and not just an empty string. This means that the parser recognizes that no argument was specified.

An empty string argument to -t on the other hand will cause a not RFC 2616 compliant date: "" exception because the empty string is recognized as an argument but it is not a valid time string. Same thing happens with ruby example.rb --time "" or ruby example.rb --time=.

Thanks again to geirha for sharing the idea of splitting options as they are encountered.

The examples above are written in Bash but should be easy to convert to POSIX versions.

For a real working example, see tail-follow-grep.bash.

Back to top