General command-line parsing solution without using getopt[s]
I never liked using getopts
since it’s limited to parsing short options. Extending it would
require placing everything else under *)
. getopt
on the other hand is an external tool and
requires output to be expanded using eval
or compgen
. These are the reasons why I choose not
to use them and simply choose to use the simple parsing mechanism that uses the while-case-shift
loop instead.
The following code is a simple example of it.
#!/bin/bash
function show_help_info {
echo "Usage: $0 [options] [--] [file ...]
Options:
-l, --log-file logfile Set target logfile
-h, --help Show this help info"
}
function fail {
printf '%s\n' "$1" >&2
exit "${2-1}"
}
function main {
local files=() log_file=() error_file=() verbose_mode=false
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
show_help_info
return 2
;;
-l|--logfile)
[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
log_file=$2
shift
;;
-e|--error-file)
[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
error_file=$2
shift
;;
-v|--verbose)
verbose_mode=true
;;
--)
files+=("${@:2}")
break
;;
-?*)
fail "Invalid option: $1"
;;
*)
files+=("$1")
;;
esac
shift
done
...
}
main "$@"
This code however does not allow multiple options to be merged in a single argument; neither does
it allow optargs to be placed directly next to the options. It also doesn’t allow optargs of long
options to be specified using the --long-option=arg
format.
While asking for opinion about a normalizer function in #bash
, geirha
shared to me an idea of
splitting merged options as they are encountered. The shared code was conceptually presented like
this.
function main {
local files=() log_file=() error_file=() verbose_mode=false
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
show_help_info
return 2
;;
-l|--logfile)
[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
log_file=$2
shift
;;
-e|--error-file)
[[ -z ${2+.} ]] && fail "No argument specified to '$1'."
error_file=$2
shift
;;
-v|--verbose)
verbose_mode=true
;;
--)
files+=("${@:2}")
break
;;
-[el]*)
set -- "${1:0:2}" "${1:2}" "${@:2}"
continue
;;
-[!-][!-]*)
set -- "${1:0:2}" "-${1:2}" "${@:2}"
continue
;;
--logfile=*|--error-file=*)
set -- "${1%%=*}" "${1#*=}" "${@:2}"
continue
;;
-?*)
fail "Invalid option: $1"
;;
*)
files+=("$1")
;;
esac
shift
done
...
}
It works but I’m not contented with it because it requires options to be specified in two more
places like -[el]*
and --logfile=*|--error-file=*
.
The expressions for the long options can also be not simplified to just --*=*
as it will allow
options like --verbose
which has no argument to be identified separately from its supposedly
invalid argument just in case --verbose=<invalid_argument>
has been specified. Once the argument
is split and the loop reiterates, invalid_argument
will be recognized as a file argument instead.
Thankfully I came up with the idea of using a helper function in the option’s condition block to unify the expressions.
...
function get_opt_and_optarg {
OPT=$1 OPTARG= OPTSHIFT=0
if [[ $1 == -[!-]?* ]]; then
OPT=${1:0:2} OPTARG=${1:2}
elif [[ $1 == --*=* ]]; then
OPT=${1%%=*} OPTARG=${1#*=}
elif [[ ${2+.} ]]; then
OPTARG=$2 OPTSHIFT=1
else
fail "No argument specified to '$1'." # Or 'return 1'
fi
}
function main {
local files=() log_file=() error_file=() verbose_mode=false
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
show_help_info
return 2
;;
-l*|--logfile|--logfile=*) # Or simply '--logfile?(=*)' when extglob is enabled
get_opt_and_optarg "${@:1:2}"
log_file=${OPTARG}
shift "${OPTSHIFT}"
;;
-e*|--error-file|--error-file=*)
get_opt_and_optarg "${@:1:2}"
error_file=${OPTARG}
shift "${OPTSHIFT}"
;;
-v|--verbose)
verbose_mode=true
;;
--)
files+=("${@:2}")
break
;;
-[!-][!-]*)
set -- "${1:0:2}" "-${1:2}" "${@:2}"
continue
;;
-?*)
fail "Invalid option: $1"
;;
*)
files+=("$1")
;;
esac
shift
done
...
}
main "$@"
It can also be extended to allow optional arguments.
...
function get_opt_and_optarg {
local optional=false
if [[ $1 == @optional ]]; then
optional=true
shift
fi
OPT=$1 OPTARG= OPTSHIFT=0
if [[ $1 == -[!-]?* ]]; then
OPT=${1:0:2} OPTARG=${1:2}
elif [[ $1 == --*=* ]]; then
OPT=${1%%=*} OPTARG=${1#*=}
elif [[ ${2+.} && (${optional} == false || $2 != -?*) ]]; then
OPTARG=$2 OPTSHIFT=1
elif [[ ${optional} == true ]]; then
return 1
else
fail "No argument specified for '$1'."
fi
return 0
}
DEFAULT_LOG_FILE=${0##*/}.log # Just a concept
function main {
local files=() log_mode=false verbose_mode=false
# Initialize log_file as an empty array so ${log_file} is null.
# Alternatively it can be initialized to a default value.
local log_file=()
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
show_help_info
return 2
;;
-l*|--log|--log=*)
log_mode=true
get_opt_and_optarg @optional "${@:1:2}" && log_file=${OPTARG}
shift "${OPTSHIFT}"
;;
--logfile|--logfile=*)
get_opt_and_optarg "${@:1:2}"
log_file=${OPTARG}
shift "${OPTSHIFT}"
;;
-e*|--error-file|--error-file=*)
get_opt_and_optarg "${@:1:2}"
error_file=${OPTARG}
shift "${OPTSHIFT}"
;;
-v|--verbose)
verbose_mode=true
;;
--)
files+=("${@:2}")
break
;;
-[!-][!-]*)
set -- "${1:0:2}" "-${1:2}" "${@:2}"
continue
;;
-?*)
fail "Invalid option: $1"
;;
*)
files+=("$1")
;;
esac
shift
done
if [[ ${log_mode} == true ]]; then
# Determine if log file has been specified and validate it.
if [[ ${log_file+.} ]]; then
[[ -z ${log_file} ]] && fail "Invalid log file specified."
else
log_file=${DEFAULT_LOG_FILE}
fi
...
fi
...
}
...
Note that my method for parsing optional arguments deviates from the the usual convention of UNIX
tools like sed
and getopt
which only allows optional arguments of short options (and even long
options) to be specified directly next to the option, and not with a space.
I chose not to follow the convention because it complicates parsing and syntax documentation. It’s also less intuitive because it allows normal arguments to be specified with a space but it doesn’t when it comes to optional arguments. It also doesn’t allow an empty string to be assigned as an explicit argument, simply because an empty string itself is the one used to tell that the argument has not been specified. The option will always have a string value and there really isn’t a true way to specify a “no argument”. TLDR, I dislike its inconsistency.
The better way to do it instead is to allow optargs to be specified with a space but exclude
arguments that look like an option (i.e. -?*
), just like how I did it above.
This will allow all forms of non-nil arguments to be specified through the --long-option=arg
format including empty strings and arguments that look like an option just like how it’s done in
the non-optional form, while at the same time allow options to be specified without an argument
through the -o
or --long-option
format.
One known program that follows this method is Ruby’s OptionParser.
If we run its example code
with -t
and "$(date +%F)"
as arguments separated by a space, we can see in the output that the
date string is recognized as an argument and is assigned as a parsed value to time
. Running the
script with -t -v
or --time --verbose
on the other hand will assign nil
as a value instead,
and not just an empty string. This means that the parser recognizes that no argument was specified.
An empty string argument to -t
on the other hand will cause a not RFC 2616 compliant date: ""
exception because the empty string is recognized as an argument but it is not a valid time string.
Same thing happens with ruby example.rb --time ""
or ruby example.rb --time=
.
Thanks again to geirha
for sharing the idea of splitting options as they are encountered.
The examples above are written in Bash but should be easy to convert to POSIX versions.
For a real working example, see tail-follow-grep.bash.