Python Parser

argparse — Parser for command-line options, arguments and …

New in version 3. 2.
Source code: Lib/
The argparse module makes it easy to write user-friendly command-line
interfaces. The program defines what arguments it requires, and argparse
will figure out how to parse those out of The argparse
module also automatically generates help and usage messages and issues errors
when users give the program invalid arguments.
The following code is a Python program that takes a list of integers and
produces either the sum or the max:
import argparse
parser = gumentParser(description=’Process some integers. ‘)
d_argument(‘integers’, metavar=’N’, type=int, nargs=’+’,
help=’an integer for the accumulator’)
d_argument(‘–sum’, dest=’accumulate’, action=’store_const’,
const=sum, default=max,
help=’sum the integers (default: find the max)’)
args = rse_args()
Assuming the Python code above is saved into a file called, it can
be run at the command line and provides useful help messages:
$ python -h
usage: [-h] [–sum] N [N… ]
Process some integers.
positional arguments:
N an integer for the accumulator
-h, –help show this help message and exit
–sum sum the integers (default: find the max)
When run with the appropriate arguments, it prints either the sum or the max of
the command-line integers:
$ python 1 2 3 4
$ python 1 2 3 4 –sum
If invalid arguments are passed in, it will issue an error:
$ python a b c
error: argument N: invalid int value: ‘a’
The following sections walk you through this example.
Creating a parser¶
The first step in using the argparse is creating an
ArgumentParser object:
>>> parser = gumentParser(description=’Process some integers. ‘)
The ArgumentParser object will hold all the information necessary to
parse the command line into Python data types.
Adding arguments¶
Filling an ArgumentParser with information about program arguments is
done by making calls to the add_argument() method.
Generally, these calls tell the ArgumentParser how to take the strings
on the command line and turn them into objects. This information is stored and
used when parse_args() is called. For example:
>>> d_argument(‘integers’, metavar=’N’, type=int, nargs=’+’,… help=’an integer for the accumulator’)
>>> d_argument(‘–sum’, dest=’accumulate’, action=’store_const’,… const=sum, default=max,… help=’sum the integers (default: find the max)’)
Later, calling parse_args() will return an object with
two attributes, integers and accumulate. The integers attribute
will be a list of one or more ints, and the accumulate attribute will be
either the sum() function, if –sum was specified at the command line,
or the max() function if it was not.
Parsing arguments¶
ArgumentParser parses arguments through the
parse_args() method. This will inspect the command line,
convert each argument to the appropriate type and then invoke the appropriate action.
In most cases, this means a simple Namespace object will be built up from
attributes parsed out of the command line:
>>> rse_args([‘–sum’, ‘7’, ‘-1′, ’42’])
Namespace(accumulate=, integers=[7, -1, 42])
In a script, parse_args() will typically be called with no
arguments, and the ArgumentParser will automatically determine the
command-line arguments from
ArgumentParser objects¶
class gumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], Formatter, prefix_chars=’-‘, fromfile_prefix_chars=None, argument_default=None, conflict_handler=’error’, add_help=True, allow_abbrev=True, exit_on_error=True)¶
Create a new ArgumentParser object. All parameters should be passed
as keyword arguments. Each parameter has its own more detailed description
below, but in short they are:
prog – The name of the program (default: [0])
usage – The string describing the program usage (default: generated from
arguments added to parser)
description – Text to display before the argument help (default: none)
epilog – Text to display after the argument help (default: none)
parents – A list of ArgumentParser objects whose arguments should
also be included
formatter_class – A class for customizing the help output
prefix_chars – The set of characters that prefix optional arguments
(default: ‘-‘)
fromfile_prefix_chars – The set of characters that prefix files from
which additional arguments should be read (default: None)
argument_default – The global default value for arguments
(default: None)
conflict_handler – The strategy for resolving conflicting optionals
(usually unnecessary)
add_help – Add a -h/–help option to the parser (default: True)
allow_abbrev – Allows long options to be abbreviated if the
abbreviation is unambiguous. (default: True)
exit_on_error – Determines whether or not ArgumentParser exits with
error info when an error occurs. (default: True)
Changed in version 3. 5: allow_abbrev parameter was added.
Changed in version 3. 8: In previous versions, allow_abbrev also disabled grouping of short
flags such as -vv to mean -v -v.
Changed in version 3. 9: exit_on_error parameter was added.
The following sections describe how each of these are used.
By default, ArgumentParser objects use [0] to determine
how to display the name of the program in help messages. This default is almost
always desirable because it will make the help messages match how the program was
invoked on the command line. For example, consider a file named
with the following code:
parser = gumentParser()
d_argument(‘–foo’, help=’foo help’)
The help for this program will display as the program name
(regardless of where the program was invoked from):
$ python –help
usage: [-h] [–foo FOO]
–foo FOO foo help
$ cd..
$ python subdir/ –help
To change this default behavior, another value can be supplied using the
prog= argument to ArgumentParser:
>>> parser = gumentParser(prog=’myprogram’)
>>> int_help()
usage: myprogram [-h]
Note that the program name, whether determined from [0] or from the
prog= argument, is available to help messages using the%(prog)s format
>>> d_argument(‘–foo’, help=’foo of the%(prog)s program’)
usage: myprogram [-h] [–foo FOO]
–foo FOO foo of the myprogram program
By default, ArgumentParser calculates the usage message from the
arguments it contains:
>>> parser = gumentParser(prog=’PROG’)
>>> d_argument(‘–foo’, nargs=’? ‘, help=’foo help’)
>>> d_argument(‘bar’, nargs=’+’, help=’bar help’)
usage: PROG [-h] [–foo [FOO]] bar [bar… ]
bar bar help
–foo [FOO] foo help
The default message can be overridden with the usage= keyword argument:
>>> parser = gumentParser(prog=’PROG’, usage=’%(prog)s [options]’)
usage: PROG [options]
The%(prog)s format specifier is available to fill in the program name in
your usage messages.
Most calls to the ArgumentParser constructor will use the
description= keyword argument. This argument gives a brief description of
what the program does and how it works. In help messages, the description is
displayed between the command-line usage string and the help messages for the
various arguments:
>>> parser = gumentParser(description=’A foo that bars’)
usage: [-h]
A foo that bars
By default, the description will be line-wrapped so that it fits within the
given space. To change this behavior, see the formatter_class argument.
Some programs like to display additional description of the program after the
description of the arguments. Such text can be specified using the epilog=
argument to ArgumentParser:
>>> parser = gumentParser(… description=’A foo that bars’,… epilog=”And that’s how you’d foo a bar”)
And that’s how you’d foo a bar
As with the description argument, the epilog= text is by default
line-wrapped, but this behavior can be adjusted with the formatter_class
argument to ArgumentParser.
Sometimes, several parsers share a common set of arguments. Rather than
repeating the definitions of these arguments, a single parser with all the
shared arguments and passed to parents= argument to ArgumentParser
can be used. The parents= argument takes a list of ArgumentParser
objects, collects all the positional and optional actions from them, and adds
these actions to the ArgumentParser object being constructed:
>>> parent_parser = gumentParser(add_help=False)
>>> d_argument(‘–parent’, type=int)
>>> foo_parser = gumentParser(parents=[parent_parser])
>>> d_argument(‘foo’)
>>> rse_args([‘–parent’, ‘2’, ‘XXX’])
Namespace(foo=’XXX’, parent=2)
>>> bar_parser = gumentParser(parents=[parent_parser])
>>> d_argument(‘–bar’)
>>> rse_args([‘–bar’, ‘YYY’])
Namespace(bar=’YYY’, parent=None)
Note that most parent parsers will specify add_help=False. Otherwise, the
ArgumentParser will see two -h/–help options (one in the parent
and one in the child) and raise an error.
You must fully initialize the parsers before passing them via parents=.
If you change the parent parsers after the child parser, those changes will
not be reflected in the child.
ArgumentParser objects allow the help formatting to be customized by
specifying an alternate formatting class. Currently, there are four such
class argparse. RawDescriptionHelpFormatter¶
class argparse. RawTextHelpFormatter¶
class gumentDefaultsHelpFormatter¶
class tavarTypeHelpFormatter¶
RawDescriptionHelpFormatter and RawTextHelpFormatter give
more control over how textual descriptions are displayed.
By default, ArgumentParser objects line-wrap the description and
epilog texts in command-line help messages:
>>> parser = gumentParser(… prog=’PROG’,… description=”’this description… was indented weird… but that is okay”’,… epilog=”’… likewise for this epilog whose whitespace will… be cleaned up and whose words will be wrapped… across a couple lines”’)
usage: PROG [-h]
this description was indented weird but that is okay
likewise for this epilog whose whitespace will be cleaned up and whose words
will be wrapped across a couple lines
Passing RawDescriptionHelpFormatter as formatter_class=
indicates that description and epilog are already correctly formatted and
should not be line-wrapped:
>>> parser = gumentParser(… formatter_class=argparse. RawDescriptionHelpFormatter,… (”’\… Please do not mess up this text!… ——————————–… I have indented it… exactly the way… I want it… ”’))
Please do not mess up this text!
I have indented it
exactly the way
I want it
RawTextHelpFormatter maintains whitespace for all sorts of help text,
including argument descriptions. However, multiple new lines are replaced with
one. If you wish to preserve multiple blank lines, add spaces between the
ArgumentDefaultsHelpFormatter automatically adds information about
default values to each of the argument help messages:
>>> parser = gumentParser(… gumentDefaultsHelpFormatter)
>>> d_argument(‘–foo’, type=int, default=42, help=’FOO! ‘)
>>> d_argument(‘bar’, nargs=’*’, default=[1, 2, 3], help=’BAR! ‘)
usage: PROG [-h] [–foo FOO] [bar… ]
bar BAR! (default: [1, 2, 3])
–foo FOO FOO! (default: 42)
MetavarTypeHelpFormatter uses the name of the type argument for each
argument as the display name for its values (rather than using the dest
as the regular formatter does):
>>> parser = gumentParser(… tavarTypeHelpFormatter)
>>> d_argument(‘–foo’, type=int)
>>> d_argument(‘bar’, type=float)
usage: PROG [-h] [–foo int] float
–foo int
Most command-line options will use – as the prefix, e. g. -f/–foo.
Parsers that need to support different or additional prefix
characters, e. for options
like +f or /foo, may specify them using the prefix_chars= argument
to the ArgumentParser constructor:
>>> parser = gumentParser(prog=’PROG’, prefix_chars=’-+’)
>>> d_argument(‘+f’)
>>> d_argument(‘++bar’)
>>> rse_args(‘+f X ++bar Y'())
Namespace(bar=’Y’, f=’X’)
The prefix_chars= argument defaults to ‘-‘. Supplying a set of
characters that does not include – will cause -f/–foo options to be
Sometimes, for example when dealing with a particularly long argument lists, it
may make sense to keep the list of arguments in a file rather than typing it out
at the command line. If the fromfile_prefix_chars= argument is given to the
ArgumentParser constructor, then arguments that start with any of the
specified characters will be treated as files, and will be replaced by the
arguments they contain. For example:
>>> with open(”, ‘w’) as fp:… (‘-f\nbar’)
>>> parser = gumentParser(fromfile_prefix_chars=’@’)
>>> d_argument(‘-f’)
>>> rse_args([‘-f’, ‘foo’, ”])
Arguments read from a file must by default be one per line (but see also
convert_arg_line_to_args()) and are treated as if they
were in the same place as the original file referencing argument on the command
line. So in the example above, the expression [‘-f’, ‘foo’, ”]
is considered equivalent to the expression [‘-f’, ‘foo’, ‘-f’, ‘bar’].
The fromfile_prefix_chars= argument defaults to None, meaning that
arguments will never be treated as file references.
Generally, argument defaults are specified either by passing a default to
add_argument() or by calling the
set_defaults() methods with a specific set of name-value
pairs. Sometimes however, it may be useful to specify a single parser-wide
default for arguments. This can be accomplished by passing the
argument_default= keyword argument to ArgumentParser. For example,
to globally suppress attribute creation on parse_args()
calls, we supply argument_default=SUPPRESS:
>>> parser = gumentParser(PPRESS)
>>> d_argument(‘–foo’)
>>> d_argument(‘bar’, nargs=’? ‘)
>>> rse_args([‘–foo’, ‘1’, ‘BAR’])
Namespace(bar=’BAR’, foo=’1′)
>>> rse_args([])
Normally, when you pass an argument list to the
parse_args() method of an ArgumentParser,
it recognizes abbreviations of long options.
This feature can be disabled by setting allow_abbrev to False:
>>> parser = gumentParser(prog=’PROG’, allow_abbrev=False)
>>> d_argument(‘–foobar’, action=’store_true’)
>>> d_argument(‘–foonley’, action=’store_false’)
>>> rse_args([‘–foon’])
usage: PROG [-h] [–foobar] [–foonley]
PROG: error: unrecognized arguments: –foon
New in version 3. 5.
ArgumentParser objects do not allow two actions with the same option
string. By default, ArgumentParser objects raise an exception if an
attempt is made to create an argument with an option string that is already in
>>> d_argument(‘-f’, ‘–foo’, help=’old foo help’)
>>> d_argument(‘–foo’, help=’new foo help’)
Traceback (most recent call last):..
ArgumentError: argument –foo: conflicting option string(s): –foo
Sometimes (e. when using parents) it may be useful to simply override any
older arguments with the same option string. To get this behavior, the value
‘resolve’ can be supplied to the conflict_handler= argument of
>>> parser = gumentParser(prog=’PROG’, conflict_handler=’resolve’)
usage: PROG [-h] [-f FOO] [–foo FOO]
-f FOO old foo help
–foo FOO new foo help
Note that ArgumentParser objects only remove an action if all of its
option strings are overridden. So, in the example above, the old -f/–foo
action is retained as the -f action, because only the –foo option
string was overridden.
By default, ArgumentParser objects add an option which simply displays
the parser’s help message. For example, consider a file named
containing the following code:
If -h or –help is supplied at the command line, the ArgumentParser
help will be printed:
Occasionally, it may be useful to disable the addition of this help option.
This can be achieved by passing False as the add_help= argument to
>>> parser = gumentParser(prog=’PROG’, add_help=False)
>>> d_argument(‘–foo’, help=’foo help’)
usage: PROG [–foo FOO]
The help option is typically -h/–help. The exception to this is
if the prefix_chars= is specified and does not include -, in
which case -h and –help are not valid options. In
this case, the first character in prefix_chars is used to prefix
the help options:
>>> parser = gumentParser(prog=’PROG’, prefix_chars=’+/’)
usage: PROG [+h]
+h, ++help show this help message and exit
Normally, when you pass an invalid argument list to the parse_args()
method of an ArgumentParser, it will exit with error info.
If the user would like to catch errors manually, the feature can be enabled by setting
exit_on_error to False:
>>> parser = gumentParser(exit_on_error=False)
>>> d_argument(‘–integers’, type=int)
_StoreAction(option_strings=[‘–integers’], dest=’integers’, nargs=None, const=None, default=None, type=, choices=None, help=None, metavar=None)
>>> try:… rse_args(‘–integers a'())… except gumentError:… print(‘Catching an argumentError’)…
Catching an argumentError
New in version 3. 9.
The add_argument() method¶
d_argument(name or flags… [, action][, nargs][, const][, default][, type][, choices][, required][, help][, metavar][, dest])¶
Define how a single command-line argument should be parsed. Each parameter
has its own more detailed description below, but in short they are:
name or flags – Either a name or a list of option strings, e. foo
or -f, –foo.
action – The basic type of action to be taken when this argument is
encountered at the command line.
nargs – The number of command-line arguments that should be consumed.
const – A constant value required by some action and nargs selections.
default – The value produced if the argument is absent from the
command line and if it is absent from the namespace object.
type – The type to which the command-line argument should be converted.
choices – A container of the allowable values for the argument.
required – Whether or not the command-line option may be omitted
(optionals only).
help – A brief description of what the argument does.
metavar – A name for the argument in usage messages.
dest – The name of the attribute to be added to the object returned by
name or flags¶
The add_argument() method must know whether an optional
argument, like -f or –foo, or a positional argument, like a list of
filenames, is expected. The first arguments passed to
add_argument() must therefore be either a series of
flags, or a simple argument name. For example, an optional argument could
be created like:
>>> d_argument(‘-f’, ‘–foo’)
while a positional argument could be created like:
>>> d_argument(‘bar’)
When parse_args() is called, optional arguments will be
identified by the – prefix, and the remaining arguments will be assumed to
be positional:
>>> rse_args([‘BAR’])
Namespace(bar=’BAR’, foo=None)
>>> rse_args([‘BAR’, ‘–foo’, ‘FOO’])
Namespace(bar=’BAR’, foo=’FOO’)
>>> rse_args([‘–foo’, ‘FOO’])
usage: PROG [-h] [-f FOO] bar
PROG: error: the following arguments are required: bar
ArgumentParser objects associate command-line arguments with actions. These
actions can do just about anything with the command-line arguments associated with
them, though most actions simply add an attribute to the object returned by
parse_args(). The action keyword argument specifies
how the command-line arguments should be handled. The supplied actions are:
‘store’ – This just stores the argument’s value. This is the default
action. For example:
>>> parser = gumentParser()
>>> rse_args(‘–foo 1′())
‘store_const’ – This stores the value specified by the const keyword
argument. The ‘store_const’ action is most commonly used with
optional arguments that specify some sort of flag. For example:
>>> d_argument(‘–foo’, action=’store_const’, const=42)
>>> rse_args([‘–foo’])
‘store_true’ and ‘store_false’ – These are special cases of
‘store_const’ used for storing the values True and False
respectively. In addition, they create default values of False and
True respectively. For example:
>>> d_argument(‘–foo’, action=’store_true’)
>>> d_argument(‘–bar’, action=’store_false’)
>>> d_argument(‘–baz’, action=’store_false’)
>>> rse_args(‘–foo –bar'())
Namespace(foo=True, bar=False, baz=True)
‘append’ – This stores a list, and appends each argument value to the
list. This is useful to allow an option to be specified multiple times.
Example usage:
>>> d_argument(‘–foo’, action=’append’)
>>> rse_args(‘–foo 1 –foo 2′())
Namespace(foo=[‘1’, ‘2’])
‘append_const’ – This stores a list, and appends the value specified by
the const keyword argument to the list. (Note that the const keyword
argument defaults to None. ) The ‘append_const’ action is typically
useful when multiple arguments need to store constants to the same list. For
>>> d_argument(‘–str’, dest=’types’, action=’append_const’, const=str)
>>> d_argument(‘–int’, dest=’types’, action=’append_const’, const=int)
>>> rse_args(‘–str –int'())
Namespace(types=[, ])
‘count’ – This counts the number of times a keyword argument occurs. For
example, this is useful for increasing verbosity levels:
>>> d_argument(‘–verbose’, ‘-v’, action=’count’, default=0)
>>> rse_args([‘-vvv’])
Note, the default will be None unless explicitly set to 0.
‘help’ – This prints a complete help message for all the options in the
current parser and then exits. By default a help action is automatically
added to the parser. See ArgumentParser for details of how the
output is created.
‘version’ – This expects a version= keyword argument in the
add_argument() call, and prints version information
and exits when invoked:
>>> import argparse
>>> d_argument(‘–version’, action=’version’, version=’%(prog)s 2. 0′)
>>> rse_args([‘–version’])
PROG 2. 0
‘extend’ – This stores a list, and extends each argument value to the
>>> d_argument(“–foo”, action=”extend”, nargs=”+”, type=str)
>>> rse_args([“–foo”, “f1”, “–foo”, “f2”, “f3”, “f4”])
Namespace(foo=[‘f1’, ‘f2’, ‘f3’, ‘f4’])
New in version 3. 8.
You may also specify an arbitrary action by passing an Action subclass or
other object that implements the same interface. The BooleanOptionalAction
is available in argparse and adds support for boolean actions such as
–foo and –no-foo:
>>> d_argument(‘–foo’, oleanOptionalAction)
>>> rse_args([‘–no-foo’])
The recommended way to create a custom action is to extend Action,
overriding the __call__ method and optionally the __init__ and
format_usage methods.
An example of a custom action:
>>> class FooAction():… def __init__(self, option_strings, dest, nargs=None, **kwargs):… if nargs is not None:… raise ValueError(“nargs not allowed”)… super(). __init__(option_strings, dest, **kwargs)… def __call__(self, parser, namespace, values, option_string=None):… print(‘%r%r%r’% (namespace, values, option_string))… setattr(namespace,, values)…
>>> d_argument(‘–foo’, action=FooAction)
>>> d_argument(‘bar’, action=FooAction)
>>> args = rse_args(‘1 –foo 2′())
Namespace(bar=None, foo=None) ‘1’ None
Namespace(bar=’1′, foo=None) ‘2’ ‘–foo’
>>> args
Namespace(bar=’1′, foo=’2′)
For more details, see Action.
ArgumentParser objects usually associate a single command-line argument with a
single action to be taken. The nargs keyword argument associates a
different number of command-line arguments with a single action. The supported
values are:
N (an integer). N arguments from the command line will be gathered
together into a list. For example:
>>> d_argument(‘–foo’, nargs=2)
>>> d_argument(‘bar’, nargs=1)
>>> rse_args(‘c –foo a b'())
Namespace(bar=[‘c’], foo=[‘a’, ‘b’])
Note that nargs=1 produces a list of one item. This is different from
the default, in which the item is produced by itself.
‘? ‘. One argument will be consumed from the command line if possible, and
produced as a single item. If no command-line argument is present, the value from
default will be produced. Note that for optional arguments, there is an
additional case – the option string is present but not followed by a
command-line argument. In this case the value from const will be produced. Some
examples to illustrate this:
>>> d_argument(‘–foo’, nargs=’? ‘, const=’c’, default=’d’)
>>> d_argument(‘bar’, nargs=’? ‘, default=’d’)
>>> rse_args([‘XX’, ‘–foo’, ‘YY’])
Namespace(bar=’XX’, foo=’YY’)
>>> rse_args([‘XX’, ‘–foo’])
Namespace(bar=’XX’, foo=’c’)
Namespace(bar=’d’, foo=’d’)
One of the more common uses of nargs=’? ‘ is to allow optional input and
output files:
>>> d_argument(‘infile’, nargs=’? ‘, leType(‘r’),… )
>>> d_argument(‘outfile’, nargs=’? ‘, leType(‘w’),… )
>>> rse_args([”, ”])
Namespace(infile=<_io. TextIOWrapper name='' encoding='UTF-8'>,
outfile=<_io. TextIOWrapper name='' encoding='UTF-8'>)
Namespace(infile=<_io. TextIOWrapper name='‘ encoding=’UTF-8’>,
outfile=<_io. TextIOWrapper name='‘ encoding=’UTF-8’>)
‘*’. All command-line arguments present are gathered into a list. Note that
it generally doesn’t make much sense to have more than one positional argument
with nargs=’*’, but multiple optional arguments with nargs=’*’ is
possible. For example:
>>> d_argument(‘–foo’, nargs=’*’)
>>> d_argument(‘–bar’, nargs=’*’)
>>> d_argument(‘baz’, nargs=’*’)
>>> rse_args(‘a b –foo x y –bar 1 2′())
Namespace(bar=[‘1’, ‘2’], baz=[‘a’, ‘b’], foo=[‘x’, ‘y’])
‘+’. Just like ‘*’, all command-line args present are gathered into a
list. Additionally, an error message will be generated if there wasn’t at
least one command-line argument present. For example:
>>> d_argument(‘foo’, nargs=’+’)
>>> rse_args([‘a’, ‘b’])
Namespace(foo=[‘a’, ‘b’])
usage: PROG [-h] foo [foo… ]
PROG: error: the following arguments are required: foo
If the nargs keyword argument is not provided, the number of arguments consumed
is determined by the action. Generally this means a single command-line argument
will be consumed and a single item (not a list) will be produced.
The const argument of add_argument() is used to hold
constant values that are not read from the command line but are required for
the various ArgumentParser actions. The two most common uses of it are:
When add_argument() is called with
action=’store_const’ or action=’append_const’. These actions add the
const value to one of the attributes of the object returned by
parse_args(). See the action description for examples.
When add_argument() is called with option strings
(like -f or –foo) and nargs=’? ‘. This creates an optional
argument that can be followed by zero or one command-line arguments.
When parsing the command line, if the option string is encountered with no
command-line argument following it, the value of const will be assumed instead.
See the nargs description for examples.
With the ‘store_const’ and ‘append_const’ actions, the const
keyword argument must be given. For other actions, it defaults to None.
All optional arguments and some positional arguments may be omitted at the
command line. The default keyword argument of
add_argument(), whose value defaults to None,
specifies what value should be used if the command-line argument is not present.
For optional arguments, the default value is used when the option string
was not present at the command line:
>>> d_argument(‘–foo’, default=42)
>>> rse_args([‘–foo’, ‘2’])
If the target namespace already has an attribute set, the action default
will not over write it:
>>> rse_args([], mespace(foo=101))
If the default value is a string, the parser parses the value as if it
were a command-line argument. In particular, the parser applies any type
conversion argument, if provided, before setting the attribute on the
Namespace return value. Otherwise, the parser uses the value as is:
>>> d_argument(‘–length’, default=’10’, type=int)
>>> d_argument(‘–width’, default=10. 5, type=int)
>>> rse_args()
Namespace(length=10, width=10. 5)
For positional arguments with nargs equal to? or *, the default value
is used when no command-line argument was present:
>>> d_argument(‘foo’, nargs=’? ‘, default=42)
>>> rse_args([‘a’])
Providing PPRESS causes no attribute to be added if the
command-line argument was not present:
>>> d_argument(‘–foo’, PPRESS)
>>> rse_args([‘–foo’, ‘1’])
By default, the parser reads command-line arguments in as simple
strings. However, quite often the command-line string should instead be
interpreted as another type, such as a float or int. The
type keyword for add_argument() allows any
necessary type-checking and type conversions to be performed.
If the type keyword is used with the default keyword, the type converter
is only applied if the default is a string.
The argument to type can be any callable that accepts a single string.
If the function raises ArgumentTypeError, TypeError, or
ValueError, the exception is caught and a nicely formatted error
message is displayed. No other exception types are handled.
Common built-in types and functions can be used as type converters:
import pathlib
d_argument(‘count’, type=int)
d_argument(‘distance’, type=float)
d_argument(‘street’, type=ascii)
d_argument(‘code_point’, type=ord)
d_argument(‘source_file’, type=open)
d_argument(‘dest_file’, leType(‘w’, encoding=’latin-1′))
d_argument(‘datapath’, )
User defined functions can be used as well:
>>> def hyphenated(string):… return ‘-‘([word[:4] for word in sefold()()])…
>>> _ = d_argument(‘short_title’, type=hyphenated)
>>> rse_args([‘”The Tale of Two Cities”‘])
The bool() function is not recommended as a type converter. All it does
is convert empty strings to False and non-empty strings to True.
This is usually not what is desired.
In general, the type keyword is a convenience that should only be used for
simple conversions that can only raise one of the three supported exceptions.
Anything with more interesting error-handling or resource management should be
done downstream after the arguments are parsed.
For example, JSON or YAML conversions have complex error cases that require
better reporting than can be given by the type keyword. A
JSONDecodeError would not be well formatted and a
FileNotFound exception would not be handled at all.
Even FileType has its limitations for use with the type
keyword. If one argument uses FileType and then a subsequent argument fails,
an error is reported but the file is not automatically closed. In this case, it
would be better to wait until after the parser has run and then use the
with-statement to manage the files.
For type checkers that simply check against a fixed set of values, consider
using the choices keyword instead.
Some command-line arguments should be selected from a restricted set of values.
These can be handled by passing a container object as the choices keyword
argument to add_argument(). When the command line is
parsed, argument values will be checked, and an error message will be displayed
if the argument was not one of the acceptable values:
>>> parser = gumentParser(prog=”)
>>> d_argument(‘move’, choices=[‘rock’, ‘paper’, ‘scissors’])
>>> rse_args([‘rock’])
>>> rse_args([‘fire’])
usage: [-h] {rock, paper, scissors}
error: argument move: invalid choice: ‘fire’ (choose from ‘rock’,
‘paper’, ‘scissors’)
Note that inclusion in the choices container is checked after any type
conversions have been performed, so the type of the objects in the choices
container should match the type specified:
>>> d_argument(‘door’, type=int, choices=range(1, 4))
>>> print(rse_args([‘3’]))
>>> rse_args([‘4’])
usage: [-h] {1, 2, 3}
error: argument door: invalid choice: 4 (choose from 1, 2, 3)
Any container can be passed as the choices value, so list objects,
set objects, and custom containers are all supported.
Use of is not recommended because it is difficult to
control its appearance in usage, help, and error messages.
Formatted choices overrides the default metavar which is normally derived
from dest. This is usually what you want because the user never sees the
dest parameter. If this display isn’t desirable (perhaps because there are
many choices), just specify an explicit metavar.
In general, the argparse module assumes that flags like -f and —
Python Parser | Working of Python Parse with different Examples

Python Parser | Working of Python Parse with different Examples

Introduction to Python Parser
In this article, parsing is defined as the processing of a piece of python program and converting these codes into machine language. In general, we can say parse is a command for dividing the given program code into a small piece of code for analyzing the correct syntax. In Python, there is a built-in module called parse which provides an interface between the Python internal parser and compiler, where this module allows the python program to edit the small fragments of code and create the executable program from this edited parse tree of python code. In Python, there is another module known as argparse to parse command-line options.
Working of Python Parse with Examples
In this article, Python parser is mainly used for converting data in the required format, this conversion process is known as parsing. As in many different applications data obtained can have different data formats and these formats might not be suitable to the particular application and here comes the use of parser that means parsing is necessary for such situations. Therefore, parsing is generally defined as the conversion of data with one format to some other format is known as parsing. In parser consists of two parts lexer and a parser and in some cases only parsers are used.
Python parsing is done using various ways such as the use of parser module, parsing using regular expressions, parsing using some string methods such as split() and strip(), parsing using pandas such as reading CSV file to text by using, etc. There is also a concept of argument parsing which means in Python, we have a module named argparse which is used for parsing data with one or more arguments from the terminal or command-line. There are other different modules when working with argument parsings such as getopt, sys, and argparse modules. Now let us below the demonstration for Python parser. In Python, the parser can also be created using few tools such as parser generators and there is a library known as parser combinators that are used for creating parsers.
Now let us see in the below example of how the parser module is used for parsing the given expressions.
Example #1
import parser
print(“Program to demonstrate parser module in Python”)
exp = “5 + 8”
print(“The given expression for parsing is as follows:”)
print(“Parsing of given expression results as: “)
st = (exp)
print(“The parsed object is converted to the code object”)
code = mpile()
print(“The evaluated result of the given expression is as follows:”)
res = eval(code)
In the above program, we first need to import the parser module, and then we have declared expression to calculate, and to parse this expression we have to use a () function. Then we can evaluate the given expression using eval() function.
In Python, sometimes we get data that consists of date-time format which would be in CSV format or text format. So to parse such formats in proper date-time formats Python provides parse_dates() function. Suppose we have a CSV file that contains data and the data time details are separated with a comma which makes it difficult for reading therefore for such cases we use parse_dates() but before that, we have to import pandas as this function is provided by pandas.
In Python, we can also parse command-line options and arguments using an argparse module which is very user friendly for the command-line interface. Suppose we have Unix commands to execute through python command-line interface such as ls which list all the directories in the current drive and it will take many different arguments also therefore to create such command-line interface we use an argparse module in Python. Therefore, to create a command-line interface in Python we need to do the following; firstly, we have to import an argparse module, then we create an object for holding arguments using ArgumentParser() through the argparse module, later we can add arguments the ArgumentParser() object that will be created and we can run any commands in Python command line. Note as running any commands is not free other than the help command. So here is a small piece of code for how to write the python code to create a command line interface using an argparse module.
import argparse
Now we have created an object using ArgumentParser() and then we can parse the arguments using rse_args() function.
parser = gumentParser()
To add the arguments we can use add_argument() along with passing the argument to this function such as d_argument(“ ls ”). So let us see a small example below.
Example #2
args = rse_args()
So in the above program, we can see the screenshot of the output as we cannot use any other commands so it will give an error but when we have an argparse module then we can run the commands in python shell as follows:
$ python –help
usage: [-h] echo
Positional Arguments:
Optional Arguments:
-h, –helpshow this help message and exit
$ python Educba
In this article, we conclude that Python provides a parsing concept. In this article, we saw that the parsing process is very simple which in general is the process of parting the large string of one type of format for converting this format to another required format is known as parsing. This is done in many different ways in Python using python string methods such as split() or strip(), using python pandas for converting CSV files to text format. In this, we saw that we can even use a parser module for using it as a command-line interface where we can run the commands easily using the argparse module in Python. In the above, we saw how to use argparse and how can we run the commands in Python terminal.
Recommended Articles
This is a guide to Python Parser. Here we also discuss the introduction and working of python parser along with different examples and its code implementation. You may also have a look at the following articles to learn more –
Python Timezone
Python NameError
Python OS Module
Python Event Loop
Parsing text with Python - vipinajayakumar

Parsing text with Python – vipinajayakumar

I hate parsing files, but it is something that I have had to do at the start of nearly every project. Parsing is not easy, and it can be a stumbling block for beginners. However, once you become comfortable with parsing files, you never have to worry about that part of the problem. That is why I recommend that beginners get comfortable with parsing files early on in their programming education. This article is aimed at Python beginners who are interested in learning to parse text files.
In this article, I will introduce you to my system for parsing files. I will briefly touch on parsing files in standard formats, but what I want to focus on is the parsing of complex text files. What do I mean by complex? Well, we will get to that, young padawan.
For reference, the slide deck that I use to present on this topic is available here. All of the code and the sample text that I use is available in my Github repo here.
Why parse files?
The big picture
Parsing text in standard format
Parsing text using string methods
Parsing text in complex format using regular expressions
Step 1: Understand the input format
Step 2: Import the required packages
Step 3: Define regular expressions
Step 4: Write a line parser
Step 5: Write a file parser
Step 6: Test the parser
Is this the best solution?
First, let us understand what the problem is. Why do we even need to parse files? In an imaginary world where all data existed in the same format, one could expect all programs to input and output that data. There would be no need to parse files. However, we live in a world where there is a wide variety of data formats. Some data formats are better suited to different applications. An individual program can only be expected to cater for a selection of these data formats. So, inevitably there is a need to convert data from one format to another for consumption by different programs. Sometimes data is not even in a standard format which makes things a little harder.
So, what is parsing?
Analyse (a string or text) into logical syntactic components.
I don’t like the above Oxford dictionary definition. So, here is my alternate definition.
Convert data in a certain format into a more usable format.
With that definition in mind, we can imagine that our input may be in any format. So, the first step, when faced with any parsing problem, is to understand the input data format. If you are lucky, there will be documentation that describes the data format. If not, you may have to decipher the data format for yourselves. That is always fun.
Once you understand the input data, the next step is to determine what would be a more usable format. Well, this depends entirely on how you plan on using the data. If the program that you want to feed the data into expects a CSV format, then that’s your end product. For further data analysis, I highly recommend reading the data into a pandas DataFrame.
If you a Python data analyst then you are most likely familiar with pandas. It is a Python package that provides the DataFrame class and other functions to do insanely powerful data analysis with minimal effort. It is an abstraction on top of Numpy which provides multi-dimensional arrays, similar to Matlab. The DataFrame is a 2D array, but it can have multiple row and column indices, which pandas calls MultiIndex, that essentially allows it to store multi-dimensional data. SQL or database style operations can be easily performed with pandas (Comparison with SQL). Pandas also comes with a suite of IO tools which includes functions to deal with CSV, MS Excel, JSON, HDF5 and other data formats.
Although, we would want to read the data into a feature-rich data structure like a pandas DataFrame, it would be very inefficient to create an empty DataFrame and directly write data to it. A DataFrame is a complex data structure, and writing something to a DataFrame item by item is computationally expensive. It’s a lot faster to read the data into a primitive data type like a list or a dict. Once the list or dict is created, pandas allows us to easily convert it to a DataFrame as you will see later on. The image below shows the standard process when it comes to parsing any file.
If your data is in a standard format or close enough, then there is probably an existing package that you can use to read your data with minimal effort.
For example, let’s say we have a CSV file,
a, b, c
1, 2, 3
4, 5, 6
7, 8, 9
You can handle this easily with pandas.
import pandas as pd
df = ad_csv(”)
a b c
0 1 2 3
1 4 5 6
2 7 8 9
Python is incredible when it comes to dealing with strings. It is worth internalising all the common string operations. We can use these methods to extract data from a string as you can see in the simple example below.
1 2 3 4 5 6 7 8 9101112131415161718192021
my_string = ‘Names: Romeo, Juliet’
# split the string at ‘:’
step_0 = (‘:’)
# get the first slice of the list
step_1 = step_0[1]
# split the string at ‘, ‘
step_2 = (‘, ‘)
# strip leading and trailing edge spaces of each item of the list
step_3 = [() for name in step_2]
# do all the above operations in one go
one_go = [() for name in (‘:’)[1](‘, ‘)]
for idx, item in enumerate([step_0, step_1, step_2, step_3]):
print(“Step {}: {}”(idx, item))
print(“Final result in one go: {}”(one_go))
Step 0: [‘Names’, ‘ Romeo, Juliet’]
Step 1: Romeo, Juliet
Step 2: [‘ Romeo’, ‘ Juliet’]
Step 3: [‘Romeo’, ‘Juliet’]
Final result in one go: [‘Romeo’, ‘Juliet’]
As you saw in the previous two sections, if the parsing problem is simple we might get away with just using an existing parser or some string methods. However, life ain’t always that easy. How do we go about parsing a complex text file?
with open(”) as file:
file_contents = ()
Sample text
A selection of students from Riverdale High and Hogwarts took part in a quiz.
Below is a record of their scores.
School = Riverdale High
Grade = 1
Student number, Name
0, Phoebe
1, Rachel
Student number, Score
0, 3
1, 7
Grade = 2
0, Angela
1, Tristan
2, Aurora
0, 6
1, 3
2, 9
School = Hogwarts
0, Ginny
1, Luna
0, 8
0, Harry
1, Hermione
0, 5
1, 10
Grade = 3
0, Fred
1, George
0, 0
1, 0
That’s a pretty complex input file! Phew! The data it contains is pretty simple though as you can see below:
Name Score
School Grade Student number
Hogwarts 1 0 Ginny 8
1 Luna 7
2 0 Harry 5
1 Hermione 10
3 0 Fred 0
1 George 0
Riverdale High 1 0 Phoebe 3
1 Rachel 7
2 0 Angela 6
1 Tristan 3
2 Aurora 9
The sample text looks similar to a CSV in that it uses commas to separate out some information. There is a title and some metadata at the top of the file. There are five variables: School, Grade, Student number, Name and Score. School, Grade and Student number are keys. Name and Score are fields. For a given School, Grade, Student number there is a Name and a Score. In other words, School, Grade, and Student Number together form a compound key.
The data is given in a hierarchical format. First, a School is declared, then a Grade. This is followed by two tables providing Name and Score for each Student number. Then Grade is incremented. This is followed by another set of tables. Then the pattern repeats for another School. Note that the number of students in a Grade or the number of classes in a school are not constant, which adds a bit of complexity to the file. This is just a small dataset. You can easily imagine this being a massive file with lots of schools, grades and students.
It goes without saying that the data format is exceptionally poor. I have done this on purpose. If you understand how to handle this, then it will be a lot easier for you to master simpler formats. It’s not unusual to come across files like this if have to deal with a lot of legacy systems. In the past when those systems were being designed, it may not have been a requirement for the data output to be machine readable. However, nowadays everything needs to be machine-readable!
We will need the Regular expressions module and the pandas package. So, let’s go ahead and import those.
import re
In the last step, we imported re, the regular expressions module. What is it though?
Well, earlier on we saw how to use the string methods to extract data from text. However, when parsing complex files, we can end up with a lot of stripping, splitting, slicing and whatnot and the code can end up looking pretty unreadable. That is where regular expressions come in. It is essentially a tiny language embedded inside Python that allows you to say what string pattern you are looking for. It is not unique to Python by the way (treehouse).
You do not need to become a master at regular expressions. However, some basic knowledge of regexes can be very handy in your programming career. I will only teach you the very basics in this article, but I encourage you to do some further study. I also recommend regexper for visualising regular expressions. regex101 is another excellent resource for testing your regular expression.
We are going to need three regexes. The first one, as shown below, will help us to identify the school. Its regular expression is School = (. *)\n. What do the symbols mean?. : Any character
*: 0 or more of the preceding expression
(. *): Placing part of a regular expression inside parentheses allows you to group that part of the expression. So, in this case, the grouped part is the name of the school.
\n: The newline character at the end of the line
We then need a regular expression for the grade. Its regular expression is Grade = (\d+)\n. This is very similar to the previous expression. The new symbols are:
\d: Short for [0-9]
+: 1 or more of the preceding expression
Finally, we need a regular expression to identify whether the table that follows the expression in the text file is a table of names or scores. Its regular expression is (Name|Score). The new symbol is:
|: Logical or statement, so in this case, it means ‘Name’ or ‘Score. ’
We also need to understand a few regular expression functions:
mpile(pattern): Compile a regular expression pattern into a RegexObject.
A RegexObject has the following methods:
match(string): If the beginning of string matches the regular expression, return a corresponding MatchObject instance. Otherwise, return None.
search(string): Scan through string looking for a location where this regular expression produced a match, and return a corresponding MatchObject instance. Return None if there are no matches.
A MatchObject always has a boolean value of True. Thus, we can just use an if statement to identify positive matches. It has the following method:
group(): Returns one or more subgroups of the match. Groups can be referred to by their index. group(0) returns the entire match. group(1) returns the first parenthesized subgroup and so on. The regular expressions we used only have a single group. Easy! However, what if there were multiple groups? It would get hard to remember which number a group belongs to. A Python specific extension allows us to name the groups and refer to them by their name instead. We can specify a name within a parenthesized group (… ) like so: (? P… ).
Let us first define all the regular expressions. Be sure to use raw strings for regex, i. e., use the subscript r before each pattern.
# set up regular expressions
# use to visualise these if required
rx_dict = {
‘school’: mpile(r’School = (? P. *)\n’),
‘grade’: mpile(r’Grade = (? P\d+)\n’),
‘name_score’: mpile(r'(? PName|Score)’), }
Then, we can define a function that checks for regex matches.
1 2 3 4 5 6 7 8 910111213
def _parse_line(line):
Do a regex search against all defined regexes and
return the key and match result of the first matching regex
for key, rx in ():
match = (line)
if match:
return key, match
# if there are no matches
return None, None
Finally, for the main event, we have the file parser function. It is quite big, but the comments in the code should hopefully help you understand the logic.
1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
def parse_file(filepath):
Parse text at given filepath
filepath: str
Filepath for file_object to be parsed
data: Frame
Parsed data
data = [] # create an empty list to collect the data
# open the file and read through it line by line
with open(filepath, ‘r’) as file_object:
line = adline()
while line:
# at each line check for a match with a regex
key, match = _parse_line(line)
# extract school name
if key == ‘school’:
school = (‘school’)
# extract grade
if key == ‘grade’:
grade = (‘grade’)
grade = int(grade)
# identify a table header
if key == ‘name_score’:
# extract type of table, i. e., Name or Score
value_type = (‘name_score’)
# read each line of the table until a blank line
while ():
# extract number and value
number, value = ()(‘, ‘)
value = ()
# create a dictionary containing this row of data
row = {
‘School’: school,
‘Grade’: grade,
‘Student number’: number,
value_type: value}
# append the dictionary to the data list
# create a pandas DataFrame from the list of dicts
data = Frame(data)
# set the School, Grade, and Student number as the index
t_index([‘School’, ‘Grade’, ‘Student number’], inplace=True)
# consolidate df to remove nans
data = oupby()()
# upgrade Score from float to integer
data = (_numeric, errors=’ignore’)
return data
We can use our parser on our sample text like so:
if __name__ == ‘__main__’:
filepath = ”
data = parse(filepath)
This is all well and good, and you can see by comparing the input and output by eye that the parser is working correctly. However, the best practice is to always write unittests to make sure your code is doing what you intended it to do. Whenever you write a parser, please ensure that it’s well tested. I have gotten into trouble with my colleagues for using parsers without testing before. Eeek! It’s also worth noting that this does not necessarily need to be the last step. Indeed, lots of programmers preach about Test Driven Development. I have not included a test suite here as I wanted to keep this tutorial concise.
I have been parsing text files for a year and perfected my method over time. Even so, I did some additional research to find out if there was a better solution. Indeed, I owe thanks to various community members who advised me on optimising my code. The community also offered some different ways of parsing the text file. Some of them were clever and exciting. My personal favourite was this one. I presented my sample problem and solution at the forums below:
Reddit post
Stackoverflow post
Code review post
If your problem is even more complex and regular expressions don’t cut it, then the next step would be to consider parsing libraries. Here are a couple of places to start with:
Parsing Horrible Things with Python:
A PyCon lecture by Erik Rose looking at the pros and cons of various parsing libraries.
Parsing in Python: Tools and Libraries:
Tools and libraries that allow you to create parsers when regular expressions are not enough.
Now that you understand how difficult and annoying it can be to parse text files, if you ever find yourselves in the privileged position of choosing a file format, choose it with care. Here are Stanford’s best practices for file formats.
I’d be lying if I said I was delighted with my parsing method, but I’m not aware of another way, of quickly parsing a text file, that is as beginner friendly as what I’ve presented above. If you know of a better solution, I’m all ears! I have hopefully given you a good starting point for parsing a file in Python! I spent a couple of months trying lots of different methods and writing some insanely unreadable code before I finally figured it out and now I don’t think twice about parsing a file. So, I hope I have been able to save you some time. Have fun parsing text with python!

Frequently Asked Questions about python parser

What is a parser in Python?

Introduction to Python Parser. In this article, parsing is defined as the processing of a piece of python program and converting these codes into machine language. In general, we can say parse is a command for dividing the given program code into a small piece of code for analyzing the correct syntax.

How do you parse in Python?

Parsing text in complex format using regular expressionsStep 1: Understand the input format. 123. … Step 2: Import the required packages. We will need the Regular expressions module and the pandas package. … Step 3: Define regular expressions. … Step 4: Write a line parser. … Step 5: Write a file parser. … Step 6: Test the parser.Jan 7, 2018

What kind of parser does Python use?

Debugging generated parsers As the generated C parser is the one used by Python, this means that if something goes wrong when adding some new rules to the grammar you cannot correctly compile and execute Python anymore.

Leave a Reply

Your email address will not be published. Required fields are marked *