CLI With Python

Motivation

I’ve been working a lot recently in python, making my own suite of analytic tools. It’s not publicly available and it won’t be for some time, but I’ve learned a great deal about python because of it.

In my work, python is the ‘toy’ language for writing quick and dirty scripts, to later be solidified in C++; I certainly didn’t expect to spend a large portion of the summer working with it! I’ve come to appreciate it and its features, for all they do to make my life so much easier.

There is no definitive ‘This is how to make a CLI(Command Line Interface) tool’ in python, or none that are easily found. Most guides paraphrase the documentation found for `argparse`, and stop there. But what I’ve been doing has been much more involved, so this post is just me giving reflecting on the experience.

Structure of a python package


For those unaware, here’s a quick rundown of a python package. Consider this sample package layout:

  package
  package/__init__.py
  package/__main__.py
  package/moduleA.py
  package/moduleB.py
  package/moduleC.py

A python package is just a system directory, with an __init__.py file within it. This file can be empty, but it needs to be there for python to recognize your directory as a package. Otherwise, to python it just sees a regular directory.

If a python package has a __main__.py, it can be run with `python package`. You are likely familiar with checking `if __name__ == ‘main’` to execute a module. This is where you would have that logic in your package.

The package’s contents are just regular python modules. These modules can be imported from your package into an existing project using `import package.moduleX`. However, you likely are used to using `import package`, then calling `package.moduleX`. Trying to do that will give you an error, but that’s easy to fix.

Whats happening is when you import a specific member of a package, python knows to look in that package and import that module directly. However, when python imports a package as a module, it behaves very differently. Python will run __init__.py to set up the package, so to have the behavior of using your package as a module like you would expect, your __init__.py should be set up as below.

  import moduleA
  import moduleB
  import moduleC

The Command Line Interface


Writing a python package is as easy as writing modules that are meant to work together. The real challenge is understanding the separation between the interface and the package’s functionality.

Your __main__.py is ran when you run your package with python, but shouldn’t do any real work. It should truly be an _interface_ with your code, only collecting the necessary data and then forwarding that data to your packages modules, where work should be done.

This was one of the main issues I faced in my work. I would write test code for a feature, then face the issue of untangling the interface from functionality. Having the foresight to separately develop your __main__.py and modules will greatly boost your productivity.

I feel the CLI should follow the pattern I naturally fell into.

  1. Use argparse to collect input from the user when invoked.
  2. Check the sanity of the parsed arguments.
  3. Collapse your units of work into an iterable.
  4. Call the appropriate function for each unit.

The last two entries are probably a bit puzzling. What’s a unit of work?
Well, for my project I came across instances where I couldn’t statically figure how much data I had to process. Rather than having messy conditional branching, I figured out a way to reuse the same functions for arbitrary amounts of input. I broke input into ‘unit of work’, and then call my function once for each unit I had. This has worked very well for me, and is easily extensible.

To give you an idea of my approach, here’s a code snippet from the project:

Now we have a nice main loop, avoiding any repetition or unnecessary conditionals. This pattern works well and makes additions a snap.

1: Argparse

Argparse is a module for handling user input. It’s a bit odd to use at first, since it feels too high level to be used to define a CLI. However, I found that using it was a breeze, and eliminated nearly all my woes of parsing arguments myself using `sys.argv`. My original code for handling arguments took a while to develop, the switch to argparse took maybe 15 minutes at most.

Here’s the argparse code:

This code constructs an `ArgumentParser` named m2py, and assigns it an appropriate description. It also gives it a non-default formatter, since the default formatter ignores line breaks.

Next, the parser gets assigned arguments. There’s some ambiguity as to how to assign these, since several arguments are mutually exclusive. When assigning how data is parsed, you can only do one of the following for each `add_argument(..)`:

  • For a simple true-false flag, use `action=store_true` for it to default false unless specified, and `store_false` for the opposite.
  • If you have an optional parameter, use `nargs=’?’`, `type={int,bool,etc}`, and `default={10,false,etc}`.
  • For an argument with N parameters, use `nargs=’N’`.
  • For an argument with 0 or more parameters, use `nargs=’*’`.
  • For an argument with 1 or more parameters, use `nargs=’+’`.

Next we add sub-parsers:

  subparsers = parser.add_subparsers(help='m2py operation modes', dest='mode')

  bbv_p = subparsers.add_parser('bbv',
  description=(';Tool for collecting basic-block vectors'))
  bbv_p.add_argument('source', help='Benchmark directory to work with')
  bbv_p.add_argument('-target', nargs='?', help='Save bbv to target path', default='')

What these subparsers do is add ‘modes’ to our operation. This can be very powerful, since with a good CLI tool the amount of parser options can quickly grow out of proportion. This was an issue that I had been previously addressing by just setting default values and ensuring that those values were never accessed, but using subparsers to create a tree of options is much more efficient.

> Note that if `dest=’mode’` is not given as an argument, we won’t be able to check which sub-parser choice was made. This would be very problematic for launching the appropriate main function later on.

The subparsers themselves are very straightforward. They are assigned to just as the top level parsers are, because they are just another node on a parsing ‘tree’. It’s likely a good idea to limit your use of subparsers to not become too nested, as that could cause confusion when using your program.

Our parser is invoked as below:

result = parser.parse_args()

This returned object is of type `Namespace`. You can access values in this namespace with `result.x`, where x is the name of a parse argument. This returned namespace contains all the parameters as key:value pairs, and is a flattened list. You don’t need to worry about subparsers, the data returned just omits the options for the paths not taken.

That’s it for argparse! Let’s move on to sanity checking.

2: Sanity

Sanity checking is ensuring that the options you are passed are logically sound, and can be processed. Here’s a bit of sanity checking code from my work.

This will be different for each project, but since this one dealt with benchmarks, it checked to be sure that we at least had one bench mark to process.

Sanity checking can be that easy, or more complicated. However, it is a very important part of writing a good CLI. If the Intel input is bad, it’s important to immediately let the user know and abort the process.

Also, from a design perspective, no module code should have to worry about being passed bad input. That is just distributing the burden of sanity checking onto code that shouldn’t be responsible for it. Not only is it inefficiency and error prone to check the state of the environment in each individual module, but it also entirely violates DRY (don’t repeat yourself).

I was breaking this rule on my own a lot, when I first started this project. Having one method to handle this task made it a lot easier, as I realized when certain modes of operation would break but not others. Moving the sanity checking into one module solved that problem.

3/4: Work Units

My main function delegates work to several ‘sub_main’ functions, which handle the work for each parser mode.
Originally, each of these sub_main functions handled the logic of deciding what to do their work on individually, having a mess of conditionals in them that handled checking for target directories.

As I worked with this project and it grew, this method of having sub\_main functions determine what work to do grew to be unsustainable.

Looking back, the solution was ultimately very straightforward. In the main function, I moved to collecting the data to be processed into one iterable, then having my sub_main functions operate on a single unit of work. Options that allowed multiple targets were no longer handled individually – they were moved to one place, increasing portability.

Here’s the gist of my work-unit handling:

Conclusion


It’s very easy with a CLI tool in python the wrong way.

From the start, keep your project organized into a package, and if you need to run it, use a __main__.py file. If your project gets big you’ll save yourself lots of time. If it doesn’t, you’ve done no harm to yourself.

Use argparse, and don’t be afraid to use subdues for different modes rather than having one mode with lots of options. Consider having a ‘testing’ mode that would allow you to have one place to test new code, rather than rotting tests clumsily in after your `if __name__ == ‘main’`. That extra bit of organization will save you from having orphaned bits of code throughout your project.

Check the sanity of your environment in one place. Don’t scatter about `assert()` statements or other checks; have them in one place. It seems trivial to think about, but it’s very easy to be developing and realize you need to check some parameter, and do it locally rather than in your sanity check. Be mindful of this, as it’s a common mistake during revisions.

Have your sub_main functions handle one unit of work, whatever that unit may be to you. Create an iterable from your work-units, and iterate over it to do all of your work. You could go absolutely crazy with this; the sky is the limit! However you move forward, sticking to a solid execution pattern will save you time and stress.

So my friends, keep your sub_mains short, eliminate repetitive code, and build something beautiful.

Leave a Reply

Your email address will not be published. Required fields are marked *