I’ve been doing more work with python, and sticking to the conventions I described in my last post, but things didn’t feel quite right. There was just too much going on in my __main__.py, it felt wrong having so much implemented in one place, defining both how to process input and do work in one place.
I did some soul searching (And more stack overflow searching), and came to a rough idea of how to make what I was doing prettier. It took a while, but I think the results were worth it, and pretty awesome to boot.
I created a framework to allow for a __main_modules directory to be populated with different execution modes for my __main__.py, which in turn only handled the command line arguments, linking of main modules, and executing them on the different ‘work-units’, as I outlined in the last Python post.
In short, we’re seperating ‘what are we working on?’ and ‘what work are we doing?’.
This may seem to be some needless refactoring, but I’ve never been a big fan of the “If it ain’t broke don’t fix it” mentality, and it seems interesting enough to share. It should prove to be more modular later on. So lets check it out!
Structure of a python package (rev. 2)
Here’s roughly what our ‘hand-rolled’ package will look like. Notice that it’s almost similar to what’s been done before, but with that neat new __main_modules directory in it.
package | init.py | main.py | main_modules | | __init.py | | submainA.py | | submainB.py | | ... | moduleA.py | moduleB.py | moduleC.py | ...
So what’s so interesting about this?
The new directory has an __init__.py, so it can technically still be imported from by regular means. This is good, since python is big on abstraction that is non-obstructive (you still can mess with interals if you really want to). However, its prefixing with __ implies that it is internal, and shouldn’t be imported from unless someone really, really knows what they are doing. The casual user will see a name like this and shy away, knowing it’s meant for internal implementation.
This is the new main function, working with submain-modules.
It’s taken from a project, but with some sensitive information stripped out. So don’t try to run it as, it won’t compile!
So we can see we’re using argparse to create a parser, and add some arguments to it. These arguments added in main are global to all submain-modules, and should be arguments that define the program as a whole, and what it is working on.
We use the default no_run to define if the submain-module should run, but with set_defaults so that it isn’t a command line option. Instead, it can be set when adding the parser arguments for a submain, if it’s a mode which makes little sense to run for multiple work-items.
The next big change is the imp module to use python’s internal import mechanics. We first use get_suffixes to get valid module extensions, then find each module in ‘__main_modules’ with find_module and a list comprehension. Then we load in the module with load_module, but only to make a single call to it.
Our only other change is later on, where an if, elif, … , else statement was replaced just with arguments.main().
Why the change?
As it was, it was too bulky having the subparser code and submain code in the main module itself. This pattern dynamically processes each module in __main_modules, adding their subparsers to the main parser. Whichever submain-module is selected sets arguments.main to be a reference to it’s main function.
However, besides saving space this also ensures we never have to tinker with the main module to add a new mode. Since main is now agnostic to what modes are passed to it and instead depends on arguments.main being set, it’s free of manually delegating which submain to run. All we need to do now is define submain-modules that expose a method add_parser_args, and that those methods in turn set arguments.main to some main method which takes the arguments provided.
The Sub-Main modules
The newly added submain modules are now where all the work being done by the script is being done. The relation between main and the module is a overseer-worker one, where main organizes work items for the submodules and delegates them.
They can be as complicated as need be, but should adhere to the following template.
The module_main function should be called for each unit passed to it by main, and should take standard parameters across each module. In the gist it only takes the returned parser arguments as a parameter, but it can take as many parameters as needed to properly define a work-unit.
The add_parser_args method is how the main function reaches into the module, and links it to the larger parser. All that is required is to start with adding a parser to the subparsers object, and to set main = module_main with the set_defaults command. Any other information added to the sub-parser is optional, but can be referenced in your module_main.
So why is this helpful?
This framework requires no modification to __main__.py, there’s no need to fiddle with the internals each time you add a new operation mode.
It’s easily extensible, to add a new mode you just copy your submodule template, define your subparser arguments, and write your new main.
It’s invariant. __main__.py never changes how it prepares your work items. You always know what it’s passing each module. There’s no special cases.
Overall, I’m sure I’ll change my standards for writing big python projects again when something starts to bother me, but this seems flexible enough for now. I concede that it seems to be a needlessly complicated abstraction, but as a codebase grows, having one __main__.py handle each operation mode would just be unsustainable. There’s no way to properly maintain code where each and every main method shares one file (and I shudder to think of the merge conflicts this would cause on a large project). Better to properly structure now, then have spaghetti code down the line!