GDP 1 — Standalone Compiler CLI

Author

Rico Häuselmann <ricoh@cscs.ch>

Status

Accepted

Type

Feature

Created

17.04.2020

Discussion PR

https://github.com/GridTools/gt4py/pull/21

Implementation

https://github.com/GridTools/gt4py/pull/23

Abstract

GT4Py currently provides a Python API / embedded DSL for defining, compiling and running GridTools stencils from Python programs.

This GDP describes an additional CLI which can be integrated into the build process of non-Python programs to compile, link and run stencils written in GT4Py DSL (GTScript) from any host language that can link to C++ libraries.

A reference implementation exists, which will on acception of this GDP be drawn on for adding functionality of this GDP in a series of separate pull requests.

Motivation and Scope

This GDP proposes adding a CLI command named gtpyc (naming rationale / alternatives documented somewhere below). The command will take a GTScript file and the name of one of the available backends as input and output backend specific source code. This includes any language bindings supported by the backend. Commandline options will allow full control over what bindings should be created if any, depending on what the backend supports.

A GTScript file denotes a file with a to-be-defined extension (suggestion: .gt.py), beginning with the line

# [GT] using-dsl: gtscript

All other content must be valid Python under the assumption that the first line has been replaced by from gt4py.cartesian.gtscript import *.

In support of this, two more features are proposed:

  • A mechanism to allow GTScript files as if they were Python modules. The usage will be from gt4py.cartesian import gtpy_import; gtpy_import.install().

  • A lazy variant or replacement of the stencil decorator, returning an object that supports manual stepwise compilation.

Limited Scope

In order to support the usecases listed below the input Python files must be written in GTScript DSL without explicitly importing GT4Py. Instead of the decorators provided by the gtscript module, comments may be used, or functions may be inferred to be a stencil or submodule depending on whether they return something or not. In order to stay maintainable gtpyc will not add any new logic beyond reading input Python code and command line options.

However, the final version will rely on more fine grained control over when and where backends create and store intermediate source files, which should become part of the backend API to be used for run-time compilation in order to avoid redundancy and guarantee maintainability.

Other language projects

In order to benefit from the higher abstraction level of the GT4Py eDSL it should not be required to run Python code at runtime. Especially for existing programs written in other languages it makes more sense to link to libraries created by GT4Py as part of the build process of the host program.

Avoiding runtime dependency

Even for Python projects it may be desirable to distribute only the extension modules created by GT4Py, not the code that generated them, thus requiring the final user of the generated code to only install GridTools, not GT4Py.

Licensing

Using only the GT4Py generated stencils in a project without depending on GT4Py at runtime allows to use a licence other than GPL3 in said project without the express permission of CSCS.

Flexibility

The import mechanism will allow the flexibility to define GTScript objects in GTScript files and using them in Python code without extra steps (as if they were defined directly in Python), yet also compiling them into other language sources / bindings from the same code base just by running the CLI tool on them. This allows prototyping in Python without making a final choice as to project language and license.

Usage and Impact

Basic CLI usage

The usage is explained best using a small example.

Note that .gt.py files could be replaced by equivalent .py files (importing GTScript symbols either from gt4py or from a .gt.py file) in all following examples. Python modules or packages are also valid input files to gtpyc, provided they are valid Python under the assumption that the import extensions are installed.

Assume the following file structure:

$ tree .
pwd
├── constants.py
└── stencils.gt.py

stencils.gt.py contains the GTScript code to be compiled to stencils. The contents might look something like the following example.

stencils.gt.py
# [GT] using-dsl: gtscript

from .constants import PI


@function
def square(inp_field):
   return inp_field * inp_field


@stencil
def stencil_a(inp_field: Field[float64], out_field: Field[float64]):
   with computation(PARALLEL), interval(...):
      out_field = square(inp_field)


@stencil
def stencil_b(inp_field: Field[float64], out_field: Field[float64]):
   from __externals__ import COMPILE_TIME_VALUE
   with computation(PARALLEL), interval(...):
      out_field = PI * inp_field + COMPILE_TIME_VALUE

Notice that this file uses names from gt4py.gtscript without importing gt4py. The names will be injected by gtpyc upon recognizing the # [GT] using-dsl: gtscript comment. Also note that stencil_b uses an external value which is not available in the file itself, so it will have to be supplied on the command line. The file constants.py contains some constant values (which might be templated by the build system).

In order to get C++ code we can now run gtpyc with for example the GridTools multi core backend (-b gtmc) and tell it to generate the stencils in the new subdirectory stencils (-o stencils).

$ gtpyc -b gtmc stencils.gt.py -o stencils -e COMPILE_TIME_VALUE
$ tree .stencils/
stencils
├── stencil_a.cpp
├── stencil_a.hpp
├── stencil_b.cpp
└── stencil_b.hpp

The current backends of gt4py (with the exception of the Python-only ones) all have the ability to generate Python bindings. Future backends might allow bindings for other languages. This is accessible through an additional CLI option, which should be validated based on the chosen backend.

$ gtpyc -b gtx86 stencils.gt.py -o stencils --bindings=python -e COMPILE_TIME_VALUE
$ tree .stencils/
stencils
├── stencil_a_bindings.cpp
├── stencil_a.cpp
├── stencil_a.hpp
├── stencil_a.py
├── stencil_b_bindings.cpp
├── stencil_b.cpp
├── stencil_b.hpp
└── stencil_b.py

Finally, the backend may allow options specific to it. These can be passed using the --option or -O flag. For example the GridTools multi core backend takes a debug flag (which does nothing during source file generation) but would activate debug flags if we ask gt4py to compile a readily importable Python extension.

$ gtpyc -b gtmc stencils.gt.py -o stencils -e COMPILE_TIME_VALUE -O debug True --bindings=python --compile-bindings
$ tree .stencils/
stencils
├── stencil_a_bindings.cpp
├── stencil_a.cpp
├── stencil_a.hpp
├── _stencil_a.so  # compiled with debug flags
├── stencil_a.py
├── stencil_b_bindings.cpp
├── stencil_b.cpp
├── stencil_b.hpp
├── _stencil_b.so  # compiled with debug flags
└── stencil_b.py

Additional Commandline options will mostly correspond to the keyword arguments of the gtscript.stencil decorator.

This should be easy to incorporate into existing build systems as an additional step from .py source files to .cpp or .cu sources before building and linking or as an alternative step to build .py sources into ready to link libraries.

Advanced CLI usage

For complex or mixed language usecases it might be desirable to use a whole library of GTScript / Python files. The import mechanism makes it possible.

$ tree .
pwd
├── stencils.gt.py
└── lib
    ├── __init__.py
    ├── foo.gt.py
    └── bar
        ├── __init__.py
        └── baz.gt.py

Note that packages require an __init__.py which remains a valid Python module (no gt4py.gtscript injection). However any Python module inside the package can import from any GTScript file (including gt4py.gtscript members).

$ gtpyc -b <backend> stencils.gt.py -o stencils

Compiles all top-level stencil members of stencils.gt.py, whether they are defined directly in stencils or imported from lib

$ gtpyc -b <backend> lib -o lib_stencils

Compiles all top-level stencil members of lib/__init__.py.

Usage from Python

After adding the following to the top of a Python module, any GTScript files in the PYTHONPATH can be imported as Python modules:

from gt4py.cartesian import gtpy_import; gtpy_import.install()

Backward compatibility

This GDP is aimed to be fully backward-compatible.

Detailed description

Any description of design ideas and implementation refers to the reference implementation. This section will be updated as the reference implementation progresses.

Naming

The accepted name, used throughout this document is gtpyc which derives from gt4py but is easier on typing. The c at the end stands for “compiler”. The author does not have a strong prefernce for this name, it is simply the first one that came to mind.

The accepted conventional file extension for GTScript files is .gt.py. The extension .gtpy is also allowed for cases where double extensions may not be practical.

Alternatives under consideration:

  • gtscript / gtscriptc (or short version gts / gtsc) -> most intuitive file extension: .gts * same as above but prefixed with py -> most intuitive file extension: .pygt or .pyg

Rejected Alternatives:

  • gt4pyc, the sequence “gt4” is all typed with the left index finger on a standard keyboard. The author strongly feels that cli command names should start with an easy to type sequence (afterwards tab-completion can be used).

It is recommended to allow one file extension for GTScript files which can be derived from the CLI command name by shortening it in an intuitive way. Since the accepted double extension might cause trouble for some tools or in some environments an additional fallback is acceptable. It is possible to allow many more extensions, however the potential confusion outweighs the benefits of being more permissive.

Enabling all of GTScript without importing from gt4py.cartesian

The currently chosen route for this is to require a comment at the very start of the file:

# [GT] using-dsl: gtscript

This will serve two purposes, first it will mark the file as being written in GTScript. Any name that in Python can be accessed by from gt4py.cartesian.gtscript import * will work when compiling with gtpyc but will be deemed undefined by the Python interpreter. It is not planned to provide any means of informing Python syntax checkers to consider these names as defined. Secondly gtpyc can replace this line with an actual import line without changing line numbers for error messages.

Obviously, some symbols like the @stencil decorator will have to be either changed or an alternative has to be offered, since we do not want loading of the input GTScript file to already trigger a compilation and though we might want to give default arguments to the backend in the decorator we want to be able to override them on the CLI.

Lazy stencil decorator

The gt4py.gtscript.stencil() decorator will be extended to return an intermediate object, a drop-in replacement for the compiled StencilObject which triggers the compilation process only when used in a way that requires the stencil to be compiled first. On the other hand it will hold all contextual information given to the decorator, which will allow gtpyc to trigger it’s slightly modified build process.

GTScript import system

GTScript files can import Python modules and vice versa, after installing the GTScript import system (which can be done in a single line). gtpyc installs the import system and (by default) adds the parent directory of the input file to sys.path, the search path for Python imports. This means Python and GTScript modules and packages in the same folder as the input file are found by default, other than that imports behave as normal.

The public API consists of the gt4py.gtpy_import.install function.

Passing externals

There are two supported ways to configure values at compile / generate time.

  • By relative import of a Python file, which may be automatically generated from a template. The latter could happen as part of a build system depending on build parameters. In this case the stencil definition can use the values without importing them from __externals__. If it does, however, the external value can be overriden on the command line using the following second option.

  • By passing externals options on the command line. In this case the external will be passed to every stencil in this run of gtpyc and each stencil needs to import it from __externals__ to use it.

Generating Language bindings

The intention of this GDP is to support generating language bindings for all languages the chosen backend supports. These language bindings are intended to be usable without gt4py as a requirement. This is important to allow usage of generated bindings in non-GPL3 projects.

Implications for Tools (IDEs, Linters, etc)

It has been remarked that it would be beneficial to use Python tools like linters, checkers, syntax highlighting etc. for GTScript files. This should work by default using the recommended .gt.py file extension. However it is natural that Python tools will flag some code which is perfectly valid GTScript code as faulty Python code. Most tools should expose configuration options to ignore or correctly consider such cases. These configuration options are very different from tool to tool and are documented for each tool separately. This GDP does not propose packaging any such configuration or even extensions for tools with gt4py.

Note that the following is a simple way to get most of the desired behaviour from any tools which have trouble with the .gt.py double extension (The author is not aware of any):

$ tree .
pwd
├── mystencils.py
└── mygts.gt.py
mygts.gt.py
# [GT] using-dsl: gtscript
mystencils.py
from mygts import lazy_stencil, Field, computation, interval

@lazy_stencil
def mystencil(a: Field[float]):
   with computation(PARALLEL), interval(...):
      a = 1.

Now IDEs will recognize mystencils.py as a Python file and will highlight and check the syntax. Of course tools will be unable to import mygts, unless there is a way to configure them to run gt4py.gtpy_import.install() before trying to import.

Implementation

Implementation will start with a proof-of-concept CLI with an absolutely mninimal feature set, taking a single function in an input .py file and outputting the result of the stencil compilation in a separate file.

If it becomes apparent at that stage that changes to the internal structure would become necessary these will likely be treated in separate GDPs.

The PoC will utilize the click framework for the CLI, since it encourages separation and reuse of CLI argument / option handling and documentation code from program logic. None of the known limitations of click are foreseen to be detrimental to what this GDP wants to achieve.

Reasons for choosing click

  • separation of concerns

  • ease of reuse of CLI components

  • built in command completion for bash, zsh etc

  • built-in testing api

Alternatives

Using argparse for the CLI

Using argparse has been rejected. although it is not impossible to separate option handling code from program logic, any attempt to do so consistently would lead to partially reinventing one of the more advanced frameworks like click.

The author of this GDP does believe the additional requirement of a small pure-Python framework like click to be outweighed by the benefits.

Using plain .py extension in combination with the marker comment

The author believes that the two types of files serve distinctly separate purposes. While both types can be passed into gtpyc, plain .py files should represent valid Python modules whereas .gt.py files are treated as written in GTScript, a domain specific language that extends Python.

It may be a subtle difference in implementation but quite a difference in intent. The author of a .py file may use gt4py as a library, whereas the author of a GTScript file uses a different language which happens to have the same syntax.

Discussion

The discussion for this GDP will be in the draft PR for it, which is to be found here.

The discussion around the reference implementation is located in it’s separate pull request.

References and Footnotes

1

Each GDP must either be explicitly labeled as placed in the public domain (see this GDP as an example) or licensed under the Open Publication License.