GDP 1 — Standalone Compiler CLI
- Author
Rico Häuselmann <ricoh@cscs.ch>
- Status
Accepted
- Type
Feature
- Created
17.04.2020
- Discussion PR
- Implementation
Abstract
GT4Py currently provides a Python API / embedded DSL for defining, compiling and running GridTools stencils from Python programs.
This GDP describes an additional CLI which can be integrated into the build process of non-Python programs to compile, link and run stencils written in GT4Py DSL (GTScript) from any host language that can link to C++ libraries.
A reference implementation exists, which will on acception of this GDP be drawn on for adding functionality of this GDP in a series of separate pull requests.
Motivation and Scope
This GDP proposes adding a CLI command named gtpyc
(naming rationale / alternatives documented
somewhere below). The command will take a GTScript file and the name of one of the available
backends as input and output backend specific source code. This includes any language bindings
supported by the backend. Commandline options will allow full control over what bindings should be
created if any, depending on what the backend supports.
A GTScript file denotes a file with a to-be-defined extension (suggestion: .gt.py
), beginning
with the line
# [GT] using-dsl: gtscript
All other content must be valid Python under the assumption that the first line has been replaced
by from gt4py.cartesian.gtscript import *
.
In support of this, two more features are proposed:
A mechanism to allow GTScript files as if they were Python modules. The usage will be
from gt4py.cartesian import gtpy_import; gtpy_import.install()
.A lazy variant or replacement of the
stencil
decorator, returning an object that supports manual stepwise compilation.
Limited Scope
In order to support the usecases listed below the input Python files must be written in GTScript
DSL without explicitly importing GT4Py. Instead of the decorators provided by the gtscript
module, comments may be used, or functions may be inferred to be a stencil
or submodule
depending on whether they return something or not. In order to stay maintainable gtpyc
will not
add any new logic beyond reading input Python code and command line options.
However, the final version will rely on more fine grained control over when and where backends create and store intermediate source files, which should become part of the backend API to be used for run-time compilation in order to avoid redundancy and guarantee maintainability.
Other language projects
In order to benefit from the higher abstraction level of the GT4Py eDSL it should not be required to run Python code at runtime. Especially for existing programs written in other languages it makes more sense to link to libraries created by GT4Py as part of the build process of the host program.
Avoiding runtime dependency
Even for Python projects it may be desirable to distribute only the extension modules created by GT4Py, not the code that generated them, thus requiring the final user of the generated code to only install GridTools, not GT4Py.
Licensing
Using only the GT4Py generated stencils in a project without depending on GT4Py at runtime allows to use a licence other than GPL3 in said project without the express permission of CSCS.
Flexibility
The import mechanism will allow the flexibility to define GTScript objects in GTScript files and using them in Python code without extra steps (as if they were defined directly in Python), yet also compiling them into other language sources / bindings from the same code base just by running the CLI tool on them. This allows prototyping in Python without making a final choice as to project language and license.
Usage and Impact
Basic CLI usage
The usage is explained best using a small example.
Note that .gt.py
files could be replaced by equivalent .py
files (importing GTScript symbols
either from gt4py
or from a .gt.py
file) in all following examples. Python modules or packages
are also valid input files to gtpyc
, provided they are valid Python under the assumption that the
import extensions are installed.
Assume the following file structure:
$ tree .
pwd
├── constants.py
└── stencils.gt.py
stencils.gt.py
contains the GTScript code to be compiled to stencils. The contents might look
something like the following example.
# [GT] using-dsl: gtscript
from .constants import PI
@function
def square(inp_field):
return inp_field * inp_field
@stencil
def stencil_a(inp_field: Field[float64], out_field: Field[float64]):
with computation(PARALLEL), interval(...):
out_field = square(inp_field)
@stencil
def stencil_b(inp_field: Field[float64], out_field: Field[float64]):
from __externals__ import COMPILE_TIME_VALUE
with computation(PARALLEL), interval(...):
out_field = PI * inp_field + COMPILE_TIME_VALUE
Notice that this file uses names from gt4py.gtscript
without importing gt4py
. The names will be
injected by gtpyc
upon recognizing the # [GT] using-dsl: gtscript
comment. Also note that
stencil_b
uses an external value which is not available in the file itself, so it will have to be
supplied on the command line. The file constants.py
contains some constant values (which might
be templated by the build system).
In order to get C++ code we can now run gtpyc
with for example the GridTools multi core backend
(-b gtmc
) and tell it to generate the stencils in the new subdirectory stencils
(-o
stencils
).
$ gtpyc -b gtmc stencils.gt.py -o stencils -e COMPILE_TIME_VALUE
$ tree .stencils/
stencils
├── stencil_a.cpp
├── stencil_a.hpp
├── stencil_b.cpp
└── stencil_b.hpp
The current backends of gt4py
(with the exception of the Python-only ones) all have the ability
to generate Python bindings. Future backends might allow bindings for other languages. This is
accessible through an additional CLI option, which should be validated based on the chosen backend.
$ gtpyc -b gtx86 stencils.gt.py -o stencils --bindings=python -e COMPILE_TIME_VALUE
$ tree .stencils/
stencils
├── stencil_a_bindings.cpp
├── stencil_a.cpp
├── stencil_a.hpp
├── stencil_a.py
├── stencil_b_bindings.cpp
├── stencil_b.cpp
├── stencil_b.hpp
└── stencil_b.py
Finally, the backend may allow options specific to it. These can be passed using the --option
or
-O
flag. For example the GridTools multi core backend takes a debug
flag (which does nothing
during source file generation) but would activate debug flags if we ask gt4py to compile a readily
importable Python extension.
$ gtpyc -b gtmc stencils.gt.py -o stencils -e COMPILE_TIME_VALUE -O debug True --bindings=python --compile-bindings
$ tree .stencils/
stencils
├── stencil_a_bindings.cpp
├── stencil_a.cpp
├── stencil_a.hpp
├── _stencil_a.so # compiled with debug flags
├── stencil_a.py
├── stencil_b_bindings.cpp
├── stencil_b.cpp
├── stencil_b.hpp
├── _stencil_b.so # compiled with debug flags
└── stencil_b.py
Additional Commandline options will mostly correspond to the keyword arguments of the
gtscript.stencil
decorator.
This should be easy to incorporate into existing build systems as an additional step from .py
source files to .cpp
or .cu
sources before building and linking or as an alternative step to
build .py
sources into ready to link libraries.
Advanced CLI usage
For complex or mixed language usecases it might be desirable to use a whole library of GTScript / Python files. The import mechanism makes it possible.
$ tree .
pwd
├── stencils.gt.py
└── lib
├── __init__.py
├── foo.gt.py
└── bar
├── __init__.py
└── baz.gt.py
Note that packages require an __init__.py which remains a valid Python module (no gt4py.gtscript
injection). However any Python module inside the package can import from any GTScript file
(including gt4py.gtscript
members).
$ gtpyc -b <backend> stencils.gt.py -o stencils
Compiles all top-level stencil members of stencils.gt.py
, whether they are defined directly in
stencils
or imported from lib
$ gtpyc -b <backend> lib -o lib_stencils
Compiles all top-level stencil members of lib/__init__.py
.
Usage from Python
After adding the following to the top of a Python module, any GTScript files in the PYTHONPATH can be imported as Python modules:
from gt4py.cartesian import gtpy_import; gtpy_import.install()
Backward compatibility
This GDP is aimed to be fully backward-compatible.
Detailed description
Any description of design ideas and implementation refers to the reference implementation. This section will be updated as the reference implementation progresses.
Naming
The accepted name, used throughout this document is gtpyc
which derives from gt4py
but is easier on
typing. The c
at the end stands for “compiler”. The author does not have a strong prefernce for
this name, it is simply the first one that came to mind.
The accepted conventional file extension for GTScript files is .gt.py
. The extension .gtpy
is also allowed for cases where double extensions may not be practical.
Alternatives under consideration:
gtscript
/gtscriptc
(or short versiongts
/gtsc
) -> most intuitive file extension:.gts
* same as above but prefixed withpy
-> most intuitive file extension:.pygt
or.pyg
Rejected Alternatives:
gt4pyc
, the sequence “gt4” is all typed with the left index finger on a standard keyboard. The author strongly feels that cli command names should start with an easy to type sequence (afterwards tab-completion can be used).
It is recommended to allow one file extension for GTScript files which can be derived from the CLI command name by shortening it in an intuitive way. Since the accepted double extension might cause trouble for some tools or in some environments an additional fallback is acceptable. It is possible to allow many more extensions, however the potential confusion outweighs the benefits of being more permissive.
Enabling all of GTScript without importing from gt4py.cartesian
The currently chosen route for this is to require a comment at the very start of the file:
# [GT] using-dsl: gtscript
This will serve two purposes, first it will mark the file as being written in GTScript. Any name
that in Python can be accessed by from gt4py.cartesian.gtscript import *
will work when compiling with
gtpyc
but will be deemed undefined by the Python interpreter. It is not planned to provide any
means of informing Python syntax checkers to consider these names as defined. Secondly gtpyc
can
replace this line with an actual import
line without changing line numbers for error messages.
Obviously, some symbols like the @stencil
decorator will have to be either changed or an
alternative has to be offered, since we do not want loading of the input GTScript file to already
trigger a compilation and though we might want to give default arguments to the backend in the
decorator we want to be able to override them on the CLI.
Lazy stencil decorator
The gt4py.gtscript.stencil()
decorator will be extended to return an intermediate object, a
drop-in replacement for the compiled StencilObject
which triggers the compilation
process only when used in a way that requires the stencil to be compiled first. On the other hand
it will hold all contextual information given to the decorator, which will allow gtpyc
to
trigger it’s slightly modified build process.
GTScript import system
GTScript files can import Python modules and vice versa, after installing the GTScript import
system (which can be done in a single line). gtpyc
installs the import system and (by default)
adds the parent directory of the input file to sys.path
, the search path for Python imports. This
means Python and GTScript modules and packages in the same folder as the input file are found by
default, other than that imports behave as normal.
The public API consists of the gt4py.gtpy_import.install
function.
Passing externals
There are two supported ways to configure values at compile / generate time.
By relative import of a Python file, which may be automatically generated from a template. The latter could happen as part of a build system depending on build parameters. In this case the stencil definition can use the values without importing them from
__externals__
. If it does, however, the external value can be overriden on the command line using the following second option.By passing externals options on the command line. In this case the external will be passed to every stencil in this run of
gtpyc
and each stencil needs to import it from__externals__
to use it.
Generating Language bindings
The intention of this GDP is to support generating language bindings for all languages the chosen
backend supports. These language bindings are intended to be usable without gt4py
as a
requirement. This is important to allow usage of generated bindings in non-GPL3 projects.
Implications for Tools (IDEs, Linters, etc)
It has been remarked that it would be beneficial to use Python tools like linters, checkers, syntax
highlighting etc. for GTScript files. This should work by default using the recommended .gt.py
file extension. However it is natural that Python tools will flag some code which is perfectly
valid GTScript code as faulty Python code. Most tools should expose configuration options to
ignore or correctly consider such cases. These configuration options are very different from tool
to tool and are documented for each tool separately. This GDP does not propose packaging any such
configuration or even extensions for tools with gt4py
.
Note that the following is a simple way to get most of the desired behaviour from any tools which
have trouble with the .gt.py
double extension (The author is not aware of any):
$ tree .
pwd
├── mystencils.py
└── mygts.gt.py
# [GT] using-dsl: gtscript
from mygts import lazy_stencil, Field, computation, interval
@lazy_stencil
def mystencil(a: Field[float]):
with computation(PARALLEL), interval(...):
a = 1.
Now IDEs will recognize mystencils.py
as a Python file and will highlight and check the syntax.
Of course tools will be unable to import mygts
, unless there is a way to configure them to run
gt4py.gtpy_import.install()
before trying to import.
Implementation
Implementation will start with a proof-of-concept CLI with an absolutely mninimal feature set,
taking a single function in an input .py
file and outputting the result of the stencil
compilation in a separate file.
If it becomes apparent at that stage that changes to the internal structure would become necessary these will likely be treated in separate GDPs.
The PoC will utilize the click framework for the CLI, since it encourages separation and reuse of CLI argument / option handling and documentation code from program logic. None of the known limitations of click are foreseen to be detrimental to what this GDP wants to achieve.
Reasons for choosing click
separation of concerns
ease of reuse of CLI components
built in command completion for bash, zsh etc
built-in testing api
Alternatives
Using argparse for the CLI
Using argparse has been rejected. although it is not impossible to separate option handling code from program logic, any attempt to do so consistently would lead to partially reinventing one of the more advanced frameworks like click.
The author of this GDP does believe the additional requirement of a small pure-Python framework like click to be outweighed by the benefits.
Using plain .py
extension in combination with the marker comment
The author believes that the two types of files serve distinctly separate purposes. While both
types can be passed into gtpyc
, plain .py
files should represent valid Python modules whereas
.gt.py
files are treated as written in GTScript, a domain specific language that extends Python.
It may be a subtle difference in implementation but quite a difference in intent. The author of a
.py
file may use gt4py
as a library, whereas the author of a GTScript file uses a different
language which happens to have the same syntax.
Discussion
The discussion for this GDP will be in the draft PR for it, which is to be found here.
The discussion around the reference implementation is located in it’s separate pull request.
References and Footnotes
- 1
Each GDP must either be explicitly labeled as placed in the public domain (see this GDP as an example) or licensed under the Open Publication License.
Copyright
This document has been placed in the public domain. 1