

Date: Sun, 29 Jul 2001 14:21:58 -0600 (MDT)
From: Carleton DeTar <detar@physics.utah.edu>
To: brower@lns.mit.edu
Subject: Re: C/C++
Cc: detar@physics.utah.edu, rdm@physics.columbia.edu

Hi Rich,

To get C-language accessibility as well as assembly-language
pluggability for single-site or small-aggregate matrix-vector
operations (Level 1), I wouldn't use class methods.  Notice I didn't
do that in the example.  But a C++ programmer could construct classes
with methods that call these API routines to do the actual
computation.  Then even the class methods could be assembly-optimized
by plug-ins.

Don Holmgren's experience with the P4 underscores the need to maintain
assembly-language pluggability, even at Level 1.

As for our discussion about Level 2 (lattice-wide) operations, the
issues are a lot clearer, if we can look at a concrete example.  I
think it would be helpful, if Bob constructed or cited an example
(presumably C++) interface for a couple of the Level 2 operations -
say a simple one for multiplying a vector by a constant and adding to
another vector, and then for a Dslash.  I will also think about how
that might be done.  We have to have the flexibility to mask arbitrary
sites, (e.g. Schroedinger functional methods do not touch boundary
sites) but still function efficiently for standard choices -
i.e. odd/even checkerboard.  The MILC practice (probably that of
others as well) is to lump even sites.

Regards,
Carleton


REPLY BY RICH BROWER:
========================================================================

Carleton,

I agree in general with the thrust of your remarks. We need to develop
some test examples.

I think level 2 is the crucial issue: 

My idea is that we have a library of PURE C routines at level 2. (Of
course in many instances these will in fact be written largely in
assembly language.) These routines would provide a QCD "data parallel"
tool box of lattice wide operations, such as Matrix: A = SHIFT(B) *
C. etc.  They would be implemented using the API_message_passsing and
API_single_site standards of level 1, although on occasion the actual
implementation maybe machine dependent.  The C purist would build from
there.  Why not?  Even a die hard Fortran programmer could call these
level 2 functions and construct a rather efficient code.

For a C++ person they would use these level 2 functions as individual
methods bundled into classes with all the "beauty" of encapsulation,
operator overloading, etc.  Organizing these C routines into classes
will of course simply the appearance of the interface to the
application programmer. At least the C++ programmer will feel at home
here. Indeed this class organization if matched well with naming
conventions etc should promote a more uniform and intelligible
interface to the C user as well. However SZIN, MILC and CPS users may
choose to organize the calling of level 2 in a different ways. Some
students may write new novel application from scratch using the data
parallel tool box at level 2, etc.

Of course there will be many specific level 3 "applications" with full
inverters, gauge fixers etc. Hopefully most of these can be entirely
written using level 2 functions.  I view this as "above" the core API
(level 1 & 2). It may be that data remapping has to be put above level
2 to avoid excessive overhead and that some level 3 routines to be
fully optimized will go directly to assembly level.  This is a
experimental question. Nonetheless I would encourage the design
objective that when a level 3 routine find a missing elements at level
2, they be added and released in the next version of level 2.

We need to generate a few examples of C routines at level 2 begin used
as methods to see if this is a viable approach. Any volunteers?

Rich




 
