Carleton's Suggestion on Data Types: Hi Rich and Robert, James and I have been discussing a suggestion for level 2 LA routines that sidesteps somewhat the site layout issue and follows the line of reasoning we were developing at the end of the JLab meeting. (1) We provide a routine that creates a lattice field with a specified layout. This routine allocates the needed space and sets a flag that defines the layout. There are a couple of predefined layout types: even first, lexicographic, etc. One can imagine a simple class or structure that has at least three members; a pointer to the base allocated address, a stride, and the layout flag. (2) The level 2 linear algebra routines operate only on the objects created in (1). When calling a level 2 routine, one specifies the operands and the subset of the lattice on which the operation is to take place. The subset, e.g. even only, odd only, both, etc. is specified by an integer. The routine checks the layout first and then decides how to traverse the subset. For this purpose we keep a catalog of known layouts and known subsets. The catalog provides one of two possible ways to traverse the subset: (a) a specified base index, stride, and quantity, or (b) a mask. The level 2 routine would then call Level 1.5 LA routines of the type proposed by Robert to complete the task. (3) A user could create new layouts and subsets. So level 2 would have a layout librarian that added traversal instructions to the catalog of known layouts and their subsets. New subsets would be defined and added to all layouts in use, predefined or user-defined. (4) Each layout would have associated with it required routines that provided the forward and reverse map between lattice four-coordinate and array index plus machine node. For each layout, there would also be a map from four-coordinate to subset index. (5) Some level 3 routines might not accept all layouts without remapping. So there would be a remapping utility provided to convert between layouts without inconveniencing the user. If it is convenient to do so, we could adopt a standard layout choice for level 3 routines so that at least level 3 modules would be compatible among themselves. The overhead associated with this scheme would be amortized easily as long as we have reasonably large chunks of data on each node. Probably, even 4^4 wouldn't be too costly. Comments, criticism? Regards, Carleton