From Carleton: Mon, 30 Jul 2001 21:31:47 -0600 (MDT) Hi Rich, So we have something concrete to shoot at in our conference call tomorrow, I offer the following proposal for a "Level 2" interface. I believe it is very close to what you are proposing. Since the intention is to make it possible to replace these primitives with assembly code, or call it from plain C, C++ programmers will have to treat it as a lower-level call, after data remapping. However, I would think that any serious remapping would be expensive enough that one would do it only at the level of the full inverter. Regards, Carleton ---------------------------------------------------------------------- Proposed Level 2 Interface Here is a possible approach to Dslash that should be possible to replace with assembly code, if desired: //====================================================================== // Wilson Dslash // Compute SUM_dirs ( // ( 1 + sign*gamma[dir] ) * U(x,dir) * rhs(x+dir) // + ( 1 - sign*gamma[dir] ) * U_adj(x-dir,dir) * rhs(x-dir) //) extern "C" { void sglDslashWilson(wilson_vector *lhs, su3_matrix *u, wilson_vector *rhs, int *mask, int nsites, bool adjoint, int parity, bool sign ) void dblDslashWilson(wilson_vector_double *lhs, su3_matrix_double *u, wilson_vector_double *rhs, int *mask, int nsites, bool adjoint, int parity, bool sign ) } //====================================================================== // A wilson_vector of any precision is defined as four su3_vectors: template class Wilson_vector { public: Su3_vector d[4]; } and then we have the specializations: typedef wilson_vector Wilson_vector typedef wilson_vector_double Wilson_vector //====================================================================== Storage assumptions: The Wilson vectors and su3_matrices are stored in contiguous storage, starting from the pointer passed, incrementing by the size of the wilson_vector, "even" checkerboard sites first. The parity integer determines whether only even, only odd, or both parities are to be processed. mask[i] determines whether to exclude lhs[i]. nsites is the total number of sites, assumed to be an even integer. The SU3 link matrices are likewise stored in contiguous storage, but four directions at a time. "adjoint" specifies the convention for interpreting the SU(3) matrices (use adjoint or not) Note: We have to make some assumptions about Dirac matrices as well. How are we going to pass information about the communications map? For more generality, we should probably also include a mask on direction so one could do a spatial Dslash - useful for NRQCD, etc. And I should have specified the hopping parameter, which also could differ for each direction if we are working on an asymmetric lattice. There are really only a few lattice-wide primitives needed for sparse-matrix inversion and eigensolving: (W's are Wilson or KS vectors over the lattice. R is real. C is complex. Operations are performed according to "parity" and "mask", as in Dslash above) W = 0 W1 = W2 W1 = r*W2 + W3 norm(W) And a special one needed for clover fermions: W1 = (F * sigma)^{-1} W2 (site-diagonal operation, but mixes spin, color) Are there others we should consider? ======================================================================