SU(3) linear algebra primitives in the MILC code (Carleton) see: http://cliodhna.cop.uop.edu/~hetrick/milc/su3.html The most critical are typically coded in assembly language. These are the ones critical for Dslash: Staggered Dslash: mult_adj_su3_mat_vec_4dir (does four su3 matrix x vector operations) mult_su3_mat_vec_sum_4dir sub_four_su3_vecs Wilson Dslash: wp_shrink wp_grow mult_su3_mat_hwvec wp_grow_add Also needed for conjugate gradient are Staggered congrad: magsq_su3vec g_doublesum su3_rdot and some specialized routines for staggered fermions: scalar_mult_latvec (performs scalar_mult_su3_vector with the same constant over the entire lattice) scalar_mult_add_latvec (same, but scalar_mult_add_su3_vector) copy_latvec (lattice global copy) clear_latvec (lattice global clear) Wilson congrad: scalar_mult_add_wvec magsq_wvec g_doublesum copy_wvec wvec_dot c_scalar_mult_add_wvec g_dcomplexsum For the more complicated improved actions with dynamical fermions, we also have to worry about optimizing the calculation of the gauge force term, since the time spent there is comparable to the time spent in the fermion inversions. Here are the primitives involved there: mult_su3_nn (by extension - other su3 matrix matrix multiplications) scalar_mult_add_su3_matrix mult_adj_su3_mat_hwvec mult_su3_mat_hwvec make_anti_hermitian uncompress_anti_hermetian su3_projector mult_adj_su3_mat_vec Even with assembly-coded versions of complete inverters for standard actions, need an optimized set of primitives, to develop new actions.