======================CARLETON'S MESSAGE================================ Date: Wed, 27 Feb 2002 13:03:25 -0700 (MST) From: Carleton DeTar To: edwards@jlab.org CC: brower@lns.mit.edu, avp@lns.mit.edu, osborn@physics.utah.edu, detar@physics.utah.edu Hi Robert, You have made great progress on the Level 2 specification! It looks very good. Here are a few preliminary comments and suggestions. James and I will certainly find more as we dig deeper. Nothing so far would change the basic framework, however. The most difficult suggestion we make is the notion of a custom data type. 1. Layout routines - entry points. QDP_pe_rank and QDP_pe_index: These routines are intended to be used only outside Level 2 and by the implementation of Level 2. Names QDP_pe_rank, QDP_pe_index. The names should probably be modified to conform to the Level 1 MP nomenclature. Weren't we using "logical node number" instead of "pe_rank" there? There should be a way of getting the logical machine grid coordinate as well as the rank. 2. Data Types a. Other predefined types We have found it useful for storage economy to add an intermediate Dirac type, "spin_wilson_vector", with 4 source spin and 12 sink spin-color combinations. One could do the same with source color instead of spin, but we haven't used that one in our code. b. Custom types We need a provision for allocating custom data types. For example, a user may want a vector of complex values, which is not in the list. Arithmetic operations would be done by callback and shifts would be done easily as long as the datum byte stride is encoded in the object created. c. Naming conventions James prefers a less tight abbreviation of the data types to make the names less like assembly language op codes, so I asked him to propose an alternative. 3. Subsets We need to think through the issue of using masks vs using arrays of indices for the general subset specification. While the subset object should be opaque, so this is to some extent implementation-specific, it impinges on what we need for Level 1. James is advocating lists of indices, which would conform to Robert's Level 1 proposal. Masks are currently not supported in the Level 1 calls. 4. General Permutation Map If we follow MILC practice, there would be an initialization call that builds the map, given the map function. It returns a map object that is then used as an argument to the QDP_F_map call. So we need a QDP_build_map(). The specification of the map object could be as simple as a unique integer assigned to the map. Each map would, in general, have a meaningful forward and reverse as do the shifts. In fact, from that perspective, the _shift and _map operations could be combined into a single operation with "dir" having a more general meaning. Any reason not to? 5. Data Parallel Operations: Reductions We should have a sum operation that collects values on each time slice and returns a vector - needed for correlators. By generalization there should be such an operation for each lattice axis. True, this could be done by defining N_t subsets and looping over subsets, but correlators are commonly needed. 6. Entry and Exit from Level 2 I am unsure how a comprehensive entry and exit would be used. I would prefer this concept applies selectively to the lattice fields. Thus one could create a Level 2 lattice field, starting from a simple array of values (of supported types) in the correct order. And one could extract the same array from a Level 2 lattice field. The only thing that really needs to happen during extraction is to finish pending communication and convert any pointers to values. The user should be strongly advised to use QDP_pe_index() calls to get the correct, portable, layout-independent ordering of values. My goal would be to allow free mixing of Level 2 and Level 1 programming. Of course the implications of such flexibility for the Level 2 implementation would need to be understood. But this could be an easy evolutionary path for legacy code. 7. Copying Type conversions should be permitted here. For example, we might want to convert double to single precision via a copy. 8. Filling Shouldn't we also allow filling a field via a callback function? 9. Logical machine, layouts, and modularity It is important that calls to the Level 1 "get logical node number" and calls to QDP_pe_rank agree with each other. It is good, also, to maintain as much modularity as possible to allow a variety of programming approaches. The way we could think about maintaining this consistency and modularity is to segregate the part of our current Level 1 QMP that declares the logical machine. (We have discussed this point before.) We then have an organization like this: |----------------|----------------------------| | logical | Level 2 | | machine | | | definition | QDP_pe_rank |----------| | | QDP_pe_index | Level 1 | | get_log_node_no| | QLA | | |----------------------------| | | Level 1 QMP | |----------------|----------------------------| The lines are intended to represent modularity - i.e. parts that could be omitted or replaced. We require that the logical machine be defined before calling either Level 2 or Level 1 QMP. Then the QDP_layout routine would get the rank from a call to "get_logical_node_number" and everything would be consistent. As for modularity, an implementation of Level 2 need not use Level 1 at all, however, so shouldn't need the rest of the Level 1 library - just the logical machine definition. In fact, if a user chose not to use Level 1 and the Level 2 implementation also did not, he/she could even supply a stripped-down version of the logical machine definition, keeping only the call to "get_logical_node_number", which might just return the value from MPI_Comm_Rank, for example. Regards, Carleton