morteza sadeghi -

Some scientific applications are partitioned to smaller weakly independent sub-problems which makes them appropriate for multiprocessors like GPUs. However, GPUs are very sensitive to memory hierarchy access patterns [1][2][3]. One of important basic scientific applications is N-body, where the sub-problems share data elements which causes high cache miss rate or cache contention problem. In these apps it seems that then number of data segments are less than total memory accesses made by processors. One approach for these applications is to duplicate the data and make processors more independent. Even though expanding data volume in GPU is not a good idea at all, but by performance modeling I shew that data redundancy can improve a GPU application incredibly faster than non-redundant mode, that is when an significantly large portion of run-time is dedicated to GPU computation and the GPU speedup resulted by less cache miss ratios can overcome the overhead made in data collection and data transfer phases. I found this condition in astrophysical and electromagnetic and electrostatic simulations but not for chemical simulations where the volume of data is a significant bottleneck. The idea of data redundancy was tested on MLFMA algorithm for a typical potential problem in [5] which is an efficient version of N-body. I am publishing another paper for its applicability in other scientific simulations that use MLFMA. However, I want to generalize my modeling technique for broader case of applications and different data re-organization, I mean we should validate it for standard benchmarks like RODINAN. This is where I need your collaboration as co-authors.