El-pö-ette, Hybrid parallelism models for neutron monte-carlo solver in an AMR framework, Proceedings of the Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo 2013, p. Petitet, LINPACK Benchmark, Concurrency and Computation: Practice and Experience, 2003.ĭ.
MARTIN MPC 128 UNIVERSES SOFTWARE
Aloisio et al., The International Exascale Software Project roadmap, International Journal of High Performance Computing Applications, vol. Nieplocha, Scalable work stealing, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, pp. Cook, CUDA Programming: A Developer's Guide to Parallel Computing with GPUs, 2013. Erache, Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures, Euro-Par 2014 Parallel Processing -20th International Conference. Mcdonald et al., Parallel Programming in OpenMP, 2001.
Lusk, An Efficient Format for Nearly Constant-Time Access to Arbitrary Time Intervals in Large Trace Files, Scientific Programming, pp. Zima, Parallel Programmability and the Chapel Language, International Journal of High Performance Computing Applications, vol. Jourdren, Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC, Proceedings of the 6th International Conference on Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, pp. Etiemble, MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks, ACM/IEEE SC 2000 Conference (SC'00), 2000. Butenhof, Programming with POSIX Threads, 1997.į. Bull, Measuring synchronisation and scheduling overheads in openmp, 1999. Namyst, ForestGOMP: An Efficient OpenMP Environment for NUMA Architectures, International Journal of Parallel Programming, vol. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. Randall et al., Cilk: An efficient multithreaded runtime system, 1996.į. Hey, Grid Computing: Making the Global Infrastructure a Reality, 2003. Colella, Local adaptive mesh refinement for shock hydrodynamics, Journal of Computational Physics, vol. Shende, ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis, Euro-Par, pp. Valensi, Performance Tuning of x86 OpenMP Codes with MAQAO, Tools for High Performance Computing 2009 -Proceedings of the 3rd International Workshop on Parallel Tools for High Performance Computing, pp. Lang et al., Entering the petaflop era: The architecture and performance of Roadrunner, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp. Norton, Highly parallel structured adaptive mesh refinement using parallel language-based approaches, Parallel Comput, vol. Lin et al., The Design of OpenMP Tasks, IEEE Transactions on Parallel and Distributed Systems, vol. 2.1 the x10 programming language Proceedings of the 15th International Euro-Par Conference on Parallel Processing, Euro-Par '09, pp. Wacrenier, Starpu: A unified platform for task scheduling on heterogeneous multicore architectures, Daniel Atkins. of the CiHPC: Competence in High Performance Computing, HPC Status Konferenz der Gauß-Allianz e.V, pp. Eschweiler et al., Score-P: A unified performance measurement system for petascale applications, Proc. Zheng, Upc collectives library 2.0, Fifth Conference on Partitioned Global Address Space Programming Models, 2011.
, BullxMPI, and best hybrid combination for MPC and IntelMPI (8 16-core nodes, 1 task per node), p. , BullxMPI, and best hybrid combination for MPC and IntelMPI (4 128-core nodes, 1 task per node), p. , IntelMPI, and the best hybrid combination (4 128-core nodes, 1 MPI task per node), p.