



# **Shared-Memory-Copy Bandwidth**

- Bandwidth on iMesh networks to caches and memory controllers
  - Shared memory performance critical for TSHMEM
  - Bandwidth of memory operations influenced by 3 of 5 iMesh networks
    - QDN: memory request network
    - RDN: memory response network
    - SDN: cache sharing network
  - Performance transitions occur at cache-size limits
    - L1 data cache: 3100 MB/s
    - L2 cache: 2700 MB/s
    - Tilera DDC L3 cache
    - Memory-to-memory

Reconfigurable Computing



Transfer Size



2

## **Performance – Put/Get**







## **Performance – Barrier Sync**



 TSHMEM barriers leverage UDN for *better scaling* than most Tilera TMC barriers for TILE-Gx36 and TILE*Pro*64





#### **Pull-based Broadcast**

And many more results!



 Single-device broadcast up to 45 GB/s aggregate bandwidth and 37 GB/s at 36 tiles





# **OpenSHMEM Extensions**

- Tilera's user dynamic network (UDN) needs to be shut down properly
  - TSHMEM uses UDN for barriers and explicit inter-tile communication
  - During termination, processes may hang if UDN is not deactivated
  - shmem\_finalize support required



BRIGHAM YOUNG



#### Come visit us at SC'12!

# NSF CHRECBooth 2405

PGAS
Booth 2137





