[Openshmem-list] shmem_tool_sync
John C. Linford
jlinford at paratools.com
Mon Mar 20 21:37:56 UTC 2017
Hi Jeff,
shmem_tool_sync() would be nice for the tool developers since we could
count on SHMEM at the time of dump. When shmem_finalize is in a DSO it's
suddenly hard to assert that SHMEM calls are safe to use if the tool is
dumping from a static destructor. From the user perspective, the tools
should already provide their own mechanisms for dumping data, e.g. TAU has
the TAU_DB_DUMP call and setting the TAU_TRACK_SIGNALS=1 environment
variable will cause TAU to dump profiles if the application crashes.
At F2F I floated the idea of a SHMEM_T interface to expose and manipulate
SHMEM internal variables. I could imagine a user setting flags to instruct
the tool to dump on certain events, every Nth collective, etc.
~John C.
On Mon, Mar 20, 2017 at 5:07 PM, Jeff Hammond <jeff.science at gmail.com>
wrote:
> The discussion of shmem_finalize made me wonder if it would not be helpful
> to have an explicit call to tell any active tools to dump their logs.
>
> For example, if I have a code like NWChem that goes through a series of
> phases, and the odds of phase i+1 crashing is nontrivial, but I want to
> know how phase i did with some tool, then I can call shmem_tool_sync
> between the phases to ensure that my tool state is (1) persistent and (2)
> in an analyzable state.
>
> It really sucks if an app crashes in such a way that tool state is not
> useable, particularly when said tool state would help figure out why the
> app crashed.
>
> One advantage of this call vs some hidden implementation is that the
> hidden implementation is either asynchronous or probably runs more often
> than it needs to. For example, if I automatically dump logs every Nth
> collective, then I might slow down the app noticeably. On the other hand,
> if I have to do the dump asynchronously, I cannot take advantage of I/O
> aggregation or any other coordinated analysis.
>
> It would be ideal if somebody from the TAU team could comment on this.
>
> Jeff
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
> _______________________________________________
> Openshmem-list mailing list
> Openshmem-list at openshmem.org
> http://www.openshmem.org/mailman/listinfo/openshmem-list
>
>
--
John C. Linford, Ph.D.
Senior Computer Scientist
ParaTools, Inc. <http://www.paratools.com>
5520 Research Park Drive, Suite 100
Baltimore, MD 21228
Phone: 540-808-9250
-------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.openshmem.org/pipermail/openshmem-list/attachments/20170320/deb55d8d/attachment.html>
More information about the Openshmem-list
mailing list