STAPL: Adaptive Remote Method Invocation (ARMI)

Parallelism, communication and synchronization support. More...

Collaboration diagram for Adaptive Remote Method Invocation (ARMI):

Classes
class	stapl::gang
	Creates a new gang by partitioning the existing one from which the gang construction is invoked. More...

Modules
	ARMI Tags
	ARMI primitives tags.

	ARMI Type traits
	Type traits related to ARMI.

	Distributed objects
	Distributed object creation, registration, retrieval and destruction.

	ARMI One-sided primitives
	One-sided (point-to-point or point-to-many) communication primitives.

	ARMI Collective primitives
	Collective communication primitives.

	ARMI Synchronization primitives
	Synchronization primitives.

	ARMI Unordered primitives
	Communication primitives with relaxed consistency.

	Request aggregation control
	Primitives that control aggregation.

	ARMI Utilities
	Utility classes and variables.

Typedefs
typedef int	stapl::process_id
	Process id type.

typedef std::uint8_t	stapl::level_type
	Level type.

Functions
void	stapl::abort (std::string const &)
	Displays the given `std::string` and aborts execution.

void	stapl::abort (void)
	Aborts execution.

template<typename T >
void	stapl::abort (T const &t)
	Outputs the given object to `std::cerr` as a string and aborts execution.

std::set< unsigned int >	stapl::external_callers (void)
	Returns the location ids that are going to make the external call.

template<typename F , typename... T>
runtime::external_caller< typename std::result_of< F(T...)>::type >::result_type	stapl::external_call (F &&f, T &&... t)
	Calls an external library function. More...

void	stapl::initialize (option opts=option{})
	Initializes the STAPL Runtime System. More...

void	stapl::initialize (int &argc, char **&argv)
	Initializes the STAPL Runtime System. More...

void	stapl::finalize (void)
	Finalizes the STAPL Runtime System. More...

bool	stapl::is_initialized (void) noexcept
	Returns `true` if the STAPL Runtime System is initialized. More...

std::vector< unsigned int > const &	stapl::get_hierarchy_widths (void) noexcept
	Returns the widths of all hierarchy levels.

unsigned int	stapl::get_available_levels (void) noexcept
	Returns the available parallelism levels. More...

void	stapl::execute (std::function< void(void)> f, unsigned int n=1)
	Executes the given function on a new created environment. More...

process_id	stapl::get_process_id (void) noexcept
	Returns the current process id.

process_id	stapl::get_num_processes (void) noexcept
	Returns the number of processes.

unsigned int	stapl::get_location_id (void) noexcept
	Returns the current location id.

unsigned int	stapl::get_num_locations (void) noexcept
	Returns the number of locations in the current gang.

std::pair< unsigned int, unsigned int >	stapl::get_location_info (void) noexcept
	Returns the current location information consisting of the location id and the number of locations in the gang.

void	stapl::rmi_poll (void)
	Causes the calling location to check for and process all available requests. If none are available it returns immediately. More...

template<typename Predicate >
void	stapl::block_until (Predicate &&pred)
	Causes the calling location to block until the given predicate returns `true`. More...

stapl::exit_code	stapl_main (int argc, char *argv[])
	The starting point for SPMD user code execution. More...

affinity_tag	get_affinity (void) noexcept
	Returns the affinity of the current processing element.

Variables
constexpr process_id	stapl::invalid_process_id = -1
	Invalid process id.

constexpr unsigned int	stapl::invalid_location_id
	Invalid location id. More...

Detailed Description

Parallelism, communication and synchronization support.

ARMI (Adaptive Remote Method Invocation) primitives are designed to abstract the creation, registration, communication and synchronization of parallelism in a STAPL program, allowing for performance and portability on different systems.

The unit of parallel execution is called a location. Contrary to the concept of shared-memory threads, locations may or may not live in the same address space. As such, it is undefined behavior to try to share writeable global variables, references and pointers, including static class members, between locations.

Upon program startup, all locations begin SPMD execution in parallel. There are no purely sequential regions. The starting point for execution is

stapl::exit_code stapl_main(int argc, char* argv[])

which replaces the sequential standard

int main(int argc, char* argv[])

The primitives provide shared-object parallelism through distributed objects named p_objects. Locations communicate with each other using Remote Method Invocations (RMI) on distributed objects. As such, each location in which a distributed object has been constructed has a local part of the distributed object.

Distributed objects are identified by a handle, and their local objects are identified by that handle (stapl::rmi_handle) and a location id. As such, all objects that are communication targets must have a handle and register with it. This handle allows for proper address translation between locations.

Since each location owns a local portion of the distributed object, it is not necessary for a location to use RMI to access its local object. However, it is still valid to use RMI on the local objects. It is up to the distributed object implementation to keep track of which portions are local and which are remote.

Some communication primitives are collective, meaning all locations must call the function before it can complete. Collective calls typically need to perform complicated communication patterns among all locations, such as reductions and broadcasts. The rest of the communication calls are point-to-point or one-sided collective operations, and hence need to be called by only one location.

Point-to-point calls cannot be used to explicitly synchronize specific locations. Collective calls imply synchronization if they return a value.

Any RMI call may be aggregated and/or combined for improved performance, by decreasing the amount of network congestion that can happen due to many small messages. See stapl::set_aggregation() for more details.

To ensure portability, only these primitives should be used to express parallelism and synchronization within a STAPL program. The actual implementation varies (OpenMP, pthreads, MPI, etc). Even if it is known that the primitives have been implemented a certain way for a certain system, using calls outside this specification (e.g., MPI calls) is non-portable and highly discouraged.

SEMANTICS OF RMIs:
RMIs make a number of guarantees. First, RMI requests always maintain order, i.e., a newer request may not overtake and execute before an older request, unless explicitly specified (e.g., the unordered primitives). However, there is no guarantee of fairness between locations. For example, although locations 0 and 1 may simultaneously issue requests to location 2, location 2 may receive all of location 1's requests before receiving any of location 0's requests.

Second, remotely invoked methods execute atomically, i.e., they will not be interrupted by other incoming requests or local operations. The only exception is if the remotely invoked method explicitly uses any of the primitives. In this case, all operations before the usage are atomic, as well as all operations after, until either the end of the method or the next RMI operation.

RMI also has a few semantic differences from traditional C++ method invocation. First, the arguments to RMI are pass-by-value, regardless of type (e.g., pointers and references), i.e., the calling location will not see any modifications made to the arguments. Likewise, the receiving location will not see any modifications made to a return value of an RMI. References and pointers are not allowed as return values.

Second, although remotely invoked methods may use and modify the supplied arguments freely, they should should not store pointers or references to the arguments after the invocation completes. This allows the runtime to reuse buffers, instead of continuously allocating space. Also, since arguments may exist within RMI maintained buffers, remotely invoked methods should not try to delete/free the object, or perform a realloc().

In many cases, especially when using aggregation settings greater than 1, starting a request does not imply it has been transferred to or executed by the destination location. There are three stages of a request: creation, issue, and execution. Only the creation stage is guaranteed to complete when asynchronous calls complete, which gathers and stores enough information to ensure that the request may subsequently execute as expected. After aggregation settings are met, a group of requests is issued to the destination location, performing the necessary data transfer.

OPTIMAL USAGE:
As in traditional C++ method invocation, the style of passing arguments can have a significant impact on performance. Most arguments are passed quicker as a const reference, since no intermediate copies are necessary. Although RMIs require a copy from the calling to the receiving location, to preserve copy-by-value semantics, all other copies will be eliminated. It is almost always quickest to pass an object of type T as a T const& if it will not be mutated and sizeof(T)>sizeof(T*).

Warning: Some compilers have problems with function template argument deduction. If your compiler issues such an error, it may be related to several issues: multiple member functions of the class matching the member functions name, arguments that require implicit casting before properly matching the member functions expected arguments etc. A simple solution is to specify a member function more explicitly:
Rtn (Class::*pmf)(Args...) = &Class::f;
async_rmi(..., ..., pmf, ...);

Function Documentation

◆ external_call()

template<typename F , typename... T>

runtime::external_caller< typename std::result_of<F(T...)>::type>::result_type stapl::external_call	(	F &&	f,
		T &&...	t
	)

Calls an external library function.

Template Parameters

F	Function type.
T	Argument types.

This function is useful for calling functions that are not STAPL-aware or thread-safe, such as MPI-based libraries. It is going to call f only from one location per process.

It is the user's responsibility to call the external_call() in a gang that f can be called correctly. Most of the times external_call() should be called in stapl_main().

Warning: Calling any runtime primitive inside f is undefined behavior.

Parameters

f	External library function to be called.
t	Arguments to pass to the function.

Returns: If R is not void, the result of f(t...) is returned in a boost::optional<R> which has a value in all locations that f has been called. If R is void, then it returns true in all locations that f has been called, otherwise false.

◆ initialize() [1/2]

void stapl::initialize ( option opts = option{} )

Initializes the STAPL Runtime System.

Warning: This is an SPMD function.

Parameters

opts	Options to pass for initialization.

◆ initialize() [2/2]

void stapl::initialize	(	int &	argc,
		char **&	argv
	)

Initializes the STAPL Runtime System.

Warning: This is an SPMD function.

Parameters

argc	Number of arguments from `main()`.
argv	Argument vector from `main()`.

◆ finalize()

void stapl::finalize ( void )

Finalizes the STAPL Runtime System.

Warning: This is an SPMD function.

◆ is_initialized()

bool stapl::is_initialized ( void )

noexcept

Returns true if the STAPL Runtime System is initialized.

Warning: This is an SPMD function.

◆ get_available_levels()

unsigned int stapl::get_available_levels ( void )

noexcept

Returns the available parallelism levels.

This is based on the environment variable STAPL_PROC_HIERARCHY that accepts a comma separated value list for the shared memory hierarchy.

Each time execute() is called, one or more levels are consumed.

◆ execute()

void stapl::execute	(	std::function< void(void)>	f,
		unsigned int	n = `1`
	)

Executes the given function on a new created environment.

This function will consume n levels of the machine hierarchy.

Parameters

f	Function to be executed.
n	Parallelism levels that will be consumed.

See also: get_available_levels()

◆ rmi_poll()

void stapl::rmi_poll ( void )

Causes the calling location to check for and process all available requests. If none are available it returns immediately.

The main purpose of rmi_poll() is to improve timeliness of request processing for a location that does not perform much communication, in support of a location that does.

Warning: User code should never call this function.

◆ block_until()

template<typename Predicate >

void stapl::block_until ( Predicate && pred )

Causes the calling location to block until the given predicate returns true.

While the predicate returns false, requests may be executed.

◆ stapl_main()

stapl::exit_code stapl_main	(	int	argc,
		char *	argv[]
	)

The starting point for SPMD user code execution.

It replaces the sequential equivalent:

int main(int argc, char* argv[])

Variable Documentation

◆ invalid_location_id

constexpr unsigned int stapl::invalid_location_id

Initial value:

std::numeric_limits<unsigned int>::max()

Invalid location id.

Classes

Modules

Typedefs

Functions

Variables

Detailed Description

Function Documentation

◆ external_call()

◆ initialize() [1/2]

◆ initialize() [2/2]

◆ finalize()

◆ is_initialized()

◆ get_available_levels()

◆ execute()

◆ rmi_poll()

◆ block_until()

◆ stapl_main()

Variable Documentation

◆ invalid_location_id