STAPL API Reference |
Modules Classes |
Parallelism, communication and synchronization support. More...
Classes | |
class | stapl::gang |
Creates a new gang by partitioning the existing one from which the gang construction is invoked. More... | |
Modules | |
ARMI Tags | |
ARMI primitives tags. | |
ARMI Type traits | |
Type traits related to ARMI. | |
Distributed objects | |
Distributed object creation, registration, retrieval and destruction. | |
ARMI One-sided primitives | |
One-sided (point-to-point or point-to-many) communication primitives. | |
ARMI Collective primitives | |
Collective communication primitives. | |
ARMI Synchronization primitives | |
Synchronization primitives. | |
ARMI Unordered primitives | |
Communication primitives with relaxed consistency. | |
Request aggregation control | |
Primitives that control aggregation. | |
ARMI Utilities | |
Utility classes and variables. | |
Typedefs | |
typedef int | stapl::process_id |
Process id type. | |
typedef std::uint8_t | stapl::level_type |
Level type. | |
Functions | |
void | stapl::abort (std::string const &) |
Displays the given std::string and aborts execution. | |
void | stapl::abort (void) |
Aborts execution. | |
template<typename T > | |
void | stapl::abort (T const &t) |
Outputs the given object to std::cerr as a string and aborts execution. | |
std::set< unsigned int > | stapl::external_callers (void) |
Returns the location ids that are going to make the external call. | |
template<typename F , typename... T> | |
runtime::external_caller< typename std::result_of< F(T...)>::type >::result_type | stapl::external_call (F &&f, T &&... t) |
Calls an external library function. More... | |
void | stapl::initialize (option opts=option{}) |
Initializes the STAPL Runtime System. More... | |
void | stapl::initialize (int &argc, char **&argv) |
Initializes the STAPL Runtime System. More... | |
void | stapl::finalize (void) |
Finalizes the STAPL Runtime System. More... | |
bool | stapl::is_initialized (void) noexcept |
Returns true if the STAPL Runtime System is initialized. More... | |
std::vector< unsigned int > const & | stapl::get_hierarchy_widths (void) noexcept |
Returns the widths of all hierarchy levels. | |
unsigned int | stapl::get_available_levels (void) noexcept |
Returns the available parallelism levels. More... | |
void | stapl::execute (std::function< void(void)> f, unsigned int n=1) |
Executes the given function on a new created environment. More... | |
process_id | stapl::get_process_id (void) noexcept |
Returns the current process id. | |
process_id | stapl::get_num_processes (void) noexcept |
Returns the number of processes. | |
unsigned int | stapl::get_location_id (void) noexcept |
Returns the current location id. | |
unsigned int | stapl::get_num_locations (void) noexcept |
Returns the number of locations in the current gang. | |
std::pair< unsigned int, unsigned int > | stapl::get_location_info (void) noexcept |
Returns the current location information consisting of the location id and the number of locations in the gang. | |
void | stapl::rmi_poll (void) |
Causes the calling location to check for and process all available requests. If none are available it returns immediately. More... | |
template<typename Predicate > | |
void | stapl::block_until (Predicate &&pred) |
Causes the calling location to block until the given predicate returns true . More... | |
stapl::exit_code | stapl_main (int argc, char *argv[]) |
The starting point for SPMD user code execution. More... | |
affinity_tag | get_affinity (void) noexcept |
Returns the affinity of the current processing element. | |
Variables | |
constexpr process_id | stapl::invalid_process_id = -1 |
Invalid process id. | |
constexpr unsigned int | stapl::invalid_location_id |
Invalid location id. More... | |
Parallelism, communication and synchronization support.
ARMI (Adaptive Remote Method Invocation) primitives are designed to abstract the creation, registration, communication and synchronization of parallelism in a STAPL program, allowing for performance and portability on different systems.
The unit of parallel execution is called a location
. Contrary to the concept of shared-memory threads, locations may or may not live in the same address space. As such, it is undefined behavior to try to share writeable global variables, references and pointers, including static class members, between locations.
Upon program startup, all locations begin SPMD execution in parallel. There are no purely sequential regions. The starting point for execution is
which replaces the sequential standard
The primitives provide shared-object parallelism through distributed objects named p_objects
. Locations communicate with each other using Remote Method Invocations (RMI) on distributed objects. As such, each location in which a distributed object has been constructed has a local part of the distributed object.
Distributed objects are identified by a handle, and their local objects are identified by that handle (stapl::rmi_handle) and a location id. As such, all objects that are communication targets must have a handle and register with it. This handle allows for proper address translation between locations.
Since each location owns a local portion of the distributed object, it is not necessary for a location to use RMI to access its local object. However, it is still valid to use RMI on the local objects. It is up to the distributed object implementation to keep track of which portions are local and which are remote.
Some communication primitives are collective, meaning all locations must call the function before it can complete. Collective calls typically need to perform complicated communication patterns among all locations, such as reductions and broadcasts. The rest of the communication calls are point-to-point or one-sided collective operations, and hence need to be called by only one location.
Point-to-point calls cannot be used to explicitly synchronize specific locations. Collective calls imply synchronization if they return a value.
Any RMI call may be aggregated and/or combined for improved performance, by decreasing the amount of network congestion that can happen due to many small messages. See stapl::set_aggregation() for more details.
To ensure portability, only these primitives should be used to express parallelism and synchronization within a STAPL program. The actual implementation varies (OpenMP, pthreads, MPI, etc). Even if it is known that the primitives have been implemented a certain way for a certain system, using calls outside this specification (e.g., MPI calls) is non-portable and highly discouraged.
SEMANTICS OF RMIs:
RMIs make a number of guarantees. First, RMI requests always maintain order, i.e., a newer request may not overtake and execute before an older request, unless explicitly specified (e.g., the unordered primitives). However, there is no guarantee of fairness between locations. For example, although locations 0 and 1 may simultaneously issue requests to location 2, location 2 may receive all of location 1's requests before receiving any of location 0's requests.
Second, remotely invoked methods execute atomically, i.e., they will not be interrupted by other incoming requests or local operations. The only exception is if the remotely invoked method explicitly uses any of the primitives. In this case, all operations before the usage are atomic, as well as all operations after, until either the end of the method or the next RMI operation.
RMI also has a few semantic differences from traditional C++ method invocation. First, the arguments to RMI are pass-by-value, regardless of type (e.g., pointers and references), i.e., the calling location will not see any modifications made to the arguments. Likewise, the receiving location will not see any modifications made to a return value of an RMI. References and pointers are not allowed as return values.
Second, although remotely invoked methods may use and modify the supplied arguments freely, they should should not store pointers or references to the arguments after the invocation completes. This allows the runtime to reuse buffers, instead of continuously allocating space. Also, since arguments may exist within RMI maintained buffers, remotely invoked methods should not try to delete/free the object, or perform a realloc()
.
In many cases, especially when using aggregation settings greater than 1, starting a request does not imply it has been transferred to or executed by the destination location. There are three stages of a request: creation, issue, and execution. Only the creation stage is guaranteed to complete when asynchronous calls complete, which gathers and stores enough information to ensure that the request may subsequently execute as expected. After aggregation settings are met, a group of requests is issued to the destination location, performing the necessary data transfer.
OPTIMAL USAGE:
As in traditional C++ method invocation, the style of passing arguments can have a significant impact on performance. Most arguments are passed quicker as a const reference, since no intermediate copies are necessary. Although RMIs require a copy from the calling to the receiving location, to preserve copy-by-value semantics, all other copies will be eliminated. It is almost always quickest to pass an object of type T
as a T const&
if it will not be mutated and sizeof(T)>sizeof(T*)
.
runtime::external_caller< typename std::result_of<F(T...)>::type>::result_type stapl::external_call | ( | F && | f, |
T &&... | t | ||
) |
Calls an external library function.
F | Function type. |
T | Argument types. |
This function is useful for calling functions that are not STAPL-aware or thread-safe, such as MPI-based libraries. It is going to call f
only from one location per process.
It is the user's responsibility to call the external_call() in a gang that f
can be called correctly. Most of the times external_call() should be called in stapl_main().
f
is undefined behavior.f | External library function to be called. |
t | Arguments to pass to the function. |
R
is not void
, the result of f
(t...) is returned in a boost::optional<R>
which has a value in all locations that f
has been called. If R
is void
, then it returns true
in all locations that f
has been called, otherwise false
. Initializes the STAPL Runtime System.
opts | Options to pass for initialization. |
void stapl::initialize | ( | int & | argc, |
char **& | argv | ||
) |
Initializes the STAPL Runtime System.
argc | Number of arguments from main() . |
argv | Argument vector from main() . |
void stapl::finalize | ( | void | ) |
Finalizes the STAPL Runtime System.
|
noexcept |
Returns true
if the STAPL Runtime System is initialized.
|
noexcept |
Returns the available parallelism levels.
This is based on the environment variable STAPL_PROC_HIERARCHY
that accepts a comma separated value list for the shared memory hierarchy.
Each time execute() is called, one or more levels are consumed.
void stapl::execute | ( | std::function< void(void)> | f, |
unsigned int | n = 1 |
||
) |
Executes the given function on a new created environment.
This function will consume n
levels of the machine hierarchy.
f | Function to be executed. |
n | Parallelism levels that will be consumed. |
void stapl::rmi_poll | ( | void | ) |
Causes the calling location to check for and process all available requests. If none are available it returns immediately.
The main purpose of rmi_poll()
is to improve timeliness of request processing for a location that does not perform much communication, in support of a location that does.
void stapl::block_until | ( | Predicate && | pred | ) |
Causes the calling location to block until the given predicate returns true
.
While the predicate returns false
, requests may be executed.
stapl::exit_code stapl_main | ( | int | argc, |
char * | argv[] | ||
) |
The starting point for SPMD user code execution.
It replaces the sequential equivalent:
constexpr unsigned int stapl::invalid_location_id |
Invalid location id.