What is OCL-MLA?

OCL-MLA is exactly what its name implies: a mid-level set of abstractions to make OpenCL development easier. OCL-MLA provides a set of compile-time configurable logical devices that are mapped to actual node-level device resources. This removes the normal boiler-plate configuration that many people find intimidating and tedious. Logical devices are pre-configured (think MPI_COMM_WORLD communicator) and initialized with a single call to ocl_init(). OCL-MLA insulates the application developer from differences in particular compute devices accessed by the OpenCL runtime, while still allowing an expert OpenCL administrator to choose how each physical device is configured and used. Additionally, OCL-MLA provides a convenience hash-table interface for creating and accessing OpenCL constructs such as kernels, programs and buffers. OCL-MLA supports C and Fortran APIs.

Features

Compile-time logical device configuration
Hash interface for creating and managing OpenCL tokens
Fortran bindings
Timer utilities
Support for multiple OpenCL platforms in single configuration (ICD - installable client driver)
Convenience functions for event manipulation
Utilities for program manipulation, e.g., static compilation of input kernel source code

Example

const size_t ELEMENTS = 32;

int main(int argc, char ** argv) {
   size_t global_size = ELEMENTS;

   // initialize OpenCL runtime
   ocl_init();

   // create a host-side array
   float h_array[ELEMENTS];

   // initialize host-side array
   for(size_t i=0; i<ELEMENTS; ++i) {
      h_array[i] = 0.0;
   } // for

   // create a device-side array
   ocl_create_buffer(OCL_PERFORMANCE_DEVICE, "array", ELEMENTS*sizeof(float),
      CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, h_array);

   // create program source from static input string
   char * source = NULL;
   ocl_add_from_string(test_PPSTR, &source, 0);

   // add program
   ocl_add_program(OCL_PERFORMANCE_DEVICE, "program", source, "-DMY_DEFINE");
   free(source);

   // add kernel
   ocl_add_kernel(OCL_PERFORMANCE_DEVICE, "program", "test", "my test");

   // use hints interface to decide what work-group size to use
   ocl_kernel_hints_t hints;
   size_t work_group_indeces;
   size_t single_indeces;

   // get kernel hints
   ocl_kernel_hints(OCL_DEFAULT_DEVICE, "program", "my test", &hints);

   // heuristic for how to execute global_size work-items
   ocl_ndrange_hints(global_size, hints.max_work_group_size,
      0.5, 0.5, &local_size, &work_group_indeces, &single_indeces);

   // set kenerl argument
   ocl_set_kernel_arg_buffer("program", "my test", "array", 0);

   // initialize event for timings
   ocl_initialize_event(&event);

   // invoke kernel
   ocl_enqueue_kernel_ndrange(OCL_PERFORMANCE_DEVICE, "program",
      "my test", 1, &global_offset, &global_size, &local_size, &event);

   // block for kernel completion
   ocl_finish(OCL_PERFORMANCE_DEVICE);

   // add a timer event for the kernel invocation
   ocl_add_timer("kernel", &event);

   // read data from device
   ocl_enqueue_read_buffer(OCL_PERFORMANCE_DEVICE, "array", 1, offset,
      ELEMENTS*sizeof(float), h_array, &event);

   // print data read from device
   for(size_t i=0; i<ELEMENTS; ++i) {
      fprintf(stderr, "%f\n", h_array[i]);
   } // for
   fprintf(stderr, "\n");

   // print timer results
   ocl_report_timer("kernel");

   // finalize OpenCL runtime
   ocl_finalize();
}