OpenCL Mid-Level Abstractions Project
This project is maintained by tuxfan
OCL-MLA is exactly what its name implies: a mid-level set of abstractions to make OpenCL development easier. OCL-MLA provides a set of compile-time configurable logical devices that are mapped to actual node-level device resources. This removes the normal boiler-plate configuration that many people find intimidating and tedious. Logical devices are pre-configured (think MPI_COMM_WORLD communicator) and initialized with a single call to ocl_init(). OCL-MLA insulates the application developer from differences in particular compute devices accessed by the OpenCL runtime, while still allowing an expert OpenCL administrator to choose how each physical device is configured and used. Additionally, OCL-MLA provides a convenience hash-table interface for creating and accessing OpenCL constructs such as kernels, programs and buffers. OCL-MLA supports C and Fortran APIs.
const size_t ELEMENTS = 32;
int main(int argc, char ** argv) {
size_t global_size = ELEMENTS;
// initialize OpenCL runtime
ocl_init();
// create a host-side array
float h_array[ELEMENTS];
// initialize host-side array
for(size_t i=0; i<ELEMENTS; ++i) {
h_array[i] = 0.0;
} // for
// create a device-side array
ocl_create_buffer(OCL_PERFORMANCE_DEVICE, "array", ELEMENTS*sizeof(float),
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, h_array);
// create program source from static input string
char * source = NULL;
ocl_add_from_string(test_PPSTR, &source, 0);
// add program
ocl_add_program(OCL_PERFORMANCE_DEVICE, "program", source, "-DMY_DEFINE");
free(source);
// add kernel
ocl_add_kernel(OCL_PERFORMANCE_DEVICE, "program", "test", "my test");
// use hints interface to decide what work-group size to use
ocl_kernel_hints_t hints;
size_t work_group_indeces;
size_t single_indeces;
// get kernel hints
ocl_kernel_hints(OCL_DEFAULT_DEVICE, "program", "my test", &hints);
// heuristic for how to execute global_size work-items
ocl_ndrange_hints(global_size, hints.max_work_group_size,
0.5, 0.5, &local_size, &work_group_indeces, &single_indeces);
// set kenerl argument
ocl_set_kernel_arg_buffer("program", "my test", "array", 0);
// initialize event for timings
ocl_initialize_event(&event);
// invoke kernel
ocl_enqueue_kernel_ndrange(OCL_PERFORMANCE_DEVICE, "program",
"my test", 1, &global_offset, &global_size, &local_size, &event);
// block for kernel completion
ocl_finish(OCL_PERFORMANCE_DEVICE);
// add a timer event for the kernel invocation
ocl_add_timer("kernel", &event);
// read data from device
ocl_enqueue_read_buffer(OCL_PERFORMANCE_DEVICE, "array", 1, offset,
ELEMENTS*sizeof(float), h_array, &event);
// print data read from device
for(size_t i=0; i<ELEMENTS; ++i) {
fprintf(stderr, "%f\n", h_array[i]);
} // for
fprintf(stderr, "\n");
// print timer results
ocl_report_timer("kernel");
// finalize OpenCL runtime
ocl_finalize();
}