Openblas backend implementation#722
Conversation
|
Thank you much for your contribution, @CDAC-SSDG !
|
|
Hi @CDAC-SSDG - could you please apply the changes from the clang-format check so it passes? Thanks! |
sknepper
left a comment
There was a problem hiding this comment.
@CDAC-SSDG - thank you very much for this valuable contribution! The changes look good to me.
@ndingle-arm - could you please check the latest changes and verify that all your questions have been answered?
| *******************************************************************************/ | ||
|
|
||
| #ifndef _NETLIB_COMMON_HPP_ | ||
| #define _NETLIB_COMMON_HPP_ |
There was a problem hiding this comment.
I think this should be OPENBLAS_COMMON_HPP , or else you are using the same guard as in src/blas/backends/netlib/netlib_common.hpp and therefore risk inadvertently excluding one of these two files.
| float* A = reinterpret_cast<float*>(acc.get_pointer()); | ||
| const float* alpha_ptr = reinterpret_cast<const float*>(&alpha); | ||
|
|
||
| CBLAS_TRANSPOSE t = (trans == transpose::nontrans) ? CblasNoTrans : CblasTrans; |
There was a problem hiding this comment.
This maps every non-notrans case to CblasTrans and does not differentiate between CblasTrans and CblasConjTrans like it should. Compare with complex omatcopy above, which does:
CBLAS_TRANSPOSE t;
if (trans == transpose::nontrans) {
t = CblasNoTrans;
}
else if (trans == transpose::trans) {
t = CblasTrans;
}
else {
t = CblasConjTrans;
}
| double* A = reinterpret_cast<double*>(acc.get_pointer()); | ||
| const double* alpha_ptr = reinterpret_cast<const double*>(&alpha); | ||
|
|
||
| CBLAS_TRANSPOSE t = (trans == transpose::nontrans) ? CblasNoTrans : CblasTrans; |
There was a problem hiding this comment.
Same comment as for cimatcopy -- it needs to consider conjugate-transposition as well.
| blasint incX = (blasint)incx; | ||
| blasint incY = (blasint)incy; | ||
|
|
||
| cblas_saxpby(N, alpha, X, incX, beta, Y, incY); |
There was a problem hiding this comment.
I am not sure why [cdsz]axpby calls the corresponding CBLAS routine directly, but [cdsz]axpy and all other functions here go via the queue, e.g. in axpy it does queue.submit([&](sycl::handler& cgh) {...} but here queue is not used at all (and in other axpby precisions too)?
| const float* B = B0 + i * stride_b; | ||
| float* C = C0 + i * stride_c; | ||
|
|
||
| cblas_sgemm(order, tA, tB, (blasint)m, (blasint)n, (blasint)k, alpha, A, (blasint)lda, B, |
There was a problem hiding this comment.
I would expect [cdsz]gemm_batch to go via queue.submit like the other BLAS calls, rather than calling the CBLAS function directly (this comment occurs in several files).
| #endif | ||
|
|
||
| #ifdef ROW_MAJOR | ||
| cblas_somatcopy(CblasRowMajor, t, m, n, alpha, A, lda, B, ldb); |
There was a problem hiding this comment.
As above, I would expect these calls to use queue.submit() rather than direct CBLAS calls, based on how other backends are implmented.
| CBLAS_TRANSPOSE t = (trans == transpose::nontrans) ? CblasNoTrans : CblasTrans; | ||
|
|
||
| #ifdef COLUMN_MAJOR | ||
| cblas_simatcopy(CblasColMajor, t, m, n, alpha, A, lda, ldb); |
There was a problem hiding this comment.
As above, I would expect these calls to use queue.submit() rather than direct CBLAS calls, based on how other backends are implmented.
| target_include_directories(${LIB_OBJ} | ||
| PUBLIC | ||
| ${ONEMATH_INCLUDE_DIRS} | ||
| ${OPENBLAS_INCLUDE} |
There was a problem hiding this comment.
Backend-specific include directories are exported as PRIVATE by other backends (e.g. src/blas/backends/netlib/CMakeLists.txt:39) . Should OpenBLAS do the same? Otherwise a build machine’s OpenBLAS header path will be exported as part of its public CMake interface.
ndingle-arm
left a comment
There was a problem hiding this comment.
Thank you for the change @CDAC-SSDG but I spotted a few more things reading through again.
Description
This PR adds support for using OpenBLAS as a new backend within oneMath for BLAS operations.
The implementation follows the design proposed in RFC#707 and integrates OpenBLAS as a CPU backend through the existing backend abstraction layer. BLAS operations are mapped to the OpenBLAS interface while preserving the oneMath API and execution model.
The implementation provides coverage across Level 1, Level 2, and Level 3 BLAS routines, along with support for batch operations and relevant BLAS extensions where applicable. Functionality is aligned with OpenBLAS capabilities.
Build system integration has been added to enable selection of the OpenBLAS backend through CMake configuration. The implementation has been validated with both DPC++ and AdaptiveCpp toolchains.
Documentation has been added describing how to configure and build the project with OpenBLAS support:
docs/building_the_project_with_openblas.rstMotivation and Context
This change provides an additional backend option for BLAS operations using OpenBLAS, allowing users to build and run oneMath in environments where other backends may not be available or desired. It improves flexibility in backend selection while remaining consistent with the modular backend design of oneMath.
Testing
The implementation has been built and tested, test logs are attached with this PR.
OpenBlas with DPC++ unit tests log.txt