Skip to content

Openblas backend implementation#722

Open
CDAC-SSDG wants to merge 6 commits into
uxlfoundation:developfrom
CDAC-Bengaluru:openblas-backend-implementation
Open

Openblas backend implementation#722
CDAC-SSDG wants to merge 6 commits into
uxlfoundation:developfrom
CDAC-Bengaluru:openblas-backend-implementation

Conversation

@CDAC-SSDG
Copy link
Copy Markdown

Description

This PR adds support for using OpenBLAS as a new backend within oneMath for BLAS operations.

The implementation follows the design proposed in RFC#707 and integrates OpenBLAS as a CPU backend through the existing backend abstraction layer. BLAS operations are mapped to the OpenBLAS interface while preserving the oneMath API and execution model.

The implementation provides coverage across Level 1, Level 2, and Level 3 BLAS routines, along with support for batch operations and relevant BLAS extensions where applicable. Functionality is aligned with OpenBLAS capabilities.

Build system integration has been added to enable selection of the OpenBLAS backend through CMake configuration. The implementation has been validated with both DPC++ and AdaptiveCpp toolchains.

Documentation has been added describing how to configure and build the project with OpenBLAS support:

  • docs/building_the_project_with_openblas.rst

Motivation and Context

This change provides an additional backend option for BLAS operations using OpenBLAS, allowing users to build and run oneMath in environments where other backends may not be available or desired. It improves flexibility in backend selection while remaining consistent with the modular backend design of oneMath.

Testing

The implementation has been built and tested, test logs are attached with this PR.
OpenBlas with DPC++ unit tests log.txt

@CDAC-SSDG CDAC-SSDG requested review from a team as code owners April 13, 2026 14:36
@sknepper
Copy link
Copy Markdown
Contributor

Thank you much for your contribution, @CDAC-SSDG !
A few questions/comments:

  1. More domains than just BLAS had changes. Could you please provide an explanation for those changes? Ideally, separating commits can also help in the review process, but that's probably too challenging at this point.
  2. The clang-format check is failing - could you please apply the changes it suggests so that check passes?
  3. Some of the ArmPL tests are failing - I'm guessing this isn't due to your changes, but could you please check, particularly for the BLAS domain?
  4. As a next step after this PR is merged, having github runners testing the OpenBLAS backend would be a nice check to add (but not required).

Comment thread cmake/mkl/MKLConfig.cmake Outdated
Comment thread include/oneapi/math/detail/backends_table.hpp Outdated
Comment thread tests/unit_tests/CMakeLists.txt Outdated
@sknepper
Copy link
Copy Markdown
Contributor

Hi @CDAC-SSDG - could you please apply the changes from the clang-format check so it passes? Thanks!

Copy link
Copy Markdown
Contributor

@sknepper sknepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CDAC-SSDG - thank you very much for this valuable contribution! The changes look good to me.
@ndingle-arm - could you please check the latest changes and verify that all your questions have been answered?

*******************************************************************************/

#ifndef _NETLIB_COMMON_HPP_
#define _NETLIB_COMMON_HPP_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be OPENBLAS_COMMON_HPP , or else you are using the same guard as in src/blas/backends/netlib/netlib_common.hpp and therefore risk inadvertently excluding one of these two files.

float* A = reinterpret_cast<float*>(acc.get_pointer());
const float* alpha_ptr = reinterpret_cast<const float*>(&alpha);

CBLAS_TRANSPOSE t = (trans == transpose::nontrans) ? CblasNoTrans : CblasTrans;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This maps every non-notrans case to CblasTrans and does not differentiate between CblasTrans and CblasConjTrans like it should. Compare with complex omatcopy above, which does:

CBLAS_TRANSPOSE t;
if (trans == transpose::nontrans) {
    t = CblasNoTrans;
}
else if (trans == transpose::trans) {
    t = CblasTrans;
}
else {
    t = CblasConjTrans;
}

double* A = reinterpret_cast<double*>(acc.get_pointer());
const double* alpha_ptr = reinterpret_cast<const double*>(&alpha);

CBLAS_TRANSPOSE t = (trans == transpose::nontrans) ? CblasNoTrans : CblasTrans;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as for cimatcopy -- it needs to consider conjugate-transposition as well.

blasint incX = (blasint)incx;
blasint incY = (blasint)incy;

cblas_saxpby(N, alpha, X, incX, beta, Y, incY);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why [cdsz]axpby calls the corresponding CBLAS routine directly, but [cdsz]axpy and all other functions here go via the queue, e.g. in axpy it does queue.submit([&](sycl::handler& cgh) {...} but here queue is not used at all (and in other axpby precisions too)?

const float* B = B0 + i * stride_b;
float* C = C0 + i * stride_c;

cblas_sgemm(order, tA, tB, (blasint)m, (blasint)n, (blasint)k, alpha, A, (blasint)lda, B,
Copy link
Copy Markdown
Contributor

@ndingle-arm ndingle-arm May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect [cdsz]gemm_batch to go via queue.submit like the other BLAS calls, rather than calling the CBLAS function directly (this comment occurs in several files).

#endif

#ifdef ROW_MAJOR
cblas_somatcopy(CblasRowMajor, t, m, n, alpha, A, lda, B, ldb);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I would expect these calls to use queue.submit() rather than direct CBLAS calls, based on how other backends are implmented.

CBLAS_TRANSPOSE t = (trans == transpose::nontrans) ? CblasNoTrans : CblasTrans;

#ifdef COLUMN_MAJOR
cblas_simatcopy(CblasColMajor, t, m, n, alpha, A, lda, ldb);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I would expect these calls to use queue.submit() rather than direct CBLAS calls, based on how other backends are implmented.

target_include_directories(${LIB_OBJ}
PUBLIC
${ONEMATH_INCLUDE_DIRS}
${OPENBLAS_INCLUDE}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend-specific include directories are exported as PRIVATE by other backends (e.g. src/blas/backends/netlib/CMakeLists.txt:39) . Should OpenBLAS do the same? Otherwise a build machine’s OpenBLAS header path will be exported as part of its public CMake interface.

Copy link
Copy Markdown
Contributor

@ndingle-arm ndingle-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the change @CDAC-SSDG but I spotted a few more things reading through again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants