Copyright (C) 2024, Advanced Micro Devices, Inc.

Copyright (C) 2014, The University of Texas at Austin

AOCL-BLAS - Release Notes - version 5.0.0
--------------------------------------------

AOCL-BLAS is a portable software framework for instantiating high-performance 
BLAS-like dense linear algebra libraries. The framework was designed to isolate 
essential kernels of computation that enable optimized 
implementations of most of its commonly used and computationally intensive 
operations.  AMD has extensively optimized the implementation of BLIS for AMD processors. 

Highlights of AOCL-BLAS 5.0.0
--------------------------------

- Zen5 configuration on Turin.
- Turin optimizations for D/ZGEMM, DTRSM, and DNRM2 APIs.
- LPGEMM features and bug fixes.
	* Added Trans A feature for all INT8 LPGEMM APIs
	* Matrix Add post-operation support for integer(s16|s32) LPGEMM APIs.
	* Implemented optimal AVX512-variant of f32 LPGEMV.
	* SWISH post-op support for all LPGEMM APIs.
- CMake build system changes and bug fixes
	* static and shared build for Linux.
	* ILP64 build on Windows.
- Gtest testcase update for below APIs.
	* GEMM, GER, GEMV, TRSV
	* GEMMT, AMAXV, SWAPV and TRSM
	* AXPBYV, AXPYV and COPYV API
	* DOTV, SCALV and ASUMV, NRM2
- GTestSuite: BLAS1, BLAS2, BLAS3 thresholds.
- Test-case development for ?OMATCOPY APIs.
- Improve ZTRSM performance on zen4.
- AVX-512 improvements:
	* ZGEMV, D/ZAXPYF, D/ZDOTXF, ZDOTV, C/ZSCALV, DNRM2, S/D/ZCOPY
	* S/D/C/ZAXPBYV, DTRSV, DGEMMT, D/ZTRSM, and D/ZGEMM
- AOCL_ENABLE_INSTRUCTIONS improvements.
- Minor bug fixes.
 

Please refer AOCL User Guide for supported Operating Systems and Compilers.

The package contains AOCL-BLAS Library binaries which includes optimizations for
the AMD EPYC and AMD Ryzen processor families, header files and examples.

