Name

    NV_shader_atomic_fp16_vector

Name Strings

    GL_NV_shader_atomic_fp16_vector

Contact

    Jeff Bolz, NVIDIA Corporation (jbolz 'at' nvidia.com)

Contributors

    Pat Brown, NVIDIA
    Mathias Heyer, NVIDIA

Status

    Shipping

Version

    Last Modified Date:         February 4, 2015
    NVIDIA Revision:            3

Number

    OpenGL Extension #474
    OpenGL ES Extension #261

Dependencies

    This extension is written against the OpenGL 4.3 (Compatibility Profile)
    Specification.

    This extension is written against version 4.30 of the OpenGL Shading
    Language Specification.

    This extension interacts with NV_shader_buffer_store and NV_gpu_shader5.

    This extension interacts with NV_gpu_program5, NV_shader_buffer_store, and
    NV_gpu_program5_mem_extended.

    This extension requires NV_gpu_shader5.

    This extension interacts with NV_shader_storage_buffer_object.

    This extension interacts with NV_compute_program5.

    This extension interacts with NV_image_formats.

    This extension interacts with OES_shader_image_atomic.

Overview

    This extension provides GLSL built-in functions and assembly opcodes
    allowing shaders to perform a limited set of atomic read-modify-write
    operations to buffer or texture memory with 16-bit floating point vector
    surface formats.

New Procedures and Functions

    None.

New Tokens

    None.

Additions to the AGL/GLX/WGL Specifications

    None.

GLX Protocol

    None.

Modifications to the OpenGL Shading Language Specification, Version 4.30

    Including the following line in a shader can be used to control the
    language features described in this extension:

      #extension GL_NV_shader_atomic_fp16_vector : <behavior>

    where <behavior> is as specified in section 3.3.

    New preprocessor #defines are added to the OpenGL Shading Language:

      #define GL_NV_shader_atomic_fp16_vector         1

    Modify Section 8.11, Atomic Memory Functions (p. 163)

    Add before the table of functions:

    Some atomic memory operations are supported on two- and four-component
    vectors with 16-bit floating-point components.

    Add new functions to the table

        // Computes a new value per-component using the specified operation.
        // Atomicity is only guaranteed on a per-component basis.
        f16vec2 atomicAdd(inout f16vec2 mem, f16vec2 data);
        f16vec4 atomicAdd(inout f16vec4 mem, f16vec4 data);
        f16vec2 atomicMin(inout f16vec2 mem, f16vec2 data);
        f16vec4 atomicMin(inout f16vec4 mem, f16vec4 data);
        f16vec2 atomicMax(inout f16vec2 mem, f16vec2 data);
        f16vec4 atomicMax(inout f16vec4 mem, f16vec4 data);
        f16vec2 atomicExchange(inout f16vec2 mem, f16vec2 data);
        f16vec4 atomicExchange(inout f16vec4 mem, f16vec4 data);


    Modify Section 8.12, Image Functions (p. 164)

    Add before the table of functions:

    Some atomic memory operations are supported on two- and four-component
    vectors with 16-bit floating-point components, for images with format
    qualifiers of <rg16f> and <rgba16f>.

    Add new functions to the table:

        // Computes a new value per-component using the specified operation
        // Atomicity is only guaranteed on a per-component basis.
        f16vec2 imageAtomicAdd(IMAGE_PARAMS, f16vec2 data);
        f16vec4 imageAtomicAdd(IMAGE_PARAMS, f16vec4 data);
        f16vec2 imageAtomicMin(IMAGE_PARAMS, f16vec2 data);
        f16vec4 imageAtomicMin(IMAGE_PARAMS, f16vec4 data);
        f16vec2 imageAtomicMax(IMAGE_PARAMS, f16vec2 data);
        f16vec4 imageAtomicMax(IMAGE_PARAMS, f16vec4 data);
        f16vec2 imageAtomicExchange(IMAGE_PARAMS, f16vec2 data);
        f16vec4 imageAtomicExchange(IMAGE_PARAMS, f16vec4 data);

Dependencies on OES_shader_image_atomic

    If implemented in OpenGL ES and OES_shader_image_atomic is not
    supported, do not introduce additional imageAtomic* functions.

Dependencies on NV_image_formats

    If implemented in OpenGL ES and NV_image_formats is not
    supported, remove references to two-component images of format
    <rg16f>.

Dependencies on NV_shader_buffer_store and NV_gpu_shader5
    If NV_shader_buffer_store and NV_gpu_shader5 are supported, the following
    functions should be added to the "Section 8.Y, Shader Memory Functions"
    language in the NV_shader_buffer_store specification:

      // Computes a new value per-component using the specified operation
      // Atomicity is only guaranteed on a per-component basis.
      f16vec2 atomicAdd(f16vec2 *address, f16vec2 data);
      f16vec4 atomicAdd(f16vec4 *address, f16vec4 data);
      f16vec2 atomicMin(f16vec2 *address, f16vec2 data);
      f16vec4 atomicMin(f16vec4 *address, f16vec4 data);
      f16vec2 atomicMax(f16vec2 *address, f16vec2 data);
      f16vec4 atomicMax(f16vec4 *address, f16vec4 data);
      f16vec2 atomicExchange(f16vec2 *address, f16vec2 data);
      f16vec4 atomicExchange(f16vec4 *address, f16vec4 data);

Dependencies on NV_gpu_program5, NV_shader_buffer_store, and
NV_gpu_program5_mem_extended

    If NV_gpu_program5 is supported and "OPTION NV_shader_atomic_fp16_vector"
    is specified in an assembly program, "F16X2" and "F16X4" should be allowed
    as storage modifiers to the ATOM instruction for the atomic operations
    "ADD", "MIN", "MAX" and "EXCH". These operate on each of the two or four
    fp16 values independently. Atomicity is only guaranteed on a per-component
    basis.

    (Add to "Section 2.X.6, Program Options" of the NV_gpu_program4 extension,
    as extended by NV_gpu_program5:)

      + Floating-Point Vector Atomic Operations (NV_shader_atomic_fp16_vector)

      If a program specifies the "NV_shader_atomic_fp16_vector" option, it may
      use the "F16X2" and "F16X4" storage modifiers with the "ATOM" opcodes to
      perform atomic floating-point add or exchange operations.

    (Add to the table in "Section 2.X.8.Z, ATOM" in NV_gpu_program5:)

      atomic     storage
      modifier   modifiers            operation
      --------   ------------------   --------------------------------------
       ADD       U32, S32, U64,       compute a sum
                 F16X2, F16X4
       MIN       U32, S32,            compute minimum
                 F16X2, F16X4
       MAX       U32, S32,            compute maximum
                 F16X2, F16X4
       EXCH      U32, S32, F32        exchange memory with operand
                 F16X2, F16X4
       ...

Dependencies on EXT_shader_image_load_store and NV_gpu_program5

    If EXT_shader_image_load_store and NV_gpu_program5 are supported and
    "OPTION NV_shader_atomic_fp16_vector" is specified in an assembly program,
    "F16X2" and "F16X4" should be allowed as storage modifiers to the ATOMIM
    instruction for the atomic operations "ADD", "MIN", "MAX", and "EXCH".
    These operate on each of the two or four fp16 values independently.
    Atomicity is only guaranteed on a per-component basis.

    (Add to the table in "Section 2.X.8.Z, ATOMIM" in the "Dependencies on
    NV_gpu_program5" portion of the EXT_shader_image_load specification)

      atomic     storage
      modifier   modifiers       operation
      --------   -------------   --------------------------------------
       ADD       U32, S32,       compute a sum
                 F16X2, F16X4
       MIN       U32, S32,       compute minimum
                 F16X2, F16X4
       MAX       U32, S32,       compute maximum
                 F16X2, F16X4
       EXCH      U32, S32, F32   exchange memory with operand
                 F16X2, F16X4
       ...

Dependencies on NV_compute_program5

    If NV_compute_program5 is supported and "OPTION
    NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
    and "F16X4" should be allowed as storage modifiers to the ATOMB instruction
    for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
    each of the two or four fp16 values independently. Atomicity is only
    guaranteed on a per-component basis.

    (Add to the table in "Section 2.X.8.Z, ATOMB" in the "Dependencies on
    NV_gpu_program5" portion of the NV_shader_storage_buffer_object
    specification)

      atomic     storage
      modifier   modifiers          operation
      --------   -------------      --------------------------------------
       ADD       U32, S32, U64      compute a sum
                 F32, F16X2, F16X4
       MIN       U32, S32,          compute minimum
                 F16X2, F16X4
       MAX       U32, S32,          compute maximum
                 F16X2, F16X4
       EXCH      U32, S32, F32      exchange memory with operand
                 F16X2, F16X4
       ...

Dependencies on NV_shader_storage_buffer_object

    If NV_shader_storage_buffer_object is supported and "OPTION
    NV_shader_atomic_fp16_vector" is specified in an assembly program, "F16X2"
    and "F16X4" should be allowed as storage modifiers to the ATOMS instruction
    for the atomic operations "ADD", "MIN", "MAX", and "EXCH". These operate on
    each of the two or four fp16 values independently. Atomicity is only
    guaranteed on a per-component basis.

    (Add to the table in "Section 2.X.8.Z, ATOMS" in the "Dependencies on
    NV_gpu_program5" portion of the NV_compute_program5 specification)

      atomic     storage
      modifier   modifiers          operation
      --------   -------------      --------------------------------------
       ADD       U32, S32, U64      compute a sum
                 F32, F16X2, F16X4
       MIN       U32, S32,          compute minimum
                 F16X2, F16X4
       MAX       U32, S32,          compute maximum
                 F16X2, F16X4
       EXCH      U32, S32, F32      exchange memory with operand
                 F16X2, F16X4
       ...


Errors

    None.

New State

    None.

New Implementation Dependent State

    None.

Issues

    (1) Should we allow "partial" atomics to a f16vec2 or f16vec4, only
    modifying some of the components?

    RESOLVED: No. If an app really cares to do this, they could inject
    "special" values in those components that cause the atomic to have no
    effect for that component (e.g. add zero, max with -infinity, etc).  This
    would work for atomicAdd, atomicMin, and atomicMax, but not for
    atomicExchange.

    (2) Are these vector atomics guaranteed to update all components of the
    vector atomically?

    RESOLVED:  No.  The spec only guarantees that individual components of a
    vector be updated atomically.  The initial implementation of this
    extension will only atomically update pairs of components.  For many of
    the algorithms supported by this extension (computing component-wise sums,
    minimums, or maximums of multi-component vectors), it is not necessary to
    update all components in a vector as a single unit.

    (3) What support should we provide for four-component vectors?

    RESOLVED:  All of image, global, buffer, and shared memory atomic
    operations will fully support two- and four-component variants.  While one
    might emulate some four-component atomic operations using pairs of
    two-component operations, we choose to support four-component operations
    universally.  Supporting atomics on four-component vectors seems useful,
    as it supports computing sums, minimums, or maximums on RGBA color values
    and other data with more than two components.

Revision History

    Revision 2
    - Add OpenGL ES interactions
    Revision 1
    - Internal revisions.
