Monday, June 19, 2017

Basis's RDO DXTc compression API

This is a work in progress, but here's the API to the new rate distortion optimizing DXTc codec I've been working on for Basis. There's only one function (excluding basis_get_version()): basis_rdo_dxt_encode(). You call it with some encoding parameters and an array of input images (or "slices"), and it gives you back a blob of DXTc blocks which you then feed to any LZ codec like zlib, zstd, LZHAM, Oodle, etc.

The output DXTc blocks are organized in simple raster order, with slice 0's blocks first, then slice 1's, etc. The slices could be mipmap levels, or cubemap faces, etc. For highest compression, it's very important to feed the output blocks to the LZ codec in the order that this function gives them back to you.

On my near-term TODO list is to allow the user to specify custom per-channel weightings, and to add more color distance functions. Right now it supports either uniform weights, or a custom model for sRGB colorspace photos/textures. Also, I may expose optional per-slice weightings (for mipmaps).

I'm shipping the first version (as a Windows DLL) tomorrow.

// File: basis_rdo_dxt_public.h
#pragma once

#include <stdlib.h>
#include <memory.h>

#ifdef BASIS_DLL_EXPORTS
#define BASIS_DLL_EXPORT __declspec(dllexport)  
#else  
#define BASIS_DLL_EXPORT
#endif  

#if defined(_MSC_VER)
#define BASIS_CDECL __cdecl
#else
#define BASIS_CDECL
#endif

namespace basis
{
    // The codec's current version number.
    const int BASIS_CODEC_VERSION = 0x0106; 
    
    // The codec can accept rdo_dxt_params's from previous versions for backwards compatibility purposes. This is the oldest version it accepts.
    const int BASIS_CODEC_MIN_COMPATIBLE_VERSION = 0x0106;

    typedef unsigned int basis_uint;
    typedef basis_uint rdo_dxt_bool;
    typedef float basis_float;

    enum rdo_dxt_format
    {
        cRDO_DXT1 = 0,
        cRDO_DXT5,
        cRDO_DXN,
        cRDO_DXT5A,

        cRDO_DXT_FORCE_DWORD = 0xFFFFFFFF
    };


    enum rdo_dxt_encoding_speed_t
    {
        cEncodingSpeedSlowest,
        cEncodingSpeedFaster,
        cEncodingSpeedFastest
    };

    const basis_uint RDO_DXT_STRUCT_VERSION = 0xABCD0001;

    const basis_uint RDO_DXT_QUALITY_MIN = 1;
    const basis_uint RDO_DXT_QUALITY_MAX = 255;
    const basis_uint RDO_DXT_MAX_CLUSTERS = 32768;
        
    struct rdo_dxt_params
    {
        basis_uint m_struct_size;
        basis_uint m_struct_version;

        rdo_dxt_format m_format;

        basis_uint m_quality;

        basis_uint m_alpha_component_indices[2];

        basis_uint m_lz_max_match_dist;

        // Output block size to use in RDO optimization stage, note this does NOT impact the blocks written to pOutput_blocks by basis_rdo_dxt_encode()
        basis_uint m_output_block_size;

        basis_uint m_num_color_endpoint_clusters;
        basis_uint m_num_color_selector_clusters;

        basis_uint m_num_alpha_endpoint_clusters;
        basis_uint m_num_alpha_selector_clusters;

        basis_float m_l;
        basis_float m_selector_rdo_quality_threshold;
        basis_float m_selector_rdo_quality_threshold_low;

        basis_float m_block_max_y_std_dev_rdo_quality_scaler;

        basis_uint m_endpoint_refinement_steps;
        basis_uint m_selector_refinement_steps;
        basis_uint m_final_block_refinement_steps;

        basis_float m_adaptive_tile_color_psnr_derating;
        basis_float m_adaptive_tile_alpha_psnr_derating;

        basis_uint m_selector_rdo_max_search_distance;

        basis_uint m_endpoint_search_height;
        basis_uint m_endpoint_search_width_first_line;
        basis_uint m_endpoint_search_width_other_lines;

        rdo_dxt_bool m_optimize_final_endpoint_clusters;
        rdo_dxt_bool m_optimize_final_selector_clusters;

        rdo_dxt_bool m_srgb_metrics;
        rdo_dxt_bool m_debugging;
        rdo_dxt_bool m_debug_output;
        rdo_dxt_bool m_hierarchical_mode;
        rdo_dxt_bool m_multithreaded;
        rdo_dxt_bool m_use_sse41_if_available;
    };

    inline void rdo_dxt_params_set_encoding_speed(rdo_dxt_params *p, rdo_dxt_encoding_speed_t encoding_speed)
    {
        if (encoding_speed == cEncodingSpeedFaster)
        {
            p->m_endpoint_refinement_steps = 1;
            p->m_selector_refinement_steps = 1;
            p->m_final_block_refinement_steps = 1;

            p->m_selector_rdo_max_search_distance = 3072;
        }
        else if (encoding_speed == cEncodingSpeedFastest)
        {
            p->m_endpoint_refinement_steps = 1;
            p->m_selector_refinement_steps = 1;
            p->m_final_block_refinement_steps = 0;

            p->m_selector_rdo_max_search_distance = 2048;
        }
        else
        {
            p->m_endpoint_refinement_steps = 2;
            p->m_selector_refinement_steps = 2;
            p->m_final_block_refinement_steps = 1;

            p->m_endpoint_search_width_first_line = 2;
            p->m_endpoint_search_height = 3;

            p->m_selector_rdo_max_search_distance = 4096;
        }
    }

    inline void rdo_dxt_params_set_to_defaults(rdo_dxt_params *p, rdo_dxt_encoding_speed_t default_speed = cEncodingSpeedFaster)
    {
        memset(p, 0, sizeof(rdo_dxt_params));

        p->m_struct_size = sizeof(rdo_dxt_params);
        p->m_struct_version = RDO_DXT_STRUCT_VERSION;

        p->m_format = cRDO_DXT1;

        p->m_quality = 128;

        p->m_alpha_component_indices[0] = 0;
        p->m_alpha_component_indices[1] = 1;

        p->m_l = .001f;

        p->m_selector_rdo_quality_threshold = 1.75f;
        p->m_selector_rdo_quality_threshold_low = 1.3f;

        p->m_block_max_y_std_dev_rdo_quality_scaler = 8.0f;

        p->m_lz_max_match_dist = 32768;
        p->m_output_block_size = 8;

        p->m_endpoint_refinement_steps = 1;
        p->m_selector_refinement_steps = 1;
        p->m_final_block_refinement_steps = 1;

        p->m_adaptive_tile_color_psnr_derating = 1.5f;
        p->m_adaptive_tile_alpha_psnr_derating = 1.5f;
        p->m_selector_rdo_max_search_distance = 0;

        p->m_optimize_final_endpoint_clusters = true;
        p->m_optimize_final_selector_clusters = true;

        p->m_selector_rdo_max_search_distance = 3072;

        p->m_endpoint_search_height = 1;
        p->m_endpoint_search_width_first_line = 1;
        p->m_endpoint_search_width_other_lines = 1;

        p->m_hierarchical_mode = true;

        p->m_multithreaded = true;
        p->m_use_sse41_if_available = true;

        rdo_dxt_params_set_encoding_speed(p, default_speed);
    }
        
    const basis_uint RDO_DXT_MAX_IMAGE_DIMENSION = 16384;

    struct rdo_dxt_slice_desc
    {
        // Pixel dimensions of this slice. A slice may be a mipmap level, a cubemap face, a video frame, or whatever.
        basis_uint m_image_width;
        basis_uint m_image_height;
        basis_uint m_image_pitch_in_pixels;

        // Pointer to 32-bit raster image. Format in memory: RGBA (R is first byte, A is last)
        const void *m_pImage_pixels;
    };

} // namespace basis

extern "C" BASIS_DLL_EXPORT basis::basis_uint BASIS_CDECL basis_get_version();
extern "C" BASIS_DLL_EXPORT basis::basis_uint BASIS_CDECL basis_get_minimum_compatible_version();

extern "C" BASIS_DLL_EXPORT bool BASIS_CDECL basis_rdo_dxt_encode(
    const basis::rdo_dxt_params *pEncoder_params,
    basis::basis_uint total_input_image_slices, const basis::rdo_dxt_slice_desc *pInput_image_slices,
    void *pOutput_blocks, basis::basis_uint output_blocks_size_in_bytes);

2 comments:

  1. This is going to be a bit nit-picky, but API reviews should be…

    Looks like it wouldn't be too difficult to provide a C99 (89 if you don't use bool) API instead of C++, which would be nice.

    Instead of RDO_QUALITY_MIN/MAX, why not just make that field a unsigned char (or uint8_t)? 0 could mean default (128).

    Why bother with typedefs for rdo_dxt_uint and rdo_dxt_bool? Why not just use unsigned int and bool directly? I can understand not wanting to use bool directly in C, so I guess rdo_dxt_uint could be for consistency, but then why no rdo_dxt_float?

    You could drop by doing *p = { 0, }, or is that a C-only thing? Better yet, get rid of the inline functions and just have a const struct rdo_dxt_params with the default values set. Instead of rdo_dxt_params_set_to_defaults(&foo), people could just do foo = rdo_dxt_default_params; (or whatever you want to call it). That would be especially nice for params on the stack. An inline function doesn't save you from issues with a mismatched header and library anyway, though it might not be a bad idea to make rdo_dxt_params_set_to_defaults a non-inline function in order to let people work around that.

    It might be useful to provide a row stride member in struct rdo_dxt_slice_desc to improve interoperability.

    If you reverse the last two arguments in basis_rdo_dxt_encode, in C99 you could use conformant array parameters (see ). Doesn't work with MSVC (it's one of the things from C99 they *still* don't support), or C++, though, so you'd need to hide it behind a macro. Still, good for documenting the API, and some C compilers and static analyzers can provide warnings if the memory block provided is too small.

    If you don't want reverse the last two args, you should probably reverse total_input_image_slices and pInput_image_slices. Right now you have two pairs of parameters with inconsistent ordering; one is (length, data) the other is (data, length).

    I'm not a fan of the "m_" prefix for struct members, but you're consistent, so it's not really a big deal.

    ReplyDelete
  2. Did you try to look into x264 rdo (https://github.com/pps83/x264/blob/master/encoder/rdo.c)? Not sure how much relevant could it be, but in general I think everything in x264 is very well done and super optimized.

    ReplyDelete