           Conventions and Design in the FreeType library


TOC

Introduction

I. Style and Formatting

  1. Naming
  2. Declarations & Statements
  3. Blocks
  4. Macros

II. Usage conventions

  1. Error handling
  2. File Access: Frames
  3. Memory management (due to change soon).
  4. Support for multi-threaded environments.
  5. Support for re-entrancy.


Introduction:

This text introduces the many conventions used within the FreeType
library.  Please read it before trying any modifications or extensions
of the source code.

I. Style and Formatting:

The following coding rules are extremely important to keep the
library's source code homogeneous.  Keep in mind the following points :

  - "Humans read source code, not machines" (Donald Knuth)

    The library source code should be as readable as possible, even by
    non C experts.  By readable, two things are meant: first, the
    source code should be pleasant to the eye, with sufficient
    whitespace and newlines, to not look like a boring stack of
    characters stuck to each other.  Second, the source should be
    _expressive_ enough about its goals.  This convention contains
    rules that can help the source focus on its purpose, not on a
    particular implementation.

    
  - "Paper is the _ultimate_ debugger" (Myself)

    There is nothing like sheets of paper (and a large floor) to help
    you understand the design of a library you're new to, or to debug
    it.  The formatting style presented here is targeted at printing. 
    For example, it is more than highly recommended to never produce a
    source line that is wider than 78 columns.  More on this below.


1. Naming:

  a. Components:

    A unit of the library is called a 'component'.  Each component has
    at least an interface, and often a body.  The library comes in two
    language flavors, C and Pascal.  A C component is defined by two
    files, one '.h' header and one '.c' body, while a Pascal component
    is contained in a single '.pas' file.

    All component source file names begin with the 'tt' prefix, with
    the exception of the 'FreeType' component.  For example, the file
    component is implemented by the files 'ttfile.h', 'ttfile.c' and
    'ttfile.pas'.  Only lowercase letters should be used, following the
    8+3 naming convention to allow compilation under DOS.

    In the C version, a single component can have multiple bodies.  For
    example, 'ttfile.c' provides stream i/o through standard ANSI libc
    calls, while 'ttfile2.c' implements the same thing using one Unix
    memory-mapping API.

    The FreeType component is an interface-only component.


  b. Long and expressive labels:

   Never hesitate to use long labels for your types, variables, etc.! 
   Except maybe for things like very trivial types, the longest is the
   best, as it increases the source's _expressiveness_.  Never forget
   that the role of a label is to express the 'function' of the entity
   it represents, not its implementation!

   NOTE:   Hungarian notation is NOT expressive, as it sticks the
           'type' of a variable to its name.  A label like 'usFoo'
           rarely tells the use of the variable it represents.

           And the state of a variable (global, static, dynamic)
           isn't helpful anymore.

   Avoid Hungarian Notation like the *plague*!


   When forging a name with several nouns (e.g."number-of-points"), use
   an uppercase letter for the first word of each one, like:

     numberOfPoints

   you are also welcomed to introduce underscores '_' in your labels,
   especially when sticking large nouns together, as it 'airs' the code
   greatly.  E.g.:

     'numberOfPoints' or 'number_Of_Points'

     'IncredibleFunction' or 'Incredible_Function'
       
   And finally, always put a capital letter after an underscore, except
   in variable labels that are all lowercase:

     'number_of_points' is OK for a variable (_all_ lowercase label)

     'incredible_function' is NOT for a function!
      ^          ^

     'Microsoft_windows' is a *shame*!
      ^         ^

     'Microsoft_Windows' isn't really better, but at least its a
      ^         ^        correct label within this convention. :) 

  c. Types:

   All types that are defined for use by FreeType client applications
   are defined in the FreeType component.  All types defined there have
   a label beginning in 'TT_'.  For examples:

     TT_Stream, TT_F26Dot6, etc.

   However, the library uses a lot more of internal types that are
   defined in the Types and Tables components ('tttypes' & 'tttables'
   files).

   By convention, all internal types, except the simplest ones like
   integers, have their name beginning with a capital 'T', like in
   'TFoo'.  Note that the first letter of 'foo' is also capitalized. 
   The corresponding pointer type uses a capital 'P' instead, i.e. 
   (TFoo*) is simply named 'PFoo'. Examples:

      typedef struct  _TTableDir
      {
        TT_Fixed  version;        /* should be 0x10000 */
        UShort    numTables;      /* Tables number     */
    
        UShort    searchRange;    /* These parameters are only used  */
        UShort    entrySelector;  /* for a dichotomy search in the   */
        UShort    rangeShift;     /* directory. We ignore them.      */
      } TTableDir;
    
      typedef TTableDir*  PTableDir;
    
   Note that we _always_ define a typedef for structures.  The original
   struct label starts with '_T'.

   This convention is a famous one from the Pascal world.


   Try to use C or Pascal types to the very least!  Rely on internally
   defined equivalent types instead.  For example, not all compilers
   agree on the sign of 'char', the size of 'int' is platform-specific,
   etc.

   There are equivalents to the most common types in the types
   components, like 'Short', 'UShort', etc.  Using the internal types
   will guarantee that you won't need to replace every occurence of
   'short' or wathever when compiling on a weird platform or with a
   weird compiler, and there are many more than you could think of...

  d. Functions:  

   The name of a function should always begin with a capital letter, as
   lowercase first letters are reserved for variables.  The name of a
   function should be, again, _expressive_!  Never hesitate to put long
   function names in your code: it will make the code much more
   readable.

   Expressive doesn't necessarily imply long though; for instance,
   reading shorts from the file stream is performed using the following
   functions defined in the File component:

     Get_Byte  Get_Short, Get_UShort, Get_Long, etc.

   Which is somewhat more readable than:

     cget, sget, usget, lget, etc.

  e. Variables:

   Variable names should always begin with a lowercase letter. 
   Lowercase first letters are reserved for variables in this
   convention, as it has been already explained above.  You're still
   welcome to use long and expressive variable names.

   Something like 'numP' can express a number of pixels, porks,
   pancakes, and much more... Something like 'num_points' won't.

   Today, we're still using short variable labels in some parts of
   the library. We're working on removing them however...

   As a side note, a field name is a variable name too.  There are
   exceptions to the first-lowercase-letter rule, but these are only
   related to fields within the structure defined by the TrueType
   specification (well, at least it _should_ be that way).


2. Declarations & Statements:

 a. Columning:

  Try to align declarations and assignments in columns, when it proves
  logical. For example (taken from ttraster.c):

  struct _TProfile
  {                                                                     
    Int        flow;        /* Profile orientation : Asc/Descending     */
    Int        height;      /* profile's height in scanlines            */
    Int        start;       /* profile's start scanline                 */
    ULong      offset;      /* offset of profile's data in render pool  */
    PProfile   link;        /* link to next profile                     */
    Int        index;       /* index of profile's entry in trace table  */
    Int        count_lines; /* count of lines having to be drawn        */
    Int        start_line;  /* lines to be rendered before this profile */
    PTraceRec  trace;       /* pointer to profile's current trace table */
  };
 
    instead of

  struct _TProfile {
    Int flow;           /* Profile orientation : Asc/Descending     */
    Int height;         /* profile's height in scanlines            */
    Int start;          /* profile's start scanline                 */
    ULong offset;       /* offset of profile's data in render pool  */
    PProfile link;      /* link to next profile                     */
    Int index;          /* index of profile's entry in trace table  */
    Int count_lines;    /* count of lines having to be drawn        */
    Int start_line;     /* lines to be rendered before this profile */
    PTraceRec  trace;   /* pointer to profile's current trace table */
  };

  That comes from the fact that you're more interested by the variable
  and its function than by its type.

  Or:

    x   = i + 1;
    y  += j;
    min = 100;

  instead of

    x=i+1;
    y+=j;
    min=100;

  And don't hesitate to separate blocks of declarations with newlines
  to "distinguish" logical sections.

  E.g., taken from tttables.c, in the declarations of the CMap
  loader:

    long             n, num_SH;
    unsigned short   u;
    long             off;
    unsigned short   l;
    long             num_Seg;
    unsigned short*  glArray;
    long             table_start;
    int              limit, i;

    TCMapDir         cmap_dir;
    TCMapDirEntry    entry_;
    PCMapTable       Plcmt;
    PCMap2SubHeader  Plcmsub;
    PCMap4           Plcm4;
    PCMap4Segment    segments;

  instead of

    long n, num_SH;
    unsigned short u;
    long off;
    unsigned short l;
    long num_Seg;
    unsigned short *glArray;
    long table_start;
    int limit, i;

    TCMapDir cmap_dir;
    TCMapDirEntry entry_;
    PCMapTable Plcmt;
    PCMap2SubHeader Plcmsub;
    PCMap4 Plcm4;
    PCMap4Segment segments;

  b. Aliases and the 'with' clause:

   The Pascal language comes with a very handy 'with' clause that is
   often used when dealing with the fields of a same record.  The
   following Pascal source extract

    with table[incredibly_long_index] do
    begin
      x := some_x;
      y := some_y;
      z := wathever_the_hell;
    end;

  is usually translated to:

    table[incredibly_long_index].x = some_x;
    table[incredibly_long_index].y = some_y;
    table[incredibly_long_index].z = wathever_the_hell;

  When a lot of fields are involved, it is usual helpful to define
  an 'alias' for the record, like in:

    alias = table + incredibly_long_index;

    alias->x = some_x;
    alias->y = some_y;
    alias->z = wathever_the_hell;

  which gives a clearer source code, and eases the compiler's
  optimization work.

  Though the use of aliases is currently not fixed in the current
  library source, it is useful to follow one of these rules:

  - avoid an alias with a stupid, or cryptic name, something like:

    TFooRecord  tfr;
    ....
    [lots of lines snipped]
    ....

    tfr = weird_table + weird_index;

    ...

    tfr->num = n;  

    it doesn't really help to guess what 'tfr' stands for several lines
    after its declaration, even if it's an extreme contraction of one
    particular type.

    something like 'cur_record' or 'alias_cmap' is better.  The current
    source also uses a prefix of 'Pl' for such aliases (like Pointer to
    Local alias), but this use is _not_ encouraged.  If you want to use
    prefixes, use 'loc_', 'cur_' or 'al_' at the very least, with a
    descriptive name following.

    Or simply use a local variable with a semi-expressive name:

    { 
      THorizontalHeader  hheader;
      TVerticalHeader    vheader;

      hheader = instance->fontRes->horizontalHeader;
      vheader = instance->fontRes->verticalHeader;

      hheader->foo = bar;
      vheader->foo = bar2;
      ...
    }

    which is much better than:

    { 
      THorizontalHeader thh;
      TVerticalHeader tvh;

      thh = instance->fontRes->horizontalHeader;
      tvh = instance->fontRes->verticalHeader;

      thh->foo = bar;
      tvh->foo = bar2;
      ...
    }

    or:

    { 
      THorizontalHeader Plhhead;
      TVerticalHeader Plvhead;

      Plhhead = instance->fontRes->horizontalHeader;
      Plvhead = instance->fontRes->verticalHeader;

      Plhhead->foo = bar;
      Plvhead->foo = bar2;
      ...
    }


 3. Blocks:

  Block separation is done with '{' and '}'.  We do not use the K&R
  convention which becomes only useful with an extensive use of tabs. 
  The '{' and its corresponding '}' should always be on the same
  column.  It makes it easier to separate a block from the rest of the
  source, and it helps your _brain_ associates the accolades easily
  (ask any Lisp programmer on the topic!).
  
  Use 2 spaces for the next indentation level.

  Never use tabs in your code, their widths may vary with editors and
  systems.

  Example:

    if (condition_test) {
            waow mamma;
            I'm doing K&R format;
            just like the Linux kernel;
    } else {
            This test failed poorly;
    }

  is _OUT_!


    if (condition_test)
    {
       This code isn't stuck to the condition;
       read it on paper, you'll find it more;
       pleasant to the eye;
    }
    else
    {
       Of course, this is a matter of taste;
       That's just the way it is in this convention;
       and you should follow it to be homogenous with;
       the rest of the FreeType code;
    }

  is _IN_!


4. Macros:

  Macros should be made of uppercase letters.  When a macro label is
  forged from several words, it is possible to only uppercasify the
  first word, using an underscore to separate the nouns.  This is used
  in tttables.c and ttfile.c with macros like :

    ACCESS_Frame, GET_UShort, CUR_Stream

  The role of the macros used throughout the engine is explained later
  in this document.



II. Usage conventions:


1. Error Handling:

  Within the engine, functions should return a boolean that indicates
  success or failure of the call.  A success is indicated by the value
  TRUE being returned. A Failure by the value FALSE.

  To extend readability, two macros are defined in the Types component,
  called SUCCESS and FAILURE.  Developers should use them when
  returning the error state of a function.

  The test condition for failure is then something like :

        ...

          if ( !Function( args ) )
          {
            /* The call has failed */
            free( some previously allocated memory );
            return FAILURE;
          }
        
          return SUCCESS;
        }

  When an error code must be returned, it should be put in the global
  variable 'Error' defined by the Error component for the safe-thread
  build.  The error code must be placed in an instance-specific error
  field in the re-entrant build.

  Note that an ERROR or CUR_Error macro is defined in the components to
  automatically set the right error variable, whatever the build.


2. Font File I/O:

  a. Streams:

    The engine uses 'streams' to access the font files.  A stream is a
    structure defined in the File component containing information
    used to access files through a system-specific i/o library.

    The current implementation of the File component uses the ANSI libc
    i/o functions.  However, for the sake of embedding in light systems
    and independence of a complete libc, it is possible to re-implement
    the component for a specific system or OS, letting it use system
    calls.
    
    A stream is of type 'TT_Stream' defined in the FreeType interface
    component.  The type is (void*) but actually points to a structure
    defined within the File component.

    A stream is created, managed and close through the interface of the
    File component.  Several implementations of the same component can
    co-exist, each taking advantage of specific system features
    (the'ttfile2.c' uses memory-mapped files for instance) as long as
    it respects the interface.

  b. Frames:

    TrueType is tied to the big-endian format, which implies that
    reading shorts or longs from the font file may need conversions. 
    To be able to easily detect read errors and allow simple conversion
    calls or macros, the engine is able to access a font file using
    'frames'.

    A frame is simply a sequence of successive bytes taken from the
    input file at the current position.  A frame is pre-loaded in
    memory by an 'TT_Access_Frame' call of the File component.

    It is then possible to read all sizes of data through the Get_xxx
    functions, like Get_Byte, Get_Short, Get_UShort, etc.

    When all important data is read, the frame can be released by a
    call to 'TT_Forget_Frame'.

    The benefits of frames are various:

      Consider these two approaches at extracting values:

        if ( !Read_Short( &var1 ) ||
             !Read_Long ( &var2 ) ||
             !Read_Long ( &var3 ) ||
             !Read_Short( &var4 ) )

          return FAILURE;

      and

        if ( !TT_Access_Frame( 16L ) ) /* Read 16 next bytes */
          return FAILURE;              /* The Frame could not be read */

        var1 = Get_Short();   /* extract values from the frame */
        var2 = Get_Long();
        var3 = Get_Long();
        var4 = Get_Short();

        TT_Forget_Frame();   /* release the frame */

      In the first case, you check the read on every value extracted,
      this increases unecessarily the size of the generated code. 
      Moreover, you must be sure that var1 and var4 are shorts, and
      var2 and var3 are longs if you want to avoid bugs and/or compiler
      warnings.

      In the second case, you perform only one check for the read, and
      exit immediately on failure.  Then the values are extracted from
      the frame, as the result of function calls.  This means that you
      can use automatic type conversion; there is no problem if var1
      and var4 are longs, unlike previously.

      On big-endian machines, the Get_xxx functions could also be
      simple macros that merely peek the values directly from the
      frame, which speeds and simplifies the generated code!

      And finally, frames are ideal when you're using memory-mapped
      files, as the frame is not really 'pre-loaded' and never uses any
      space.

      IMPORTANT    You CANNOT nest several frame accesses.  There is
                   only one available at a time for a specific
                   instance.

                   It is also the programmer's responsablity to never
                   extract more data than was pre-loaded in the frame! 
                   (But you usually know how many values you want to
                   extract from the file before doing so).


3. Memory Management:

  Though this is will change for the next release, the library
  currently uses a simple growing heap, called the Font_Pool, for all
  of its memory management.  This is limited but useful to pack several
  arrays together.

  The idea is that the information found in TrueType files is
  sufficient to pre-allocate all the arrays that will be needed later
  for any rendering.  The 'Maximum Profile' table for example gives a
  lot of useful maximum sizes of characteristics of the data found in
  the corresponding file.  It is then possible to pre-allocate a large
  number of small arrays, each with a very distinct purpose, in a
  single malloc block.  This simplifies a lot memory management within
  the library, and reduces greatly the risk of memory leaks.

  That's why a growing heap was a good idea to get started with the
  library.  Now that we are reaching 'real' usability and near our
  first beta, we have to be able to use real 'malloc' and 'free', or
  wathever memory management that should be supported by the host
  machine.  This implies some changes in the current source.

  However, the possibility and idea of packing several structures and
  arrays in few malloced blocks is still there and will be kept.


4. Support for multi-threaded environments:

  We have recently added support for multi-threaded environment.  This
  means that parts of the library have been made thread-safe, and use
  mutexes to synchronize access to some important shared variables.

  We have thus added a Mutex component.  However, it's current
  implementation is completely void.  This means that you should
  currently use this library with one single thread.  There is no
  portable standard for mutexes, and the Mutex component shall be
  re-implemented for each platform.


5. Support for re-entrancy:

  We have also added support for re-entrancy, and it has led us to the
  creation and use of various macros within the main components.

  In a re-entrant implementation, global variables are avoided, which
  means that _all_ states must be kept in instance-specific structures. 
  This implies that a pointer to one such instance must be passed to
  _every_ function called that might change or read the state. For
  instance, you would typically replace something like:

        if ( !TT_Access_Frame( 16L ) ) /* Read 16 next bytes */
          return FAILURE;              /* The Frame could not be read */

        var1 = Get_Short();   /* extract values from the frame */
        var2 = Get_Long ();
        var3 = Get_Long ();
        var4 = Get_Short();

        TT_Forget_Frame();   /* release the frame */

  used in a thread-safe build by:

        if ( !TT_Access_Frame( instance, 16L ) ) /* Read 16 next bytes */
          return FAILURE;              /* The Frame could not be read */

        var1 = Get_Short( instance );   /* extract values from the frame */
        var2 = Get_Long ( instance );
        var3 = Get_Long ( instance );
        var4 = Get_Short( instance );

        TT_Forget_Frame( instance );   /* release the frame */

  Without macros, a second source code is needed, which is often less
  readable than the original.  Moreover, chances that both sources can
  be kept synchronized are minimal without a _serious_ care from the
  maintainers.

  We thus define macros that look like :

  #ifdef TT_CONFIG_REENTRANT  /* for re-entrant builds */

    #define  ACCESS_Frame( size )  TT_Access_Frame( instance, size )
    #define  FORGET_Frame()        TT_Forget_Frame( instace )

    #define  GET_Short()   Get_Short( instance )
    #define  GET_Long()    Get_Long( instance )
  
  #else                       /* for thread-safe builds */

    #define  ACCESS_Frame( size )  TT_Access_Frame(size)
    #define  FORGET_Frame()        TT_Forget_Frame()

    #define  GET_Short()   Get_Short()
    #define  GET_Long()    Get_Long()
  
  #endif

  And the previous code becomes:

        if ( !ACCESS_Frame( 16L ) ) /* Read 16 next bytes */
          return FAILURE;           /* The Frame could not be read */

        var1 = GET_Short();   /* extract values from the frame */
        var2 = GET_Long ();
        var3 = GET_Long ();
        var4 = GET_Short();

        FORGET_Frame();   /* release the frame */

  Here, we have not really lost in readability, and allow both
  implementations to share a _single_ source code, which is a
  considerable gain in terms of maintainance.

  However, this doesn't come for free, it inforces that all instance
  parameters be named the same in all functions that use the macros (in
  this example, the name is 'instance').

  Moreover, we use a macro in the functions declarations :

    #ifdef TT_CONFIG_REENTRANT   /* for re-entrant builds  */
  
       #define INSTANCE_OPS    PInstance_Record  instance,
       #define INSTANCE_OP     PInstance_Record  instance
       #define INSTANCE_ARGS   instance,
       #define INSTANCE_ARG    instance
  
    #else                        /* for thread-safe builds */
  
       #define INSTANCE_OPS    /* void */
       #define INSTANCE_OP     /* void */
       #define INSTANCE_ARGS   /* void */
       #define INSTANCE_ARG    /* void */
  
    #endif

  with

    Bool  Load_That_Table( INSTANCE_OPS arg1_type  arg1,
                                        arg2_type  arg2 )
    {

    }
  
    Short  Get_Short( INSTANCE_OP )
    {
      ... read a short from the current instance frame
    }

  where the _OPS, _OP, _ARGS and _ARG macros are used to perform and
  define function call interfaces.

  Please read the Tables, Ins and File components for examples of this
  implementation.  This may not be the most secure way to do things,
  but it's the most elegant one to provide both thread-safe and
  re-entrant builds from a single source file that we've found.  Any
  suggestions are welcomed though...


------- current end of document, more will follow later -------------
