


CCF(1)                   USER COMMANDS                     CCF(1)



NAME
     ccf - a code conversion filter for  chinese  coding  systems
     widely  used in Taiwan, and the GB code widely used in Main-
     land. Supports to HZ (a 7-bit encoding for GB)  and  one  of
     its extension for 7-bit Big5 character encoding, called B5E3
     (B5Encode3) format, are also included.

SYNOPSIS
     ccf -ST [ (SrcFile|-) [ (TrgFile|-) ] ]

DESCRIPTION
     ccf scans an input source file SrcFile  or  the  stdin  (-),
     which is encoded in a particular source code S, and converts
     it to a target file TrgFile  or  sends  the  output  to  the
     stdout  (-) in another target code T.  The standard input is
     read if there is no SrcFile.  If the TrgFile is  not  speci-
     fied  or  it  is a `-', then the output is redirected to the
     standard output.  The filename `-' means the standard  input
     or standard output.

OPTIONS
     -ST  Use `S' to specify the type of the input code, and  use
          `T' for the type of the output code.

          The type of a chinese code is specified by one  of  the
          following  1-character code designators (the characters
          are Case Sensitive ):

               b - BIG5 code

               i - IBM5550 code

               n - CNS-11643/86 (internal code)

               e - EUC (Extended UNIX Code)

               t - TCA (Taipei Computer Association) code

               l - Telgram code

               H - IBM Host code

               p - CNS-11643/86 (7-bit interchange protocol code)

               7 - BIG5 7-bit ASCII exchange code (in  B5E3  for-
               mat)

               g - GB code

               h - HZ 7-bit ASCII exchange code (for GB)

               o - use  optimal  conversion  mode  for  producing



Sun Release 4.1   Last change: 18 October 1994                  1






CCF(1)                   USER COMMANDS                     CCF(1)



               (possibly) mixed HZ+B5E3 ASCII exchange codes when
               cross-converting from  Big5  (or  other  Taiwanese
               codes)  to  HZ,  or  from  GB  to  B5E3.  (See the
               "WHAT'S NEW (Rev. 1.53)" section.)

     SrcFile
          The source file (or stdin if `-' is used.)

     TrgFile
          The target file (or stdout if `-' is used.)

CHANGES IN COMMAND LINE OPTIONS
     The command line options for code set  designation  are  now
     Case  Sensitive  because we are using `h' for HZ and `H' for
     IBM HOST code.

7-BIT CHINESE CODE SUPPORTS
     Currently, two 7-bit  encoding  methods  are  supported  for
     Chinese  characters:  the  HZ encoding method for GB and the
     B5E3 (B5encode3) encoding method for Big5. Notably, B5E3  is
     an  extension to the HZ encoding, therefore, its decoder can
     decode a mixed HZ+B5E3 document (or pure HZ, pure B5E3 docu-
     ments).

     The ccf encoder also provides  cross  encoding  from  either
     Big5  or GB to either B5E3 or HZ, or a mixture of B5E3+HZ in
     an optimal encoding mode (using  the  -Xo  options  for  any
     input code X described above).

     Other Taiwanese codes are currently translated to B5E3 or HZ
     by  first  converting  them  to  Big5  and  then  performing
     B5ENCOED. (Decoding is  done  in  the  reversed  direction.)
     Therefore,  it  is possible, for instance to produce HZ file
     from an EUC file produced by a Unix workstation or  from  an
     IBM5550 file produced by an OS/2-T PC.



WHAT'S NEW (Rev. 1.50)
     In comparison to v1.4, the ccf Rev. 1.50  includes  a  7-bit
     encoding  method  for  encoding  those code sets with essen-
     tially the same character set as Big5. This is  intended  to
     be  used for mail or other applications which cannot find an
     8-bit clean communication path for sending  Big5  characters
     directly or where the applications would strip the high bits
     of the chinese characters. See the documents of this package
     for the specification of the encoding scheme.

     The decoding mechanism also allows you  to  decode  a  mixed
     ASCII/BIG5/HZ/B5E3   (7-bit   Big5)   document  (or  a  pure
     HZ/B5Encode3 doucment) since the B5E3 encoding method is  an
     extension  to the HZ encoding scheme.  Other such extensions



Sun Release 4.1   Last change: 18 October 1994                  2






CCF(1)                   USER COMMANDS                     CCF(1)



     includes HZ-2, HZ+S (HZ+) and B5E1, B5E2.

WHAT'S NEW (Rev. 1.53)
     In Rev. 1.53, the code sets are further augmented to include
     the  GB  code, which is widely used in Mainland.  Therefore,
     almost all major chinese coding systems used in  Taiwan  and
     Mainland,  7-bit  or 8-bit, primitive internal codes or ISO-
     type  exchange  codes,  HZ-style  exchange  codes,  are  now
     cross-translatable.

     Among the cross translation between the Big5/B5E3 and  GB/HZ
     families,  an  `optimal encoding mode' for 7-bit encoding is
     supported. This mode of  translation  follows  the  idea  of
     Ricky  Yeung  (Ricky.Yeung@Eng.Sun.Com) to leave untranslat-
     able characters in their native encoded form so as to  avoid
     information  loss. In other words, such untranslatable char-
     acters will not be translated to a default character (like a
     square  box),  instead,  they  are  translated into B5E3 (if
     their source is Big5 or other Taiwaness  codes)  or  HZ  (if
     their source is GB.)

     With such a translation convention, all characters  will  be
     translated  back properly to a Big5 terminal if the original
     document is produced using the `-bo'  (Big5  to  HZ  optimal
     mode)  option  (or  most  -Xo options for Taiwanese X code);
     similarly, a  GB  terminal  user  will  see  all  characters
     translated  back  properly  for a document prepared with the
     `-ho', HZ to B5E3 optimal mode, option.

     (Note:  A  Big5  terminal  user  still  cannot  read   those
     untranslatable  characters  in a `-ho' ducument; ditto for a
     GB terminal user to read those untranslatable characters  in
     a `-bo' document with a simple terminal.  But this situation
     could be resolved by a terminal emulator or a  multi-lingual
     editor which supports multiple font sets.)

EXAMPLES
     Convert an EUC file eucfile(.Z) (compressed  or  not)  to  a
     Big5 file big5file for a Big5 terminal user:


               ccf -eb eucfile big5file

               ccf -eb eucfile | more


               zcat eucfile.Z | ccf -eb | more

               zcat eucfile.Z | ccf -eb - big5file

     Translate to/from 7-bit Big5 in B5Encode3 Format.




Sun Release 4.1   Last change: 18 October 1994                  3






CCF(1)                   USER COMMANDS                     CCF(1)



               ccf -b7 big5file > big5file.be3

               (== `b5encode')


               ccf -7b file.hz > file.b5

               (== `hz2big5' or `b5decode')

               ccf -g7 file.gb file.be3


               ccf -7n file.asc+hz+be3 > file.cns

               ccf -7n file.asc+hz+be3 | ccf -nb |more

     Prepare a HZ document from a Big5 (or GB, EUC) file


               ccf -bh file.b5 file.hz

               ccf -bo file.b5 file.hzo


               ccf -gh file.gb file.hz

               ccf -eh file.euc file.hz

AUTHOR AND CONTRIBUTORS
     Jing-Shin  Chang  (shin@hermes.ee.nthu.edu.tw),   NLP   Lab,
     Department  of  Electrical  Engineering,  National Tsing-Hua
     University, Hsinchu, Taiwan, ROC.

     Dan-Yi  Liu,  Institute  for  Information  Industry   (III),
     Taipei, Taiwan, ROC.  (The source codes for converting major
     chinese codes used in Taiwan are made public by III, ROC.)

     Yong-Guang Zhang  (ygz@cs.purdue.edu),  the  author  of  the
     famous  hztty  package,  had  kindly  provided his codes and
     tables for the Big5 to GB conversion functions. (The mapping
     between  GB's  Japanese  symbols  and  their  counterpart in
     Big5/ETen extension is provided by the author.)

     Ricky  Yeung  (Ricky.Yeung@Eng.Sun.Com)  had  proposed   the
     encoding  convention  to  keep  untranslatable characters in
     their native (encoded) form for avoiding  information  loss.
     This  idea results in the -Xo (optimal encoding mode) option
     of the current implementation.

OTHER INFORMATION
     See the "README.1st" file for this package for more details.




Sun Release 4.1   Last change: 18 October 1994                  4






CCF(1)                   USER COMMANDS                     CCF(1)



SEEALSO
     betty(1), b5encode(1), b5decode(1), b5e_format(5)





















































Sun Release 4.1   Last change: 18 October 1994                  5



