*** INSTALLING ***

Set BINDIR in the Makefile to where you keep your binaries and MANDIR
to where you keep your man pages (in their source form).  (If you're
using RosettaMan with TkMan, BINDIR needs to be a component of your
bin PATH.)  After properly editing the Makefile, type `make install'.
Thereafter (perhaps after a `rehash') type `rman' to invoke RosettaMan.
RosettaMan requires an ANSI C compiler.  To compile on a Macintosh
under MPW, use Makefile.mac.

If you send me bug reports and/or suggestions for new features,
include the version of RosettaMan (available by typing `rman -v').
RosettaMan doesn't parse every aspect of every man page perfectly, but
if it blows up spectacularly where it doesn't seem like it should, you
can send me the man page (or a representative man page if it blows up
on a class of man pages) in BOTH: (1) [tn]roff source form, from
.../man/manX and (2) formatted form (as formatted by `nroff -man'),
uuencoded to preserve the control characters, from .../man/catX.

The home location for RosettaMan is ftp.cs.berkeley.edu:
/ucb/people/phelps/tcltk/rman.tar.Z (this is a softlink to the latest,
numbered version).  If you discover a bug and you obtained RosettaMan
at some other site, first grab it from this one to see if the problem
has been fixed.

Be sure to look in the contrib directory for WWW server interfaces,
a batch converter, and a wrapper for SCO.

--------------------------------------------------


*** COPYRIGHT ***

Copyright (c) 1993-1996  T.A. Phelps  (phelps@cs.Berkeley.EDU)
All Rights Reserved.

Permission to use, copy, modify, and distribute this software and its
documentation for educational, research and non-profit purposes, 
without fee, and without a written agreement is hereby granted, 
provided that the above copyright notice and the following three 
paragraphs appear in all copies.  

Permission to incorporate this software into commercial products may 
be obtained from the Office of Technology Licensing, 2150 Shattuck 
Avenue, Suite 510, Berkeley, CA  94704. 

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY
FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES
ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF
THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE
PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF
CALIFORNIA HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
ENHANCEMENTS, OR MODIFICATIONS.

--------------------------------------------------


*** SYNOPSIS ***

RosettaMan is a filter for UNIX manual pages.  It takes as input man
pages formatted for a variety of UNIX flavors (not [tn]roff source)
and produces as output a variety of file formats.  Currently
RosettaMan accepts man pages as formatted by the following flavors of
UNIX: Hewlett-Packard HP-UX, AT&T System V, SunOS, Sun Solaris, OSF/1,
DEC Ultrix, SGI IRIX, Linux, SCO; and produces output for the following
formats: printable ASCII only (stripping page headers and footers),
section and subsection headers only, TkMan, [tn]roff, Ensemble, RTF,
SGML (soon--I finally found a DTD), HTML, MIME, LaTeX, LaTeX 2e, Perl 5's pod.

RosettaMan improves on other man page filters in several ways: (1) its
analysis recognizes the structural pieces of man pages, enabling high
quality output, (2) its modular structure permits easy augmentation of
output formats, (3) it accepts man pages formatted with the varient
macros of many different flavors of UNIX, and (4) it doesn't require
modification or cooperation with any other program.

RosettaMan is a rewrite of TkMan's man page filter, called bs2tk.  (If
you haven't heard about TkMan, a hypertext man page browser, you
should grab it via anonymous ftp from ftp.cs.berkeley.edu:
/ucb/people/phelps/tkman.tar.Z.)  Whereas bs2tk generated output only for
TkMan, RosettaMan generalizes the process so that the analysis can be
leveraged to new output formats.  A single analysis engine recognizes
section heads, subsection heads, body text, lists, references to other
man pages, boldface, italics, bold italics, special characters (like
bullets), tables (to a degree) and strips out page headers and
footers.  The engine sends signals to the selected output functions so
that an enhancement in the engine improves the quality of output of
all of them.  Output format functions are easy to add, and thus far
average about about 75 lines of C code each.



*** NOTES ON CURRENT VERSION ***

Help!  I'm looking for people to help with the following projects.
(1) Better RTF output format.  The current one works, but could be
made better.  (2) Roff macros that produce text that is easily
parsable.  RosettaMan handles a great variety, but some things, like
H-P's tables, are intractable.  If you write an output format or
otherwise improve RosettaMan, please send in your code so that I may
share the wealth in future releases.

This version can try to identify tables (turn this on with the -T
switch) by looking for lines with a large amount of interword spacing,
reasoning that this is space between columns of a table.  This
heuristic doesn't always work and sometimes misidentifies ordinary
text as tables.  In general I think it is impossible to perfectly
identify tables from nroff formatted text.  However, I do think the
heuristics can be tuned, so if you have a collection of manual pages
with unrecognized tables, send me the lot, in formatted form (i.e.,
after formatting with nroff -man), and uuencode them to preserve the
control characters.  Better, if you can think of heuristics that
distinguish tables from ordinary text, I'd like to hear them.


Notes for HTML consumers: This filter does real (heuristic)
parsing--no <PRE>!  Man page references are turned into hypertext links.

Several people have extended World Wide Web servers to format man
pages on the fly.  Check the README file in the contrib directory for
a list.
