mTCP HTGet
2011-07-29 Version
Michael Brutman (mbbrutman@gmail.com)


Introduction

  HTGet is a small utility that can fetch files served from HTTP servers.
  It uses HTTP 1.1 which is the current HTTP protocol that enables use with
  virtual hosts.  HTTP 1.0 is supported for compatibility with older
  HTTP servers.

  HTGet was inspired by a version written by Ken Yap (ken@syd.dit.csiro.au)
  in 1997 for the WATTCP stack.  This version is a complete rewrite using
  some of the concepts from the original.


Hardware requirements

  8088 processor or better
  165KB available RAM
  CGA, Monochrome (MDA), EGA or VGA display
  Supported Ethernet card, SLIP or PPP connection
 

Software requirements

  DOS 2.1 or newer
  Packet driver for your Ethernet adapter


Setup instructions

  HTGet is built using the mTCP library.  The setup instructions for
  mTCP can be found in SETUP.TXT.


Using HTGet

  Usage is like this:

    htget [options] <URL>

  Options are:

    -h                       Shows help text
    -v                       Print extra status messages
    -headers                 Fetch only the HTTP headers
    -m <file>                Fetch file only if modified after <file>'s date
    -o <file>                Write content to <file> instead of stdout
    -pass <user:password>    Send authorization
    -10                      Use HTTP 1.0 protocol
    -11                      Use HTTP 1.1 protocol (default)


  The URL format is:

    http://yourservernamehere[:port]/path

  The port parameter is optional - if it is not specified the default is
  port 80.  Authentication information is not included in the URL syntax;
  this is slightly nonstandard but it makes the code simpler.  (See
  the description of the -pass option below for details on how
  authentication is supported.)

  When you run HTGet it will attempt to connect to the HTTP server,
  send headers, and receive content.  If you do not give any options
  the downloaded content is sent to STDOUT.  You can redirect STDOUT to
  a file or use the -o option to specify a filename for HTGet to write
  to directly.

  Using -v gives you extra messages that tell you how the data transfer
  is progressing.  The extra output is sent to STDERR to avoid intefering
  with the content that is being set to STDOUT.

  The -m option checks the modification date of the file that was specified
  with the -o option and sends that date to the server during the request
  phase.  If the content on the server has not been updated since the
  timestamp on the local file the server will return a "304" status code
  and no content.  If the content on the server has been updated you will
  get the new content instead.  This option is useful for when you only
  want to update a file based on if it has been updated on the server.  To
  use it you should have the TZ environment variable set.  (See SNTP.TXT
  for notes on how to use the TZ environment variable.)

  If the content you are requesting is password protected try using the
  -pass option.  The -pass option takes a string in the form of 
  username:password and encodes it using BASE64 before sending it to the
  HTTP server.  The authenication type is known as "BASIC" - it is not
  secure but it is widely supported.  If the HTTP server requires
  something more secure then this option probably will not work.

  The current HTTP protocol version number is 1.1 which is what this
  program is using.  The big improvement that version 1.1 adds is support
  for virtual hosts, which are HTTP servers that share the same IP address.
  Since this program is designed to run on vintage computers it is quite
  possible that you might run into a vintage web server using HTTP 1.0.
  These older servers might not understand the headers used by HTTP 1.1;
  if that happens the request will fail.  Try using the -10 option
  to force the program to fall back to HTTP 1.0 if you suspect you have
  an ancient server that is confused by HTTP 1.1.


Examples

  htget -o google.htm http://www.google.com/

    This downloads the Google search page and writes it to google.htm.

  htget -m -o google.htm http://www.google.com/

    This does the same thing as above, but if the copy on the server has
    not changed since the local copy was written it will not be downloaed
    and rewritten.

  htget -pass john:secret http://someserver.com/protected/secret.file

    This grabs the password protected file "secret.file" using "john" as
    the userid and "secret" as the password.  The userid and password get
    converted to BASE64 before being sent to the server.  If the server
    supports HTTP BASIC authentication you will get the file.

  htget -10 http://yeancient.server.net/readme.txt

    Force HTTP 1.0 mode for a prehistoric server that does not use HTTP 1.1.

  htget -headers http://www.google.com/

    Don't get the content - just look at the headers that would have been
    sent.


When the server sends content ...

  The server can send content even when the server reports a bad return code.
  The content might be a page that describes the error and tells you what to
  do.

  HTGet will not write output if there is a socket error or if the file
  has not been modified and you use the -m switch.  But due to the way HTTP
  works, it will write data if it is given content, even if the content is
  an error message.  So don't overwrite anything important assuming that the
  server will give you a fresh copy - it might get replaced with an error
  message.


Return codes

  The HTTP server always returns a result code which can be used to tell
  you if the request was successful.  Unfortunately, the range of possible
  return codes from the HTTP server is greater than what can be expressed
  in a program exit code so HTTP return codes can not be used as exit
  codes directly.

  It is valuable to be able to determine if a request was successful or not
  using a program instead of having a human make the decision.  To make this
  easier HTGet maps HTTP response codes to a smaller set of return codes.
  Here are some of the mappings:

     0 - No socket errors, but unknown HTTP code (should never happen)
     1 - Socket or protocol error

    10 - HTTP response code from 100 to 199
    20 - HTTP response code from 200 to 299 if not shown below
    21 - 200 OK
    24 - 203 Non-Authoritative Information
    30 - HTTP response code from 300 to 399 if not shown below
    32 - 301 Moved Permanently
    35 - 304 Not Modified
    37 - 307 Temporary Redirect
    40 - HTTP response code from 400 to 499 if not shown below
    41 - 400 Bad Request
    42 - 401 Unauthorized
    43 - 403 Forbidden
    44 - 404 Not Found
    45 - 410 Gone
    50 - HTTP response code from 500 to 599 if not shown below
    51 - 500 Internal Server Error
    53 - 503 Service Unavailable
    54 - 505 HTTP Version Not Supported
    55 - 509 Bandwidth Limit Exceeded

  The general idea is to give the common response codes a unique return code
  and compress the rest into a generic return codes.  For a complete list
  of the mappings refer to the source code.


Differences with the original WATTCP HTGet

  As mentioned earlier, this is a complete rewrite of the original HTGet.
  A few variable names survived - not much else.  Here are the major
  differences:

  - HTTP 1.1 request support with a switch to use HTTP 1.0 if necessary
  - The ability to hit Ctrl-Break, Ctrl-C or ESC to abort a transfer
  - Far better performance due to better buffer handling.
  - Return codes that try to give a better indication of what happened
  - This version uses mTCP :-)


Support

  Have a comment or need help?  Please email me at mbbrutman@gmail.com.


Recent changes

  2011-07-17: Initial version


More information: http://www.brutman.com/mTCP

Created July 17th, 2011, Last updated July 29th, 2011
(C)opyright Michael B. Brutman, mbbrutman@gmail.com
