org.apache.commons.httpclient

Class URI

public class URI extends Object implements Cloneable, Comparable, Serializable

The interface for the URI(Uniform Resource Identifiers) version of RFC 2396. This class has the purpose of supportting of parsing a URI reference to extend any specific protocols, the character encoding of the protocol to be transported and the charset of the document.

A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics.

Implementers should be careful not to escape or unescape the same string more than once, since unescaping an already unescaped string might lead to misinterpreting a percent data character as another escaped character, or vice versa in the case of escaping an already escaped string.

In order to avoid these problems, data types used as follows:

   URI character sequence: char
   octet sequence: byte
   original character sequence: String
 

So, a URI is a sequence of characters as an array of a char type, which is not always represented as a sequence of octets as an array of byte.

URI Syntactic Components

 - In general, written as follows:
   Absolute URI = <scheme>:<scheme-specific-part>
   Generic URI = <scheme>://<authority><path>?<query>

 - Syntax
   absoluteURI   = scheme ":" ( hier_part | opaque_part )
   hier_part     = ( net_path | abs_path ) [ "?" query ]
   net_path      = "//" authority [ abs_path ]
   abs_path      = "/"  path_segments
 

The following examples illustrate URI that are in common use.

 ftp://ftp.is.co.za/rfc/rfc1808.txt
    -- ftp scheme for File Transfer Protocol services
 gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
    -- gopher scheme for Gopher and Gopher+ Protocol services
 http://www.math.uio.no/faq/compression-faq/part1.html
    -- http scheme for Hypertext Transfer Protocol services
 mailto:mduerst@ifi.unizh.ch
    -- mailto scheme for electronic mail addresses
 news:comp.infosystems.www.servers.unix
    -- news scheme for USENET news groups and articles
 telnet://melvyl.ucop.edu/
    -- telnet scheme for interactive services via the TELNET Protocol
 
Please, notice that there are many modifications from URL(RFC 1738) and relative URL(RFC 1808).

The expressions for a URI

 For escaped URI forms
  - URI(char[]) // constructor
  - char[] getRawXxx() // method
  - String getEscapedXxx() // method
  - String toString() // method
 

For unescaped URI forms - URI(String) // constructor - String getXXX() // method

Version: $Revision: 564973 $ $Date: 2002/03/14 15:14:01

Author: Sung-Gu Mike Bowler

Nested Class Summary
static classURI.DefaultCharsetChanged
The charset-changed normal operation to represent to be required to alert to user the fact the default charset is changed.
static classURI.LocaleToCharsetMap
A mapping to determine the (somewhat arbitrarily) preferred charset for a given locale.
Field Summary
protected static BitSetabsoluteURI
BitSet for absoluteURI.
protected static BitSetabs_path
URI absolute path.
static BitSetallowed_abs_path
Those characters that are allowed for the abs_path.
static BitSetallowed_authority
Those characters that are allowed for the authority component.
static BitSetallowed_fragment
Those characters that are allowed for the fragment component.
static BitSetallowed_host
Those characters that are allowed for the host component.
static BitSetallowed_IPv6reference
Those characters that are allowed for the IPv6reference component.
static BitSetallowed_opaque_part
Those characters that are allowed for the opaque_part.
static BitSetallowed_query
Those characters that are allowed for the query component.
static BitSetallowed_reg_name
Those characters that are allowed for the reg_name.
static BitSetallowed_rel_path
Those characters that are allowed for the rel_path.
static BitSetallowed_userinfo
Those characters that are allowed for the userinfo component.
static BitSetallowed_within_authority
Those characters that are allowed for the authority component.
static BitSetallowed_within_path
Those characters that are allowed within the path.
static BitSetallowed_within_query
Those characters that are allowed within the query component.
static BitSetallowed_within_userinfo
Those characters that are allowed for within the userinfo component.
protected static BitSetalpha
BitSet for alpha.
protected static BitSetalphanum
BitSet for alphanum (join of alpha & digit).
protected static BitSetauthority
BitSet for authority.
static BitSetcontrol
BitSet for control.
protected static StringdefaultDocumentCharset
The default charset of the document.
protected static StringdefaultDocumentCharsetByLocale
protected static StringdefaultDocumentCharsetByPlatform
protected static StringdefaultProtocolCharset
The default charset of the protocol.
static BitSetdelims
BitSet for delims.
protected static BitSetdigit
BitSet for digit.
static BitSetdisallowed_opaque_part
Disallowed opaque_part before escaping.
static BitSetdisallowed_rel_path
Disallowed rel_path before escaping.
protected static BitSetdomainlabel
BitSet for domainlabel.
protected static BitSetescaped
BitSet for escaped.
protected static BitSetfragment
BitSet for fragment (alias for uric).
protected inthash
Cache the hash code for this URI.
protected static BitSethex
BitSet for hex.
protected static BitSethier_part
BitSet for hier_part.
protected static BitSethost
BitSet for host.
protected static BitSethostname
BitSet for hostname.
protected static BitSethostport
BitSet for hostport.
protected static BitSetIPv4address
Bitset that combines digit and dot fo IPv$address.
protected static BitSetIPv6address
RFC 2373.
protected static BitSetIPv6reference
RFC 2732, 2373.
protected static BitSetmark
BitSet for mark.
protected static BitSetnet_path
BitSet for net_path.
protected static BitSetopaque_part
URI bitset that combines uric_no_slash and uric.
protected static BitSetparam
BitSet for param (alias for pchar).
protected static BitSetpath
URI bitset that combines absolute path and opaque part.
protected static BitSetpath_segments
BitSet for path segments.
protected static BitSetpchar
BitSet for pchar.
protected static BitSetpercent
The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.
protected static BitSetport
Port, a logical alias for digit.
protected StringprotocolCharset
The charset of the protocol used by this URI instance.
protected static BitSetquery
BitSet for query (alias for uric).
protected static BitSetreg_name
BitSet for reg_name.
protected static BitSetrelativeURI
BitSet for relativeURI.
protected static BitSetrel_path
BitSet for rel_path.
protected static BitSetrel_segment
BitSet for rel_segment.
protected static BitSetreserved
BitSet for reserved.
protected static char[]rootPath
The root path.
protected static BitSetscheme
BitSet for scheme.
protected static BitSetsegment
BitSet for segment.
protected static BitSetserver
Bitset for server.
static BitSetspace
BitSet for space.
protected static BitSettoplabel
BitSet for toplabel.
protected static BitSetunreserved
Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved.
static BitSetunwise
BitSet for unwise.
protected static BitSeturic
BitSet for uric.
protected static BitSeturic_no_slash
URI bitset for encoding typical non-slash characters.
protected static BitSetuserinfo
Bitset for userinfo.
protected static BitSetURI_reference
BitSet for URI-reference.
static BitSetwithin_userinfo
BitSet for within the userinfo component like user and password.
protected char[]_authority
The authority.
protected char[]_fragment
The fragment.
protected char[]_host
The host.
protected boolean_is_abs_path
protected boolean_is_hier_part
protected boolean_is_hostname
protected boolean_is_IPv4address
protected boolean_is_IPv6reference
protected boolean_is_net_path
protected boolean_is_opaque_part
protected boolean_is_reg_name
protected boolean_is_rel_path
protected boolean_is_server
protected char[]_opaque
The opaque.
protected char[]_path
The path.
protected int_port
The port.
protected char[]_query
The query.
protected char[]_scheme
The scheme.
protected char[]_uri
This Uniform Resource Identifier (URI).
protected char[]_userinfo
The userinfo.
Constructor Summary
protected URI()
Create an instance as an internal use
URI(String s, boolean escaped, String charset)
Construct a URI from a string with the given charset.
URI(String s, boolean escaped)
Construct a URI from a string with the given charset.
URI(char[] escaped, String charset)
Construct a URI as an escaped form of a character array with the given charset.
URI(char[] escaped)
Construct a URI as an escaped form of a character array.
URI(String original, String charset)
Construct a URI from the given string with the given charset.
URI(String original)
Construct a URI from the given string.
URI(String scheme, String schemeSpecificPart, String fragment)
Construct a general URI from the given components.
URI(String scheme, String authority, String path, String query, String fragment)
Construct a general URI from the given components.
URI(String scheme, String userinfo, String host, int port)
Construct a general URI from the given components.
URI(String scheme, String userinfo, String host, int port, String path)
Construct a general URI from the given components.
URI(String scheme, String userinfo, String host, int port, String path, String query)
Construct a general URI from the given components.
URI(String scheme, String userinfo, String host, int port, String path, String query, String fragment)
Construct a general URI from the given components.
URI(String scheme, String host, String path, String fragment)
Construct a general URI from the given components.
URI(URI base, String relative)
Construct a general URI with the given relative URI string.
URI(URI base, String relative, boolean escaped)
Construct a general URI with the given relative URI string.
URI(URI base, URI relative)
Construct a general URI with the given relative URI.
Method Summary
Objectclone()
Create and return a copy of this object, the URI-reference containing the userinfo component.
intcompareTo(Object obj)
Compare this URI to another object.
protected static Stringdecode(char[] component, String charset)
Decodes URI encoded string.
protected static Stringdecode(String component, String charset)
Decodes URI encoded string.
protected static char[]encode(String original, BitSet allowed, String charset)
Encodes URI string.
protected booleanequals(char[] first, char[] second)
Test if the first array is equal to the second array.
booleanequals(Object obj)
Test an object if this URI is equal to another.
StringgetAboveHierPath()
Get the level above the this hierarchy level.
StringgetAuthority()
Get the authority.
StringgetCurrentHierPath()
Get the current hierarchy level.
static StringgetDefaultDocumentCharset()
Get the recommended default charset of the document.
static StringgetDefaultDocumentCharsetByLocale()
Get the default charset of the document by locale.
static StringgetDefaultDocumentCharsetByPlatform()
Get the default charset of the document by platform.
static StringgetDefaultProtocolCharset()
Get the default charset of the protocol.
StringgetEscapedAboveHierPath()
Get the level above the this hierarchy level.
StringgetEscapedAuthority()
Get the escaped authority.
StringgetEscapedCurrentHierPath()
Get the escaped current hierarchy level.
StringgetEscapedFragment()
Get the escaped fragment.
StringgetEscapedName()
Get the escaped basename of the path.
StringgetEscapedPath()
Get the escaped path.
StringgetEscapedPathQuery()
Get the escaped query.
StringgetEscapedQuery()
Get the escaped query.
StringgetEscapedURI()
It can be gotten the URI character sequence.
StringgetEscapedURIReference()
Get the escaped URI reference string.
StringgetEscapedUserinfo()
Get the escaped userinfo.
StringgetFragment()
Get the fragment.
StringgetHost()
Get the host.
StringgetName()
Get the basename of the path.
StringgetPath()
Get the path.
StringgetPathQuery()
Get the path and query.
intgetPort()
Get the port.
StringgetProtocolCharset()
Get the protocol charset used by this current URI instance.
StringgetQuery()
Get the query.
char[]getRawAboveHierPath()
Get the level above the this hierarchy level.
char[]getRawAuthority()
Get the raw-escaped authority.
protected char[]getRawCurrentHierPath(char[] path)
Get the raw-escaped current hierarchy level in the given path.
char[]getRawCurrentHierPath()
Get the raw-escaped current hierarchy level.
char[]getRawFragment()
Get the raw-escaped fragment.
char[]getRawHost()
Get the host.
char[]getRawName()
Get the raw-escaped basename of the path.
char[]getRawPath()
Get the raw-escaped path.
char[]getRawPathQuery()
Get the raw-escaped path and query.
char[]getRawQuery()
Get the raw-escaped query.
char[]getRawScheme()
Get the scheme.
char[]getRawURI()
It can be gotten the URI character sequence.
char[]getRawURIReference()
Get the URI reference character sequence.
char[]getRawUserinfo()
Get the raw-escaped userinfo.
StringgetScheme()
Get the scheme.
StringgetURI()
It can be gotten the URI character sequence.
StringgetURIReference()
Get the original URI reference string.
StringgetUserinfo()
Get the userinfo.
booleanhasAuthority()
Tell whether or not this URI has authority.
booleanhasFragment()
Tell whether or not this URI has fragment.
inthashCode()
Return a hash code for this URI.
booleanhasQuery()
Tell whether or not this URI has query.
booleanhasUserinfo()
Tell whether or not this URI has userinfo.
protected intindexFirstOf(String s, String delims)
Get the earlier index that to be searched for the first occurrance in one of any of the given string.
protected intindexFirstOf(String s, String delims, int offset)
Get the earlier index that to be searched for the first occurrance in one of any of the given string.
protected intindexFirstOf(char[] s, char delim)
Get the earlier index that to be searched for the first occurrance in one of any of the given array.
protected intindexFirstOf(char[] s, char delim, int offset)
Get the earlier index that to be searched for the first occurrance in one of any of the given array.
booleanisAbsoluteURI()
Tell whether or not this URI is absolute.
booleanisAbsPath()
Tell whether or not the relativeURI or hier_part of this URI is abs_path.
booleanisHierPart()
Tell whether or not the absoluteURI of this URI is hier_part.
booleanisHostname()
Tell whether or not the host part of this URI is hostname.
booleanisIPv4address()
Tell whether or not the host part of this URI is IPv4address.
booleanisIPv6reference()
Tell whether or not the host part of this URI is IPv6reference.
booleanisNetPath()
Tell whether or not the relativeURI or heir_part of this URI is net_path.
booleanisOpaquePart()
Tell whether or not the absoluteURI of this URI is opaque_part.
booleanisRegName()
Tell whether or not the authority component of this URI is reg_name.
booleanisRelativeURI()
Tell whether or not this URI is relative.
booleanisRelPath()
Tell whether or not the relativeURI of this URI is rel_path.
booleanisServer()
Tell whether or not the authority component of this URI is server.
protected char[]normalize(char[] path)
Normalize the given hier path part.
voidnormalize()
Normalizes the path part of this URI.
protected voidparseAuthority(String original, boolean escaped)
Parse the authority component.
protected voidparseUriReference(String original, boolean escaped)
In order to avoid any possilbity of conflict with non-ASCII characters, Parse a URI reference as a String with the character encoding of the local system or the document.
protected booleanprevalidate(String component, BitSet disallowed)
Pre-validate the unescaped URI string within a specific component.
protected char[]removeFragmentIdentifier(char[] component)
Remove the fragment identifier of the given component.
protected char[]resolvePath(char[] basePath, char[] relPath)
Resolve the base and relative path.
static voidsetDefaultDocumentCharset(String charset)
Set the default charset of the document.
static voidsetDefaultProtocolCharset(String charset)
Set the default charset of the protocol.
voidsetEscapedAuthority(String escapedAuthority)
Set the authority.
voidsetEscapedFragment(String escapedFragment)
Set the escaped fragment string.
voidsetEscapedPath(String escapedPath)
Set the escaped path.
voidsetEscapedQuery(String escapedQuery)
Set the escaped query string.
voidsetFragment(String fragment)
Set the fragment.
voidsetPath(String path)
Set the path.
voidsetQuery(String query)
Set the query.
voidsetRawAuthority(char[] escapedAuthority)
Set the authority.
voidsetRawFragment(char[] escapedFragment)
Set the raw-escaped fragment.
voidsetRawPath(char[] escapedPath)
Set the raw-escaped path.
voidsetRawQuery(char[] escapedQuery)
Set the raw-escaped query.
protected voidsetURI()
Once it's parsed successfully, set this URI.
StringtoString()
Get the escaped URI string.
protected booleanvalidate(char[] component, BitSet generous)
Validate the URI characters within a specific component.
protected booleanvalidate(char[] component, int soffset, int eoffset, BitSet generous)
Validate the URI characters within a specific component.

Field Detail

absoluteURI

protected static final BitSet absoluteURI
BitSet for absoluteURI.

 absoluteURI   = scheme ":" ( hier_part | opaque_part )
 

abs_path

protected static final BitSet abs_path
URI absolute path.

 abs_path      = "/"  path_segments
 

allowed_abs_path

public static final BitSet allowed_abs_path
Those characters that are allowed for the abs_path.

allowed_authority

public static final BitSet allowed_authority
Those characters that are allowed for the authority component.

allowed_fragment

public static final BitSet allowed_fragment
Those characters that are allowed for the fragment component.

allowed_host

public static final BitSet allowed_host
Those characters that are allowed for the host component. The characters '[', ']' in IPv6reference should be excluded.

allowed_IPv6reference

public static final BitSet allowed_IPv6reference
Those characters that are allowed for the IPv6reference component. The characters '[', ']' in IPv6reference should be excluded.

allowed_opaque_part

public static final BitSet allowed_opaque_part
Those characters that are allowed for the opaque_part.

allowed_query

public static final BitSet allowed_query
Those characters that are allowed for the query component.

allowed_reg_name

public static final BitSet allowed_reg_name
Those characters that are allowed for the reg_name.

allowed_rel_path

public static final BitSet allowed_rel_path
Those characters that are allowed for the rel_path.

allowed_userinfo

public static final BitSet allowed_userinfo
Those characters that are allowed for the userinfo component.

allowed_within_authority

public static final BitSet allowed_within_authority
Those characters that are allowed for the authority component.

allowed_within_path

public static final BitSet allowed_within_path
Those characters that are allowed within the path.

allowed_within_query

public static final BitSet allowed_within_query
Those characters that are allowed within the query component.

allowed_within_userinfo

public static final BitSet allowed_within_userinfo
Those characters that are allowed for within the userinfo component.

alpha

protected static final BitSet alpha
BitSet for alpha.

 alpha         = lowalpha | upalpha
 

alphanum

protected static final BitSet alphanum
BitSet for alphanum (join of alpha & digit).

  alphanum      = alpha | digit
 

authority

protected static final BitSet authority
BitSet for authority.

 authority     = server | reg_name
 

control

public static final BitSet control
BitSet for control.

defaultDocumentCharset

protected static String defaultDocumentCharset
The default charset of the document. RFC 2277, 2396 The platform's charset is used for the document by default.

defaultDocumentCharsetByLocale

protected static String defaultDocumentCharsetByLocale

defaultDocumentCharsetByPlatform

protected static String defaultDocumentCharsetByPlatform

defaultProtocolCharset

protected static String defaultProtocolCharset
The default charset of the protocol. RFC 2277, 2396

delims

public static final BitSet delims
BitSet for delims.

digit

protected static final BitSet digit
BitSet for digit.

 digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
            "8" | "9"
 

disallowed_opaque_part

public static final BitSet disallowed_opaque_part
Disallowed opaque_part before escaping.

disallowed_rel_path

public static final BitSet disallowed_rel_path
Disallowed rel_path before escaping.

domainlabel

protected static final BitSet domainlabel
BitSet for domainlabel.

 domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
 

escaped

protected static final BitSet escaped
BitSet for escaped.

 escaped       = "%" hex hex
 

fragment

protected static final BitSet fragment
BitSet for fragment (alias for uric).

 fragment      = *uric
 

hash

protected int hash
Cache the hash code for this URI.

hex

protected static final BitSet hex
BitSet for hex.

 hex           = digit | "A" | "B" | "C" | "D" | "E" | "F" |
                         "a" | "b" | "c" | "d" | "e" | "f"
 

hier_part

protected static final BitSet hier_part
BitSet for hier_part.

 hier_part     = ( net_path | abs_path ) [ "?" query ]
 

host

protected static final BitSet host
BitSet for host.

 host          = hostname | IPv4address | IPv6reference
 

hostname

protected static final BitSet hostname
BitSet for hostname.

 hostname      = *( domainlabel "." ) toplabel [ "." ]
 

hostport

protected static final BitSet hostport
BitSet for hostport.

 hostport      = host [ ":" port ]
 

IPv4address

protected static final BitSet IPv4address
Bitset that combines digit and dot fo IPv$address.

 IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit
 

IPv6address

protected static final BitSet IPv6address
RFC 2373.

 IPv6address = hexpart [ ":" IPv4address ]
 

IPv6reference

protected static final BitSet IPv6reference
RFC 2732, 2373.

 IPv6reference   = "[" IPv6address "]"
 

mark

protected static final BitSet mark
BitSet for mark.

 mark          = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                 "(" | ")"
 

net_path

protected static final BitSet net_path
BitSet for net_path.

 net_path      = "//" authority [ abs_path ]
 

opaque_part

protected static final BitSet opaque_part
URI bitset that combines uric_no_slash and uric.

 opaque_part   = uric_no_slash *uric
 

param

protected static final BitSet param
BitSet for param (alias for pchar).

 param         = *pchar
 

path

protected static final BitSet path
URI bitset that combines absolute path and opaque part.

 path          = [ abs_path | opaque_part ]
 

path_segments

protected static final BitSet path_segments
BitSet for path segments.

 path_segments = segment *( "/" segment )
 

pchar

protected static final BitSet pchar
BitSet for pchar.

 pchar         = unreserved | escaped |
                 ":" | "@" | "&" | "=" | "+" | "$" | ","
 

percent

protected static final BitSet percent
The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.

port

protected static final BitSet port
Port, a logical alias for digit.

protocolCharset

protected String protocolCharset
The charset of the protocol used by this URI instance.

query

protected static final BitSet query
BitSet for query (alias for uric).

 query         = *uric
 

reg_name

protected static final BitSet reg_name
BitSet for reg_name.

 reg_name      = 1*( unreserved | escaped | "$" | "," |
                     ";" | ":" | "@" | "&" | "=" | "+" )
 

relativeURI

protected static final BitSet relativeURI
BitSet for relativeURI.

 relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]
 

rel_path

protected static final BitSet rel_path
BitSet for rel_path.

 rel_path      = rel_segment [ abs_path ]
 

rel_segment

protected static final BitSet rel_segment
BitSet for rel_segment.

 rel_segment   = 1*( unreserved | escaped |
                     ";" | "@" | "&" | "=" | "+" | "$" | "," )
 

reserved

protected static final BitSet reserved
BitSet for reserved.

 reserved      = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                 "$" | ","
 

rootPath

protected static final char[] rootPath
The root path.

scheme

protected static final BitSet scheme
BitSet for scheme.

 scheme        = alpha *( alpha | digit | "+" | "-" | "." )
 

segment

protected static final BitSet segment
BitSet for segment.

 segment       = *pchar *( ";" param )
 

server

protected static final BitSet server
Bitset for server.

 server        = [ [ userinfo "@" ] hostport ]
 

space

public static final BitSet space
BitSet for space.

toplabel

protected static final BitSet toplabel
BitSet for toplabel.

 toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
 

unreserved

protected static final BitSet unreserved
Data characters that are allowed in a URI but do not have a reserved purpose are called unreserved.

 unreserved    = alphanum | mark
 

unwise

public static final BitSet unwise
BitSet for unwise.

uric

protected static final BitSet uric
BitSet for uric.

 uric          = reserved | unreserved | escaped
 

uric_no_slash

protected static final BitSet uric_no_slash
URI bitset for encoding typical non-slash characters.

 uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
                 "&" | "=" | "+" | "$" | ","
 

userinfo

protected static final BitSet userinfo
Bitset for userinfo.

 userinfo      = *( unreserved | escaped |
                    ";" | ":" | "&" | "=" | "+" | "$" | "," )
 

URI_reference

protected static final BitSet URI_reference
BitSet for URI-reference.

 URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
 

within_userinfo

public static final BitSet within_userinfo
BitSet for within the userinfo component like user and password.

_authority

protected char[] _authority
The authority.

_fragment

protected char[] _fragment
The fragment.

_host

protected char[] _host
The host.

_is_abs_path

protected boolean _is_abs_path

_is_hier_part

protected boolean _is_hier_part

_is_hostname

protected boolean _is_hostname

_is_IPv4address

protected boolean _is_IPv4address

_is_IPv6reference

protected boolean _is_IPv6reference

_is_net_path

protected boolean _is_net_path

_is_opaque_part

protected boolean _is_opaque_part

_is_reg_name

protected boolean _is_reg_name

_is_rel_path

protected boolean _is_rel_path

_is_server

protected boolean _is_server

_opaque

protected char[] _opaque
The opaque.

_path

protected char[] _path
The path.

_port

protected int _port
The port.

_query

protected char[] _query
The query.

_scheme

protected char[] _scheme
The scheme.

_uri

protected char[] _uri
This Uniform Resource Identifier (URI). The URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics.

_userinfo

protected char[] _userinfo
The userinfo.

Constructor Detail

URI

protected URI()
Create an instance as an internal use

URI

public URI(String s, boolean escaped, String charset)
Construct a URI from a string with the given charset. The input string can be either in escaped or unescaped form.

Parameters: s URI character sequence escaped true if URI character sequence is in escaped form. false otherwise. charset the charset string to do escape encoding, if required

Throws: URIException If the URI cannot be created. NullPointerException if input string is null

Since: 3.0

See Also: URI

URI

public URI(String s, boolean escaped)
Construct a URI from a string with the given charset. The input string can be either in escaped or unescaped form.

Parameters: s URI character sequence escaped true if URI character sequence is in escaped form. false otherwise.

Throws: URIException If the URI cannot be created. NullPointerException if input string is null

Since: 3.0

See Also: URI

URI

public URI(char[] escaped, String charset)

Deprecated: Use #URI(String, boolean, String)

Construct a URI as an escaped form of a character array with the given charset.

Parameters: escaped the URI character sequence charset the charset string to do escape encoding

Throws: URIException If the URI cannot be created. NullPointerException if escaped is null

See Also: URI

URI

public URI(char[] escaped)

Deprecated: Use #URI(String, boolean)

Construct a URI as an escaped form of a character array. An URI can be placed within double-quotes or angle brackets like "http://test.com/" and <http://test.com/>

Parameters: escaped the URI character sequence

Throws: URIException If the URI cannot be created. NullPointerException if escaped is null

See Also: URI

URI

public URI(String original, String charset)

Deprecated: Use #URI(String, boolean, String)

Construct a URI from the given string with the given charset.

Parameters: original the string to be represented to URI character sequence It is one of absoluteURI and relativeURI. charset the charset string to do escape encoding

Throws: URIException If the URI cannot be created.

See Also: URI

URI

public URI(String original)

Deprecated: Use #URI(String, boolean)

Construct a URI from the given string.

   URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
 

An URI can be placed within double-quotes or angle brackets like "http://test.com/" and <http://test.com/>

Parameters: original the string to be represented to URI character sequence It is one of absoluteURI and relativeURI.

Throws: URIException If the URI cannot be created.

See Also: URI

URI

public URI(String scheme, String schemeSpecificPart, String fragment)
Construct a general URI from the given components.

   URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
   absoluteURI   = scheme ":" ( hier_part | opaque_part )
   opaque_part   = uric_no_slash *uric
 

It's for absolute URI = <scheme>:<scheme-specific-part># <fragment>.

Parameters: scheme the scheme string schemeSpecificPart scheme_specific_part fragment the fragment string

Throws: URIException If the URI cannot be created.

See Also: URI

URI

public URI(String scheme, String authority, String path, String query, String fragment)
Construct a general URI from the given components.

   URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
   absoluteURI   = scheme ":" ( hier_part | opaque_part )
   relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]
   hier_part     = ( net_path | abs_path ) [ "?" query ]
 

It's for absolute URI = <scheme>:<path>?<query>#< fragment> and relative URI = <path>?<query>#<fragment >.

Parameters: scheme the scheme string authority the authority string path the path string query the query string fragment the fragment string

Throws: URIException If the new URI cannot be created.

See Also: URI

URI

public URI(String scheme, String userinfo, String host, int port)
Construct a general URI from the given components.

Parameters: scheme the scheme string userinfo the userinfo string host the host string port the port number

Throws: URIException If the new URI cannot be created.

See Also: URI

URI

public URI(String scheme, String userinfo, String host, int port, String path)
Construct a general URI from the given components.

Parameters: scheme the scheme string userinfo the userinfo string host the host string port the port number path the path string

Throws: URIException If the new URI cannot be created.

See Also: URI

URI

public URI(String scheme, String userinfo, String host, int port, String path, String query)
Construct a general URI from the given components.

Parameters: scheme the scheme string userinfo the userinfo string host the host string port the port number path the path string query the query string

Throws: URIException If the new URI cannot be created.

See Also: URI

URI

public URI(String scheme, String userinfo, String host, int port, String path, String query, String fragment)
Construct a general URI from the given components.

Parameters: scheme the scheme string userinfo the userinfo string host the host string port the port number path the path string query the query string fragment the fragment string

Throws: URIException If the new URI cannot be created.

See Also: URI

URI

public URI(String scheme, String host, String path, String fragment)
Construct a general URI from the given components.

Parameters: scheme the scheme string host the host string path the path string fragment the fragment string

Throws: URIException If the new URI cannot be created.

See Also: URI

URI

public URI(URI base, String relative)

Deprecated: Use #URI(URI, String, boolean)

Construct a general URI with the given relative URI string.

Parameters: base the base URI relative the relative URI string

Throws: URIException If the new URI cannot be created.

URI

public URI(URI base, String relative, boolean escaped)
Construct a general URI with the given relative URI string.

Parameters: base the base URI relative the relative URI string escaped true if URI character sequence is in escaped form. false otherwise.

Throws: URIException If the new URI cannot be created.

Since: 3.0

URI

public URI(URI base, URI relative)
Construct a general URI with the given relative URI.

   URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
   relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]
 

Resolving Relative References to Absolute Form. Examples of Resolving Relative URI References Within an object with a well-defined base URI of

   http://a/b/c/d;p?q
 

the relative URI would be resolved as follows: Normal Examples

   g:h           =  g:h
   g             =  http://a/b/c/g
   ./g           =  http://a/b/c/g
   g/            =  http://a/b/c/g/
   /g            =  http://a/g
   //g           =  http://g
   ?y            =  http://a/b/c/?y
   g?y           =  http://a/b/c/g?y
   #s            =  (current document)#s
   g#s           =  http://a/b/c/g#s
   g?y#s         =  http://a/b/c/g?y#s
   ;x            =  http://a/b/c/;x
   g;x           =  http://a/b/c/g;x
   g;x?y#s       =  http://a/b/c/g;x?y#s
   .             =  http://a/b/c/
   ./            =  http://a/b/c/
   ..            =  http://a/b/
   ../           =  http://a/b/
   ../g          =  http://a/b/g
   ../..         =  http://a/
   ../../        =  http://a/ 
   ../../g       =  http://a/g
 

Some URI schemes do not allow a hierarchical syntax matching the syntax, and thus cannot use relative references.

Parameters: base the base URI relative the relative URI

Throws: URIException If the new URI cannot be created.

Method Detail

clone

public Object clone()
Create and return a copy of this object, the URI-reference containing the userinfo component. Notice that the whole URI-reference including the userinfo component counld not be gotten as a String.

To copy the identical URI object including the userinfo component, it should be used.

Returns: a clone of this instance

compareTo

public int compareTo(Object obj)
Compare this URI to another object.

Parameters: obj the object to be compared.

Returns: 0, if it's same, -1, if failed, first being compared with in the authority component

Throws: ClassCastException not URI argument

decode

protected static String decode(char[] component, String charset)
Decodes URI encoded string. This is a two mapping, one from URI characters to octets, and subsequently a second from octets to original characters:

   URI character sequence->octet sequence->original character sequence
 

A URI must be separated into its components before the escaped characters within those components can be allowedly decoded.

Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading.

The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.

The unescape method is internally performed within this method.

Parameters: component the URI character sequence charset the protocol charset

Returns: original character sequence

Throws: URIException incomplete trailing escape pattern or unsupported character encoding

decode

protected static String decode(String component, String charset)
Decodes URI encoded string. This is a two mapping, one from URI characters to octets, and subsequently a second from octets to original characters:

   URI character sequence->octet sequence->original character sequence
 

A URI must be separated into its components before the escaped characters within those components can be allowedly decoded.

Notice that there is a chance that URI characters that are non UTF-8 may be parsed as valid UTF-8. A recent non-scientific analysis found that EUC encoded Japanese words had a 2.7% false reading; SJIS had a 0.0005% false reading; other encoding such as ASCII or KOI-8 have a 0% false reading.

The percent "%" character always has the reserved purpose of being the escape indicator, it must be escaped as "%25" in order to be used as data within a URI.

The unescape method is internally performed within this method.

Parameters: component the URI character sequence charset the protocol charset

Returns: original character sequence

Throws: URIException incomplete trailing escape pattern or unsupported character encoding

Since: 3.0

encode

protected static char[] encode(String original, BitSet allowed, String charset)
Encodes URI string. This is a two mapping, one from original characters to octets, and subsequently a second from octets to URI characters:

   original character sequence->octet sequence->URI character sequence
 

An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character.

Conversion from the local filesystem character set to UTF-8 will normally involve a two step process. First convert the local character set to the UCS; then convert the UCS to UTF-8. The first step in the process can be performed by maintaining a mapping table that includes the local character set code and the corresponding UCS code. The next step is to convert the UCS character code to the UTF-8 encoding.

Mapping between vendor codepages can be done in a very similar manner as described above.

The only time escape encodings can allowedly be made is when a URI is being created from its component parts. The escape and validate methods are internally performed within this method.

Parameters: original the original character sequence allowed those characters that are allowed within a component charset the protocol charset

Returns: URI character sequence

Throws: URIException null component or unsupported character encoding

equals

protected boolean equals(char[] first, char[] second)
Test if the first array is equal to the second array.

Parameters: first the first character array second the second character array

Returns: true if they're equal

equals

public boolean equals(Object obj)
Test an object if this URI is equal to another.

Parameters: obj an object to compare

Returns: true if two URI objects are equal

getAboveHierPath

public String getAboveHierPath()
Get the level above the this hierarchy level.

Returns: the above hierarchy level

Throws: URIException If (char[]) fails.

See Also: URI

getAuthority

public String getAuthority()
Get the authority.

Returns: the authority

Throws: URIException If URI fails

getCurrentHierPath

public String getCurrentHierPath()
Get the current hierarchy level.

Returns: the current hierarchy level

Throws: URIException If (char[]) fails.

See Also: URI

getDefaultDocumentCharset

public static String getDefaultDocumentCharset()
Get the recommended default charset of the document.

Returns: the default charset string

getDefaultDocumentCharsetByLocale

public static String getDefaultDocumentCharsetByLocale()
Get the default charset of the document by locale.

Returns: the default charset string by locale

getDefaultDocumentCharsetByPlatform

public static String getDefaultDocumentCharsetByPlatform()
Get the default charset of the document by platform.

Returns: the default charset string by platform

getDefaultProtocolCharset

public static String getDefaultProtocolCharset()
Get the default charset of the protocol.

An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.

To work globally either requires support of a number of character sets and to be able to convert between them, or the use of a single preferred character set. For support of global compatibility it is STRONGLY RECOMMENDED that clients and servers use UTF-8 encoding when exchanging URIs.

Returns: the default charset string

getEscapedAboveHierPath

public String getEscapedAboveHierPath()
Get the level above the this hierarchy level.

Returns: the raw above hierarchy level

Throws: URIException If (char[]) fails.

getEscapedAuthority

public String getEscapedAuthority()
Get the escaped authority.

Returns: the escaped authority

getEscapedCurrentHierPath

public String getEscapedCurrentHierPath()
Get the escaped current hierarchy level.

Returns: the escaped current hierarchy level

Throws: URIException If (char[]) fails.

getEscapedFragment

public String getEscapedFragment()
Get the escaped fragment.

Returns: the escaped fragment string

getEscapedName

public String getEscapedName()
Get the escaped basename of the path.

Returns: the escaped basename string

getEscapedPath

public String getEscapedPath()
Get the escaped path.

   path          = [ abs_path | opaque_part ]
   abs_path      = "/"  path_segments 
   opaque_part   = uric_no_slash *uric
 

Returns: the escaped path string

getEscapedPathQuery

public String getEscapedPathQuery()
Get the escaped query.

Returns: the escaped path and query string

getEscapedQuery

public String getEscapedQuery()
Get the escaped query.

Returns: the escaped query string

getEscapedURI

public String getEscapedURI()
It can be gotten the URI character sequence. It's escaped. For the purpose of the protocol to be transported, it will be useful.

Returns: the escaped URI string

getEscapedURIReference

public String getEscapedURIReference()
Get the escaped URI reference string.

Returns: the escaped URI reference string

getEscapedUserinfo

public String getEscapedUserinfo()
Get the escaped userinfo.

Returns: the escaped userinfo

See Also: URI

getFragment

public String getFragment()
Get the fragment.

Returns: the fragment string

Throws: URIException incomplete trailing escape pattern or unsupported character encoding

See Also: URI

getHost

public String getHost()
Get the host.

   host          = hostname | IPv4address | IPv6reference
 

Returns: the host

Throws: URIException If URI fails

See Also: URI

getName

public String getName()
Get the basename of the path.

Returns: the basename string

Throws: URIException incomplete trailing escape pattern or unsupported character encoding

See Also: URI

getPath

public String getPath()
Get the path.

   path          = [ abs_path | opaque_part ]
 

Returns: the path string

Throws: URIException If URI fails.

See Also: URI

getPathQuery

public String getPathQuery()
Get the path and query.

Returns: the path and query string.

Throws: URIException incomplete trailing escape pattern or unsupported character encoding

See Also: URI

getPort

public int getPort()
Get the port. In order to get the specfic default port, the specific protocol-supported class extended from the URI class should be used. It has the server-based naming authority.

Returns: the port if -1, it has the default port for the scheme or the server-based naming authority is not supported in the specific URI.

getProtocolCharset

public String getProtocolCharset()
Get the protocol charset used by this current URI instance. It was set by the constructor for this instance. If it was not set by contructor, it will return the default protocol charset.

Returns: the protocol charset string

See Also: URI

getQuery

public String getQuery()
Get the query.

Returns: the query string.

Throws: URIException incomplete trailing escape pattern or unsupported character encoding

See Also: URI

getRawAboveHierPath

public char[] getRawAboveHierPath()
Get the level above the this hierarchy level.

Returns: the raw above hierarchy level

Throws: URIException If (char[]) fails.

getRawAuthority

public char[] getRawAuthority()
Get the raw-escaped authority.

Returns: the raw-escaped authority

getRawCurrentHierPath

protected char[] getRawCurrentHierPath(char[] path)
Get the raw-escaped current hierarchy level in the given path. If the last namespace is a collection, the slash mark ('/') should be ended with at the last character of the path string.

Parameters: path the path

Returns: the current hierarchy level

Throws: URIException no hierarchy level

getRawCurrentHierPath

public char[] getRawCurrentHierPath()
Get the raw-escaped current hierarchy level.

Returns: the raw-escaped current hierarchy level

Throws: URIException If (char[]) fails.

getRawFragment

public char[] getRawFragment()
Get the raw-escaped fragment.

The optional fragment identifier is not part of a URI, but is often used in conjunction with a URI.

The format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result.

A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.

Returns: the raw-escaped fragment

getRawHost

public char[] getRawHost()
Get the host.

   host          = hostname | IPv4address | IPv6reference
 

Returns: the host

See Also: URI

getRawName

public char[] getRawName()
Get the raw-escaped basename of the path.

Returns: the raw-escaped basename

getRawPath

public char[] getRawPath()
Get the raw-escaped path.

   path          = [ abs_path | opaque_part ]
 

Returns: the raw-escaped path

getRawPathQuery

public char[] getRawPathQuery()
Get the raw-escaped path and query.

Returns: the raw-escaped path and query

getRawQuery

public char[] getRawQuery()
Get the raw-escaped query.

Returns: the raw-escaped query

getRawScheme

public char[] getRawScheme()
Get the scheme.

Returns: the scheme

getRawURI

public char[] getRawURI()
It can be gotten the URI character sequence. It's raw-escaped. For the purpose of the protocol to be transported, it will be useful.

It is clearly unwise to use a URL that contains a password which is intended to be secret. In particular, the use of a password within the 'userinfo' component of a URL is strongly disrecommended except in those rare cases where the 'password' parameter is intended to be public.

When you want to get each part of the userinfo, you need to use the specific methods in the specific URL. It depends on the specific URL.

Returns: the URI character sequence

getRawURIReference

public char[] getRawURIReference()
Get the URI reference character sequence.

Returns: the URI reference character sequence

getRawUserinfo

public char[] getRawUserinfo()
Get the raw-escaped userinfo.

Returns: the raw-escaped userinfo

See Also: URI

getScheme

public String getScheme()
Get the scheme.

Returns: the scheme null if undefined scheme

getURI

public String getURI()
It can be gotten the URI character sequence.

Returns: the original URI string

Throws: URIException incomplete trailing escape pattern or unsupported character encoding

See Also: URI

getURIReference

public String getURIReference()
Get the original URI reference string.

Returns: the original URI reference string

Throws: URIException If URI fails.

getUserinfo

public String getUserinfo()
Get the userinfo.

Returns: the userinfo

Throws: URIException If URI fails

See Also: URI

hasAuthority

public boolean hasAuthority()
Tell whether or not this URI has authority. It's the same function as the is_net_path() method.

Returns: true iif this URI has authority

See Also: URI

hasFragment

public boolean hasFragment()
Tell whether or not this URI has fragment.

Returns: true iif this URI has fragment

hashCode

public int hashCode()
Return a hash code for this URI.

Returns: a has code value for this URI

hasQuery

public boolean hasQuery()
Tell whether or not this URI has query.

Returns: true iif this URI has query

hasUserinfo

public boolean hasUserinfo()
Tell whether or not this URI has userinfo.

Returns: true iif this URI has userinfo

indexFirstOf

protected int indexFirstOf(String s, String delims)
Get the earlier index that to be searched for the first occurrance in one of any of the given string.

Parameters: s the string to be indexed delims the delimiters used to index

Returns: the earlier index if there are delimiters

indexFirstOf

protected int indexFirstOf(String s, String delims, int offset)
Get the earlier index that to be searched for the first occurrance in one of any of the given string.

Parameters: s the string to be indexed delims the delimiters used to index offset the from index

Returns: the earlier index if there are delimiters

indexFirstOf

protected int indexFirstOf(char[] s, char delim)
Get the earlier index that to be searched for the first occurrance in one of any of the given array.

Parameters: s the character array to be indexed delim the delimiter used to index

Returns: the ealier index if there are a delimiter

indexFirstOf

protected int indexFirstOf(char[] s, char delim, int offset)
Get the earlier index that to be searched for the first occurrance in one of any of the given array.

Parameters: s the character array to be indexed delim the delimiter used to index offset The offset.

Returns: the ealier index if there is a delimiter

isAbsoluteURI

public boolean isAbsoluteURI()
Tell whether or not this URI is absolute.

Returns: true iif this URI is absoluteURI

isAbsPath

public boolean isAbsPath()
Tell whether or not the relativeURI or hier_part of this URI is abs_path.

Returns: true iif the relativeURI or hier_part is abs_path

isHierPart

public boolean isHierPart()
Tell whether or not the absoluteURI of this URI is hier_part.

Returns: true iif the absoluteURI is hier_part

isHostname

public boolean isHostname()
Tell whether or not the host part of this URI is hostname.

Returns: true iif the host part is hostname

isIPv4address

public boolean isIPv4address()
Tell whether or not the host part of this URI is IPv4address.

Returns: true iif the host part is IPv4address

isIPv6reference

public boolean isIPv6reference()
Tell whether or not the host part of this URI is IPv6reference.

Returns: true iif the host part is IPv6reference

isNetPath

public boolean isNetPath()
Tell whether or not the relativeURI or heir_part of this URI is net_path. It's the same function as the has_authority() method.

Returns: true iif the relativeURI or heir_part is net_path

See Also: URI

isOpaquePart

public boolean isOpaquePart()
Tell whether or not the absoluteURI of this URI is opaque_part.

Returns: true iif the absoluteURI is opaque_part

isRegName

public boolean isRegName()
Tell whether or not the authority component of this URI is reg_name.

Returns: true iif the authority component is reg_name

isRelativeURI

public boolean isRelativeURI()
Tell whether or not this URI is relative.

Returns: true iif this URI is relativeURI

isRelPath

public boolean isRelPath()
Tell whether or not the relativeURI of this URI is rel_path.

Returns: true iif the relativeURI is rel_path

isServer

public boolean isServer()
Tell whether or not the authority component of this URI is server.

Returns: true iif the authority component is server

normalize

protected char[] normalize(char[] path)
Normalize the given hier path part.

Algorithm taken from URI reference parser at http://www.apache.org/~fielding/uri/rev-2002/issues.html.

Parameters: path the path to normalize

Returns: the normalized path

Throws: URIException no more higher path level to be normalized

normalize

public void normalize()
Normalizes the path part of this URI. Normalization is only meant to be performed on URIs with an absolute path. Calling this method on a relative path URI will have no effect.

Throws: URIException no more higher path level to be normalized

See Also: isAbsPath

parseAuthority

protected void parseAuthority(String original, boolean escaped)
Parse the authority component.

Parameters: original the original character sequence of authority component escaped true if original is escaped

Throws: URIException If an error occurs.

parseUriReference

protected void parseUriReference(String original, boolean escaped)
In order to avoid any possilbity of conflict with non-ASCII characters, Parse a URI reference as a String with the character encoding of the local system or the document.

The following line is the regular expression for breaking-down a URI reference into its components.

   ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
    12            3  4          5       6  7        8 9
 

For example, matching the above expression to http://jakarta.apache.org/ietf/uri/#Related results in the following subexpression matches:

               $1 = http:
  scheme    =  $2 = http
               $3 = //jakarta.apache.org
  authority =  $4 = jakarta.apache.org
  path      =  $5 = /ietf/uri/
               $6 = 
  query     =  $7 = 
               $8 = #Related
  fragment  =  $9 = Related
 

Parameters: original the original character sequence escaped true if original is escaped

Throws: URIException If an error occurs.

prevalidate

protected boolean prevalidate(String component, BitSet disallowed)
Pre-validate the unescaped URI string within a specific component.

Parameters: component the component string within the component disallowed those characters disallowed within the component

Returns: if true, it doesn't have the disallowed characters if false, the component is undefined or an incorrect one

removeFragmentIdentifier

protected char[] removeFragmentIdentifier(char[] component)
Remove the fragment identifier of the given component.

Parameters: component the component that a fragment may be included

Returns: the component that the fragment identifier is removed

resolvePath

protected char[] resolvePath(char[] basePath, char[] relPath)
Resolve the base and relative path.

Parameters: basePath a character array of the basePath relPath a character array of the relPath

Returns: the resolved path

Throws: URIException no more higher path level to be resolved

setDefaultDocumentCharset

public static void setDefaultDocumentCharset(String charset)
Set the default charset of the document.

Notice that it will be possible to contain mixed characters (e.g. ftp://host/KoreanNamespace/ChineseResource). To handle the Bi-directional display of these character sets, the protocol charset could be simply used again. Because it's not yet implemented that the insertion of BIDI control characters at different points during composition is extracted.

Always all the time, the setter method is always succeeded and throws DefaultCharsetChanged exception. So API programmer must follow the following way:

  import org.apache.util.URI$DefaultCharsetChanged;
      .
      .
      .
  try {
      URI.setDefaultDocumentCharset("EUC-KR");
  } catch (DefaultCharsetChanged cc) {
      // CASE 1: the exception could be ignored, when it is set by user
      if (cc.getReasonCode() == DefaultCharsetChanged.DOCUMENT_CHARSET) {
      // CASE 2: let user know the default document charset changed
      } else {
      // CASE 2: let user know the default protocol charset changed
      }
  }
  
The API programmer is responsible to set the correct charset. And each application should remember its own charset to support.

Parameters: charset the default charset for the document

Throws: DefaultCharsetChanged default charset changed

setDefaultProtocolCharset

public static void setDefaultProtocolCharset(String charset)
Set the default charset of the protocol.

The character set used to store files SHALL remain a local decision and MAY depend on the capability of local operating systems. Prior to the exchange of URIs they SHOULD be converted into a ISO/IEC 10646 format and UTF-8 encoded. This approach, while allowing international exchange of URIs, will still allow backward compatibility with older systems because the code set positions for ASCII characters are identical to the one byte sequence in UTF-8.

An individual URI scheme may require a single charset, define a default charset, or provide a way to indicate the charset used.

Always all the time, the setter method is always succeeded and throws DefaultCharsetChanged exception. So API programmer must follow the following way:

  import org.apache.util.URI$DefaultCharsetChanged;
      .
      .
      .
  try {
      URI.setDefaultProtocolCharset("UTF-8");
  } catch (DefaultCharsetChanged cc) {
      // CASE 1: the exception could be ignored, when it is set by user
      if (cc.getReasonCode() == DefaultCharsetChanged.PROTOCOL_CHARSET) {
      // CASE 2: let user know the default protocol charset changed
      } else {
      // CASE 2: let user know the default document charset changed
      }
  }
  
The API programmer is responsible to set the correct charset. And each application should remember its own charset to support.

Parameters: charset the default charset for each protocol

Throws: DefaultCharsetChanged default charset changed

setEscapedAuthority

public void setEscapedAuthority(String escapedAuthority)
Set the authority. It can be one type of server, hostport, hostname, IPv4address, IPv6reference and reg_name. Note that there is no setAuthority method by the escape encoding reason.

Parameters: escapedAuthority the escaped authority string

Throws: URIException If URI fails

setEscapedFragment

public void setEscapedFragment(String escapedFragment)
Set the escaped fragment string.

Parameters: escapedFragment the escaped fragment string

Throws: URIException escaped fragment not valid

setEscapedPath

public void setEscapedPath(String escapedPath)
Set the escaped path.

Parameters: escapedPath the escaped path string

Throws: URIException encoding error or not proper for initial instance

See Also: URI

setEscapedQuery

public void setEscapedQuery(String escapedQuery)
Set the escaped query string.

Parameters: escapedQuery the escaped query string

Throws: URIException escaped query not valid

setFragment

public void setFragment(String fragment)
Set the fragment.

Parameters: fragment the fragment string.

Throws: URIException If an error occurs.

setPath

public void setPath(String path)
Set the path.

Parameters: path the path string

Throws: URIException set incorrectly or fragment only

See Also: URI

setQuery

public void setQuery(String query)
Set the query.

When a query string is not misunderstood the reserved special characters ("&", "=", "+", ",", and "$") within a query component, it is recommended to use in encoding the whole query with this method.

The additional APIs for the special purpose using by the reserved special characters used in each protocol are implemented in each protocol classes inherited from URI. So refer to the same-named APIs implemented in each specific protocol instance.

Parameters: query the query string.

Throws: URIException incomplete trailing escape pattern or unsupported character encoding

See Also: URI

setRawAuthority

public void setRawAuthority(char[] escapedAuthority)
Set the authority. It can be one type of server, hostport, hostname, IPv4address, IPv6reference and reg_name.

   authority     = server | reg_name
 

Parameters: escapedAuthority the raw escaped authority

Throws: URIException If URI fails NullPointerException null authority

setRawFragment

public void setRawFragment(char[] escapedFragment)
Set the raw-escaped fragment.

Parameters: escapedFragment the raw-escaped fragment

Throws: URIException escaped fragment not valid

setRawPath

public void setRawPath(char[] escapedPath)
Set the raw-escaped path.

Parameters: escapedPath the path character sequence

Throws: URIException encoding error or not proper for initial instance

See Also: URI

setRawQuery

public void setRawQuery(char[] escapedQuery)
Set the raw-escaped query.

Parameters: escapedQuery the raw-escaped query

Throws: URIException escaped query not valid

setURI

protected void setURI()
Once it's parsed successfully, set this URI.

See Also: URI

toString

public String toString()
Get the escaped URI string.

On the document, the URI-reference form is only used without the userinfo component like http://jakarta.apache.org/ by the security reason. But the URI-reference form with the userinfo component could be parsed.

In other words, this URI and any its subclasses must not expose the URI-reference expression with the userinfo component like http://user:password@hostport/restricted_zone.
It means that the API client programmer should extract each user and password to access manually. Probably it will be supported in the each subclass, however, not a whole URI-reference expression.

Returns: the escaped URI string

See Also: clone

validate

protected boolean validate(char[] component, BitSet generous)
Validate the URI characters within a specific component. The component must be performed after escape encoding. Or it doesn't include escaped characters.

Parameters: component the characters sequence within the component generous those characters that are allowed within a component

Returns: if true, it's the correct URI character sequence

validate

protected boolean validate(char[] component, int soffset, int eoffset, BitSet generous)
Validate the URI characters within a specific component. The component must be performed after escape encoding. Or it doesn't include escaped characters.

It's not that much strict, generous. The strict validation might be performed before being called this method.

Parameters: component the characters sequence within the component soffset the starting offset of the given component eoffset the ending offset of the given component if -1, it means the length of the component generous those characters that are allowed within a component

Returns: if true, it's the correct URI character sequence

Copyright (c) 1999-2005 - Apache Software Foundation