| .\" (C) Copyright 1999-2000 David A. Wheeler (dwheeler@dwheeler.com) |
| .\" |
| .\" %%%LICENSE_START(VERBATIM) |
| .\" Permission is granted to make and distribute verbatim copies of this |
| .\" manual provided the copyright notice and this permission notice are |
| .\" preserved on all copies. |
| .\" |
| .\" Permission is granted to copy and distribute modified versions of this |
| .\" manual under the conditions for verbatim copying, provided that the |
| .\" entire resulting derived work is distributed under the terms of a |
| .\" permission notice identical to this one. |
| .\" |
| .\" Since the Linux kernel and libraries are constantly changing, this |
| .\" manual page may be incorrect or out-of-date. The author(s) assume no |
| .\" responsibility for errors or omissions, or for damages resulting from |
| .\" the use of the information contained herein. The author(s) may not |
| .\" have taken the same level of care in the production of this manual, |
| .\" which is licensed free of charge, as they might when working |
| .\" professionally. |
| .\" |
| .\" Formatted or processed versions of this manual, if unaccompanied by |
| .\" the source, must acknowledge the copyright and authors of this work. |
| .\" %%%LICENSE_END |
| .\" |
| .\" Fragments of this document are directly derived from IETF standards. |
| .\" For those fragments which are directly derived from such standards, |
| .\" the following notice applies, which is the standard copyright and |
| .\" rights announcement of The Internet Society: |
| .\" |
| .\" Copyright (C) The Internet Society (1998). All Rights Reserved. |
| .\" This document and translations of it may be copied and furnished to |
| .\" others, and derivative works that comment on or otherwise explain it |
| .\" or assist in its implementation may be prepared, copied, published |
| .\" and distributed, in whole or in part, without restriction of any |
| .\" kind, provided that the above copyright notice and this paragraph are |
| .\" included on all such copies and derivative works. However, this |
| .\" document itself may not be modified in any way, such as by removing |
| .\" the copyright notice or references to the Internet Society or other |
| .\" Internet organizations, except as needed for the purpose of |
| .\" developing Internet standards in which case the procedures for |
| .\" copyrights defined in the Internet Standards process must be |
| .\" followed, or as required to translate it into languages other than English. |
| .\" |
| .\" Modified Fri Jul 25 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com) |
| .\" Modified Fri Aug 21 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com) |
| .\" Modified Tue Mar 14 2000 by David A. Wheeler (dwheeler@dwheeler.com) |
| .\" |
| .TH URI 7 2021-03-22 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| uri, url, urn \- uniform resource identifier (URI), including a URL or URN |
| .SH SYNOPSIS |
| .nf |
| .HP 0.2i |
| URI = [ absoluteURI | relativeURI ] [ "#" fragment ] |
| .HP |
| absoluteURI = scheme ":" ( hierarchical_part | opaque_part ) |
| .HP |
| relativeURI = ( net_path | absolute_path | relative_path ) [ "?" query ] |
| .HP |
| scheme = "http" | "ftp" | "gopher" | "mailto" | "news" | "telnet" | |
| "file" | "man" | "info" | "whatis" | "ldap" | "wais" | \&... |
| .HP |
| hierarchical_part = ( net_path | absolute_path ) [ "?" query ] |
| .HP |
| net_path = "//" authority [ absolute_path ] |
| .HP |
| absolute_path = "/" path_segments |
| .HP |
| relative_path = relative_segment [ absolute_path ] |
| .fi |
| .SH DESCRIPTION |
| A Uniform Resource Identifier (URI) is a short string of characters |
| identifying an abstract or physical resource (for example, a web page). |
| A Uniform Resource Locator (URL) is a URI |
| that identifies a resource through its primary access |
| mechanism (e.g., its network "location"), rather than |
| by name or some other attribute of that resource. |
| A Uniform Resource Name (URN) is a URI |
| that must remain globally unique and persistent even when |
| the resource ceases to exist or becomes unavailable. |
| .PP |
| URIs are the standard way to name hypertext link destinations |
| for tools such as web browsers. |
| The string "http://www.kernel.org" is a URL (and thus it |
| is also a URI). |
| Many people use the term URL loosely as a synonym for URI |
| (though technically URLs are a subset of URIs). |
| .PP |
| URIs can be absolute or relative. |
| An absolute identifier refers to a resource independent of |
| context, while a relative |
| identifier refers to a resource by describing the difference |
| from the current context. |
| Within a relative path reference, the complete path segments "." and |
| ".." have special meanings: "the current hierarchy level" and "the |
| level above this hierarchy level", respectively, just like they do in |
| UNIX-like systems. |
| A path segment which contains a colon |
| character can't be used as the first segment of a relative URI path |
| (e.g., "this:that"), because it would be mistaken for a scheme name; |
| precede such segments with ./ (e.g., "./this:that"). |
| Note that descendants of MS-DOS (e.g., Microsoft Windows) replace |
| devicename colons with the vertical bar ("|") in URIs, so "C:" becomes "C|". |
| .PP |
| A fragment identifier, if included, refers to a particular named portion |
| (fragment) of a resource; text after a \(aq#\(aq identifies the fragment. |
| A URI beginning with \(aq#\(aq refers to that fragment in the current resource. |
| .SS Usage |
| There are many different URI schemes, each with specific |
| additional rules and meanings, but they are intentionally made to be |
| as similar as possible. |
| For example, many URL schemes |
| permit the authority to be the following format, called here an |
| .I ip_server |
| (square brackets show what's optional): |
| .HP |
| .IR "ip_server = " [ user " [ : " password " ] @ ] " host " [ : " port ] |
| .PP |
| This format allows you to optionally insert a username, |
| a user plus password, and/or a port number. |
| The |
| .I host |
| is the name of the host computer, either its name as determined by DNS |
| or an IP address (numbers separated by periods). |
| Thus the URI |
| <http://fred:fredpassword@example.com:8080/> |
| logs into a web server on host example.com |
| as fred (using fredpassword) using port 8080. |
| Avoid including a password in a URI if possible because of the many |
| security risks of having a password written down. |
| If the URL supplies a username but no password, and the remote |
| server requests a password, the program interpreting the URL |
| should request one from the user. |
| .PP |
| Here are some of the most common schemes in use on UNIX-like systems |
| that are understood by many tools. |
| Note that many tools using URIs also have internal schemes or specialized |
| schemes; see those tools' documentation for information on those schemes. |
| .PP |
| .B "http \- Web (HTTP) server" |
| .PP |
| .RI http:// ip_server / path |
| .br |
| .RI http:// ip_server / path ? query |
| .PP |
| This is a URL accessing a web (HTTP) server. |
| The default port is 80. |
| If the path refers to a directory, the web server will choose what |
| to return; usually if there is a file named "index.html" or "index.htm" |
| its content is returned, otherwise, a list of the files in the current |
| directory (with appropriate links) is generated and returned. |
| An example is <http://lwn.net>. |
| .PP |
| A query can be given in the archaic "isindex" format, consisting of a |
| word or phrase and not including an equal sign (=). |
| A query can also be in the longer "GET" format, which has one or more |
| query entries of the form |
| .IR key = value |
| separated by the ampersand character (&). |
| Note that |
| .I key |
| can be repeated more than once, though it's up to the web server |
| and its application programs to determine if there's any meaning to that. |
| There is an unfortunate interaction with HTML/XML/SGML and |
| the GET query format; when such URIs with more than one key |
| are embedded in SGML/XML documents (including HTML), the ampersand |
| (&) has to be rewritten as &. |
| Note that not all queries use this format; larger forms |
| may be too long to store as a URI, so they use a different |
| interaction mechanism (called POST) which does |
| not include the data in the URI. |
| See the Common Gateway Interface specification at |
| .UR http://www.w3.org\:/CGI |
| .UE |
| for more information. |
| .PP |
| .B "ftp \- File Transfer Protocol (FTP)" |
| .PP |
| .RI ftp:// ip_server / path |
| .PP |
| This is a URL accessing a file through the file transfer protocol (FTP). |
| The default port (for control) is 21. |
| If no username is included, the username "anonymous" is supplied, and |
| in that case many clients provide as the password the requestor's |
| Internet email address. |
| An example is |
| <ftp://ftp.is.co.za/rfc/rfc1808.txt>. |
| .PP |
| .B "gopher \- Gopher server" |
| .PP |
| .RI gopher:// ip_server / "gophertype selector" |
| .br |
| .RI gopher:// ip_server / "gophertype selector" %09 search |
| .br |
| .RI gopher:// ip_server / "gophertype selector" %09 search %09 gopher+_string |
| .br |
| .PP |
| The default gopher port is 70. |
| .I gophertype |
| is a single-character field to denote the |
| Gopher type of the resource to |
| which the URL refers. |
| The entire path may also be empty, in |
| which case the delimiting "/" is also optional and the gophertype |
| defaults to "1". |
| .PP |
| .I selector |
| is the Gopher selector string. |
| In the Gopher protocol, |
| Gopher selector strings are a sequence of octets which may contain |
| any octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal |
| (US-ASCII character LF), and 0D (US-ASCII character CR). |
| .PP |
| .B "mailto \- Email address" |
| .PP |
| .RI mailto: email-address |
| .PP |
| This is an email address, usually of the form |
| .IR name @ hostname . |
| See |
| .BR mailaddr (7) |
| for more information on the correct format of an email address. |
| Note that any % character must be rewritten as %25. |
| An example is <mailto:dwheeler@dwheeler.com>. |
| .PP |
| .B "news \- Newsgroup or News message" |
| .PP |
| .RI news: newsgroup-name |
| .br |
| .RI news: message-id |
| .PP |
| A |
| .I newsgroup-name |
| is a period-delimited hierarchical name, such as |
| "comp.infosystems.www.misc". |
| If <newsgroup-name> is "*" (as in <news:*>), it is used to refer |
| to "all available news groups". |
| An example is <news:comp.lang.ada>. |
| .PP |
| A |
| .I message-id |
| corresponds to the Message-ID of |
| .UR http://www.ietf.org\:/rfc\:/rfc1036.txt |
| IETF RFC\ 1036, |
| .UE |
| without the enclosing "<" |
| and ">"; it takes the form |
| .IR unique @ full_domain_name . |
| A message identifier may be distinguished from a news group name by the |
| presence of the "@" character. |
| .PP |
| .B "telnet \- Telnet login" |
| .PP |
| .RI telnet:// ip_server / |
| .PP |
| The Telnet URL scheme is used to designate interactive text services that |
| may be accessed by the Telnet protocol. |
| The final "/" character may be omitted. |
| The default port is 23. |
| An example is <telnet://melvyl.ucop.edu/>. |
| .PP |
| .B "file \- Normal file" |
| .PP |
| .RI file:// ip_server / path_segments |
| .br |
| .RI file: path_segments |
| .PP |
| This represents a file or directory accessible locally. |
| As a special case, |
| .I ip_server |
| can be the string "localhost" or the empty |
| string; this is interpreted as "the machine from which the URL is |
| being interpreted". |
| If the path is to a directory, the viewer should display the |
| directory's contents with links to each containee; |
| not all viewers currently do this. |
| KDE supports generated files through the URL <file:/cgi-bin>. |
| If the given file isn't found, browser writers may want to try to expand |
| the filename via filename globbing |
| (see |
| .BR glob (7) |
| and |
| .BR glob (3)). |
| .PP |
| The second format (e.g., <file:/etc/passwd>) |
| is a correct format for referring to |
| a local file. |
| However, older standards did not permit this format, |
| and some programs don't recognize this as a URI. |
| A more portable syntax is to use an empty string as the server name, |
| for example, |
| <file:///etc/passwd>; this form does the same thing |
| and is easily recognized by pattern matchers and older programs as a URI. |
| Note that if you really mean to say "start from the current location", don't |
| specify the scheme at all; use a relative address like <../test.txt>, |
| which has the side-effect of being scheme-independent. |
| An example of this scheme is <file:///etc/passwd>. |
| .PP |
| .B "man \- Man page documentation" |
| .PP |
| .RI man: command-name |
| .br |
| .RI man: command-name ( section ) |
| .PP |
| This refers to local online manual (man) reference pages. |
| The command name can optionally be followed by a |
| parenthesis and section number; see |
| .BR man (7) |
| for more information on the meaning of the section numbers. |
| This URI scheme is unique to UNIX-like systems (such as Linux) |
| and is not currently registered by the IETF. |
| An example is <man:ls(1)>. |
| .PP |
| .B "info \- Info page documentation" |
| .PP |
| .RI info: virtual-filename |
| .br |
| .RI info: virtual-filename # nodename |
| .br |
| .RI info:( virtual-filename ) |
| .br |
| .RI info:( virtual-filename ) nodename |
| .PP |
| This scheme refers to online info reference pages (generated from |
| texinfo files), |
| a documentation format used by programs such as the GNU tools. |
| This URI scheme is unique to UNIX-like systems (such as Linux) |
| and is not currently registered by the IETF. |
| As of this writing, GNOME and KDE differ in their URI syntax |
| and do not accept the other's syntax. |
| The first two formats are the GNOME format; in nodenames all spaces |
| are written as underscores. |
| The second two formats are the KDE format; |
| spaces in nodenames must be written as spaces, even though this |
| is forbidden by the URI standards. |
| It's hoped that in the future most tools will understand all of these |
| formats and will always accept underscores for spaces in nodenames. |
| In both GNOME and KDE, if the form without the nodename is used the |
| nodename is assumed to be "Top". |
| Examples of the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>. |
| Examples of the KDE format are <info:(gcc)> and <info:(gcc)G++ and GCC>. |
| .PP |
| .B "whatis \- Documentation search" |
| .PP |
| .RI whatis: string |
| .PP |
| This scheme searches the database of short (one-line) descriptions of |
| commands and returns a list of descriptions containing that string. |
| Only complete word matches are returned. |
| See |
| .BR whatis (1). |
| This URI scheme is unique to UNIX-like systems (such as Linux) |
| and is not currently registered by the IETF. |
| .PP |
| .B "ghelp \- GNOME help documentation" |
| .PP |
| .RI ghelp: name-of-application |
| .PP |
| This loads GNOME help for the given application. |
| Note that not much documentation currently exists in this format. |
| .PP |
| .B "ldap \- Lightweight Directory Access Protocol" |
| .PP |
| .RI ldap:// hostport |
| .br |
| .RI ldap:// hostport / |
| .br |
| .RI ldap:// hostport / dn |
| .br |
| .RI ldap:// hostport / dn ? attributes |
| .br |
| .RI ldap:// hostport / dn ? attributes ? scope |
| .br |
| .RI ldap:// hostport / dn ? attributes ? scope ? filter |
| .br |
| .RI ldap:// hostport / dn ? attributes ? scope ? filter ? extensions |
| .PP |
| This scheme supports queries to the |
| Lightweight Directory Access Protocol (LDAP), a protocol for querying |
| a set of servers for hierarchically organized information |
| (such as people and computing resources). |
| See |
| .UR http://www.ietf.org\:/rfc\:/rfc2255.txt |
| RFC\ 2255 |
| .UE |
| for more information on the LDAP URL scheme. |
| The components of this URL are: |
| .IP hostport 12 |
| the LDAP server to query, written as a hostname optionally followed by |
| a colon and the port number. |
| The default LDAP port is TCP port 389. |
| If empty, the client determines which the LDAP server to use. |
| .IP dn |
| the LDAP Distinguished Name, which identifies |
| the base object of the LDAP search (see |
| .UR http://www.ietf.org\:/rfc\:/rfc2253.txt |
| RFC\ 2253 |
| .UE |
| section 3). |
| .IP attributes |
| a comma-separated list of attributes to be returned; |
| see RFC\ 2251 section 4.1.5. |
| If omitted, all attributes should be returned. |
| .IP scope |
| specifies the scope of the search, which can be one of |
| "base" (for a base object search), "one" (for a one-level search), |
| or "sub" (for a subtree search). |
| If scope is omitted, "base" is assumed. |
| .IP filter |
| specifies the search filter (subset of entries |
| to return). |
| If omitted, all entries should be returned. |
| See |
| .UR http://www.ietf.org\:/rfc\:/rfc2254.txt |
| RFC\ 2254 |
| .UE |
| section 4. |
| .IP extensions |
| a comma-separated list of type=value |
| pairs, where the =value portion may be omitted for options not |
| requiring it. |
| An extension prefixed with a \(aq!\(aq is critical |
| (must be supported to be valid), otherwise it is noncritical (optional). |
| .PP |
| LDAP queries are easiest to explain by example. |
| Here's a query that asks ldap.itd.umich.edu for information about |
| the University of Michigan in the U.S.: |
| .PP |
| .nf |
| ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US |
| .fi |
| .PP |
| To just get its postal address attribute, request: |
| .PP |
| .nf |
| ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress |
| .fi |
| .PP |
| To ask a host.com at port 6666 for information about the person |
| with common name (cn) "Babs Jensen" at University of Michigan, request: |
| .PP |
| .nf |
| ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen) |
| .fi |
| .PP |
| .B "wais \- Wide Area Information Servers" |
| .PP |
| .RI wais:// hostport / database |
| .br |
| .RI wais:// hostport / database ? search |
| .br |
| .RI wais:// hostport / database / wtype / wpath |
| .PP |
| This scheme designates a WAIS database, search, or document |
| (see |
| .UR http://www.ietf.org\:/rfc\:/rfc1625.txt |
| IETF RFC\ 1625 |
| .UE |
| for more information on WAIS). |
| Hostport is the hostname, optionally followed by a colon and port number |
| (the default port number is 210). |
| .PP |
| The first form designates a WAIS database for searching. |
| The second form designates a particular search of the WAIS database |
| .IR database . |
| The third form designates a particular document within a WAIS |
| database to be retrieved. |
| .I wtype |
| is the WAIS designation of the type of the object and |
| .I wpath |
| is the WAIS document-id. |
| .PP |
| .B "other schemes" |
| .PP |
| There are many other URI schemes. |
| Most tools that accept URIs support a set of internal URIs |
| (e.g., Mozilla has the about: scheme for internal information, |
| and the GNOME help browser has the toc: scheme for various starting |
| locations). |
| There are many schemes that have been defined but are not as widely |
| used at the current time |
| (e.g., prospero). |
| The nntp: scheme is deprecated in favor of the news: scheme. |
| URNs are to be supported by the urn: scheme, with a hierarchical name space |
| (e.g., urn:ietf:... would identify IETF documents); at this time |
| URNs are not widely implemented. |
| Not all tools support all schemes. |
| .SS Character encoding |
| URIs use a limited number of characters so that they can be |
| typed in and used in a variety of situations. |
| .PP |
| The following characters are reserved, that is, they may appear in a |
| URI but their use is limited to their reserved purpose |
| (conflicting data must be escaped before forming the URI): |
| .IP |
| ; / ? : @ & = + $ , |
| .PP |
| Unreserved characters may be included in a URI. |
| Unreserved characters |
| include uppercase and lowercase English letters, |
| decimal digits, and the following |
| limited set of punctuation marks and symbols: |
| .IP |
| \- _ . ! \(ti * ' ( ) |
| .PP |
| All other characters must be escaped. |
| An escaped octet is encoded as a character triplet, consisting of the |
| percent character "%" followed by the two hexadecimal digits |
| representing the octet code (you can use uppercase or lowercase letters |
| for the hexadecimal digits). |
| For example, a blank space must be escaped |
| as "%20", a tab character as "%09", and the "&" as "%26". |
| Because the percent "%" character always has the reserved purpose of |
| being the escape indicator, it must be escaped as "%25". |
| It is common practice to escape space characters as the plus symbol (+) |
| in query text; this practice isn't uniformly defined |
| in the relevant RFCs (which recommend %20 instead) but any tool accepting |
| URIs with query text should be prepared for them. |
| A URI is always shown in its "escaped" form. |
| .PP |
| Unreserved characters can be escaped without changing the semantics |
| of the URI, but this should not be done unless the URI is being used |
| in a context that does not allow the unescaped character to appear. |
| For example, "%7e" is sometimes used instead of "\(ti" in an HTTP URL |
| path, but the two are equivalent for an HTTP URL. |
| .PP |
| For URIs which must handle characters outside the US ASCII character set, |
| the HTML 4.01 specification (section B.2) and |
| IETF RFC\ 2718 (section 2.2.5) recommend the following approach: |
| .IP 1. 4 |
| translate the character sequences into UTF-8 (IETF RFC\ 2279)\(emsee |
| .BR utf\-8 (7)\(emand |
| then |
| .IP 2. |
| use the URI escaping mechanism, that is, |
| use the %HH encoding for unsafe octets. |
| .SS Writing a URI |
| When written, URIs should be placed inside double quotes |
| (e.g., "http://www.kernel.org"), |
| enclosed in angle brackets (e.g., <http://lwn.net>), |
| or placed on a line by themselves. |
| A warning for those who use double-quotes: |
| .B never |
| move extraneous punctuation (such as the period ending a sentence or the |
| comma in a list) |
| inside a URI, since this will change the value of the URI. |
| Instead, use angle brackets instead, or |
| switch to a quoting system that never includes extraneous characters |
| inside quotation marks. |
| This latter system, called the 'new' or 'logical' quoting system by |
| "Hart's Rules" and the "Oxford Dictionary for Writers and Editors", |
| is preferred practice in Great Britain and in various European languages. |
| Older documents suggested inserting the prefix "URL:" |
| just before the URI, but this form has never caught on. |
| .PP |
| The URI syntax was designed to be unambiguous. |
| However, as URIs have become commonplace, traditional media |
| (television, radio, newspapers, billboards, etc.) have increasingly |
| used abbreviated URI references consisting of |
| only the authority and path portions of the identified resource |
| (e.g., <www.w3.org/Addressing>). |
| Such references are primarily |
| intended for human interpretation rather than machine, with the |
| assumption that context-based heuristics are sufficient to complete |
| the URI (e.g., hostnames beginning with "www" are likely to have |
| a URI prefix of "http://" and hostnames beginning with "ftp" likely |
| to have a prefix of "ftp://"). |
| Many client implementations heuristically resolve these references. |
| Such heuristics may |
| change over time, particularly when new schemes are introduced. |
| Since an abbreviated URI has the same syntax as a relative URL path, |
| abbreviated URI references cannot be used where relative URIs are |
| permitted, and can be used only when there is no defined base |
| (such as in dialog boxes). |
| Don't use abbreviated URIs as hypertext links inside a document; |
| use the standard format as described here. |
| .SH CONFORMING TO |
| .UR http://www.ietf.org\:/rfc\:/rfc2396.txt |
| (IETF RFC\ 2396) |
| .UE , |
| .UR http://www.w3.org\:/TR\:/REC\-html40 |
| (HTML 4.0) |
| .UE . |
| .SH NOTES |
| Any tool accepting URIs (e.g., a web browser) on a Linux system should |
| be able to handle (directly or indirectly) all of the |
| schemes described here, including the man: and info: schemes. |
| Handling them by invoking some other program is |
| fine and in fact encouraged. |
| .PP |
| Technically the fragment isn't part of the URI. |
| .PP |
| For information on how to embed URIs (including URLs) in a data format, |
| see documentation on that format. |
| HTML uses the format <A HREF="\fIuri\fP"> |
| .I text |
| </A>. |
| Texinfo files use the format @uref{\fIuri\fP}. |
| Man and mdoc have the recently added UR macro, or just include the |
| URI in the text (viewers should be able to detect :// as part of a URI). |
| .PP |
| The GNOME and KDE desktop environments currently vary in the URIs |
| they accept, in particular in their respective help browsers. |
| To list man pages, GNOME uses <toc:man> while KDE uses <man:(index)>, and |
| to list info pages, GNOME uses <toc:info> while KDE uses <info:(dir)> |
| (the author of this man page prefers the KDE approach here, though a more |
| regular format would be even better). |
| In general, KDE uses <file:/cgi-bin/> as a prefix to a set of generated |
| files. |
| KDE prefers documentation in HTML, accessed via the |
| <file:/cgi-bin/helpindex>. |
| GNOME prefers the ghelp scheme to store and find documentation. |
| Neither browser handles file: references to directories at the time |
| of this writing, making it difficult to refer to an entire directory with |
| a browsable URI. |
| As noted above, these environments differ in how they handle the |
| info: scheme, probably the most important variation. |
| It is expected that GNOME and KDE |
| will converge to common URI formats, and a future |
| version of this man page will describe the converged result. |
| Efforts to aid this convergence are encouraged. |
| .SS Security |
| A URI does not in itself pose a security threat. |
| There is no general guarantee that a URL, which at one time |
| located a given resource, will continue to do so. |
| Nor is there any |
| guarantee that a URL will not locate a different resource at some |
| later point in time; such a guarantee can be |
| obtained only from the person(s) controlling that namespace and the |
| resource in question. |
| .PP |
| It is sometimes possible to construct a URL such that an attempt to |
| perform a seemingly harmless operation, such as the |
| retrieval of an entity associated with the resource, will in fact |
| cause a possibly damaging remote operation to occur. |
| The unsafe URL |
| is typically constructed by specifying a port number other than that |
| reserved for the network protocol in question. |
| The client unwittingly contacts a site that is in fact |
| running a different protocol. |
| The content of the URL contains instructions that, when |
| interpreted according to this other protocol, cause an unexpected |
| operation. |
| An example has been the use of a gopher URL to cause an |
| unintended or impersonating message to be sent via a SMTP server. |
| .PP |
| Caution should be used when using any URL that specifies a port |
| number other than the default for the protocol, especially when it is |
| a number within the reserved space. |
| .PP |
| Care should be taken when a URI contains escaped delimiters for a |
| given protocol (for example, CR and LF characters for telnet |
| protocols) that these are not unescaped before transmission. |
| This might violate the protocol, but avoids the potential for such |
| characters to be used to simulate an extra operation or parameter in |
| that protocol, which might lead to an unexpected and possibly harmful |
| remote operation to be performed. |
| .PP |
| It is clearly unwise to use a URI that contains a password which is |
| intended to be secret. |
| In particular, the use of a password within |
| the "userinfo" component of a URI is strongly recommended against except |
| in those rare cases where the "password" parameter is intended to be public. |
| .SH BUGS |
| Documentation may be placed in a variety of locations, so there |
| currently isn't a good URI scheme for general online documentation |
| in arbitrary formats. |
| References of the form |
| <file:///usr/doc/ZZZ> don't work because different distributions and |
| local installation requirements may place the files in different |
| directories |
| (it may be in /usr/doc, or /usr/local/doc, or /usr/share, |
| or somewhere else). |
| Also, the directory ZZZ usually changes when a version changes |
| (though filename globbing could partially overcome this). |
| Finally, using the file: scheme doesn't easily support people |
| who dynamically load documentation from the Internet (instead of |
| loading the files onto a local filesystem). |
| A future URI scheme may be added (e.g., "userdoc:") to permit |
| programs to include cross-references to more detailed documentation |
| without having to know the exact location of that documentation. |
| Alternatively, a future version of the filesystem specification may |
| specify file locations sufficiently so that the file: scheme will |
| be able to locate documentation. |
| .PP |
| Many programs and file formats don't include a way to incorporate |
| or implement links using URIs. |
| .PP |
| Many programs can't handle all of these different URI formats; there |
| should be a standard mechanism to load an arbitrary URI that automatically |
| detects the users' environment (e.g., text or graphics, |
| desktop environment, local user preferences, and currently executing |
| tools) and invokes the right tool for any URI. |
| .\" .SH AUTHOR |
| .\" David A. Wheeler (dwheeler@dwheeler.com) wrote this man page. |
| .SH SEE ALSO |
| .BR lynx (1), |
| .BR man2html (1), |
| .BR mailaddr (7), |
| .BR utf\-8 (7) |
| .PP |
| .UR http://www.ietf.org\:/rfc\:/rfc2255.txt |
| IETF RFC\ 2255 |
| .UE |