Subject: Re: URI parsing
From: (Rob Warnock)
Date: 6 Apr 2001 13:23:10 GMT
Newsgroups: comp.lang.lisp
Message-ID: <9akfvu$818lf$>
Jochen Schmidt  <> wrote:
| Raymond Wiker wrote:
| >         I wrote a small (37 lines) function yesterday for parsing
| > URIs. When I compared it with NET.URI, I noticed that I didn't handle
| > fragments (internal anchors in html files). On the other hand, it
| > *does* handle usernames and passwords.
| Thanks - my new parsing function of NET.URI it handles 
| scheme, authority, path, query and fragment parts. It is only 22 Lines 
| long. I don't thought that it would be a such good idea to bring in more 
| specialized fields, as after the scheme each URI is free to define it's
| own syntax.

But at least for the "Common Internet Scheme Syntax" [RFC 1738 Section 3.1],
that is, anything that starts with "//" after the scheme (including the
"http:", "ftp:", & "telnet:" schemes), you definitely should parse *all*
of the elements (if present):


There have been published exploits recently that involved deceiving users
by formatting a "user" component that *looked* like a domain name but wasn't,
because of a later "@". See the following RISKS Digest articles for an
especially sneaky example:

    "Making something look hacked when it isn't"

    "The risk of a seldom-used URL syntax"


Rob Warnock, 31-2-510
SGI Network Engineering		<URL:>
1600 Amphitheatre Pkwy.		Phone: 650-933-1673
Mountain View, CA  94043	PP-ASEL-IA