Subject: Re: Where to start
From: rpw3@rpw3.org (Rob Warnock)
Date: Tue, 19 Aug 2003 09:09:48 -0500
Newsgroups: comp.lang.lisp
Message-ID: <bFidnepfXNgxst-iXTWc-g@speakeasy.net>
Daniel Barlow  <dan@telent.net> wrote:
+---------------
| rpw3@rpw3.org (Rob Warnock) writes:
| >4. Just *do* it. That is, go ahead and start coding whatever infrastructure
| >   you're going to want to use; put together a web site that uses it...
| 
| A point that someone raised (I think on IRC, apologies to whoever it
| was, because I can't remember) the other day was that it might sooner
| or later be good to standardize[*] an API for Lisp web applications:
+---------------

I agree, I think. Thinking back to about a year ago, when I started the
first of these projects, the reasons that I "did my own thing" were:
(1) I had just started trying to do serious coding in CL (after ~10 years
of Scheme) and web stuff was a good thing [I think] to cut my teeth on;
(2) at the time, OpenAllegroServe & Araneida were a bit daunting to me
in their complexity [maybe "opaque" would be a better term!]; (3) there
was no server-side demo code with "mod_lisp" other than the most trivial
stub; and (4) "cl-modlisp" hadn't been released yet [only recently].

I had already written a number of CGI-style apps in MzScheme years ago,
so I started by trying to duplicate the same thing in CL [successfully,
almost trivially so], and eventually proceeded in stages as the application
grew larger [and the interpretive "converting" time of CMUCL grew larger]
to end up writing a "mod_lisp"-like persistent server similar to [but
simpler than?] Araneida -- which I did browse from time to time, by the
way, mainly to see if I was missing anything major.

+---------------
| ...something that fulfills a similar role to the java servlet API.
| Right now you can choose from AllegroServe or Araneida or CL-HTTP or
| whatever...
+---------------

...or "cl-modlisp", or your own home-grown backend to "mod_lisp".
[Or my as-yet-unpublished "sock.cgi" style, a hybrid intermediate
with elements of both classic CGI and "mod_lisp"/"cl-modlisp" style.]

+---------------
| ...and you have to make your choice both on performance/platform
| considerations ("does it handle 100 requests/second?  can I run it
| in CLISP?") _and_ on the API exported by that server or server shim.
+---------------

Yes, these are all issues. And given different usage scenarios, very
different answers are often reasonable. E.g., for the kind of load
that they get [essentially zero, unless I've just slashdotted myself
by mentioning them!], either of these two is perfectly reasonable, even
though they both fork/exec a complete new Lisp image per web hit:

	<URL:http://rpw3.org/hacks/lisp/cmucl-demo.cgi>
	<URL:http://rpw3.org/hacks/lisp/clisp-demo.cgi>

[Somewhat surprisingly, according to "ab" the CMUCL one, even though
not compiled, one runs ever-so-slightly *faster* than CLISP: 150 ms
vs. 160ms. Go figure. Probably FreeBSD's excellent VM and file cache,
not to mention having 0.5GB RAM on the server...]

And then if the load gets too large, why, then one can use a persistent
server ("mod_lisp" would probably be even faster, but "ab" says my
"sock.cgi" hybrid takes only 13ms, even though there's a fork/exec
of the small C-coded trampoline per request):

	<URL:http://rpw3.org/hacks/lisp/appsrv-demo.lhp>

The latter causes the usual "good things" to happen: loading the
source if it hasn't been loaded; reloading it if it's been changed;
recompiling it if there was already a FASL there (but not if there
wasn't, so you can choose desired space/speed); and otherwise just
using the page function already registered in memory.

+---------------
| I very much like the Araneida API...
+---------------

Initially it seemed rather complex to me, all those different "stages",
amd the complexity of the URI parsing seemed a bit over-the-top, but
I dunno, maybe there are cases where it's needed. Plus, (at least, in 0.51)
it only handled :EXACT and :PREFIX matches, and I needed :SUFFIX, too
(for ".lhp" files like the above).

Instead, I started with a slightly different interface (with only one
"stage"):

(defun register-uri-handler
       (&key (kind :exact)	; (member :exact :prefix :suffix :default)
             (path (or (and *http-request* (http-request-self *http-request*))
                       (required-argument)))    ; If not within CGI or LHP.
             (function (required-argument))     ; Always required.
             (source-path *load-truename*))     ; Would be XPATH, but for CGI.
  ...)

The registered function is called with a parsed HTTP request object
(with the POST data, if any, already sucked up, and the GET/POST query
parsed into an alist of bindings, along with the full "CGI" bindings),
and then it does whatever it want to do thoughout the various "stages",
calling back into the infrastructure for help when needed [such as
issuing any of a few flavors of standard HTML error pages].

But my API seems to be growing (e.g., I now find that I need the ability
to handle multiple virtual hosts), so maybe some convergence with a
community "standard" would be reasonable.

+---------------
| but the actual code underneath it is a bit grotty in places...
+---------------

Yes... well... The only reason I haven't yet released my code
(will have a BSD-style license) is that I still feel enough like
a CL newbie that I was worried that people would throw up at it.  ;-}
(But maybe not, I dunno.)

+---------------
| ...and as it's SBCL-only...
+---------------

Can't it be made to run on CMUCL pretty easily?

+---------------
| ...it restricts my apps (such as CLiki, which other people sometimes
| express an interest in running on other implementations) likewise.
+---------------

For myself, I initially started my CGI work trying carefully to keep the
code running on *both* CLISP and CMUCL. But as I moved to a persistent
server model I'm afraid I grew dependent on having CMUCL's threads --
*so* convenient for things like socket-based REPL listeners -- that I
let the CLISP compatibility fall away (except for simple fork/exec per
request CGI, where it still all works, as shown above). Through it should
work well enough on anything with CLIM-like threads (SBCL, etc.).

+---------------
| Apps written to this hypothetical standard interface, however, could
| be deployed on any platform that implements it: if my webmail archive
| is too slow using a CLISP/CGI implementation of cl-webapi I could move
| it to the Allegro/AllegroServe-based implementation; if I can't afford
| the Allegro licenses to deploy it I could move it back to CMUCL and
| some server optimized to run on that platform.  Or whatever.
+---------------

Again, I tried to stick to that initially, but found myself using
too many CMUCL-specific features (e.g., the extended args to LOAD
that let you automatically recompile if the source is out-of-date).
Yes, many could be backed out or moved into compatibility libraries,
but not all, I think. [E.g., threads.]

Nevertheless, I succeeded to this extent: I was able to make the
classic CGI and persistent server models close enough that the exact
same ".lhp" file can run (as long as it doesn't play any games with
the memory state on the persistent server) in any of three modes:
fork/exec-a-whole-Lisp-image/per-request CGI mode; my hybrid CGI-
trampoline-connects-to-Lisp-server mode; or pure "mod_lisp" (Apache
connects directly to the Lisp server). That is, given an executable
Unix file "foo.lhp" which contains this [and a magic "run-lhp" CMUCL
script]:

	#!/usr/bin/env /u/rpw3/bin/run-lhp
	(in-package :lhp-user)

	(lhp-basic-page ()
	  (:html ()
	    (:head ()
	      (:title () "Simple Test Page")) (lfd)
	    (:body ()
	      (:h1 () "Simple Test Page") (lfd)
	      "This is a simple test page with not much on it." (lfd)))
	  (lfd))

If the Apache config or an applicable ".htaccess" file contains:

	AddHandler cgi-script .lhp

then the file will run as a classic CGI. Whereas if ".htaccess" file
contains this (and an "AddHandler cgi-script .cgi" was already done
at the $DOCUMENT_ROOT level):

	Action lisp-handled-pages /sock.cgi
	AddHandler lisp-handled-pages .lhp

then the "sock.cgi" program (tiny C-coded trampoline) will connect to
the Lisp server and pass it ("mod_lisp" style) the information about the
request. Finally, if you have real "mod_lisp" installed, just do this:

	AddHandler lisp-handler .lhp

The file "foo.lhp" remains the same in all three cases.

Given this existence proof, I think it's reasonable for your hypothetical
standard interface to attempt at least the same amount of mode independence.
In particular, it should be at least *possible* for a CGI shim library to
fake up classic CGI into looking like it's running under the Lisp server API
(which is what I did), albeit only to the extent that the page doesn't make
use of the memory persistence of the Lisp server.

+---------------
| I'd suggest the following starting point for a design
| 
|  - implement something that reflects the HTTP request/response model 
|    fairly closely, rather than requiring users to tie themselves into
|    some application server model providing sessions or state.  That
|    can be layered on top.
+---------------

I agree. In fact, I've been meaning to suggest to Marc Battyani that
maybe "mod_lisp" shouldn't be doing so much special-casing of which
variables it passes through and which ones it doesn't (and just forget
about all the downcasing!) -- just pass 'em *all* unchanged and let
the Lisp side sort it out. [That's what my "sock.cgi" trampoline does.]

+---------------
|  - HTML generation is likewise out of scope.  Generate your HTML any
|    way you choose
+---------------

Agreed, though in a related vein I think there might need to be some
support or at least recognition for the various patterns of HTML generation.
That is, the whole issue (mentioned in passing above) of what might be
called "dynamic defsystem" issues (that is, when do Lisp sources get
loaded/reloaded/recompiled). There are many styles I looked at using
[basically, everything listed at <URL:http://www.cliki.net/Web> plus
several more, such as BRL & LAML] before I went off and did my own
LHP ("Lisp-Handled Pages"). The ones I looked at varied widely in
several dimensions:

- Whether the HTML is pre-generated off-line (TML, LML) or at HTTP-request
  time (ASP, ALP, LHP); and for the latter, whether the HTML cached or not,
  and if so, where (e.g., if you're using a front end such as Apache in
  proxy mode [as with Araneida], you can cause caching to happen there
  "for free" -- but then there are issues with updates);

- Where the source comes from: initial server load (Araneida, HTTP.LSP) or
  demand-loaded pages from the filesystem (ALP, LHP); and for the latter,
  what and how much is compiled and/or cached, and how changes in the
  filesystem are handled (what I called "dynamic defsystem" issues above);

- If pages are in the filesystem, their source syntax:
  - Plain HTML with ASP-like <%...%> escapes (ALP, LSP);
  - Plain HTML with some different escapes (e.g., BRL uses [...]);
  - Some other "HTML-like" markup (TML, which is really HTOUT s-exprs
     with angle brackets, e.g., "here comes a <font :size 6|BIG> word");
  - Pure Lisp code or at least s-exprs (LML/LML2, LHP).

And I'm sure there are others. But the point is that the API at least
needs to recognize the existence of these issue -- especially the "dynamic
defsystem" issue -- and provide *some* minimal hooks to support whichever
policy the developer wants to use. For example, my URI-HANDLER object
provides:

  (defstruct uri-handler
    kind                  ; (member :exact :prefix :suffix :default)
    path                  ; One of: absolute exact path, absolute prefix,
			  ; relative suffix, or NIL [must match PATH-KIND].
    function              ; #'(lambda (request &rest TBD) ...)
    function-time         ; Time when FUNCTION was set. XXX Currently unused.
    source-path           ; (or pathname-designator null)
    source-time           ; (and source-path (file-write-date source-path))
    fasl-path             ;XXX Currently unused.
    fasl-time             ;XXX Currently unused.
    )

but even that might not be enough, if the Lisp source is actually derived
from some *other* format (as with ALP or TML). In that case, you might also
need a PAGE-SOURCE-PATH/-TIME pair.

Just something to think about...

+---------------
|  - URI/URL parsing is probably _in_ scope, however: URI are used all
|    over the place in HTTP; it'd be madness to treat them as text...
+---------------

Well, I'd agree, except... If you're using one of the "active pages
stored in the filesystem" styles, fairly often you need to go back and
forth from URI to pathnames (and often, *namestrings*). After looking
at the complexity of URI parsing in Araneida, I just punted it completely,
since Apache already did most of the parsing I needed (except for turning
GET/POST query strings into binding alists, which I had already solved
for CGI scripting). If you're using "handled" files, you already get
$REDIRECT_URL with all the nasty "/./" and "/../" removed; $PATH_INFO,
which has the query string stripped; $PATH_TRANSLATED, which has the
URI translated to a filesystem path (albeit possibly imaginary!) with
"/./" & "/../" stripped; and $QUERY_STRING. [And several other useful
ones.] So as it happened, except for the QUERY-STRING-ALIST function,
I didn't actually need to do any URI parsing at all!

[I had even gone to the trouble of breaking the $REDIRECT_URL up into
path elements, Scheme Underground Server style, e.g., "/foo/bar/baz" ==>
("foo" "bar" "baz"), but then ended up never using it!]

Maybe a more complex application or site structure would make more complex
URI parsing useful, I dunno. But I suggest that if there *is* a URI parser
in the infrastructure, that the default parsing be *very* simple, and make
it very easy to get the string version back when needed (e.g., for filesystem
path operations).

+---------------
| ...and pretty silly for each implemetnation to define a private URI
| object but not let the client applciations use it.
+---------------

That I can certainly agree with.

+---------------
| Anyone?  If there's interest from other server shim implementors and
| users, we could move the discussion to the lispweb list to catch the
| people who don't read c.l.l 
+---------------

"Lispweb"?  Whazzat?


-Rob

-----
Rob Warnock, PP-ASEL-IA		<rpw3@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607