Subject: Re: Regexp notation
From: rpw3@rigden.engr.sgi.com (Rob Warnock)
Date: 1997/01/10
Newsgroups: comp.lang.scheme.scsh,comp.lang.scheme
Message-ID: <5b4ja6$bci@tokyo.engr.sgi.com>

Olin Shivers <shivers@ai.mit.edu> wrote:
+---------------
| ...regular languages are closed under intersection and set-complement.
| But when was the last time you thought to yourself "I'd like to match
| all the strings that *aren't* matched by regexp R"...
+---------------

At the Unix shell level, I do "grep -v {pattern}" all the time!

+---------------
| or "I'd like to match all the strings that match pattern R1 *and*
| pattern R2"?
+---------------

Quite a lot, actually. That's why I now use "agrep" by default instead of
"grep", precisely because it has conjunction (";") and disjunction (","), and
because "(AND pat1 pat2)" -- or in agrep, "pat1;pat2" -- matches in any order.
For example, when grepping the index of RFCs ("rfc" is just a shell-script
wrapper around "agrep" that adds the option that makes "lines" be everything
between occurrences of blank lines, i.e., paragraphs), using conjunction is
very useful to refine one's search:

% rfc 'protocol;multicast'
grepping...

1458  I   R. Braudes, S. Zabele, "Requirements for Multicast Protocols", 
           05/26/1993. (Pages=19) (Format=.txt) 

1301  I   S. Armstrong, A. Freier, K. Marzullo, "Multicast Transport  
           Protocol", 02/19/1992. (Pages=38) (Format=.txt) 

1075  E   S. Deering, C. Partridge, D. Waitzman, "Distance Vector Multicast  
           Routing Protocol", 11/01/1988. (Pages=24) (Format=.txt) 

0966       S. Deering, D. Cheriton, "Host groups: A multicast extension to  
           theInternet Protocol", 12/01/1985. (Pages=27) (Format=.txt) 
           (Obsoleted by RFC0988) 

+---------------
| A further, serious difficulty is the issue of how you would translate
| such a spec into a classic regexp string to interface to traditional
| matching engines.
+---------------

Well, you might have to use extended matching engines, not just the
"libc" version...  ;-}  ;-}

Note: The "agrep" matcher limits the complexity of regexps that can
be used if you're using either "fat lines" or {con,dis}junction. Is
that a theoretical problem or just where the author decided to stop?
(I don't know.)


-Rob

-----
Rob Warnock, 7L-551		rpw3@sgi.com
Silicon Graphics, Inc.		http://reality.sgi.com/rpw3/
2011 N. Shoreline Blvd.		Phone: 415-933-1673  FAX: 415-933-0979
Mountain View, CA  94043	PP-ASEL-IA