--- /dev/null
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
+
+<!--Converted with LaTeX2HTML 99.2beta8 (1.43)
+original version by: Nikos Drakos, CBLU, University of Leeds
+* revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan
+* with significant contributions from:
+ Jens Lippmann, Marek Rouchal, Martin Wilck and others -->
+<HTML>
+<HEAD>
+<TITLE>JFlex User's Manual</TITLE>
+<META NAME="description" CONTENT="JFlex User's Manual">
+<META NAME="keywords" CONTENT="manual">
+<META NAME="resource-type" CONTENT="document">
+<META NAME="distribution" CONTENT="global">
+
+<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
+<META NAME="Generator" CONTENT="LaTeX2HTML v99.2beta8">
+<META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css">
+
+<LINK REL="STYLESHEET" HREF="manual.css">
+
+</HEAD>
+
+<BODY >
+
+<P>
+
+<CENTER>
+<A NAME="TOP"></a>
+<A HREF="http://www.jflex.de"><IMG SRC="logo.gif" BORDER=0 HEIGHT=223 WIDTH=577></a>
+</CENTER>
+
+<P>
+<DIV ALIGN="CENTER">
+<I><FONT SIZE="+2">The Fast Lexical Analyser Generator</FONT>
+<BR></I></DIV>
+<P></P>
+<DIV ALIGN="CENTER"></DIV>
+<P></P>
+<DIV ALIGN="CENTER"><I>Copyright ©1998-2004 by <A NAME="tex2html1"
+ HREF="http://www.doclsf.de">Gerwin Klein</A>
+<BR></I></DIV>
+<P><P><BR>
+<DIV ALIGN="CENTER"><I><FONT SIZE="+4"><I><B>JFlex User's Manual</B></I></FONT>
+<BR></I></DIV>
+<P><P><BR>
+<DIV ALIGN="CENTER"><I>Version 1.4, April 12, 2004
+
+</I></DIV>
+
+<P>
+<BR>
+
+<H2><A NAME="SECTION00010000000000000000">
+Contents</A>
+</H2>
+<!--Table of Contents-->
+
+<UL>
+<LI><A NAME="tex2html80"
+ HREF="manual.html">Contents</A>
+<LI><A NAME="tex2html81"
+ HREF="manual.html#SECTION00020000000000000000">Introduction</A>
+<UL>
+<LI><A NAME="tex2html82"
+ HREF="manual.html#SECTION00021000000000000000">Design goals</A>
+<LI><A NAME="tex2html83"
+ HREF="manual.html#SECTION00022000000000000000">About this manual</A>
+</UL>
+<LI><A NAME="tex2html84"
+ HREF="manual.html#SECTION00030000000000000000">Installing and Running JFlex</A>
+<UL>
+<LI><A NAME="tex2html85"
+ HREF="manual.html#SECTION00031000000000000000">Installing JFlex</A>
+<LI><A NAME="tex2html86"
+ HREF="manual.html#SECTION00032000000000000000">Running JFlex</A>
+</UL>
+<LI><A NAME="tex2html87"
+ HREF="manual.html#SECTION00040000000000000000">A simple Example: How to work with JFlex</A>
+<UL>
+<LI><A NAME="tex2html88"
+ HREF="manual.html#SECTION00041000000000000000">Code to include</A>
+<LI><A NAME="tex2html89"
+ HREF="manual.html#SECTION00042000000000000000">Options and Macros</A>
+<LI><A NAME="tex2html90"
+ HREF="manual.html#SECTION00043000000000000000">Rules and Actions</A>
+<LI><A NAME="tex2html91"
+ HREF="manual.html#SECTION00044000000000000000">How to get it going</A>
+</UL>
+<LI><A NAME="tex2html92"
+ HREF="manual.html#SECTION00050000000000000000">Lexical Specifications</A>
+<UL>
+<LI><A NAME="tex2html93"
+ HREF="manual.html#SECTION00051000000000000000">User code</A>
+<LI><A NAME="tex2html94"
+ HREF="manual.html#SECTION00052000000000000000">Options and declarations</A>
+<LI><A NAME="tex2html95"
+ HREF="manual.html#SECTION00053000000000000000">Lexical rules</A>
+</UL>
+<LI><A NAME="tex2html96"
+ HREF="manual.html#SECTION00060000000000000000">Encodings, Platforms, and Unicode</A>
+<UL>
+<LI><A NAME="tex2html97"
+ HREF="manual.html#SECTION00061000000000000000">The Problem</A>
+<LI><A NAME="tex2html98"
+ HREF="manual.html#SECTION00062000000000000000">Scanning text files</A>
+<LI><A NAME="tex2html99"
+ HREF="manual.html#SECTION00063000000000000000">Scanning binaries</A>
+</UL>
+<LI><A NAME="tex2html100"
+ HREF="manual.html#SECTION00070000000000000000">A few words on performance</A>
+<UL>
+<LI><A NAME="tex2html101"
+ HREF="manual.html#SECTION00071000000000000000">Comparison of JLex and JFlex</A>
+<LI><A NAME="tex2html102"
+ HREF="manual.html#SECTION00072000000000000000">How to write a faster specification</A>
+</UL>
+<LI><A NAME="tex2html103"
+ HREF="manual.html#SECTION00080000000000000000">Porting Issues</A>
+<UL>
+<LI><A NAME="tex2html104"
+ HREF="manual.html#SECTION00081000000000000000">Porting from JLex</A>
+<LI><A NAME="tex2html105"
+ HREF="manual.html#SECTION00082000000000000000">Porting from lex/flex</A>
+</UL>
+<LI><A NAME="tex2html106"
+ HREF="manual.html#SECTION00090000000000000000">Working together</A>
+<UL>
+<LI><A NAME="tex2html107"
+ HREF="manual.html#SECTION00091000000000000000">JFlex and CUP</A>
+<LI><A NAME="tex2html108"
+ HREF="manual.html#SECTION00092000000000000000">JFlex and BYacc/J</A>
+</UL>
+<LI><A NAME="tex2html109"
+ HREF="manual.html#SECTION000100000000000000000">Bugs and Deficiencies</A>
+<UL>
+<LI><A NAME="tex2html110"
+ HREF="manual.html#SECTION000101000000000000000">Deficiencies</A>
+<LI><A NAME="tex2html111"
+ HREF="manual.html#SECTION000102000000000000000">Bugs</A>
+</UL>
+<LI><A NAME="tex2html112"
+ HREF="manual.html#SECTION000110000000000000000">Copying and License</A>
+<LI><A NAME="tex2html113"
+ HREF="manual.html#SECTION000120000000000000000">Bibliography</A>
+</UL>
+<!--End of Table of Contents-->
+
+<H1><A NAME="SECTION00020000000000000000"></A><A NAME="Intro"></A><BR>
+Introduction
+</H1>
+JFlex is a lexical analyzer generator for Java<A NAME="tex2html2"
+ HREF="#foot32"><SUP><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="footnote.png"></SUP></A>written in Java. It is also a rewrite of the very useful tool JLex [<A
+ HREF="manual.html#JLex">3</A>] which
+was developed by Elliot Berk at Princeton University. As Vern Paxon states
+for his C/C++ tool flex [<A
+ HREF="manual.html#flex">11</A>]: They do not share any code though.
+
+<P>
+
+<H2><A NAME="SECTION00021000000000000000">
+Design goals</A>
+</H2>
+The main design goals of JFlex are:
+
+<UL>
+<LI><B>Full unicode support</B>
+</LI>
+<LI><B>Fast generated scanners </B>
+</LI>
+<LI><B>Fast scanner generation</B>
+</LI>
+<LI><B>Convenient specification syntax</B>
+</LI>
+<LI><B>Platform independence</B>
+</LI>
+<LI><B>JLex compatibility</B>
+</LI>
+</UL>
+
+<P>
+
+<H2><A NAME="SECTION00022000000000000000">
+About this manual</A>
+</H2>
+This manual gives a brief but complete description of the tool JFlex. It
+assumes that you are familiar with the issue of lexical analysis. The references [<A
+ HREF="manual.html#Aho">1</A>],
+[<A
+ HREF="manual.html#Appel">2</A>], and [<A
+ HREF="manual.html#Maurer">13</A>] provide a good introduction to this topic.
+
+<P>
+The next section of this manual describes <A HREF="manual.html#Installing"><I>installation procedures</I></A>
+for JFlex. If you never worked with JLex or
+just want to compare a JLex and a JFlex scanner specification you
+should also read <A HREF="manual.html#Example"><I>Working with JFlex - an example</I></A>
+(section <A HREF="manual.html#Example"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>). All options and the complete
+specification syntax are presented in
+<A HREF="manual.html#Specifications"><I>Lexical specifications</I></A> (section <A HREF="manual.html#Specifications"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>);
+<A HREF="manual.html#sec:encodings"><I>Encodings, Platforms, and Unicode</I></A> (section <A HREF="manual.html#sec:encodings"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>)
+provides information about scannig text vs. binary files.
+If you are interested in performance
+considerations and comparing JLex with JFlex speed,
+<A HREF="manual.html#performance"><I>a few words on performance</I></A> (section <A HREF="manual.html#performance"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>)
+might be just right for you. Those who want to
+use their old JLex specifications may want to check out section <A HREF="manual.html#Porting"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>
+<A HREF="manual.html#Porting"><I>Porting from JLex</I></A> to avoid possible problems
+with not portable or non standard JLex behavior that has been fixed in
+JFlex. Section <A HREF="manual.html#lexport"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> talks about porting scanners from the
+Unix tools lex and flex. Interfacing JFlex scanners with the LALR
+parser generators CUP and BYacc/J is explained in <A HREF="manual.html#WorkingTog"><I>working
+ together</I></A> (section <A HREF="manual.html#WorkingTog"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>). Section <A HREF="manual.html#Bugs"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>
+<A HREF="manual.html#Bugs"><I>Bugs</I></A> gives a list of currently known active bugs.
+The manual concludes with notes about
+<A HREF="manual.html#Copyright"><I>Copying and License</I></A> (section <A HREF="manual.html#Copyright"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>) and
+<A HREF="manual.html#References">references</A>.
+
+<P>
+
+<H1><A NAME="SECTION00030000000000000000"></A><A NAME="Installing"></A><BR>
+Installing and Running JFlex
+</H1>
+
+<P>
+
+<H2><A NAME="SECTION00031000000000000000">
+Installing JFlex</A>
+</H2>
+
+<P>
+
+<H3><A NAME="SECTION00031100000000000000"></A><A NAME="install:windows"></A><BR>
+Windows
+</H3>
+To install JFlex on Windows 95/98/NT/XP, follow these three steps:
+
+<OL>
+<LI>Unzip the file you downloaded into the directory you want JFlex in (using
+something like
+<A NAME="tex2html3"
+ HREF="http://www.winzip.com">WinZip</A>).
+If you unzipped it to say <code>C:\</code>, the following directory structure
+should be generated:
+
+<PRE>
+C:\JFlex\
+ +--bin\ (start scripts)
+ +--doc\ (FAQ and manual)
+ +--examples\
+ +--binary\ (scanning binary files)
+ +--byaccj\ (calculator example for BYacc/J)
+ +--cup\ (calculator example for cup)
+ +--interpreter\ (interpreter example for cup)
+ +--java\ (Java lexer specification)
+ +--simple\ (example scanner)
+ +--standalone\ (a simple standalone scanner)
+ +--lib\ (the precompiled classes)
+ +--src\
+ +--JFlex\ (source code of JFlex)
+ +--JFlex\gui (source code of JFlex UI classes)
+ +--java_cup\runtime\ (source code of cup runtime classes)
+</PRE>
+
+<P>
+</LI>
+<LI>Edit the file <B><code>bin\jflex.bat</code></B>
+(in the example it's <code>C:\JFlex\bin\jflex.bat</code>)
+such that
+
+<P>
+
+<UL>
+<LI><B><TT>JAVA_HOME</TT></B> contains the directory where your Java JDK is installed
+ (for instance <code>C:\java</code>) and
+</LI>
+<LI><B><TT>JFLEX_HOME</TT></B> the directory that contains JFlex (in the example:
+ <code>C:\JFlex</code>)
+</LI>
+</UL>
+
+<P>
+</LI>
+<LI>Include the <code>bin\</code> directory of JFlex in your path.
+(the one that contains the start script, in the example: <code>C:\JFlex\bin</code>).
+</LI>
+</OL>
+
+<P>
+
+<H3><A NAME="SECTION00031200000000000000">
+Unix with tar archive</A>
+</H3>
+
+<P>
+To install JFlex on a Unix system, follow these two steps:
+
+<UL>
+<LI>Uncompress the archive into a directory of your choice
+ with GNU tar, for instance to <TT>/usr/share</TT>:
+
+<P>
+<TT>tar -C /usr/share -xvzf jflex-1.4.tar.gz</TT>
+
+<P>
+(The example is for site wide installation. You need to
+ be root for that. User installation works exactly the
+ same way--just choose a directory where you have write
+ permission)
+
+<P>
+</LI>
+<LI>Make a symbolic link from somewhere in your binary
+ path to <TT>bin/jflex</TT>, for instance:
+
+<P>
+<TT>ln -s /usr/share/JFlex/bin/jflex /usr/bin/jflex</TT>
+
+<P>
+If the java interpreter is not in your binary path, you
+ need to supply its location in the script <TT>bin/jflex</TT>.
+</LI>
+</UL>
+
+<P>
+You can verify the integrity of the downloaded file with
+the MD5 checksum available on the <A NAME="tex2html4"
+ HREF="http://www.jflex.de/download.html">JFlex download page</A>.
+If you put the checksum file in the same directory
+as the archive, you run:
+
+<P>
+<code>md5sum --check </code><TT>jflex-1.4.tar.gz.md5</TT>
+
+<P>
+It should tell you
+
+<P>
+<TT>jflex-1.4.tar.gz: OK</TT>
+
+<P>
+
+<H3><A NAME="SECTION00031300000000000000">
+Linux with RPM</A>
+</H3>
+
+<P>
+
+<UL>
+<LI>become root
+</LI>
+<LI>issue
+<BR> <TT>rpm -U jflex-1.4-0.rpm</TT>
+</LI>
+</UL>
+
+<P>
+You can verify the integrity of the downloaded <TT>rpm</TT> file with
+
+<P>
+<code>rpm --checksig </code><TT>jflex-1.4-0.rpm</TT>
+
+<P>
+This requires my pgp public key. If you don't have it, you can use
+
+<P>
+<code>rpm --checksig --nopgp </code><TT>jflex-1.4-0.rpm</TT>
+
+<P>
+or you can get it from <A NAME="tex2html5"
+ HREF="http://www.jflex.de/public-key.asc"><TT>http://www.jflex.de/public-key.asc</TT></A>.
+
+<P>
+
+<H2><A NAME="SECTION00032000000000000000">
+Running JFlex</A>
+</H2>
+You run JFlex with:
+
+<P>
+<TT>jflex <options> <inputfiles></TT>
+
+<P>
+It is also possible to skip the start script in <code>bin\</code>
+and include the file <code>lib\JFlex.jar</code>
+in your <TT>CLASSPATH</TT> environment variable instead.
+
+<P>
+Then you run JFlex with:
+
+<P>
+<TT>java JFlex.Main <options> <inputfiles></TT>
+
+<P>
+The input files and options are in both cases optional. If you don't provide a file name on
+the command line, JFlex will pop up a window to ask you for one.
+
+<P>
+JFlex knows about the following options:
+
+<P>
+<DL>
+<DT></DT>
+<DD><code>-d <directory></code>
+<BR> writes the generated file to the directory <code><directory></code>
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--skel <file></code>
+<BR> uses external skeleton <code><file></code>. This is mainly for JFlex
+ maintenance and special low level customizations. Use only when you
+ know what you are doing! JFlex comes with a skeleton file in the
+ <TT>src</TT> directory that reflects exactly the internal, precompiled
+ skeleton and can be used with the <TT>-skel</TT> option.
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--nomin</code>
+<BR> skip the DFA minimization step during scanner generation.
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--jlex</code>
+<BR> tries even harder to comply to JLex interpretation of specs.
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--dot</code>
+<BR> generate graphviz dot files for the NFA, DFA and minimized
+ DFA. This feature is still in alpha status, and not
+ fully implemented yet.
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--dump</code>
+<BR> display transition tables of NFA, initial DFA, and minimized DFA
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--verbose</code> or <TT>-v</TT>
+<BR> display generation progress messages (enabled by default)
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--quiet</code> or <TT>-q</TT>
+<BR> display error messages only (no chatter about what JFlex is
+ currently doing)
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--time</code>
+<BR> display time statistics about the code generation process
+ (not very accurate)
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--version</code>
+<BR> print version number
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--info</code>
+<BR> print system and JDK information (useful if you'd like
+ to report a problem)
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--pack</code>
+<BR> use the %pack code generation method by default
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--table</code>
+<BR> use the %table code generation method by default
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--switch</code>
+<BR> use the %switch code generation method by default
+
+<P>
+</DD>
+<DT></DT>
+<DD><code>--help</code> or <TT>-h</TT>
+<BR> print a help message explaining options and usage of JFlex.
+</DD>
+</DL>
+
+<P>
+
+<H1><A NAME="SECTION00040000000000000000"></A><A NAME="Example"></A><BR>
+A simple Example: How to work with JFlex
+</H1>
+To demonstrate what a lexical specification with JFlex looks like, this
+section presents a part of the specification for the Java language.
+The example does not describe the whole lexical structure of Java programs,
+but only a small and simplified part of it (some keywords, some operators,
+comments and only two kinds of literals). It also shows how to interface
+with the LALR parser generator CUP [<A
+ HREF="manual.html#CUP">8</A>] and therefore
+uses a class <TT>sym</TT> (generated by CUP), where integer constants for
+the terminal tokens of the CUP grammar are declared. JFlex comes with a
+directory <TT>examples</TT>, where you can find a small standalone scanner
+that doesn't need other tools like CUP to give you a running example.
+The "<TT>examples</TT>" directory also contains a <EM>complete</EM> JFlex
+specification of the lexical structure of Java programs together with the
+CUP parser specification for Java by
+<A NAME="tex2html6"
+ HREF="mailto:cananian@alumni.princeton.edu">C. Scott Ananian</A>, obtained
+from the CUP [<A
+ HREF="manual.html#CUP">8</A>] website (it was modified to interface with the JFlex scanner).
+Both specifications adhere to the Java Language Specification [<A
+ HREF="manual.html#LangSpec">7</A>].
+
+<P>
+<FONT SIZE="-1"><A NAME="CodeTop"></A></FONT><PRE>
+/* JFlex example: part of Java language lexer specification */
+import java_cup.runtime.*;
+
+/**
+ * This class is a simple example lexer.
+ */
+%%
+</PRE><FONT SIZE="-1">
+<A NAME="CodeOptions"></A></FONT><PRE>
+%class Lexer
+%unicode
+%cup
+%line
+%column
+</PRE><FONT SIZE="-1">
+<A NAME="CodeScannerCode"></A></FONT><PRE>
+%{
+ StringBuffer string = new StringBuffer();
+
+ private Symbol symbol(int type) {
+ return new Symbol(type, yyline, yycolumn);
+ }
+ private Symbol symbol(int type, Object value) {
+ return new Symbol(type, yyline, yycolumn, value);
+ }
+%}
+</PRE><FONT SIZE="-1">
+<A NAME="CodeMacros"></A></FONT><PRE>
+LineTerminator = \r|\n|\r\n
+InputCharacter = [^\r\n]
+WhiteSpace = {LineTerminator} | [ \t\f]
+
+/* comments */
+Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}
+
+TraditionalComment = "/*" [^*] ~"*/" | "/*" "*"+ "/"
+EndOfLineComment = "//" {InputCharacter}* {LineTerminator}
+DocumentationComment = "/**" {CommentContent} "*"+ "/"
+CommentContent = ( [^*] | \*+ [^/*] )*
+
+Identifier = [:jletter:] [:jletterdigit:]*
+
+DecIntegerLiteral = 0 | [1-9][0-9]*
+</PRE><FONT SIZE="-1">
+<A NAME="CodeStateDecl"></A></FONT><PRE>
+%state STRING
+
+%%
+</PRE><FONT SIZE="-1">
+<A NAME="CodeRulesYYINITIAL"></A></FONT><PRE>
+/* keywords */
+<YYINITIAL> "abstract" { return symbol(sym.ABSTRACT); }
+<YYINITIAL> "boolean" { return symbol(sym.BOOLEAN); }
+<YYINITIAL> "break" { return symbol(sym.BREAK); }
+</PRE><FONT SIZE="-1">
+<A NAME="CodeRulesBunch"></A></FONT><PRE>
+<YYINITIAL> {
+ /* identifiers */
+ {Identifier} { return symbol(sym.IDENTIFIER); }
+
+ /* literals */
+ {DecIntegerLiteral} { return symbol(sym.INTEGER_LITERAL); }
+ \" { string.setLength(0); yybegin(STRING); }
+
+ /* operators */
+ "=" { return symbol(sym.EQ); }
+ "==" { return symbol(sym.EQEQ); }
+ "+" { return symbol(sym.PLUS); }
+
+ /* comments */
+ {Comment} { /* ignore */ }
+
+ /* whitespace */
+ {WhiteSpace} { /* ignore */ }
+}
+</PRE><FONT SIZE="-1">
+<A NAME="CodeRulesYYtext"></A></FONT><PRE>
+<STRING> {
+ \" { yybegin(YYINITIAL);
+ return symbol(sym.STRING_LITERAL,
+ string.toString()); }
+ [^\n\r\"\\]+ { string.append( yytext() ); }
+ \\t { string.append('\t'); }
+ \\n { string.append('\n'); }
+
+ \\r { string.append('\r'); }
+ \\\" { string.append('\"'); }
+ \\ { string.append('\\'); }
+}
+</PRE><FONT SIZE="-1">
+<A NAME="CodeRulesAllStates"></A></FONT><PRE>
+/* error fallback */
+.|\n { throw new Error("Illegal character <"+
+ yytext()+">"); }
+</PRE>
+
+<P>
+From this specification JFlex generates a <TT>.java</TT> file with one
+class that contains code for the scanner. The class will have a
+constructor taking a <TT>java.io.Reader</TT> from which the input is
+read. The class will also have a function <TT>yylex()</TT> that runs the
+scanner and that can be used to get the next token from the input (in this
+example the function actually has the name <TT>next_token()</TT> because
+the specification uses the <TT>%cup</TT> switch).
+
+<P>
+As with JLex, the specification consists of three parts, divided by <TT>%%</TT>:
+
+<UL>
+<LI><A HREF="manual.html#ExampleUserCode">usercode</A>,
+</LI>
+<LI><A HREF="manual.html#ExampleOptions">options and declarations</A> and
+</LI>
+<LI><A HREF="manual.html#ExampleLexRules">lexical rules</A>.
+</LI>
+</UL>
+
+<P>
+
+<H2><A NAME="SECTION00041000000000000000"></A><A NAME="ExampleUserCode"></A><BR>
+Code to include
+</H2>
+Let's take a look at the first section, ``user code'': The text up to the
+first line starting with <TT>%%</TT> is copied verbatim to the top
+of the generated lexer class (before the actual class declaration).
+Beside <TT>package</TT> and <TT>import</TT> statements there is usually not much
+to do here. If the code ends with a javadoc class comment, the generated class
+will get this comment, if not, JFlex will generate one automatically.
+
+<P>
+
+<H2><A NAME="SECTION00042000000000000000"></A><A NAME="ExampleOptions"></A><BR>
+Options and Macros
+</H2>
+The second section ``options and declarations'' is more interesting. It consists
+of a set of options, code that is included inside the generated scanner
+class, lexical states and macro declarations. Each JFlex option must begin
+a line of the specification and starts with a <TT>%</TT>. In our example
+the following options are used:
+
+<P>
+
+<UL>
+<LI><TT><A HREF="manual.html#CodeOptions">%class Lexer</A></TT> tells JFlex to give the
+ generated class the name ``Lexer'' and to write the code to a file ``<TT>Lexer.java</TT>''.
+
+<P>
+</LI>
+<LI><TT><A HREF="manual.html#CodeOptions">%unicode</A></TT> defines the set of characters the scanner will
+ work on. For scanning text files, <TT>%unicode</TT> should always be used. See also
+ section <A HREF="manual.html#sec:encodings"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> for more information on character sets, encodings, and
+ scanning text vs. binary files.
+
+<P>
+</LI>
+<LI><TT><A HREF="manual.html#CodeOptions">%cup</A></TT> switches to CUP compatibility
+ mode to interface with a CUP generated parser.
+
+<P>
+</LI>
+<LI><TT><A HREF="manual.html#CodeOptions">%line</A></TT> switches line counting on (the
+ current line number can be accessed via the variable <TT>yyline</TT>)
+
+<P>
+</LI>
+<LI><TT><A HREF="manual.html#CodeOptions">%column</A></TT> switches column counting on
+ (current column is accessed via <TT>yycolumn</TT>)
+
+<P>
+</LI>
+</UL>
+<A NAME="ExampleScannerCode"></A>
+<P>
+The code included in <TT><A HREF="manual.html#CodeScannerCode">%{...%}</A></TT>
+is copied verbatim into the generated lexer class source.
+Here you can declare member variables and functions that are used
+inside scanner actions. In our example we declare a <TT>StringBuffer</TT> ``<TT>string</TT>''
+in which we will store parts of string literals and two helper functions
+``<TT>symbol</TT>'' that create <TT>java_cup.runtime.Symbol</TT> objects
+with position information of the current token (see section <A HREF="manual.html#CUPWork"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>
+<A HREF="manual.html#CUPWork"><I>JFlex and CUP</I></A>
+for how to interface with the parser generator CUP). As JFlex options, both
+<code>%{</code> and <code>\%}</code> must begin a line.
+<A NAME="ExampleMacros"></A>
+<P>
+The specification continues with macro declarations. Macros are
+abbreviations for regular expressions, used to make lexical specifications
+easier to read and understand. A macro declaration
+consists of a macro identifier followed by <TT>=</TT>, then followed by
+the regular expression it represents. This regular expression may
+itself contain macro usages. Although this allows a grammar like specification
+style, macros are still just abbreviations and not non terminals - they
+cannot be recursive or mutually recursive. Cycles in macro definitions
+are detected and reported at generation time by JFlex.
+
+<P>
+Here some of the example macros in more detail:
+
+<UL>
+<LI><TT><A HREF="manual.html#CodeMacros">LineTerminator</A></TT> stands for the regular
+ expression that matches an ASCII CR, an ASCII LF or an CR followed by LF.
+
+<P>
+</LI>
+<LI><TT><A HREF="manual.html#CodeMacros">InputCharacter</A></TT> stands for all characters
+ that are not a CR or LF.
+
+<P>
+</LI>
+<LI><TT><A HREF="manual.html#CodeMacros">TraditionalComment</A></TT> is the expression
+ that matches the string <TT>"/*"</TT> followed by a character that is not
+ a <TT>*</TT> followed by anything that matches the macro
+ <TT><A HREF="manual.html#CodeMacros">CommentContent</A></TT>
+ followed by any number of <TT>*</TT> followed by <TT>/</TT>.
+
+<P>
+</LI>
+<LI><TT><A HREF="manual.html#CodeMacros">CommentContent</A></TT> matches zero or more
+ occurrences of any character except a <TT>*</TT> or any number of
+ <TT>*</TT> followed by a character that is not a <TT>/</TT>
+
+<P>
+</LI>
+<LI><TT><A HREF="manual.html#CodeMacros">Identifier</A></TT> matches each string that
+ starts with a character of class <TT>jletter</TT> followed by zero or more characters
+ of class <TT>jletterdigit</TT>. <TT>jletter</TT> and <TT>jletterdigit</TT>
+ are predefined character classes. <TT>jletter</TT> includes all characters for which
+ the Java function <TT>Character.isJavaIdentifierStart</TT> returns <TT>true</TT> and
+ <TT>jletterdigit</TT> all characters for that <TT>Character.isJavaIdentifierPart</TT>
+ returns <TT>true</TT>.
+</LI>
+</UL>
+<A NAME="ExampleStateDecl"></A>
+<P>
+The last part of the second section in our
+lexical specification is a lexical state declaration:
+<TT><A HREF="manual.html#CodeStateDecl">%state STRING</A></TT>
+declares a lexical state <TT>STRING</TT> that can be
+used in the ``lexical rules'' part of the specification. A state declaration
+is a line starting with <TT>%state</TT> followed by a space or comma
+separated list of state identifiers. There can be more than one line starting
+with <TT>%state</TT>.
+
+<P>
+
+<H2><A NAME="SECTION00043000000000000000"></A><A NAME="ExampleLexRules"></A><BR>
+Rules and Actions
+</H2>
+The "lexical rules" section of a JFlex specification contains regular expressions
+and actions (Java code) that are executed when the scanner matches the
+associated regular expression. As the scanner reads its input, it keeps
+track of all regular expressions and activates the action of the expression
+that has the longest match. Our specification above for instance would with input
+"<TT>breaker</TT>" match the regular expression for <TT><A HREF="manual.html#CodeMacros">Identifier</A></TT>
+and not the keyword "<TT><A HREF="manual.html#CodeRulesYYINITIAL">break</A></TT>"
+followed by the Identifier "<TT>er</TT>", because rule <code>{Identifier}</code>
+matches more of this input at once (i.e. it matches all of it)
+than any other rule in the specification. If two regular expressions both
+have the longest match for a certain input, the scanner chooses the action
+of the expression that appears first in the specification. In that way, we
+get for input "<TT>break</TT>" the keyword "<TT>break</TT>" and not an
+Identifier "<TT>break</TT>".
+
+<P>
+Additional to regular expression matches, one can use lexical states to
+refine a specification. A lexical state acts like a start condition.
+If the scanner is in lexical state <TT>STRING</TT>, only expressions that
+are preceded by the start condition <TT><STRING></TT> can be matched.
+A start condition of a regular expression can contain more than one lexical
+state. It is then matched when the lexer is in any of these lexical states.
+The lexical state <TT>YYINITIAL</TT> is predefined and is also the state
+in which the lexer begins scanning. If a regular expression has no start
+conditions it is matched in <EM>all</EM> lexical states.
+<A NAME="ExampleRulesStateBunch"></A>
+<P>
+Since you often have a bunch of expressions with the same start conditions,
+JFlex allows the same abbreviation as the Unix tool <TT>flex</TT>:
+<PRE>
+<STRING> {
+ expr1 { action1 }
+ expr2 { action2 }
+}
+</PRE>
+means that both <TT>expr1</TT> and <TT>expr2</TT> have start condition <TT><STRING></TT>.
+<A NAME="ExampleRulesYYINITIAL"></A>
+<P>
+The first three rules in our example demonstrate the syntax of a regular
+expression preceded by the start condition <TT><YYINITIAL></TT>.
+
+<P>
+<TT><A HREF="manual.html#CodeRulesYYINITIAL"><YYINITIAL> "abstract"</A><code> {</code> return symbol(sym.ABSTRACT); <code>}</code></TT>
+
+<P>
+matches the input "<TT>abstract</TT>" only if the scanner is in its
+start state "<TT>YYINITIAL</TT>". When the string "<TT>abstract</TT>" is
+matched, the scanner function returns the CUP symbol <TT>sym.ABSTRACT</TT>.
+If an action does not return a value, the scanning process is resumed immediately
+after executing the action.
+<A NAME="ExampleRulesBunch"></A>
+<P>
+The rules enclosed in
+
+<P>
+<TT><A HREF="manual.html#CodeRulesBunch"><YYINITIAL> {
+<BR> ...
+<BR>}</A></TT>
+
+<P>
+demonstrate the abbreviated syntax and are also only matched in state <TT>YYINITIAL</TT>.
+<A NAME="ExampleRulesYYbegin"></A>
+<P>
+Of these rules, one may be of special interest:
+
+<P>
+<code>\" { </code> <TT><A HREF="manual.html#CodeRulesBunch">string.setLength(0); yybegin(STRING);</A></TT><code> }</code>
+
+<P>
+If the scanner matches a double quote in state <TT>YYINITIAL</TT> we
+have recognized the start of a string literal. Therefore we clear our <TT>StringBuffer</TT>
+that will hold the content of this string literal and tell the scanner
+with <TT>yybegin(STRING)</TT> to switch into the lexical state <TT>STRING</TT>.
+Because we do not yet return a value to the parser, our scanner proceeds
+immediately.
+<A NAME="ExampleRulesYYtext"></A>
+<P>
+In lexical state <TT>STRING</TT> another
+rule demonstrates how to refer to the input that has been matched:
+
+<P>
+<code>[^\n\r\"]+ { </code> <TT><A HREF="manual.html#CodeRulesYYtext">string.append( yytext() );</A></TT><code> }</code>
+
+<P>
+The expression <code>[^\n\r\"]+</code> matches
+all characters in the input up to the next backslash (indicating an
+escape sequence such as <code>\n</code>), double quote (indicating the end
+of the string), or line terminator (which must not occur in a string literal).
+The matched region of the input is referred to with <TT><A HREF="manual.html#CodeRulesYYtext">yytext()</A></TT>
+and appended to the content of the string literal parsed so far.
+<A NAME="ExampleRuleLast"></A>
+<P>
+The last lexical rule in the example specification
+is used as an error fallback. It matches any character in any state that
+has not been matched by another rule. It doesn't conflict with any other
+rule because it has the least priority (because it's the last rule) and
+because it matches only one character (so it can't have longest match
+precedence over any other rule).
+
+<P>
+
+<H2><A NAME="SECTION00044000000000000000">
+How to get it going</A>
+</H2>
+
+<UL>
+<LI>Install JFlex (see section <A HREF="manual.html#Installing"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> <A HREF="manual.html#Installing"><I>Installing JFlex</I></A>)
+
+<P>
+</LI>
+<LI>If you have written your specification file (or chosen one from the <TT>examples</TT>
+directory), save it (say under the name <TT>java-lang.flex</TT>).
+
+<P>
+</LI>
+<LI>Run JFlex with
+
+<P>
+<TT>jflex java-lang.flex</TT>
+
+<P>
+</LI>
+<LI>JFlex should then report some progress messages about generating the scanner
+and write the generated code to the directory of your specification file.
+
+<P>
+</LI>
+<LI>Compile the generated <TT>.java</TT> file and your own classes. (If you
+use CUP, generate your parser classes first)
+
+<P>
+</LI>
+<LI>That's it.
+</LI>
+</UL>
+
+<P>
+
+<H1><A NAME="SECTION00050000000000000000"></A><A NAME="Specifications"></A><BR>
+Lexical Specifications
+</H1>
+As shown above, a lexical specification file for JFlex consists of three
+parts divided by a single line starting with <TT>%%</TT>:
+
+<P>
+<TT><A HREF="manual.html#SpecUsercode">UserCode</A></TT>
+<BR><TT>%%</TT>
+<BR><TT><A HREF="manual.html#SpecOptions">Options and declarations</A></TT>
+<BR><TT>%%</TT>
+<BR><TT><A HREF="manual.html#LexRules">Lexical rules</A></TT>
+
+<P>
+In all parts of the specification comments of the form
+<TT>/* comment text */</TT> and the Java style end of line comments starting with <TT>//</TT>
+are permitted. JFlex comments do nest - so the number of <TT>/*</TT> and <TT>*/</TT>
+should be balanced.
+
+<P>
+
+<H2><A NAME="SECTION00051000000000000000"></A><A NAME="SpecUsercode"></A><BR>
+User code
+</H2>
+The first part contains user code that is copied verbatim into the beginning
+of the source file of the generated lexer before the scanner class is declared.
+As shown in the example above, this is the place to put <TT>package</TT>
+declarations and <TT>import</TT>
+statements. It is possible, but not considered as good Java programming
+style to put own helper class (such as token classes) in this section.
+They should get their own <TT>.java</TT> file instead.
+
+<P>
+
+<H2><A NAME="SECTION00052000000000000000"></A><A NAME="SpecOptions"></A><BR>
+Options and declarations
+</H2>
+The second part of the lexical specification contains <A HREF="manual.html#SpecOptDirectives">options</A>
+to customize your generated lexer (JFlex directives and Java code to include in
+different parts of the lexer), declarations of <A HREF="manual.html#StateDecl">lexical states</A> and
+<A HREF="manual.html#MacroDefs">macro definitions</A> for use in the third section
+<A HREF="manual.html#LexRules">``Lexical rules''</A> of the lexical specification file.
+<A NAME="SpecOptDirectives"></A>
+<P>
+Each JFlex directive must be situated at the beginning of a line
+and starts with the <TT>%</TT> character. Directives that have one or
+more parameters are described as follows:
+
+<P>
+<TT>%class "classname"</TT>
+
+<P>
+means that you start a line with <TT>%class</TT> followed by a space followed
+by the name of the class for the generated scanner (the double quotes are
+<I>not</I> to be entered, see the <A HREF="manual.html#CodeOptions">example specification</A> in
+section <A HREF="manual.html#CodeOptions"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>).
+
+<P>
+
+<H3><A NAME="SECTION00052100000000000000"></A><A NAME="ClassOptions"></A><BR>
+Class options and user class code
+</H3>
+These options regard name, constructor, API, and related parts of the
+generated scanner class.
+
+<UL>
+<LI><B><TT>%class "classname"</TT></B>
+
+<P>
+Tells JFlex to give the generated class the name "<TT>classname</TT>" and to
+write the generated code to a file "<TT>classname.java</TT>". If the
+<TT>-d <directory></TT> command line option is not used, the code
+will be written to the directory where the specification file resides. If
+no <TT>%class</TT> directive is present in the specification, the generated
+class will get the name "<TT>Yylex</TT>" and will be written to a file
+"<TT>Yylex.java</TT>". There should be only one <TT>%class</TT> directive
+in a specification.
+
+<P>
+</LI>
+<LI><B><TT>%implements "interface 1"[, "interface 2", ..]</TT></B>
+
+<P>
+Makes the generated class implement the specified interfaces. If more than
+one <TT>%implements</TT> directive is present, all the specified interfaces
+will be implemented.
+
+<P>
+</LI>
+<LI><B><TT>%extends "classname"</TT></B>
+
+<P>
+Makes the generated class a subclass of the class ``<TT>classname</TT>''.
+There should be only one <TT>%extends</TT> directive in a specification.
+
+<P>
+</LI>
+<LI><B><TT>%public</TT></B>
+
+<P>
+Makes the generated class public (the class is only accessible in its
+own package by default).
+
+<P>
+</LI>
+<LI><B><TT>%final</TT></B>
+
+<P>
+Makes the generated class final.
+
+<P>
+</LI>
+<LI><B><TT>%abstract</TT></B>
+
+<P>
+Makes the generated class abstract.
+
+<P>
+</LI>
+<LI><B><TT>%apiprivate</TT></B>
+
+<P>
+Makes all generated methods and fields of the class
+private. Exceptions are the constructor, user code in the
+specification, and, if <code>%cup</code> is present, the method
+<TT>next_token</TT>. All occurences of
+<TT>" public "</TT> (one space character before and after <TT>public</TT>)
+in the skeleton file are replaced by
+<TT>" private "</TT> (even if a user-specified skeleton is used).
+Access to the genarated class is expected to be mediated by user class
+code (see next switch).
+
+<P>
+</LI>
+<LI><B><code>%{</code></B>
+<BR><B><TT>...</TT></B>
+<BR><B><code>%}</code></B>
+
+<P>
+The code enclosed in <code>%{</code> and <code>%}</code> is copied verbatim
+into the generated class. Here you can define your own member variables
+and functions in the generated scanner. Like all options, both <code>%{</code>
+and <code>%}</code> must start a line in the specification. If more than one
+class code directive <code>%{...%}</code> is present, the code is concatenated
+in order of appearance in the specification.
+
+<P>
+</LI>
+<LI><B><code>%init{</code></B>
+<BR><B><TT>...</TT></B>
+<BR><B><code>%init}</code></B>
+
+<P>
+The code enclosed in <code>%init{</code> and <code>%init}</code> is copied
+verbatim into the constructor of the generated class. Here, member
+variables declared in the <code>%{...%}</code> directive can be initialized.
+If more than one initializer option is present, the code is concatenated
+in order of appearance in the specification.
+
+<P>
+</LI>
+<LI><B><code>%initthrow{</code></B>
+<BR><B><TT>"exception1"[, "exception2", ...]</TT></B>
+<BR><B><code>%initthrow}</code></B>
+
+<P>
+or (on a single line) just
+
+<P>
+<B><TT>%initthrow "exception1" [, "exception2", ...]</TT></B>
+
+<P>
+Causes the specified exceptions to be declared in the <TT>throws</TT>
+clause of the constructor. If more than one <code>%initthrow{</code> <TT>...</TT> <code>%initthrow}</code>
+directive is present in the specification, all specified exceptions will
+be declared.
+
+<P>
+</LI>
+<LI><B><TT>%scanerror "exception"</TT></B>
+
+<P>
+Causes the generated scanner to throw an instance of the specified
+exception in case of an internal error (default is
+<TT>java.lang.Error</TT>). Note that this exception is only for
+internal scanner errors. With usual specifications it should never
+occur (i.e. if there is an error fallback rule in the specification
+and only the documented scanner API is used).
+
+<P>
+</LI>
+<LI><B><TT>%buffer "size"</TT></B>
+
+<P>
+Set the initial size of the scan buffer to the specified value
+(decimal, in bytes). The default value is 16384.
+
+<P>
+</LI>
+<LI><B><TT>%include "filename"</TT></B>
+
+<P>
+Replaces the <TT>%include</TT> verbatim by the specified file. This
+feature is still experimental. It works, but error reporting can be
+strange if a syntax error occurs on the last token in the included
+file.
+
+<P>
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION00052200000000000000"></A><A NAME="ScanningMethod"></A><BR>
+Scanning method
+</H3>
+This section shows how the scanning method can be customized. You can redefine
+the name and return type of the method and it is possible to declare
+exceptions that may be thrown in one of the actions of the specification.
+If no return type is specified, the scanning method will be declared as
+returning values of class <TT>Yytoken</TT>.
+
+<UL>
+<LI><B><TT>%function "name"</TT></B>
+
+<P>
+Causes the scanning method to get the specified name. If no <TT>%function</TT>
+directive is present in the specification, the scanning method gets the
+name ``<TT>yylex</TT>''. This directive overrides settings of the
+<TT><A HREF="manual.html#CupMode">%cup</A></TT> switch. Please note that the default name
+of the scanning method with the <TT><A HREF="manual.html#CupMode">%cup</A></TT> switch is
+<TT>next_token</TT>. Overriding this name might lead to the generated scanner
+being implicitly declared as <TT>abstract</TT>, because it does not provide
+the method <TT>next_token</TT> of the interface <TT>java_cup.runtime.Scanner</TT>.
+It is of course possible to provide a dummy implemention of that method
+in the class code section, if you still want to override the function name.
+
+<P>
+</LI>
+<LI><B><TT>%integer</TT></B>
+<BR><B><TT>%int</TT></B>
+
+<P>
+Both cause the scanning method to be declared as of Java type <TT>int</TT>.
+Actions in the specification can then return <TT>int</TT> values as tokens.
+The default end of file value under this setting is <TT>YYEOF</TT>, which is a <TT>public
+static final int</TT> member of the generated class.
+
+<P>
+</LI>
+<LI><B><TT>%intwrap</TT></B>
+
+<P>
+Causes the scanning method to be declared as of the Java wrapper type
+<TT>Integer</TT>. Actions in the specification can then return <TT>Integer</TT>
+values as tokens. The default end of file value under this setting is <TT>null</TT>.
+
+<P>
+</LI>
+<LI><B><TT>%type "typename"</TT></B>
+
+<P>
+Causes the scanning method to be declared as returning values of the specified type.
+Actions in the specification can then return values of <TT>typename</TT>
+as tokens. The default end of file value under this setting is <TT>null</TT>.
+If <TT>typename</TT> is not a subclass of <TT>java.lang.Object</TT>,
+you should specify another end of file value using the
+<A HREF="manual.html#eofval"><TT>%eofval{</TT> <TT>...</TT> <TT>%eofval}</TT></A>
+directive or the <A HREF="manual.html#EOFRule"><TT><<EOF>></TT> rule</A>.
+The <TT>%type</TT> directive overrides settings of the
+<TT><A HREF="manual.html#CupMode">%cup</A></TT> switch.
+
+<P>
+</LI>
+<LI><B><code>%yylexthrow{</code></B>
+<BR><B><TT>"exception1"[, "exception2", ... ]</TT></B>
+<BR><B><code>%yylexthrow}</code></B>
+
+<P>
+or (on a single line) just
+
+<P>
+<B><TT>%yylexthrow "exception1" [, "exception2", ...]</TT></B>
+
+<P>
+The exceptions listed inside <code>%yylexthrow{</code> <TT>...</TT> <code>%yylexthrow}</code>
+will be declared in the throws clause of the scanning method. If there is
+more than one <code>%yylexthrow{</code> <TT>...</TT> <code>%yylexthrow}</code> clause in
+the specification, all specified exceptions will be declared.
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION00052300000000000000"></A><A NAME="EOF"></A><BR>
+The end of file
+</H3>
+There is always a default value that the scanning method will return when
+the end of file has been reached. You may however define a specific value
+to return and a specific piece of code that should be executed when the
+end of file is reached.
+
+<P>
+The default end of file values depends on the return type of the scanning method:
+
+<UL>
+<LI>For <B><TT>%integer</TT></B>, the scanning method will return the value
+<B><TT>YYEOF</TT></B>, which is a <TT>public static final int</TT> member
+of the generated class.
+
+<P>
+</LI>
+<LI>For <B><TT>%intwrap</TT></B>,
+</LI>
+<LI>no specified type at all, or a
+</LI>
+<LI>user defined type, declared using <B><TT>%type</TT></B>, the value is <B><TT>null</TT></B>.
+
+<P>
+</LI>
+<LI>In CUP compatibility mode, using <B><TT>%cup</TT></B>, the value is
+
+<P>
+<B><TT>new java_cup.runtime.Symbol(sym.EOF)</TT></B>
+</LI>
+</UL>
+
+<P>
+User values and code to be executed at the end of file can be defined using these directives:
+
+<A NAME="eofval"></A><UL>
+<LI><B><code>%eofval{</code></B>
+<BR><B><TT>...</TT></B>
+<BR><B><code>%eofval}</code></B>
+
+<P>
+The code included in <code>%eofval{</code> <TT>...</TT> <code>%eofval}</code> will
+be copied verbatim into the scanning method and will be executed <EM>each time</EM>
+when the end of file is reached (this is possible when
+the scanning method is called again after the end of file has been
+reached). The code should return the value that indicates the end of
+file to the parser. There should be only one <code>%eofval{</code>
+<TT>...</TT> <code>%eofval}</code> clause in the specification.
+The <code>%eofval{ ... %eofval}</code> directive overrides settings of the
+<TT><A HREF="manual.html#CupMode">%cup</A></TT> switch and <TT><A HREF="manual.html#YaccMode">%byaccj</A></TT> switch.
+As of version 1.2 JFlex provides
+a more readable way to specify the end of file value using the
+<A HREF="manual.html#EOFRule"><TT><<EOF>></TT> rule</A> (see also section <A HREF="manual.html#EOFRule"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>).
+
+<P>
+</LI>
+<LI><A NAME="eof"></A> <B><code>%eof{</code></B>
+<BR> <B><TT>...</TT></B>
+<BR> <B><code>%eof}</code></B>
+
+<P>
+The code included in <code>%{eof ... %eof}</code> will be executed
+ exactly once, when the end of file is reached. The code is included
+ inside a method <TT>void yy_do_eof()</TT> and should not return any
+ value (use <code>%eofval{...%eofval}</code> or
+ <A HREF="manual.html#EOFRule"><TT><<EOF>></TT></A> for this purpose). If more than one
+ end of file code directive is present, the code will be concatenated
+ in order of appearance in the specification.
+
+<P>
+</LI>
+<LI><B><code>%eofthrow{</code></B>
+<BR> <B><TT>"exception1"[,"exception2", ... ]</TT></B>
+<BR> <B><code>%eofthrow}</code></B>
+
+<P>
+or (on a single line) just
+
+<P>
+<B><TT>%eofthrow "exception1" [, "exception2", ...]</TT></B>
+
+<P>
+The exceptions listed inside <code>%eofthrow{...%eofthrow}</code> will
+ be declared in the throws clause of the method <TT>yy_do_eof()</TT>
+ (see <A HREF="manual.html#eof"><TT>%eof</TT></A> for more on that method).
+ If there is more than one <code>%eofthrow{...%eofthrow}</code> clause
+ in the specification, all specified exceptions will be declared.
+
+<P>
+<A NAME="eofclose"></A></LI>
+<LI><B><TT>%eofclose</TT></B>
+
+<P>
+Causes JFlex to close the input stream at the end of file. The code
+ <TT>yyclose()</TT> is appended to the method <TT>yy_do_eof()</TT>
+ (together with the code specified in <code>%eof{...%eof}</code>) and
+ the exception <TT>java.io.IOException</TT> is declared in the throws
+ clause of this method (together with those of
+ <code>%eofthrow{...%eofthrow}</code>)
+
+<P>
+</LI>
+<LI><B><TT>%eofclose false</TT></B>
+
+<P>
+Turns the effect of <TT>%eofclose</TT> off again (e.g. in case closing of
+ input stream is not wanted after <TT>%cup</TT>).
+
+<P>
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION00052400000000000000"></A><A NAME="Standalone"></A><BR>
+Standalone scanners
+</H3>
+
+<UL>
+<LI><B><TT>%debug</TT></B>
+
+<P>
+Creates a main function in the generated class that expects the name
+of an input file on the command line and then runs the scanner on this
+input file by printing information about each returned token to the Java
+console until the end of file is reached. The information includes:
+line number (if line counting is enabled), column (if column counting is enabled),
+the matched text, and the executed action (with line number in the specification).
+
+<P>
+</LI>
+<LI><B><TT>%standalone</TT></B>
+
+<P>
+Creates a main function in the generated class that expects the name
+of an input file on the command line and then runs the scanner on this
+input file. The values returned by the scanner are ignored, but any unmatched
+text is printed to the Java console instead (as the C/C++ tool flex does, if
+run as standalone program). To avoid having to use an extra token class, the
+scanning method will be declared as having default type <TT>int</TT>, not <TT>YYtoken</TT>
+(if there isn't any other type explicitly specified).
+This is in most cases irrelevant, but could be useful to know when making
+another scanner standalone for some purpose. You should also consider using
+the <TT>%debug</TT> directive, if you just want to be able to run the scanner
+without a parser attached for testing etc.
+
+<P>
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION00052500000000000000"></A><A NAME="CupMode"></A><BR>
+CUP compatibility
+</H3>
+You may also want to read section <A HREF="manual.html#CUPWork"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> <A HREF="manual.html#CUPWork"><I>JFlex and CUP</I></A>
+if you are interested in how to interface your generated
+scanner with CUP.
+
+<UL>
+<LI><B><TT>%cup</TT></B>
+
+<P>
+The <TT>%cup</TT> directive enables the CUP compatibility mode and is equivalent
+to the following set of directives:
+
+<P>
+<PRE>
+%implements java_cup.runtime.Scanner
+%function next_token
+%type java_cup.runtime.Symbol
+%eofval{
+ return new java_cup.runtime.Symbol(<CUPSYM>.EOF);
+%eofval}
+%eofclose
+</PRE>
+
+<P>
+The value of <TT><CUPSYM></TT> defaults to <TT>sym</TT> and can be
+changed with the <TT>%cupsym</TT> directive. In JLex compatibility
+mode (<TT>-jlex</TT> switch on the command line), <TT>%eofclose</TT>
+will not be turned on.
+
+<P>
+</LI>
+<LI><B><TT>%cupsym "classname"</TT></B>
+
+<P>
+Customizes the name of the CUP generated class/interface
+containing the names of terminal tokens. Default is <TT>sym</TT>.
+The directive should not be used after <TT>%cup</TT>, but before.
+
+<P>
+</LI>
+<LI><B><TT>%cupdebug</TT></B>
+
+<P>
+Creates a main function in the generated class that expects the name
+of an input file on the command line and then runs the scanner on this
+input file. Prints line, column, matched text, and CUP symbol name for
+each returned token to standard out.
+
+<P>
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION00052600000000000000"></A><A NAME="YaccMode"></A><BR>
+BYacc/J compatibility
+</H3>
+You may also want to read section <A HREF="manual.html#YaccWork"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> <A HREF="manual.html#YaccWork"><I>JFlex and BYacc/J</I></A>
+if you are interested in how to interface your generated
+scanner with Byacc/J.
+
+<UL>
+<LI><B><TT>%byacc</TT></B>
+
+<P>
+The <TT>%byacc</TT> directive enables the BYacc/J compatibility mode and is equivalent
+to the following set of directives:
+
+<P>
+<PRE>
+%integer
+%eofval{
+ return 0;
+%eofval}
+%eofclose
+</PRE>
+
+<P>
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION00052700000000000000"></A><A NAME="CodeGeneration"></A><BR>
+Code generation
+</H3>
+The following options define what kind of lexical analyzer code JFlex
+will produce. <TT>%pack</TT> is the default setting and will be used,
+when no code generation method is specified.
+
+<P>
+
+<UL>
+<LI><B><TT>%switch</TT></B>
+
+<P>
+With <TT>%switch</TT> JFlex will generate a scanner that has
+ the DFA hard coded into a nested switch statement. This method gives
+ a good deal of compression in terms of the size of the compiled
+ <TT>.class</TT> file while still providing very good performance. If your
+ scanner gets to big though (say more than about 200 states)
+ performance may vastly degenerate and you should consider using one
+ of the <TT>%table</TT> or <TT>%pack</TT> directives. If your scanner
+ gets even bigger (about 300 states), the Java compiler <TT>javac</TT>
+ could produce corrupted code, that will crash when executed or will
+ give you an <TT>java.lang.VerifyError</TT> when checked by the virtual
+ machine. This is due to the size limitation of 64 KB of Java
+ methods as described in the Java Virtual Machine Specification
+ [<A
+ HREF="manual.html#MachineSpec">10</A>]. In this case you will be forced to use the
+ <TT>%pack</TT> directive, since <TT>%switch</TT>
+ usually provides more compression of the DFA table than the
+ <TT>%table</TT> directive.
+
+<P>
+</LI>
+<LI><B><TT>%table</TT></B>
+
+<P>
+The <TT>%table</TT> direction causes JFlex to produce a classical
+ table driven scanner that encodes its DFA table in an array. In
+ this mode, JFlex only does a small amount of table compression (see
+ [<A
+ HREF="manual.html#ParseTable">6</A>], [<A
+ HREF="manual.html#SparseTable">12</A>], [<A
+ HREF="manual.html#Aho">1</A>] and [<A
+ HREF="manual.html#Maurer">13</A>]
+ for more details on the matter of table compression) and uses the
+ same method that JLex did up to version 1.2.1. See section <A HREF="manual.html#performance"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>
+ <A HREF="manual.html#performance">performance</A> of this manual to compare
+ these methods. The same reason as above (64 KB size limitation of
+ methods) causes the same problem, when the scanner gets too big.
+ This is, because the virtual machine treats static initializers of
+ arrays as normal methods. You will in this case again be forced to
+ use the <TT>%pack</TT> directive to avoid the problem.
+
+<P>
+</LI>
+<LI><B><TT>%pack</TT></B>
+
+<P>
+<TT>%pack</TT> causes JFlex to compress the generated DFA table and to
+ store it in one or more string literals. JFlex takes care that the
+ strings are not longer than permitted by the class file format.
+ The strings have to be unpacked when
+ the first scanner object is created and initialized.
+ After unpacking the internal access to the DFA table is exactly the
+ same as with option <TT>%table</TT> -- the only extra work to be done
+ at runtime is the unpacking process which is quite fast (not noticeable
+ in normal cases). It is in time complexity proportional to the
+ size of the expanded DFA table, and it is static,
+ i.e. it is done only once for a certain scanner class -- no matter
+ how often it is instantiated. Again, see section
+ <A HREF="manual.html#performance"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> <A HREF="manual.html#performance">performance</A>
+ on the performance of these scanners
+ With <TT>%pack</TT>, there should be practically no
+ limitation to the size of the scanner. <TT>%pack</TT> is the default
+ setting and will be used when no code generation method is specified.
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION00052800000000000000"></A><A NAME="CharacterSets"></A><BR>
+Character sets
+</H3>
+
+<UL>
+<LI><B><TT>%7bit</TT></B>
+
+<P>
+Causes the generated scanner to use an 7 bit input character set (character
+codes 0-127). Because this is the default value in JLex, JFlex also defaults
+to 7 bit scanners. If an input character with a code greater than 127 is
+encountered in an input at runtime, the scanner will throw an <TT>ArrayIndexOutofBoundsException</TT>.
+Not only because of this, you should consider using the <TT>%unicode</TT> directive.
+See also section <A HREF="manual.html#sec:encodings"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> for information about character encodings.
+
+<P>
+</LI>
+<LI><B><TT>%full</TT></B>
+<BR><B><TT>%8bit</TT></B>
+
+<P>
+Both options cause the generated scanner to use an 8 bit input character
+set (character codes 0-255). If an input character with a code greater
+than 255 is encountered in an input at runtime, the scanner will throw
+an <TT>ArrayIndexOutofBoundsException</TT>. Note that even if your platform
+uses only one byte per character, the Unicode value of a character may
+still be greater than 255. If you are scanning text files, you should
+consider using the <TT>%unicode</TT> directive. See also section <A HREF="manual.html#sec:encodings"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>
+for more information about character encodings.
+
+<P>
+</LI>
+<LI><B><TT>%unicode</TT></B>
+<BR><B><TT>%16bit</TT></B>
+
+<P>
+Both options cause the generated scanner to use the full 16 bit Unicode input
+character set (character codes 0-65535). There will be no runtime overflow when
+using this set of input characters. <TT>%unicode</TT> does not mean that the
+scanner will read two bytes at a time. What is read and what constitutes a
+character depends on the runtime platform. See also section <A HREF="manual.html#sec:encodings"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>
+for more information about character encodings.
+
+<P>
+<A NAME="caseless"></A></LI>
+<LI><B><TT>%caseless</TT></B>
+<BR><B><TT>%ignorecase</TT></B>
+
+<P>
+This option causes JFlex to handle all characters and strings in the
+specification as if they were specified in both uppercase and lowercase form.
+This enables an easy way to specify a scanner for a language with
+case insensitive keywords. The string "<TT>break</TT>" in a specification is for instance
+handled like the expression <TT>([bB][rR][eE][aA][kK])</TT>. The <TT>%caseless</TT>
+option does not change the matched text and does not effect character classes. So
+<TT>[a]</TT> still only matches the character <TT>a</TT> and not <TT>A</TT>, too.
+Which letters are uppercase and which lowercase letters, is defined by the Unicode standard
+and determined by JFlex with the Java methods <TT>Character.toUpperCase</TT> and
+<TT>Character.toLowerCase</TT>. In JLex compatibility
+mode (<TT>-jlex</TT> switch on the command line), <TT>%caseless</TT>
+and <TT>%ignorecase</TT> also affect character classes.
+
+<P>
+</LI>
+</UL>
+<H3><A NAME="SECTION00052900000000000000"></A><A NAME="Counting"></A><BR>
+Line, character and column counting
+</H3>
+
+<UL>
+<LI><B><TT>%char</TT></B>
+
+<P>
+Turns character counting on. The <TT>int</TT> member variable <TT>yychar</TT>
+contains the number of characters (starting with 0) from the beginning
+of input to the beginning of the current token.
+
+<P>
+</LI>
+<LI><B><TT>%line</TT></B>
+
+<P>
+Turns line counting on. The <TT>int</TT> member variable <TT>yyline</TT>
+contains the number of lines (starting with 0) from the beginning of input
+to the beginning of the current token.
+
+<P>
+</LI>
+<LI><B><TT>%column</TT></B>
+
+<P>
+Turns column counting on. The <TT>int</TT> member variable <TT>yycolumn</TT>
+contains the number of characters (starting with 0) from the beginning
+of the current line to the beginning of the current token.
+
+<P>
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION000521000000000000000"></A><A NAME="Obsolete"></A><BR>
+Obsolete JLex options
+</H3>
+
+<UL>
+<LI><B><TT>%notunix</TT></B>
+
+<P>
+This JLex option is obsolete in JFlex but still recognized as valid directive.
+It used to switch between Windows and Unix kind of line terminators (<code>\r\n</code>
+and <code>\n</code>) for the <TT>$</TT> operator in regular expressions. JFlex
+always recognizes both styles of platform dependent line terminators.
+
+<P>
+</LI>
+<LI><B><TT>%yyeof</TT></B>
+
+<P>
+This JLex option is obsolete in JFlex but still recognized as valid directive.
+In JLex it declares a public member constant <TT>YYEOF</TT>. JFlex declares it in any case.
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION000521100000000000000"></A><A NAME="StateDecl"></A><BR>
+State declarations
+</H3>
+State declarations have the following from:
+
+<P>
+<TT>%s[tate] "state identifier" [, "state identifier", ... ]</TT> for inclusive or
+<BR><TT>%x[state] "state identifier" [, "state identifier", ... ]</TT> for exlusive states
+
+<P>
+There may be more than one line of state declarations, each starting with
+<TT>%state</TT> or <TT>%xstate</TT> (the first character is sufficient,
+<TT>%s</TT> and <TT>%x</TT> works, too). State identifiers are letters followed
+by a sequence of letters, digits or underscores. State identifiers can be separated
+by whitespace or comma.
+
+<P>
+The sequence
+
+<P>
+<TT>%state STATE1</TT>
+<BR><TT>%xstate STATE3, XYZ, STATE_10</TT>
+<BR><TT>%state ABC STATE5</TT>
+
+<P>
+declares the set of identifiers <TT>STATE1, STATE3, XYZ,
+ STATE_10, ABC, STATE5</TT> as lexical states, <TT>STATE1</TT>, <TT>ABC</TT>, <TT>STATE5</TT>
+as inclusive, and <TT>STATE3</TT>, <TT>XYZ</TT>, <TT>STATE_10</TT> as exclusive.
+See also section
+<A HREF="manual.html#HowMatched"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> on the way lexical states influence how the input is
+matched.
+
+<P>
+
+<H3><A NAME="SECTION000521200000000000000"></A><A NAME="MacroDefs"></A><BR>
+Macro definitions
+</H3>
+A macro definition has the form
+
+<P>
+<TT>macroidentifier = regular expression</TT>
+
+<P>
+That means, a macro definition is a macro identifier (letter followed
+by a sequence of letters, digits or underscores), that can later be
+used to reference the macro, followed by optional whitespace, followed
+by an "<TT>=</TT>", followed by optional whitespace, followed by a
+regular expression (see section <A HREF="manual.html#LexRules"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> <A HREF="manual.html#LexRules"><I>lexical
+ rules</I></A> for more information about regular expressions).
+
+<P>
+The regular expression on the right hand side must be well formed and
+must not contain the <code>^</code>, <TT>/</TT> or <TT>$</TT> operators. <B>Differently
+to JLex, macros are not just pieces of text that are expanded by copying</B>
+- they are parsed and must be well formed.
+
+<P>
+<B>This is a feature.</B> It eliminates some very hard to find bugs in
+lexical specifications (such like not having parentheses around more
+complicated macros - which is not necessary with JFlex). See section
+<A HREF="manual.html#Porting"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> <A HREF="manual.html#Porting"><I>Porting from JLex</I></A> for more
+details on the problems of JLex style macros.
+
+<P>
+Since it is allowed to have macro usages in macro definitions, it is
+possible to use a grammar like notation to specify the desired lexical
+structure. Macros however remain just abbreviations of the regular expressions
+they represent. They are not non terminals of a grammar and cannot be used
+recursively in any way. JFlex detects cycles in macro definitions and reports
+them at generation time. JFlex also warns you about macros that have been
+defined but never used in the ``lexical rules'' section of the specification.
+
+<P>
+
+<H2><A NAME="SECTION00053000000000000000"></A><A NAME="LexRules"></A><BR>
+Lexical rules
+</H2>
+The ``lexical rules'' section of an JFlex specification contains a set of
+regular expressions and actions (Java code) that are executed when the
+scanner matches the associated regular expression.
+
+<P>
+
+<H3><A NAME="SECTION00053100000000000000"></A><A NAME="Grammar"></A><BR>
+Syntax
+</H3>
+The syntax of the "lexical rules" section is described by the following
+BNF grammar (terminal symbols are enclosed in 'quotes'):
+
+<P>
+<PRE>
+LexicalRules ::= Rule+
+Rule ::= [StateList] ['^'] RegExp [LookAhead] Action
+ | [StateList] '<<EOF>>' Action
+ | StateGroup
+StateGroup ::= StateList '{' Rule+ '}'
+StateList ::= '<' Identifier (',' Identifier)* '>'
+LookAhead ::= '$' | '/' RegExp
+Action ::= '{' JavaCode '}' | '|'
+
+RegExp ::= RegExp '|' RegExp
+ | RegExp RegExp
+ | '(' RegExp ')'
+ | ('!'|'~') RegExp
+ | RegExp ('*'|'+'|'?')
+ | RegExp "{" Number ["," Number] "}"
+ | '[' ['^'] (Character|Character'-'Character)* ']'
+ | PredefinedClass
+ | '{' Identifier '}'
+ | '"' StringCharacter+ '"'
+ | Character
+
+PredefinedClass ::= '[:jletter:]'
+ | '[:jletterdigit:]'
+ | '[:letter:]'
+ | '[:digit:]'
+ | '[:uppercase:]'
+ | '[:lowercase:]'
+ | '.'
+</PRE>
+
+<P>
+<A NAME="Terminals"></A>The grammar uses the following terminal symbols:
+
+<UL>
+<LI><TT>JavaCode</TT>
+<BR> a sequence of <EM><TT>BlockStatements</TT></EM> as described in the Java
+ Language Specification [<A
+ HREF="manual.html#LangSpec">7</A>], section 14.2.
+
+<P>
+</LI>
+<LI><TT>Number</TT>
+<BR> a non negative decimal integer.
+
+<P>
+</LI>
+<LI><TT>Identifier</TT>
+<BR> a letter <code>[a-zA-Z]</code> followed by a sequence of zero or more
+ letters, digits or underscores <code>[a-zA-Z0-9_]</code>
+
+<P>
+</LI>
+<LI><TT>Character</TT>
+<BR> an escape sequence or any unicode character that is not one of these
+ meta characters:
+ <code> | ( ) { } [ ] < > \ . * + ? ^ $ / . " ~ !</code>
+
+<P>
+</LI>
+<LI><TT>StringCharacter</TT>
+<BR> an escape sequence or any unicode character that is not one of these
+ meta characters:
+ <code> \ "</code>
+
+<P>
+</LI>
+<LI>An escape sequence
+
+<P>
+
+<UL>
+<LI><code>\n</code> <code>\r</code> <code>\t</code> <code>\f</code> <code>\b</code>
+</LI>
+<LI>a <code>\x</code> followed by two hexadecimal digits <TT>[a-fA-F0-9]</TT> (denoting
+ a standard ASCII escape sequence),
+
+<P>
+</LI>
+<LI>a <code>\u</code> followed by four hexadecimal digits <TT>[a-fA-F0-9]</TT>
+ (denoting an unicode escape sequence),
+
+<P>
+</LI>
+<LI>a backslash followed by a three digit octal number from 000 to 377 (denoting
+ a standard ASCII escape sequence), or
+
+<P>
+</LI>
+<LI>a backslash followed by any other unicode character that stands for this
+ character.
+
+<P>
+</LI>
+</UL>
+
+<P>
+</LI>
+</UL>
+
+<P>
+Please note that the <code>\n</code> escape sequence stands for the ASCII
+LF character - not for the end of line. If you would like to match the
+line terminator, you should use the expression <code>\r|\n|\r\n</code> if you want
+the Java conventions, or <code>\r|\n|\r\n|\u2028|\u2029|\u000B|\u000C|\u0085</code>
+if you want to be fully Unicode compliant (see also [<A
+ HREF="manual.html#unicode_rep">5</A>]).
+
+<P>
+As of version 1.1 of JFlex the whitespace characters <TT>" "</TT>
+(space) and <code>"\t"</code> (tab) can be used to improve the readability of
+regular expressions. They will be ignored by JFlex. In character
+classes and strings however, whitespace characters keep standing for
+themselves (so the string <TT>" "</TT> still matches exactly one space
+character and <code>[ \n]</code> still matches an ASCII LF or a space
+character).
+
+<P>
+JFlex applies the following standard operator precedences in regular
+expression (from highest to lowest):
+
+<P>
+
+<UL>
+<LI>unary postfix operators (<code>'*', '+', '?', {n}, {n,m}</code>)
+
+<P>
+</LI>
+<LI>unary prefix operators (<code>'!', '~'</code>)
+
+<P>
+</LI>
+<LI>concatenation (<TT>RegExp::= RegExp Regexp</TT>)
+
+<P>
+</LI>
+<LI>union (<code>RegExp::= RegExp '|' RegExp</code>)
+</LI>
+</UL>
+
+<P>
+So the expression <code>a | abc | !cd*</code> for instance is parsed as
+<code>(a|(abc)) | ((!c)(d*))</code>.
+
+<P>
+
+<H3><A NAME="SECTION00053200000000000000"></A><A NAME="Semantics"></A><BR>
+Semantics
+</H3>
+This section gives an informal description of which text is matched by
+a regular expression (i.e. an expression described by the <TT>RegExp</TT>
+production of the grammar presented <A HREF="manual.html#Grammar">above</A>).
+
+<P>
+A regular expression that consists solely of
+
+<UL>
+<LI>a <TT>Character</TT> matches this character.
+
+<P>
+</LI>
+<LI>a character class <code>'[' (Character|Character'-'Character)* ']'</code> matches
+ any character in that class. A <TT>Character</TT> is to be considered an
+ element of a class, if it is listed in the class or if its code lies within
+ a listed character range <TT>Character'-'Character</TT>. So <code>[a0-3\n]</code>
+ for instance matches the characters
+
+<P>
+<code>a 0 1 2 3 \n</code>
+
+<P>
+If the list of characters is empty (i.e. just <code>[]</code>), the expression
+ matches nothing at all (the empty set), not even the empty string. This
+ may be useful in combination with the negation operator <code>'!'</code>.
+
+<P>
+</LI>
+<LI>a negated character class <code>'[^' (Character|Character'-'Character)* ']'</code>
+ matches all characters not listed in the class. If the list of characters
+ is empty (i.e. <code>[^]</code>), the expression matches any character of the
+ input character set.
+
+<P>
+</LI>
+<LI>a string <TT>'"' StringCharacter+ '"</TT> <TT>'</TT> matches the exact
+ text enclosed in double quotes. All meta characters but <code>\</code> and
+ <TT>"</TT> loose their special meaning inside a string. See also the
+ <A HREF="manual.html#caseless"><TT>%ignorecase</TT></A> switch.
+
+<P>
+</LI>
+<LI>a macro usage <code>'{' Identifier '}'</code> matches the input that is matched
+ by the right hand side of the macro with name "<TT>Identifier</TT>".
+
+<P>
+<A NAME="predefCharCl"></A></LI>
+<LI>a predefined character class matches any of
+ the characters in that class. There are the following predefined character
+ classes:
+
+<P>
+<TT>.</TT> contains all characters but <code>\n</code>.
+
+<P>
+All other predefined character classes are defined in the Unicode
+ specification or the Java Language Specification and determined by
+ Java functions of class
+ <TT>java</TT>.<TT>lang</TT>.<TT>Character</TT>.
+
+<P>
+<PRE>
+[:jletter:] isJavaIdentifierStart()
+[:jletterdigit:] isJavaIdentifierPart()
+[:letter:] isLetter()
+[:digit:] isDigit()
+[:uppercase:] isUpperCase()
+[:lowercase:] isLowerCase()
+</PRE>
+
+<P>
+They are especially useful when working with the unicode character set.
+
+<P>
+</LI>
+</UL>
+
+<P>
+If <TT>a</TT> and <TT>b</TT> are regular expressions, then
+
+<P>
+<DL COMPACT>
+<DT><TT>a | b</TT></DT>
+<DD>(union)
+
+<P>
+is the regular expression, that matches
+ all input that is matched by <TT>a</TT> or by <TT>b</TT>.
+
+<P>
+</DD>
+<DT><TT>a b</TT></DT>
+<DD>(concatenation)
+
+<P>
+is the regular expression,
+ that matches the input matched by <TT>a</TT> followed by the
+ input matched by <TT>b</TT>.
+
+<P>
+</DD>
+<DT><TT>a*</TT></DT>
+<DD>(kleene closure)
+
+<P>
+matches zero or more repetitions
+ of the input matched by <TT>a</TT>
+
+<P>
+</DD>
+<DT><TT>a+</TT></DT>
+<DD>(iteration)
+
+<P>
+is equivalent to <TT>aa*</TT>
+
+<P>
+</DD>
+<DT><TT>a?</TT></DT>
+<DD>(option)
+
+<P>
+matches the empty input or the input matched
+ by <TT>a</TT>
+
+<P>
+</DD>
+<DT><TT>!a</TT></DT>
+<DD>(negation)
+
+<P>
+matches everything but the strings matched by <TT>a</TT>.
+ Use with care: the construction of <code>!a</code> involves
+ an additional, possibly exponential NFA to DFA transformation
+ on the NFA for <TT>a</TT>. Note that
+ with negation and union you also have (by applying DeMorgan)
+ intersection and set difference: the intersection of
+ <TT>a</TT> and <TT>b</TT> is <code>!(!a|!b)</code>, the expression
+ that matches everything of <TT>a</TT> not matched by <TT>b</TT> is
+ <code>!(!a|b)</code>
+
+<P>
+</DD>
+<DT><TT>~a</TT></DT>
+<DD>(upto)
+
+<P>
+matches everything up to (and including) the first occurrence of a text
+ matched by <TT>a</TT>. The expression <code>~a</code> is equivalent
+ to <code>!([^]* a [^]* | "") a</code>. A traditional C-style comment
+ is matched by <code>"/*" ~"*/"</code>
+
+<P>
+</DD>
+<DT><TT>a{n}</TT></DT>
+<DD>(repeat)
+
+<P>
+is equivalent to <TT>n</TT> times the concatenation of <TT>a</TT>.
+ So <code>a{4}</code> for instance is equivalent to the expression <TT>a a a a</TT>.
+ The decimal integer <TT>n</TT> must be positive.
+
+<P>
+</DD>
+<DT><TT>a{n,m}</TT></DT>
+<DD>is equivalent to at least <TT>n</TT> times and at most <TT>m</TT> times the
+ concatenation of <TT>a</TT>. So <code>a{2,4}</code> for instance is equivalent
+ to the expression <code>a a a? a?</code>. Both <TT>n</TT> and <TT>m</TT> are non
+ negative decimal integers and <TT>m</TT> must not be smaller than <TT>n</TT>.
+
+<P>
+</DD>
+<DT><TT>( a )</TT></DT>
+<DD>matches the same input as <TT>a</TT>.
+
+<P>
+</DD>
+</DL>
+
+<P>
+In a lexical rule, a regular expression <TT>r</TT> may be preceded by a
+'<code>^</code>' (the beginning of line operator). <TT>r</TT> is then
+only matched at the beginning of a line in the input. A line begins
+after each occurrence of <code>\r|\n|\r\n|\u2028|\u2029|\u000B|\u000C|\u0085</code>
+(see also [<A
+ HREF="manual.html#unicode_rep">5</A>]) and at the beginning of input.
+The preceding line terminator in the input is not consumed and can
+be matched by another rule.
+
+<P>
+In a lexical rule, a regular expression <TT>r</TT> may be followed by a
+lookahead expression. A lookahead expression is either a '<TT>$</TT>'
+(the end of line operator) or a <code>'/'</code> followed by an arbitrary
+regular expression. In both cases the lookahead is not consumed and
+not included in the matched text region, but it <EM>is</EM> considered
+while determining which rule has the longest match (see also
+<A HREF="manual.html#HowMatched"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A> <A HREF="manual.html#HowMatched"><I>How the input is matched</I></A>).
+
+<P>
+In the '<TT>$</TT>' case <TT>r</TT> is only matched at the end of a line in
+the input. The end of a line is denoted by the regular expression
+<code>\r|\n|\r\n|\u2028|\u2029|\u000B|\u000C|\u0085</code>.
+So <code>a$</code> is equivalent to <code>a / \r|\n|\r\n|\u2028|\u2029|\u000B|\u000C|\u0085</code>.This is a bit different to the situation described in [<A
+ HREF="manual.html#unicode_rep">5</A>]:
+since in JFlex <code>$</code> is a true trailing context, the end of file
+does <B>not</B> count as end of line.
+
+<P>
+<A NAME="trailingContext"></A>
+<P>
+For arbitrary lookahead (also called <EM>trailing context</EM>) the
+expression is matched only when followed by input that matches the
+trailing context. Unfortunately the lookahead expression is not
+really arbitrary: In a rule <TT>r1 / r2</TT>, either the text matched
+by <TT>r1</TT> must have a fixed length (e.g. if <TT>r1</TT> is a string)
+or the beginning of the trailing context <TT>r2</TT> must not match the
+end of <TT>r1</TT>. So for example <code>"abc" / "a"|"b"</code> is ok because
+<TT>"abc"</TT> has a fixed length, <code>"a"|"ab" / "x"*</code> is ok because
+no prefix of <TT>"x"*</TT> matches a postfix of <code>"a"|"ab"</code>, but
+<code>"x"|"xy" / "yx"</code> is <EM>not</EM> possible, because the postfix <TT>"y"</TT>
+of <TT>"x"|"xy"</TT> is also a prefix of <TT>"yx"</TT>. JFlex will report
+such cases at generation time. The algorithm JFlex currently uses for matching
+trailing context expressions is the one described in [<A
+ HREF="manual.html#Aho">1</A>] (leading
+to the deficiencies mentioned above).
+
+<P>
+<A NAME="EOFRule"></A>As of version 1.2, JFlex allows lex/flex style <TT>«EOF»</TT> rules in
+lexical specifications. A rule
+<PRE>
+[StateList] <<EOF>> { some action code }
+</PRE>
+is very similar to the <A HREF="manual.html#eofval"><TT>%eofval</TT> directive</A> (section <A HREF="manual.html#eofval"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>).
+The difference lies in the optional <TT>StateList</TT> that may precede the <TT>«EOF»</TT> rule. The
+action code will only be executed when the end of file is read and the
+scanner is currently in one of the lexical states listed in <TT>StateList</TT>.
+The same <TT>StateGroup</TT> (see section <A HREF="manual.html#HowMatched"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>
+<A HREF="manual.html#HowMatched"><I>How the input is matched</I></A>) and precedence
+rules as in the ``normal'' rule case apply
+(i.e. if there is more than one <TT>«EOF»</TT>
+rule for a certain lexical state, the action of the one appearing
+earlier in the specification will be executed). <TT>«EOF»</TT> rules
+override settings of the <TT>%cup</TT> and <TT>%byaccj</TT> options and
+should not be mixed with the <TT>%eofval</TT> directive.
+
+<P>
+An <TT>Action</TT> consists either of a piece of Java code enclosed in
+curly braces or is the special <code>|</code> action. The <code>|</code> action is
+an abbreviation for the action of the following expression.
+
+<P>
+Example:
+<PRE>
+expression1 |
+expression2 |
+expression3 { some action }
+</PRE>
+is equivalent to the expanded form
+<PRE>
+expression1 { some action }
+expression2 { some action }
+expression3 { some action }
+</PRE>
+
+<P>
+They are useful when you work with trailing context expressions. The
+expression <TT>a | (c / d) | b</TT> is not syntactically legal, but can
+easily be expressed using the <code>|</code> action:
+<PRE>
+a |
+c / d |
+b { some action }
+</PRE>
+
+<P>
+
+<H3><A NAME="SECTION00053300000000000000"></A><A NAME="HowMatched"></A><BR>
+How the input is matched
+</H3>
+When consuming its input, the scanner determines the regular expression
+that matches the longest portion of the input (longest match rule). If
+there is more than one regular expression that matches the longest portion
+of input (i.e. they all match the same input), the generated scanner chooses
+the expression that appears first in the specification. After determining
+the active regular expression, the associated action is executed. If there
+is no matching regular expression, the scanner terminates the program with
+an error message (if the <TT>%standalone</TT> directive has been used, the
+scanner prints the unmatched input to <TT>java.lang.System.out</TT> instead
+and resumes scanning).
+
+<P>
+Lexical states can be used to further restrict the set of regular expressions
+that match the current input.
+
+<P>
+
+<UL>
+<LI>A regular expression can only be matched when its associated set of lexical
+states includes the currently active lexical state of the scanner or if
+the set of associated lexical states is empty and the currently active lexical
+state is inclusive. Exclusive and inclusive states only differ at this point:
+rules with an empty set of associated states.
+
+<P>
+</LI>
+<LI>The currently active lexical state of the scanner can be changed from within
+an action of a regular expression using the method <TT>yybegin()</TT>.
+
+<P>
+</LI>
+<LI>The scanner starts in the inclusive lexical state
+<TT>YYINITIAL</TT>, which is always declared by default.
+
+<P>
+</LI>
+<LI>The set of lexical states associated with a regular expression is
+the <TT>StateList</TT> that precedes the expression. If a rule is
+contained in one or more <TT>StateGroups</TT>, then the states of
+these are also associated with the rule, i.e. they accumulate over
+<TT>StateGroups</TT>.
+
+<P>
+Example:
+<PRE>
+%states A, B
+%xstates C
+%%
+expr1 { yybegin(A); action }
+<YYINITIAL, A> expr2 { action }
+<A> {
+ expr3 { action }
+ <B,C> expr4 { action }
+}
+</PRE>
+The first line declares two (inclusive) lexical states <TT>A</TT> and <TT>B</TT>,
+the second line an exclusive lexical state <TT>C</TT>.
+The default (inclusive) state <TT>YYINITIAL</TT> is always implicitly there and
+doesn't need to be declared. The rule with <TT>expr1</TT> has no
+states listed, and is thus matched in all states but the exclusive
+ones, i.e. <TT>A</TT>, <TT>B</TT>, and <TT>YYINITIAL</TT>. In its
+action, the scanner is switched to state <TT>A</TT>. The second rule
+<TT>expr2</TT> can only match when the scanner is in state
+<TT>YYINITIAL</TT> or <TT>A</TT>. The rule <TT>expr3</TT> can only be
+matched in state <TT>A</TT> and <TT>expr4</TT> in states <TT>A</TT>, <TT>B</TT>,
+and <TT>C</TT>.
+
+<P>
+</LI>
+<LI>Lexical states are declared and used as Java <TT>int</TT> constants in
+the generated class under the same name as they are used in the specification.
+There is no guarantee that the values of these integer constants are
+distinct. They are pointers into the generated DFA table, and if JFlex
+recognizes two states as lexically equivalent (if they are used with the
+exact same set of regular expressions), then the two constants will get
+the same value.
+
+<P>
+</LI>
+</UL>
+
+<P>
+
+<H3><A NAME="SECTION00053400000000000000">
+The generated class</A>
+</H3>
+JFlex generates exactly one file containing one class from the specification
+(unless you have declared another class in the first specification section).
+
+<P>
+The generated class contains (among other things) the DFA tables, an input buffer,
+the lexical states of the specification, a constructor, and the scanning method
+with the user supplied actions.
+
+<P>
+The name of the class is by default <TT>Yylex</TT>, it is customizable
+with the <TT>%class</TT> directive (see also section
+<A HREF="manual.html#ClassOptions"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>). The input buffer of the lexer is connected with an
+input stream over the <TT>java.io.Reader</TT> object which is passed
+to the lexer in the generated constructor. If you want to provide your
+own constructor for the lexer, you should always call the generated
+one in it to initialize the input buffer. The input buffer should not
+be accessed directly, but only over the advertised API (see also
+section <A HREF="manual.html#ScannerMethods"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>). Its internal implementation may change
+between releases or skeleton files without notice.
+
+<P>
+The main interface to the outside world is the generated scanning
+method (default name <TT>yylex</TT>, default return type
+<TT>Yytoken</TT>). Most of its aspects are customizable (name, return
+type, declared exceptions etc., see also section
+<A HREF="manual.html#ScanningMethod"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>). If it is called, it will consume input until
+one of the expressions in the specification is matched or an error
+occurs. If an expression is matched, the corresponding action is
+executed. It may return a value of the specified return type (in which
+case the scanning method return with this value), or if it doesn't
+return a value, the scanner resumes consuming input until the next
+expression is matched. If the end of file is reached, the scanner
+executes the EOF action, and (also upon each further call to the scanning
+method) returns the specified EOF value (see also section <A HREF="manual.html#EOF"><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="crossref.png"></A>).
+
+<P>
+
+<H3><A NAME="SECTION00053500000000000000"></A><A NAME="ScannerMethods"></A><BR>
+Scanner methods and fields accessible in actions (API)
+</H3>
+Generated methods and member fields in JFlex scanners are prefixed
+with <TT>yy</TT> to indicate that they are generated and to avoid name
+conflicts with user code copied into the class. Since user code is
+part of the same class, JFlex has no language means like the
+<TT>private</TT> modifier to indicate which members and methods are
+internal and which ones belong to the API. Instead, JFlex follows a
+naming convention: everything starting with a <TT>zz</TT> prefix like
+<TT>zzStartRead</TT> is to be considered internal and subject to
+change without notice between JFlex releases. Methods and members of
+the generated class that do not have a <TT>zz</TT> prefix like
+<TT>yycharat</TT> belong to the API that the scanner class provides to
+users in action code of the specification. They will be remain stable
+and supported between JFlex releases as long as possible.
+
+<P>
+Currently, the API consists of the following methods and member fields:
+
+<UL>
+<LI><TT>String yytext()</TT>
+<BR> returns the matched input text region
+
+<P>
+</LI>
+<LI><TT>int yylength()</TT>
+<BR> returns the length of the matched input text region (does not require
+ a <TT>String</TT> object to be created)
+
+<P>
+</LI>
+<LI><TT>char yycharat(int pos)</TT>
+<BR> returns the character at position <TT>pos</TT> from the matched text.
+ It is equivalent to <TT>yytext().charAt(pos)</TT>, but faster. <TT> pos</TT> must be a value from <TT>0</TT> to <TT>yylength()-1</TT>.
+
+<P>
+</LI>
+<LI><TT>void yyclose()</TT>
+<BR> closes the input stream. All subsequent calls to the scanning method will
+ return the end of file value
+
+<P>
+</LI>
+<LI><TT>void yyreset(java.io.Reader reader)</TT>
+<BR> closes the current input stream, and resets the scanner to read from
+ a new input stream. All internal variables are reset, the old input
+ stream <EM>cannot</EM> be reused (content of the internal buffer is
+ discarded and lost). The lexical state is set to <TT>YY_INITIAL</TT>.
+
+<P>
+</LI>
+<LI><TT>void yypushStream(java.io.Reader reader)</TT>
+<BR> Stores the current input stream on a stack, and
+ reads from a new stream. Lexical state, line,
+ char, and column counting remain untouched.
+ The current input stream can be restored with
+ <TT>yypopstream</TT> (usually in an <TT>«EOF»</TT> action).
+
+<P>
+A typical example for this are include files in
+ style of the C preprocessor. The corresponding
+ JFlex specification could look somewhat like this:
+<PRE>
+"#include" {FILE} { yypushStream(new FileReader(getFile(yytext()))); }
+..
+<<EOF>> { if (yymoreStreams()) yypopStream(); else return EOF; }
+</PRE>
+
+<P>
+This method is only available in the skeleton file
+ <TT>skeleton.nested</TT>. You can find it in the
+ <TT>src</TT> directory of the JFlex distribution.
+
+<P>
+</LI>
+<LI><TT>void yypopStream()</TT>
+<BR> Closes the current input stream and continues to
+ read from the one on top of the stream stack.
+
+<P>
+This method is only available in the skeleton file
+ <TT>skeleton.nested</TT>. You can find it in the
+ <TT>src</TT> directory of the JFlex distribution.
+
+<P>
+</LI>
+<LI><TT>boolean yymoreStreams()</TT>
+<BR> Returns true iff there are still streams for <TT>yypopStream</TT>
+ left to read from on the stream stack.
+
+<P>
+This method is only available in the skeleton file
+ <TT>skeleton.nested</TT>. You can find it in the
+ <TT>src</TT> directory of the JFlex distribution.
+
+<P>
+</LI>
+<LI><TT>int yystate()</TT>
+<BR> returns the current lexical state of the scanner.
+
+<P>
+</LI>
+<LI><TT>void yybegin(int lexicalState)</TT>
+<BR> enters the lexical state <TT>lexicalState</TT>
+
+<P>
+</LI>
+<LI><TT>void yypushback(int number)</TT>
+<BR> pushes <TT>number</TT> characters of the matched text back into the inputstream.
+ They will be read again in the next call of the scanning method.
+ The number of characters to be read again must not be greater than the length
+ of the matched text. The pushed back characters will after the call of
+ <TT>yypushback</TT> not be included in <TT>yylength</TT> and <TT>yytext()</TT>.
+ Please note that in Java strings are unchangeable, i.e. an action code like
+ <PRE>
+ String matched = yytext();
+ yypushback(1);
+ return matched;
+</PRE>
+ will return the whole matched text, while
+ <PRE>
+ yypushback(1);
+ return yytext();
+</PRE>
+ will return the matched text minus the last character.
+
+<P>
+</LI>
+<LI><TT>int yyline</TT>
+<BR> contains the current line of input (starting with 0, only active with
+ the <TT><A HREF="manual.html#Counting">%line</A></TT> directive)
+
+<P>
+</LI>
+<LI><TT>int yychar</TT>
+<BR> contains the current character count in the input (starting with 0,
+ only active with the <TT><A HREF="manual.html#Counting">%char</A></TT> directive)
+
+<P>
+</LI>
+<LI><TT>int yycolumn</TT>
+<BR> contains the current column of the current line (starting with 0, only
+ active with the <TT><A HREF="manual.html#Counting">%column</A></TT> directive)
+
+<P>
+</LI>
+</UL>
+
+<P>
+
+<H1><A NAME="SECTION00060000000000000000"></A><A NAME="sec:encodings"></A><BR>
+Encodings, Platforms, and Unicode
+</H1>
+
+<P>
+This section tries to shed some light on the issues of Unicode and
+encodings, cross platform scanning, and how to deal with binary data.
+My thanks go to Stephen Ostermiller for his input on this topic.
+
+<P>
+
+<H2><A NAME="SECTION00061000000000000000"></A><A NAME="sec:howtoencoding"></A><BR>
+The Problem
+</H2>
+
+<P>
+Before we dive straight into details, let's take a look at what the
+problem is. The problem is Java's platform independence when you want
+to use it. For scanners the interesting part about platform
+independence is character encodings and how they are handled.
+
+<P>
+If a program reads a file from disk, it gets a stream of bytes. In
+earlier times, when the grass was green, and the world was much
+simpler, everybody knew that the byte value 65 is, of course, an A.
+It was no problem to see which bytes meant which characters (actually
+these times never existed, but anyway). The normal Latin alphabet
+only has 26 characters, so 7 bits or 128 distinct values should surely
+be enough to map them, even if you allow yourself the luxury of upper
+and lower case. Nowadays, things are different. The world suddenly
+grew much larger, and all kinds of people wanted all kinds of special
+characters, just because they use them in their language and writing.
+This is were the mess starts. Since the 128 distinct values were
+already filled up with other stuff, people began to use all 8 bits of
+the byte, and extended the byte/character mappings to fit their need,
+and of course everybody did it differently. Some people for instance
+may have said ``let's use the value 213 for the German character ä''. Others
+may have found that 213 should much rather mean é, because they didn't need
+German and wrote French instead. As long as you use your program and
+data files only on one platform, this is no problem, as all know what
+means what, and everything gets used consistently.
+
+<P>
+Now Java comes into play, and wants to run everywhere (once written,
+that is) and now there suddenly is a problem: how do I get the same
+program to say ä to a certain byte when it runs in Germany and maybe é
+when it runs in France? And also the other way around: when I want to
+say é on the screen, which byte value should I send to the operating
+system?
+
+<P>
+Java's solution to this is to use Unicode internally. Unicode aims to
+be a superset of all known character sets and is therefore a perfect base
+for encoding things that might get used all over the world. To make
+things work correctly, you still have to know where you are and how to
+map byte values to Unicode characters and vice versa, but the
+important thing is, that this mapping is at least possible (you can
+map Kanji characters to Unicode, but you cannot map them to ASCII or
+iso-latin-1).
+
+<P>
+
+<H2><A NAME="SECTION00062000000000000000"></A><A NAME="sec:howtotext"></A><BR>
+Scanning text files
+</H2>
+
+<P>
+Scanning text files is the standard application for scanners like
+JFlex. Therefore it should also be the most convenient one. Most times
+it is.
+
+<P>
+The following scenario works like a breeze:
+You work on a platform X, write your lexer specification there, can
+use any obscure Unicode character in it as you like, and compile the
+program. Your users work on any platform Y (possibly but not
+necessarily something different from X), they write their input files
+on Y and they run your program on Y. No problems.
+
+<P>
+Java does this as follows:
+If you want to read anything in Java that is supposed to contain text,
+you use a <TT>FileReader</TT> or some <TT>InputStream</TT> together with
+an <TT>InputStreamReader</TT>. <TT>InputStreams</TT> return the raw bytes, the
+<TT>InputStreamReader</TT> converts the bytes into Unicode characters with
+the platform's default encoding. If a text file is produced on the
+same platform, the platform's default encoding should do the mapping
+correctly. Since JFlex also uses readers and Unicode internally, this
+mechanism also works for the scanner specifications. If you write an
+<TT>A</TT> in your text editor and the editor uses the platform's encoding (say <TT>A</TT> is 65),
+then Java translates this into the logical Unicode <TT>A</TT> internally.
+If a user writes an <TT>A</TT> on a completely different platform (say <TT>A</TT> is 237 there),
+then Java also translates this into the logical Unicode <TT>A</TT> internally. Scanning
+is performed after that translation and both match.
+
+<P>
+Note that because of this mapping from bytes to characters, you should always
+use the <TT>%unicode</TT> switch in you lexer specification if you want to scan
+text files. <TT>%8bit</TT> may not be enough, even if
+you know that your platform only uses one byte per character. The encoding
+Cp1252 used on many Windows machines for instance knows 256 characters, but
+the character ´ with Cp1252 code <code>\x92</code> has the Unicode value <code>\u2019</code>, which
+is larger than 255 and which would make your scanner throw an
+<TT>ArrayIndexOutOfBoundsException</TT> if it is encountered.
+
+<P>
+So for the usual case you don't have to do anything but use the
+<TT>%unicode</TT> switch in your lexer specification.
+
+<P>
+Things may break when you produce a text file on platform X and
+consume it on a different platform Y. Let's say you have a file
+written on a Windows PC using the encoding Cp1252. Then you move
+this file to a Linux PC with encoding ISO 8859-1 and there you want
+to run your scanner on it. Java now thinks the file is encoded
+in ISO 8859-1 (the platform's default encoding) while it really is
+encoded in Cp1252. For most characters
+Cp1252 and ISO 8859-1 are the same, but for the byte values <code>\x80</code>
+to <code>\x9f</code> they disagree: ISO 8859-1 is undefined there. You can fix
+the problem by telling Java explicitly which encoding to use. When
+constructing the <TT>InputStreamReader</TT>, you can give the encoding
+as argument. The line
+<DIV ALIGN="CENTER">
+<TT>Reader r = new InputStreamReader(input, "Cp1252"); </TT>
+
+</DIV>
+will do the trick.
+
+<P>
+Of course the encoding to use can also come from the data itself:
+for instance, when you scan a HTML page, it may have embedded
+information about its character encoding in the headers.
+
+<P>
+More information about encodings, which ones are supported, how
+they are called, and how to set them may be found in the
+official Java documentation in the chapter about
+internationalization.
+The link
+<A NAME="tex2html7"
+ HREF="http://java.sun.com/j2se/1.3/docs/guide/intl/"><TT>http://java.sun.com/j2se/1.3/docs/guide/intl/</TT></A>
+leads to an online version of this for Sun's JDK 1.3.
+
+<P>
+
+<H2><A NAME="SECTION00063000000000000000"></A><A NAME="sec:howtobinary"></A><BR>
+Scanning binaries
+</H2>
+
+<P>
+Scanning binaries is both easier and more difficult
+than scanning text files. It's easier because you want
+the raw bytes and not their meaning, i.e. you don't want
+any translation.
+It's more difficult because it's not so easy to get
+``no translation'' when you use Java readers.
+
+<P>
+The problem (for binaries) is that JFlex scanners are
+designed to work on text. Therefore the interface is
+the <TT>Reader</TT> class (there is a constructor
+for <TT>InputStream</TT> instances, but it's just there
+for convenience and wraps an <TT>InputStreamReader</TT>
+around it to get characters, not bytes).
+You can still get a binary scanner when you write
+your own custom <TT>InputStreamReader</TT> class that
+does explicitly no translation, but just copies
+byte values to character codes instead. It sounds
+quite easy, and actually it is no big deal, but there
+are a few little pitfalls on the way. In the scanner
+specification you can only enter positive character
+codes (for bytes that is <code>\x00</code>
+to <code>\xFF</code>). Java's <TT>byte</TT> type on the other hand
+is a signed 8 bit integer (-128 to 127), so you have to convert
+them properly in your custom <TT>Reader</TT>. Also, you should
+take care when you write your lexer spec: if you
+use text in there, it gets interpreted by an encoding
+first, and what scanner you get as result might depend
+on which platform you run JFlex on when you generate
+the scanner (this is what you want for text, but for binaries it
+gets in the way). If you are not sure, or if the development
+platform might change, it's probably best to use character
+code escapes in all places, since they don't change their
+meaning.
+
+<P>
+To illustrate these points, the example in <TT>examples/binary</TT>
+contains a very small binary scanner that tries to
+detect if a file is a Java <TT>class</TT> file. For that
+purpose it looks if the file begins with the magic number <code>\xCAFEBABE</code>.
+
+<P>
+
+<H1><A NAME="SECTION00070000000000000000"></A><A NAME="performance"></A><BR>
+A few words on performance
+</H1>
+This section gives some empirical results about the speed of JFlex generated
+scanners in comparison to those generated by JLex,
+compares a JFlex scanner with a <A HREF="manual.html#PerformanceHandwritten">handwritten</A>
+one, and presents some <A HREF="manual.html#PerformanceTips">tips</A> on how to make
+your specification produce a faster scanner.
+
+<P>
+
+<H2><A NAME="SECTION00071000000000000000"></A><A NAME="PerformanceJLex"></A><BR>
+Comparison of JLex and JFlex
+</H2>
+Scanners generated by the tool JLex are quite fast. It was however
+possible to further improve the performance of generated scanners
+using JFlex. The following table shows the results that were produced
+by the scanner specification of a small toy programming language (in
+fact the example from the JLex website). The scanner was generated
+using JLex and all three different JFlex code generation methods. Then
+it was run on a W98 system using Sun's JDK 1.3 with different sample inputs
+of that toy programming language. All test runs were made under the
+same conditions on an otherwise idle machine.
+
+<P>
+The values presented in the table denote the time from the first call
+to the scanning method to returning the EOF value and the speedup in
+percent. The tests were run both int the mixed (HotSpot) JVM mode and
+the pure interpreted mode. The mixed mode JVM brings
+about a factor of 10 performance improvement, the difference between
+JLex and JFlex only decreases slightly.
+
+<P>
+<TABLE CELLPADDING=3 BORDER="1" WIDTH="100%">
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">KB</TD>
+<TD ALIGN="CENTER">JVM</TD>
+<TD ALIGN="RIGHT">JLex</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%switch</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%table</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%pack</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">325 ms</TD>
+<TD ALIGN="RIGHT">261 ms</TD>
+<TD ALIGN="RIGHT">24.5 %</TD>
+<TD ALIGN="RIGHT">261 ms</TD>
+<TD ALIGN="RIGHT">24.5 %</TD>
+<TD ALIGN="RIGHT">261 ms</TD>
+<TD ALIGN="RIGHT">24.5 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">187</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">127 ms</TD>
+<TD ALIGN="RIGHT">98 ms</TD>
+<TD ALIGN="RIGHT">29.6 %</TD>
+<TD ALIGN="RIGHT">94 ms</TD>
+<TD ALIGN="RIGHT">35.1 %</TD>
+<TD ALIGN="RIGHT">96 ms</TD>
+<TD ALIGN="RIGHT">32.3 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">93</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">66 ms</TD>
+<TD ALIGN="RIGHT">50 ms</TD>
+<TD ALIGN="RIGHT">32.0 %</TD>
+<TD ALIGN="RIGHT">50 ms</TD>
+<TD ALIGN="RIGHT">32.0 %</TD>
+<TD ALIGN="RIGHT">48 ms</TD>
+<TD ALIGN="RIGHT">37.5 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">4009 ms</TD>
+<TD ALIGN="RIGHT">3025 ms</TD>
+<TD ALIGN="RIGHT">32.5 %</TD>
+<TD ALIGN="RIGHT">3258 ms</TD>
+<TD ALIGN="RIGHT">23.1 %</TD>
+<TD ALIGN="RIGHT">3231 ms</TD>
+<TD ALIGN="RIGHT">24.1 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">187</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">1641 ms</TD>
+<TD ALIGN="RIGHT">1155 ms</TD>
+<TD ALIGN="RIGHT">42.1 %</TD>
+<TD ALIGN="RIGHT">1245 ms</TD>
+<TD ALIGN="RIGHT">31.8 %</TD>
+<TD ALIGN="RIGHT">1234 ms</TD>
+<TD ALIGN="RIGHT">33.0 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">93</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">817 ms</TD>
+<TD ALIGN="RIGHT">573 ms</TD>
+<TD ALIGN="RIGHT">42.6 %</TD>
+<TD ALIGN="RIGHT">617 ms</TD>
+<TD ALIGN="RIGHT">32.4 %</TD>
+<TD ALIGN="RIGHT">613 ms</TD>
+<TD ALIGN="RIGHT">33.3 %</TD>
+</TR>
+</TABLE>
+
+<P><BR>
+
+<P>
+Since the scanning time of the lexical analyzer examined in the table
+above includes lexical actions that often need to create new object instances,
+another table shows the execution time for the same specification with empty
+lexical actions to compare the pure scanning engines.
+
+<P>
+<TABLE CELLPADDING=3 BORDER="1" WIDTH="100%">
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">KB</TD>
+<TD ALIGN="CENTER">JVM</TD>
+<TD ALIGN="RIGHT">JLex</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%switch</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%table</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%pack</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">204 ms</TD>
+<TD ALIGN="RIGHT">140 ms</TD>
+<TD ALIGN="RIGHT">45.7 %</TD>
+<TD ALIGN="RIGHT">138 ms</TD>
+<TD ALIGN="RIGHT">47.8 %</TD>
+<TD ALIGN="RIGHT">140 ms</TD>
+<TD ALIGN="RIGHT">45.7 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">187</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">83 ms</TD>
+<TD ALIGN="RIGHT">55 ms</TD>
+<TD ALIGN="RIGHT">50.9 %</TD>
+<TD ALIGN="RIGHT">52 ms</TD>
+<TD ALIGN="RIGHT">59.6 %</TD>
+<TD ALIGN="RIGHT">52 ms</TD>
+<TD ALIGN="RIGHT">59.6 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">93</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">41 ms</TD>
+<TD ALIGN="RIGHT">28 ms</TD>
+<TD ALIGN="RIGHT">46.4 %</TD>
+<TD ALIGN="RIGHT">26 ms</TD>
+<TD ALIGN="RIGHT">57.7 %</TD>
+<TD ALIGN="RIGHT">26 ms</TD>
+<TD ALIGN="RIGHT">57.7 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">2983 ms</TD>
+<TD ALIGN="RIGHT">2036 ms</TD>
+<TD ALIGN="RIGHT">46.5 %</TD>
+<TD ALIGN="RIGHT">2230 ms</TD>
+<TD ALIGN="RIGHT">33.8 %</TD>
+<TD ALIGN="RIGHT">2232 ms</TD>
+<TD ALIGN="RIGHT">33.6 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">187</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">1260 ms</TD>
+<TD ALIGN="RIGHT">793 ms</TD>
+<TD ALIGN="RIGHT">58.9 %</TD>
+<TD ALIGN="RIGHT">865 ms</TD>
+<TD ALIGN="RIGHT">45.7 %</TD>
+<TD ALIGN="RIGHT">867 ms</TD>
+<TD ALIGN="RIGHT">45.3 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">93</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">628 ms</TD>
+<TD ALIGN="RIGHT">395 ms</TD>
+<TD ALIGN="RIGHT">59.0 %</TD>
+<TD ALIGN="RIGHT">432 ms</TD>
+<TD ALIGN="RIGHT">45.4 %</TD>
+<TD ALIGN="RIGHT">432 ms</TD>
+<TD ALIGN="RIGHT">45.4 %</TD>
+</TR>
+</TABLE>
+
+<P><BR>
+
+<P>
+Execution time of single instructions depends on the platform and
+the implementation of the Java Virtual Machine the program is executed
+on. Therefore the tables above cannot be used as a reference to which
+code generation method of JFlex is the right one to choose in general.
+The following table was produced by the same lexical specification and
+the same input on a Linux system also using Sun's JDK 1.3.
+
+<P>
+With actions:
+
+<P>
+<TABLE CELLPADDING=3 BORDER="1" WIDTH="100%">
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">KB</TD>
+<TD ALIGN="CENTER">JVM</TD>
+<TD ALIGN="RIGHT">JLex</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%switch</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%table</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%pack</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">246 ms</TD>
+<TD ALIGN="RIGHT">203 ms</TD>
+<TD ALIGN="RIGHT">21.2 %</TD>
+<TD ALIGN="RIGHT">193 ms</TD>
+<TD ALIGN="RIGHT">27.5 %</TD>
+<TD ALIGN="RIGHT">190 ms</TD>
+<TD ALIGN="RIGHT">29.5 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">187</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">99 ms</TD>
+<TD ALIGN="RIGHT">76 ms</TD>
+<TD ALIGN="RIGHT">30.3 %</TD>
+<TD ALIGN="RIGHT">69 ms</TD>
+<TD ALIGN="RIGHT">43.5 %</TD>
+<TD ALIGN="RIGHT">70 ms</TD>
+<TD ALIGN="RIGHT">41.4 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">93</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">48 ms</TD>
+<TD ALIGN="RIGHT">36 ms</TD>
+<TD ALIGN="RIGHT">33.3 %</TD>
+<TD ALIGN="RIGHT">34 ms</TD>
+<TD ALIGN="RIGHT">41.2 %</TD>
+<TD ALIGN="RIGHT">35 ms</TD>
+<TD ALIGN="RIGHT">37.1 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">3251 ms</TD>
+<TD ALIGN="RIGHT">2247 ms</TD>
+<TD ALIGN="RIGHT">44.7 %</TD>
+<TD ALIGN="RIGHT">2430 ms</TD>
+<TD ALIGN="RIGHT">33.8 %</TD>
+<TD ALIGN="RIGHT">2444 ms</TD>
+<TD ALIGN="RIGHT">33.0 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">187</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">1320 ms</TD>
+<TD ALIGN="RIGHT">848 ms</TD>
+<TD ALIGN="RIGHT">55.7 %</TD>
+<TD ALIGN="RIGHT">958 ms</TD>
+<TD ALIGN="RIGHT">37.8 %</TD>
+<TD ALIGN="RIGHT">920 ms</TD>
+<TD ALIGN="RIGHT">43.5 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">93</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">658 ms</TD>
+<TD ALIGN="RIGHT">423 ms</TD>
+<TD ALIGN="RIGHT">55.6 %</TD>
+<TD ALIGN="RIGHT">456 ms</TD>
+<TD ALIGN="RIGHT">44.3 %</TD>
+<TD ALIGN="RIGHT">452 ms</TD>
+<TD ALIGN="RIGHT">45.6 %</TD>
+</TR>
+</TABLE>
+
+<P><BR>
+
+<P>
+Without actions:
+
+<P>
+<TABLE CELLPADDING=3 BORDER="1" WIDTH="100%">
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">KB</TD>
+<TD ALIGN="CENTER">JVM</TD>
+<TD ALIGN="RIGHT">JLex</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%switch</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%table</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+<TD ALIGN="RIGHT"><FONT SIZE="-1"><TT>%pack</TT></FONT></TD>
+<TD ALIGN="RIGHT">speedup</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">136 ms</TD>
+<TD ALIGN="RIGHT">78 ms</TD>
+<TD ALIGN="RIGHT">74.4 %</TD>
+<TD ALIGN="RIGHT">76 ms</TD>
+<TD ALIGN="RIGHT">78.9 %</TD>
+<TD ALIGN="RIGHT">77 ms</TD>
+<TD ALIGN="RIGHT">76.6 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">187</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">59 ms</TD>
+<TD ALIGN="RIGHT">31 ms</TD>
+<TD ALIGN="RIGHT">90.3 %</TD>
+<TD ALIGN="RIGHT">48 ms</TD>
+<TD ALIGN="RIGHT">22.9 %</TD>
+<TD ALIGN="RIGHT">32 ms</TD>
+<TD ALIGN="RIGHT">84.4 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">93</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">28 ms</TD>
+<TD ALIGN="RIGHT">15 ms</TD>
+<TD ALIGN="RIGHT">86.7 %</TD>
+<TD ALIGN="RIGHT">15 ms</TD>
+<TD ALIGN="RIGHT">86.7 %</TD>
+<TD ALIGN="RIGHT">15 ms</TD>
+<TD ALIGN="RIGHT">86.7 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">1992 ms</TD>
+<TD ALIGN="RIGHT">1047 ms</TD>
+<TD ALIGN="RIGHT">90.3 %</TD>
+<TD ALIGN="RIGHT">1246 ms</TD>
+<TD ALIGN="RIGHT">59.9 %</TD>
+<TD ALIGN="RIGHT">1215 ms</TD>
+<TD ALIGN="RIGHT">64.0 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">187</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">859 ms</TD>
+<TD ALIGN="RIGHT">408 ms</TD>
+<TD ALIGN="RIGHT">110.5 %</TD>
+<TD ALIGN="RIGHT">479 ms</TD>
+<TD ALIGN="RIGHT">79.3 %</TD>
+<TD ALIGN="RIGHT">487 ms</TD>
+<TD ALIGN="RIGHT">76.4 %</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">93</TD>
+<TD ALIGN="CENTER">interpr.</TD>
+<TD ALIGN="RIGHT">435 ms</TD>
+<TD ALIGN="RIGHT">200 ms</TD>
+<TD ALIGN="RIGHT">117.5 %</TD>
+<TD ALIGN="RIGHT">237 ms</TD>
+<TD ALIGN="RIGHT">83.5 %</TD>
+<TD ALIGN="RIGHT">242 ms</TD>
+<TD ALIGN="RIGHT">79.8 %</TD>
+</TR>
+</TABLE>
+
+<P><BR>
+
+<P>
+Although all JFlex scanners were faster than those generated by JLex,
+slight differences between JFlex code generation methods show up when compared
+to the run on the W98 system.
+<A NAME="PerformanceHandwritten"></A>
+<P>
+The following table compares a handwritten scanner for the Java language
+obtained from the website of CUP with the JFlex generated scanner for Java
+that comes with JFlex in the <TT>examples</TT> directory. They were tested
+on different <TT>.java</TT> files on a Linux machine with Sun's JDK 1.3.
+
+<P>
+<TABLE CELLPADDING=3 BORDER="1" WIDTH="100%">
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">lines</TD>
+<TD ALIGN="RIGHT">KB</TD>
+<TD ALIGN="CENTER">JVM</TD>
+<TD ALIGN="RIGHT">handwritten scanner</TD>
+<TD ALIGN="CENTER" COLSPAN=2>JFlex generated scanner</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">19050</TD>
+<TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">824 ms</TD>
+<TD ALIGN="RIGHT">248 ms</TD>
+<TD ALIGN="RIGHT">235 % faster</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">6350</TD>
+<TD ALIGN="RIGHT">165</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">272 ms</TD>
+<TD ALIGN="RIGHT">84 ms</TD>
+<TD ALIGN="RIGHT">232 % faster</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">1270</TD>
+<TD ALIGN="RIGHT">33</TD>
+<TD ALIGN="CENTER">hotspot</TD>
+<TD ALIGN="RIGHT">53 ms</TD>
+<TD ALIGN="RIGHT">18 ms</TD>
+<TD ALIGN="RIGHT">194 % faster</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">19050</TD>
+<TD ALIGN="RIGHT">496</TD>
+<TD ALIGN="CENTER">interpreted</TD>
+<TD ALIGN="RIGHT">5.83 s</TD>
+<TD ALIGN="RIGHT">3.85 s</TD>
+<TD ALIGN="RIGHT">51 % faster</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">6350</TD>
+<TD ALIGN="RIGHT">165</TD>
+<TD ALIGN="CENTER">interpreted</TD>
+<TD ALIGN="RIGHT">1.95 s</TD>
+<TD ALIGN="RIGHT">1.29 s</TD>
+<TD ALIGN="RIGHT">51 % faster</TD>
+</TR>
+<TR><TD ALIGN="LEFT"> </TD><TD ALIGN="RIGHT">1270</TD>
+<TD ALIGN="RIGHT">33</TD>
+<TD ALIGN="CENTER">interpreted</TD>
+<TD ALIGN="RIGHT">0.38 s</TD>
+<TD ALIGN="RIGHT">0.25 s</TD>
+<TD ALIGN="RIGHT">52 % faster</TD>
+</TR>
+</TABLE>
+
+<P><BR>
+
+<P>
+Although JDK 1.3 seems to speed up the handwritten scanner if compared
+to JDK 1.1 or 1.2 more than the generated one, the generated scanner is
+still up to 3.3 times as fast as the handwritten one. One example of
+a handwritten scanner that is
+considerably slower than the equivalent generated one is surely no
+proof for all generated scanners being faster than handwritten. It is
+clearly impossible to prove something like that, since you could
+always write the generated scanner by hand. From a software
+engineering point of view however, there is no excuse for writing a
+scanner by hand since this task takes more time, is more difficult and
+therefore more error prone than writing a compact, readable and easy
+to change lexical specification. (I'd like to add, that I do <EM>not</EM>
+think, that the handwritten scanner from the CUP website used here in
+the test is stupid or badly written or anything like that. I actually
+think, Scott did a great job with it, and that for learning about
+lexers it is quite valuable to study it or even to write a similar one
+for oneself.)
+
+<P>
+
+<H2><A NAME="SECTION00072000000000000000"></A><A NAME="PerformanceTips"></A><BR>
+How to write a faster specification
+</H2>
+Although JFlex generated scanners show good performance without
+special optimizations, there are some heuristics that can make a
+lexical specification produce an even faster scanner. Those are
+(roughly in order of performance gain):
+
+<P>
+
+<UL>
+<LI>Avoid rules that require backtracking
+
+<P>
+From the C/C++ flex [<A
+ HREF="manual.html#flex">11</A>] manpage: <EM>``Getting rid
+of backtracking is messy and often may be an enormous amount of work for
+a complicated scanner.''</EM> Backtracking is introduced by the longest match
+rule and occurs for instance on this set of expressions:
+
+<P>
+<TT> "averylongkeyword"</TT>
+<BR><TT> .</TT>
+
+<P>
+With input <TT>"averylongjoke"</TT> the scanner has to read all charcters
+up to <TT>'j' </TT>to decide that rule <TT>.</TT> should be matched. All
+characters of <TT>"verylong"</TT> have to be read again for the next
+matching process. Backtracking can be avoided in general by adding
+error rules that match those error conditions
+
+<P>
+<code> "av"|"ave"|"avery"|"averyl"|..</code>
+
+<P>
+While this is impractical in most scanners, there is still the
+possibility to add a ``catch all'' rule for a lengthy list of keywords
+<PRE>
+"keyword1" { return symbol(KEYWORD1); }
+..
+"keywordn" { return symbol(KEYWORDn); }
+[a-z]+ { error("not a keyword"); }
+</PRE>
+Most programming language scanners already have a rule like this for
+some kind of variable length identifiers.
+
+<P>
+</LI>
+<LI>Avoid line and column counting
+
+<P>
+It costs multiple additional comparisons per input character and the
+ matched text has to be rescanned for counting. In most scanners it
+ is possible to do the line counting in the specification by
+ incrementing <TT>yyline</TT> each time a line terminator has been
+ matched. Column counting could also be included in actions. This
+ will be faster, but can in some cases become quite messy.
+
+<P>
+</LI>
+<LI>Avoid lookahead expressions and the end of line operator '$'
+
+<P>
+The trailing context will first have to be read and then (because
+ it is not to be consumed) read again.
+
+<P>
+</LI>
+<LI>Avoid the beginning of line operator '<code>^</code>'
+
+<P>
+It costs multiple additional comparisons per match. In some
+ cases one extra lookahead character is needed (when the last character read is
+ <code>\r</code> the scanner has to read one character ahead to check if
+ the next one is an <code>\n</code> or not).
+
+<P>
+</LI>
+<LI>Match as much text as possible in a rule.
+
+<P>
+One rule is matched in the innermost loop of the scanner. After
+ each action some overhead for setting up the internal state of the
+ scanner is necessary.
+</LI>
+</UL>
+
+<P>
+Note that writing more rules in a specification does not make the generated
+scanner slower (except when you have to switch to another code generation
+method because of the larger size).
+
+<P>
+The two main rules of optimization apply also for lexical specifications:
+
+<OL>
+<LI><B>don't do it</B>
+</LI>
+<LI><B>(for experts only) don't do it yet</B>
+</LI>
+</OL>
+
+<P>
+Some of the performance tips above contradict a readable and compact
+specification style. When in doubt or when requirements are not or not
+yet fixed: don't use them - the specification can always be optimized
+in a later state of the development process.
+
+<P>
+
+<H1><A NAME="SECTION00080000000000000000">
+Porting Issues</A>
+</H1>
+
+<P>
+
+<H2><A NAME="SECTION00081000000000000000"></A><A NAME="Porting"></A><BR>
+Porting from JLex
+</H2>
+JFlex was designed to read old JLex specifications unchanged and to
+generate a scanner which behaves exactly the same as the one generated
+by JLex with the only difference of being faster.
+
+<P>
+This works as expected on all well formed JLex specifications.
+
+<P>
+Since the statement above is somewhat absolute, let's take a look at
+what ``well formed'' means here. A JLex specification is well formed, when
+it
+
+<UL>
+<LI>generates a working scanner with JLex
+
+<P>
+</LI>
+<LI>doesn't contain the unescaped characters <TT>!</TT> and <TT>~</TT>
+
+<P>
+They are operators in JFlex while JLex treats them as normal
+ input characters. You can easily port such a JLex specification
+ to JFlex by replacing every <TT>!</TT> with <code>\!</code> and every
+ <code>~</code> with <code>\~</code> in all regular expressions.
+
+<P>
+</LI>
+<LI>has only complete regular expressions surrounded by parentheses in
+ macro definitions
+
+<P>
+This may sound a bit harsh, but could otherwise be a major problem
+ - it can also help you find some disgusting bugs in your
+ specification that didn't show up in the first place. In JLex, a
+ right hand side of a macro is just a piece of text, that is copied
+ to the point where the macro is used. With this, some weird kind of
+ stuff like
+ <PRE>
+ macro1 = ("hello"
+ macro2 = {macro1})*
+</PRE>
+ was possible (with <TT>macro2</TT> expanding to <code>("hello")*</code>). This
+ is not allowed in JFlex and you will have to transform such
+ definitions. There are however some more subtle kinds of errors that
+ can be introduced by JLex macros. Let's consider a definition like
+ <code>macro = a|b</code> and a usage like <code>{macro}*</code>.
+ This expands in JLex to <code>a|b*</code> and not to the probably intended
+ <code>(a|b)*</code>.
+
+<P>
+JFlex uses always the second form of expansion, since this is the natural
+ form of thinking about abbreviations for regular expressions.
+
+<P>
+Most specifications shouldn't suffer from this problem, because
+ macros often only contain (harmless) character classes like
+ <TT>alpha = [a-zA-Z]</TT> and more dangerous definitions like
+
+<P>
+<code> ident = {alpha}({alpha}|{digit})*</code>
+
+<P>
+are only used to write rules like
+
+<P>
+<code> {ident} { .. action .. }</code>
+
+<P>
+and not more complex expressions like
+
+<P>
+<code> {ident}* { .. action .. }</code>
+
+<P>
+where the kind of error presented above would show up.
+</LI>
+</UL>
+
+<P>
+
+<H2><A NAME="SECTION00082000000000000000"></A><A NAME="lexport"></A><BR>
+Porting from lex/flex
+</H2>
+This section tries to give an overview of activities and possible
+problems when porting a lexical specification from the C/C++ tools lex
+and flex [<A
+ HREF="manual.html#flex">11</A>] available on most Unix systems to JFlex.
+
+<P>
+Most of the C/C++ specific features are naturally not present in JFlex,
+but most ``clean'' lex/flex lexical specifications can be ported to
+JFlex without very much work.
+
+<P>
+This section is by far not complete and is based mainly on a survey of
+the flex man page and very little personal experience. If you do
+engage in any porting activity from lex/flex to JFlex and encounter
+problems, have better solutions for points presented here or have just
+some tips you would like to share, please do <A NAME="tex2html8"
+ HREF="mailto:lsf@jflex.de">contact me</A>. I will
+incorporate your experiences in this manual (with all due credit to you,
+of course).
+
+<P>
+
+<H3><A NAME="SECTION00082100000000000000">
+Basic structure</A>
+</H3>
+A lexical specification for flex has the following basic structure:
+<PRE>
+definitions
+%%
+rules
+%%
+user code
+</PRE>
+
+<P>
+The <TT>user code</TT> section usually contains some C code that is used
+in actions of the <TT>rules</TT> part of the specification. For JFlex most
+of this code will have to be included in the class code <code>%{..%}</code>
+directive in the <TT>options</TT> <TT>and declarations</TT> section (after
+translating the C code to Java, of course).
+
+<P>
+
+<H3><A NAME="SECTION00082200000000000000">
+Macros and Regular Expression Syntax</A>
+</H3>
+The <TT>definitions</TT> section of a flex specification is quite similar
+to the <TT>options and declarations</TT> part of JFlex specs.
+
+<P>
+Macro definitions in flex have the form:
+<PRE>
+<identifier> <expression>
+</PRE>
+To port them to JFlex macros, just insert a <TT>=</TT> between <TT><identifier></TT>
+and <TT><expression></TT>.
+
+<P>
+The syntax and semantics of regular expressions in flex are pretty much the
+same as in JFlex. A little attention is needed for some escape sequences
+present in flex (such as <code>\a</code>) that are not supported in JFlex. These
+escape sequences should be transformed into their octal or hexadecimal
+equivalent.
+
+<P>
+Another point are predefined character classes. Flex offers the ones directly
+supported by C, JFlex offers the ones supported by Java. These classes will
+sometimes have to be listed manually (if there is need for this feature, it
+may be implemented in a future JFlex version).
+
+<P>
+
+<H3><A NAME="SECTION00082300000000000000">
+Lexical Rules</A>
+</H3>
+Since flex is mostly Unix based, the '<code>^</code>' (beginning of line) and
+'<code>$</code>' (end of line) operators, consider the <code>\n</code> character as only line terminator. This should usually cause not much problems, but you
+should be prepared for occurrences of <code>\r</code> or <code>\r\n</code> or one of
+the characters <code>\u2028</code>, <code>\u2029</code>, <code>\u000B</code>, <code>\u000C</code>,
+or <code>\u0085</code>. They are considered to be line terminators in Unicode and
+therefore may not be consumed when
+<code>^</code> or <code>$</code> is present in a rule.
+<P>
+The trailing context algorithm of flex is better than the one used in
+JFlex. Therefore lookahead expressions could cause major headaches. JFlex
+will issue an error message at generation time, if it cannot generate
+a scanner for a certain lookahead expression. (sorry, I have no more tips here
+on that yet. If anyone knows how the flex lookahead algorithm works (or any better one)
+and can be efficiently implemented, again: please <A NAME="tex2html9"
+ HREF="mailto:lsf@jflex.de">contact me</A>).
+
+<P>
+
+<H1><A NAME="SECTION00090000000000000000"></A><A NAME="WorkingTog"></A><BR>
+Working together
+</H1>
+
+<P>
+
+<H2><A NAME="SECTION00091000000000000000"></A><A NAME="CUPWork"></A><BR>
+JFlex and CUP
+</H2>
+One of the main design goals of JFlex was to make interfacing with the free
+Java parser generator CUP [<A
+ HREF="manual.html#CUP">8</A>] as easy as possibly.
+This has been done by giving
+the <TT><A HREF="manual.html#CupMode">%cup</A></TT> directive a special meaning. An
+interface however always has two sides. This section concentrates on the
+CUP side of the story.
+
+<P>
+
+<H3><A NAME="SECTION00091100000000000000">
+CUP version 0.10j</A>
+</H3>
+Since CUP version 0.10j, this has been simplified greatly by the new
+CUP scanner interface <TT>java_cup.runtime.Scanner</TT>. JFlex lexers now implement
+this interface automatically when then <TT><A HREF="manual.html#CupMode">%cup</A></TT>
+switch is used. There are no special <TT>parser code</TT>, <TT>init
+ code</TT> or <TT>scan with</TT> options any more that you have to provide
+in your CUP parser specification. You can just concentrate on your grammar.
+
+<P>
+If your generated Lexer has the class name <TT>Scanner</TT>, the parser
+is started from the a main program like this:
+
+<P>
+<PRE>
+...
+ try {
+ parser p = new parser(new Scanner(new FileReader(fileName)));
+ Object result = p.parse().value;
+ }
+ catch (Exception e) {
+...
+</PRE>
+
+<P>
+
+<H3><A NAME="SECTION00091200000000000000">
+Using existing JFlex/CUP specifications with CUP 0.10j</A>
+</H3>
+If you already have an existing specification and you would like to upgrade
+both JFlex and CUP to their newest version, you will probably have to adjust
+your specification.
+
+<P>
+The main difference between the <TT><A HREF="manual.html#CupMode">%cup</A></TT> switch in
+JFlex 1.2.1 and lower, and the current JFlex version is, that JFlex scanners
+now automatically implement the <TT>java_cup.runtime.Scanner</TT> interface.
+This means, that the scanning function now changes its name from <TT>yylex()</TT>
+to <TT>next_token()</TT>.
+
+<P>
+The main difference from older CUP versions to 0.10j is, that CUP now
+has a default constructor that accepts a <TT>java_cup.runtime.Scanner</TT>
+as argument and that uses this scanner as
+default (so no <TT>scan with</TT> code is necessary any more).
+
+<P>
+If you have an existing CUP specification, it will probably look somewhat like this:
+<PRE>
+parser code {:
+ Lexer lexer;
+
+ public parser (java.io.Reader input) {
+ lexer = new Lexer(input);
+ }
+:};
+
+scan with {: return lexer.yylex(); :};
+</PRE>
+
+<P>
+To upgrade to CUP 0.10j, you could change it to look like this:
+<PRE>
+parser code {:
+ public parser (java.io.Reader input) {
+ super(new Lexer(input));
+ }
+:};
+</PRE>
+
+<P>
+If you do not mind to change the method that is calling the parser,
+you could remove the constructor entirely (and if there is nothing else
+in it, the whole <TT>parser code</TT> section as well, of course). The calling
+main procedure would then construct the parser as shown in the section above.
+
+<P>
+The JFlex specification does not need to be changed.
+
+<P>
+
+<H3><A NAME="SECTION00091300000000000000">
+Using older versions of CUP</A>
+</H3>
+For people, who like or have to use older versions of CUP, the following section
+explains ``the old way''. Please note, that the standard name of the scanning
+function with the <TT><A HREF="manual.html#CupMode">%cup</A></TT> switch is not
+<TT>yylex()</TT>, but <TT>next_token()</TT>.
+
+<P>
+If you have a scanner specification that begins like this:
+
+<P>
+<PRE>
+package PACKAGE;
+import java_cup.runtime.*; /* this is convenience, but not necessary */
+
+%%
+
+%class Lexer
+%cup
+..
+</PRE>
+
+<P>
+then it matches a CUP specification starting like
+
+<P>
+<PRE>
+package PACKAGE;
+
+parser code {:
+ Lexer lexer;
+
+ public parser (java.io.Reader input) {
+ lexer = new Lexer(input);
+ }
+:};
+
+scan with {: return lexer.next_token(); :};
+
+..
+</PRE>
+
+<P>
+This assumes that the generated parser will get the name <TT>parser</TT>.
+If it doesn't, you have to adjust the constructor name.
+
+<P>
+The parser can then be started in a main routine like this:
+
+<P>
+<PRE>
+..
+ try {
+ parser p = new parser(new FileReader(fileName));
+ Object result = p.parse().value;
+ }
+ catch (Exception e) {
+..
+</PRE>
+
+<P>
+If you want the parser specification to be independent of the name of the generated
+scanner, you can instead write an interface Lexer
+
+<P>
+<PRE>
+public interface Lexer {
+ public java_cup.runtime.Symbol next_token() throws java.io.IOException;
+}
+</PRE>
+
+<P>
+change the parser code to:
+
+<P>
+<PRE>
+package PACKAGE;
+
+parser code {:
+ Lexer lexer;
+
+ public parser (Lexer lexer) {
+ this.lexer = lexer;
+ }
+:};
+
+scan with {: return lexer.next_token(); :};
+
+..
+</PRE>
+
+<P>
+tell JFlex about the Lexer
+interface using the <TT>%implements</TT>
+directive:
+
+<P>
+<PRE>
+..
+%class Scanner /* not Lexer now since that is our interface! */
+%implements Lexer
+%cup
+..
+</PRE>
+
+<P>
+and finally change the main routine to look like
+
+<P>
+<PRE>
+...
+ try {
+ parser p = new parser(new Scanner(new FileReader(fileName)));
+ Object result = p.parse().value;
+ }
+ catch (Exception e) {
+...
+</PRE>
+
+<P>
+If you want to improve the error messages that CUP generated parsers
+produce, you can also override the methods <TT>report_error</TT> and <TT>report_fatal_error</TT>
+in the ``parser code'' section of the CUP specification. The new methods
+could for instance use <TT>yyline</TT> and <TT>yycolumn</TT> (stored in
+the <TT>left</TT> and <TT>right</TT> members of class <TT>java_cup.runtime.Symbol</TT>)
+to report error positions more conveniently for the user. The lexer and
+parser for the Java language in the <TT>examples/java</TT> directory of the
+JFlex distribution use this style of error reporting. These specifications
+also demonstrate the techniques above in action.
+
+<P>
+
+<H2><A NAME="SECTION00092000000000000000"></A><A NAME="YaccWork"></A><BR>
+JFlex and BYacc/J
+</H2>
+
+<P>
+JFlex has builtin support for the Java extension
+<A NAME="tex2html10"
+ HREF="http://troi.lincom-asg.com/~rjamison/byacc/">BYacc/J</A>
+[<A
+ HREF="manual.html#BYaccJ">9</A>] by Bob Jamison
+to the classical Berkeley Yacc parser generator.
+This section describes how to interface BYacc/J with JFlex. It
+builds on many helpful suggestions and comments from Larry Bell.
+
+<P>
+Since Yacc's architecture is a bit different from CUP's, the
+interface setup also works in a slightly different manner.
+BYacc/J expects a function <TT>int yylex()</TT> in the parser
+class that returns each next token. Semantic values are expected
+in a field <TT>yylval</TT> of type <TT>parserval</TT> where ``<TT>parser</TT>''
+is the name of the generated parser class.
+
+<P>
+For a small calculator example, one could use a setup like the
+following on the JFlex side:
+
+<P>
+<PRE>
+%%
+
+%byaccj
+
+%{
+ /* store a reference to the parser object */
+ private parser yyparser;
+
+ /* constructor taking an additional parser object */
+ public Yylex(java.io.Reader r, parser yyparser) {
+ this(r);
+ this.yyparser = yyparser;
+ }
+%}
+
+NUM = [0-9]+ ("." [0-9]+)?
+NL = \n | \r | \r\n
+
+%%
+
+/* operators */
+"+" |
+..
+"(" |
+")" { return (int) yycharat(0); }
+
+/* newline */
+{NL} { return parser.NL; }
+
+/* float */
+{NUM} { yyparser.yylval = new parserval(Double.parseDouble(yytext()));
+ return parser.NUM; }
+</PRE>
+
+<P>
+The lexer expects a reference to the parser in its constructor.
+Since Yacc allows direct use of terminal characters like <TT>'+'</TT>
+in its specifications, we just return the character code for
+single char matches (e.g. the operators in the example). Symbolic
+token names are stored as <TT>public static int</TT> constants in
+the generated parser class. They are used as in the <TT>NL</TT> token
+above. Finally, for some tokens, a semantic value may have to be
+communicated to the parser. The <TT>NUM</TT> rule demonstrates that
+bit.
+
+<P>
+A matching BYacc/J parser specification could look like this:
+<PRE>
+%{
+ import java.io.*;
+%}
+
+%token NL /* newline */
+%token <dval> NUM /* a number */
+
+%type <dval> exp
+
+%left '-' '+'
+..
+%right '^' /* exponentiation */
+
+%%
+
+..
+
+exp: NUM { $$ = $1; }
+ | exp '+' exp { $$ = $1 + $3; }
+ ..
+ | exp '^' exp { $$ = Math.pow($1, $3); }
+ | '(' exp ')' { $$ = $2; }
+ ;
+
+%%
+ /* a reference to the lexer object */
+ private Yylex lexer;
+
+ /* interface to the lexer */
+ private int yylex () {
+ int yyl_return = -1;
+ try {
+ yyl_return = lexer.yylex();
+ }
+ catch (IOException e) {
+ System.err.println("IO error :"+e);
+ }
+ return yyl_return;
+ }
+
+ /* error reporting */
+ public void yyerror (String error) {
+ System.err.println ("Error: " + error);
+ }
+
+ /* lexer is created in the constructor */
+ public parser(Reader r) {
+ lexer = new Yylex(r, this);
+ }
+
+ /* that's how you use the parser */
+ public static void main(String args[]) throws IOException {
+ parser yyparser = new parser(new FileReader(args[0]));
+ yyparser.yyparse();
+ }
+</PRE>
+
+<P>
+Here, the customized part is mostly in the user code section:
+We create the lexer in the constructor of the parser and store
+a reference to it for later use in the parser's <TT>int yylex()</TT>
+method. This <TT>yylex</TT> in the parser only calls <TT>int yylex()</TT>
+of the generated lexer and passes the result on. If something goes
+wrong, it returns -1 to indicate an error.
+
+<P>
+Runnable versions of the specifications above
+are located in the <TT>examples/byaccj</TT> directory of the JFlex
+distribution.
+
+<P>
+
+<H1><A NAME="SECTION000100000000000000000"></A><A NAME="Bugs"></A><BR>
+Bugs and Deficiencies
+</H1>
+
+<P>
+
+<H2><A NAME="SECTION000101000000000000000">
+Deficiencies</A>
+</H2>
+The trailing context algorithm described in [<A
+ HREF="manual.html#Aho">1</A>] and used in
+JFlex is incorrect. It does not work, when a postfix of the regular
+expression matches a prefix of the trailing context and the length
+of the text matched by the expression does not have a fixed size.
+JFlex will report these cases as errors at generation time.
+
+<P>
+
+<H2><A NAME="SECTION000102000000000000000">
+Bugs</A>
+</H2>
+
+<P>
+As of April 12, 2004 the following bugs are known in JFlex:
+
+<UL>
+<LI>The check if a lookahead expression is legal fails on some expressions.
+ The lookahead algorithm itself works as advertised, but JFlex will not
+ report all lookahead expressions that the algorithm can't handle at generation
+ time. Some cases are caught by the check, but not all.
+
+<P>
+<B>Workaround:</B> Check lookahead expressions manually. A lookahead expression
+ <TT>r1/r2</TT> is ok, if no postfix of <TT>r1</TT> can match a prefix of <TT>r2</TT>.
+</LI>
+</UL>
+
+<P>
+If you find new ones, please use the bugs section of the
+<A NAME="tex2html11"
+ HREF="http://www.jflex.de/">JFlex website</A>
+to report them.
+
+<P>
+
+<H1><A NAME="SECTION000110000000000000000"></A><A NAME="Copyright"></A><BR>
+Copying and License
+</H1>
+JFlex is free software, published under the terms of the
+<A NAME="tex2html12"
+ HREF="http://www.fsf.org/copyleft/gpl.html">GNU General Public License</A>.
+
+<P>
+There is absolutely NO WARRANTY for JFlex, its code and its documentation.
+
+<P>
+The code generated by JFlex inherits the copyright of the specification it
+was produced from. If it was your specification, you may use the generated
+code without restriction.
+
+<P>
+See the file <A NAME="tex2html13"
+ HREF="COPYRIGHT"><TT>COPYRIGHT</TT></A>
+for more information.
+
+<P>
+
+<H2><A NAME="SECTION000120000000000000000"></A><A NAME="References"></A><BR>
+Bibliography
+</H2><DL COMPACT><DD>
+
+<P>
+<P></P><DT><A NAME="Aho">1</A>
+<DD>
+ A. Aho, R. Sethi, J. Ullman, <EM>Compilers: Principles, Techniques, and Tools</EM>, 1986
+
+<P>
+<P></P><DT><A NAME="Appel">2</A>
+<DD>
+ A. W. Appel, <EM>Modern Compiler Implementation in Java: basic techniques</EM>, 1997
+
+<P>
+<P></P><DT><A NAME="JLex">3</A>
+<DD>
+ E. Berk, <EM>JLex: A lexical analyser generator for Java</EM>,
+<BR> <A NAME="tex2html14"
+ HREF="http://www.cs.princeton.edu/~appel/modern/java/JLex/"><TT>http://www.cs.princeton.edu/~appel/modern/java/JLex/</TT></A>
+<P>
+<P></P><DT><A NAME="fast">4</A>
+<DD>
+ K. Brouwer, W. Gellerich,E. Ploedereder,
+ <EM>Myths and Facts about the Efficient Implementation of Finite Automata and Lexical Analysis</EM>,
+ in: Proceedings of the 7th International Conference on Compiler Construction (CC '98), 1998
+
+<P>
+<P></P><DT><A NAME="unicode_rep">5</A>
+<DD>
+ M. Davis, <EM>Unicode Regular Expression Guidelines</EM>, Unicode Technical Report #18, 2000
+<BR> <A NAME="tex2html15"
+ HREF="http://www.unicode.org/unicode/reports/tr18/tr18-5.1.html"><TT>http://www.unicode.org/unicode/reports/tr18/tr18-5.1.html</TT></A>
+<P>
+<P></P><DT><A NAME="ParseTable">6</A>
+<DD>
+ P. Dencker, K. Dürre, J. Henft, <EM>Optimization of Parser Tables for portable Compilers</EM>,
+ in: ACM Transactions on Programming Languages and Systems 6(4), 1984
+
+<P>
+<P></P><DT><A NAME="LangSpec">7</A>
+<DD>
+ J. Gosling, B. Joy, G. Steele, <EM>The Java Language Specifcation</EM>, 1996,
+<BR> <A NAME="tex2html16"
+ HREF="http://www.javasoft.com/docs/books/jls/"><TT>http://www.javasoft.com/docs/books/jls/</TT></A>
+<P>
+<P></P><DT><A NAME="CUP">8</A>
+<DD>
+ S. E. Hudson, <EM>CUP LALR Parser Generator for Java</EM>,
+<BR> <A NAME="tex2html17"
+ HREF="http://www.cs.princeton.edu/~appel/modern/java/CUP/"><TT>http://www.cs.princeton.edu/~appel/modern/java/CUP/</TT></A>
+<P>
+<P></P><DT><A NAME="BYaccJ">9</A>
+<DD>
+ B. Jamison, <EM>BYacc/J</EM>,
+<BR> <A NAME="tex2html18"
+ HREF="http://troi.lincom-asg.com/~rjamison/byacc/"><TT>http://troi.lincom-asg.com/~rjamison/byacc/</TT></A>
+<P>
+<P></P><DT><A NAME="MachineSpec">10</A>
+<DD>
+ T. Lindholm, F. Yellin, <EM>The Java Virtual Machine Specification</EM>, 1996,
+<BR> <A NAME="tex2html19"
+ HREF="http://www.javasoft.com/docs/books/vmspec/"><TT>http://www.javasoft.com/docs/books/vmspec/</TT></A>
+<P>
+<P></P><DT><A NAME="flex">11</A>
+<DD>
+ V. Paxon, <EM>flex - The fast lexical analyzer generator</EM>, 1995
+
+<P>
+<P></P><DT><A NAME="SparseTable">12</A>
+<DD>
+ R. E. Tarjan, A. Yao, <EM>Storing a Sparse Table</EM>, in: Communications of the ACM 22(11), 1979
+
+<P>
+<P></P><DT><A NAME="Maurer">13</A>
+<DD>
+ R. Wilhelm, D. Maurer, <EM>Übersetzerbau</EM>, Berlin 1997<SUP>2</SUP>
+
+<P>
+</DL>
+
+<P>
+<BR><HR><H4>Footnotes</H4>
+<DL>
+<DT><A NAME="foot32">... Java</A><A NAME="foot32"
+ HREF="manual.html#tex2html2"><SUP><IMG ALIGN="BOTTOM" BORDER="1" ALT="[*]" SRC="footnote.png"></SUP></A>
+<DD>Java is a trademark of
+Sun Microsystems, Inc., and refers to Sun's Java programming language.
+JFlex is not sponsored by or affiliated with Sun Microsystems, Inc.
+
+</DL><BR><HR>
+<ADDRESS>
+Mon Apr 12 20:58:12 EST 2004, <a href="http://www.doclsf.de">Gerwin Klein</a>
+</ADDRESS>
+</BODY>
+</HTML>