17.1 文字处理-编程书籍在线阅读平台-duidaima 堆代码

17.1 文字处理
如果您有C或C++的经验，那么最开始可能会对Java控制文本的能力感到怀疑。事实上，我们最害怕的就是速度特别慢，这可能妨碍我们创造能力的发挥。然而，Java对应的工具（特别是String类）具有很强的功能，就象本节的例子展示的那样（而且性能也有一定程度的提升）。

正如大家即将看到的那样，建立这些例子的目的都是为了解决本书编制过程中遇到的一些问题。但是，它们的能力并非仅止于此。通过简单的改造，即可让它们在其他场合大显身手。除此以外，它们还揭示出了本书以前没有强调过的一项Java特性。

17.1.1 提取代码列表
对于本书每一个完整的代码列表（不是代码段），大家无疑会注意到它们都用特殊的注释记号起始与结束（'//:'和'///:~'）。之所以要包括这种标志信息，是为了能将代码从本书自动提取到兼容的源码文件中。在我的前一本书里，我设计了一个系统，可将测试过的代码文件自动合并到书中。但对于这本书，我发现一种更简便的做法是一旦通过了最初的测试，就把代码粘贴到书中。而且由于很难第一次就编译通过，所以我在书的内部编辑代码。但如何提取并测试代码呢？这个程序就是关键。如果你打算解决一个文字处理的问题，那么它也很有利用价值。该例也演示了String类的许多特性。

我首先将整本书都以ASCII文本格式保存成一个独立的文件。CodePackager程序有两种运行模式（在usageString有相应的描述）：如果使用-p标志，程序就会检查一个包含了ASCII文本（即本书的内容）的一个输入文件。它会遍历这个文件，按照注释记号提取出代码，并用位于第一行的文件名来决定创建文件使用什么名字。除此以外，在需要将文件置入一个特殊目录的时候，它还会检查package语句（根据由package语句指定的路径选择）。

但这样还不够。程序还要对包（package）名进行跟踪，从而监视章内发生的变化。由于每一章使用的所有包都以c02，c03，c04等等起头，用于标记它们所属的是哪一章（除那些以com起头的以外，它们在对不同的章进行跟踪的时候会被忽略）——只要每一章的第一个代码列表包含了一个package，所以CodePackager程序能知道每一章发生的变化，并将后续的文件放到新的子目录里。每个文件提取出来时，都会置入一个SourceCodeFile对象，随后再将那个对象置入一个集合（后面还会详尽讲述这个过程）。这些SourceCodeFile对象可以简单地保存在文件中，那正是本项目的第二个用途。如果直接调用CodePackager，不添加-p标志，它就会将一个“打包”文件作为输入。那个文件随后会被提取（释放）进入单独的文件。所以-p标志的意思就是提取出来的文件已被“打包”（packed）进入这个单一的文件。

但为什么还要如此麻烦地使用打包文件呢？这是由于不同的计算机平台用不同的方式在文件里保存文本信息。其中最大的问题是换行字符的表示方法；当然，还有可能存在另一些问题。然而，Java有一种特殊类型的IO数据流——DataOutputStream——它可以保证“无论数据来自何种机器，只要使用一个DataInputStream收取这些数据，就可用本机正确的格式保存它们”。也就是说，Java负责控制与不同平台有关的所有细节，而这正是Java最具魅力的一点。所以-p标志能将所有东西都保存到单一的文件里，并采用通用的格式。用户可从Web下载这个文件以及Java程序，然后对这个文件运行CodePackager，同时不指定-p标志，文件便会释放到系统中正确的场所（亦可指定另一个子目录；否则就在当前目录创建子目录）。为确保不会留下与特定平台有关的格式，凡是需要描述一个文件或路径的时候，我们就使用File对象。除此以外，还有一项特别的安全措施：在每个子目录里都放入一个空文件；那个文件的名字指出在那个子目录里应找到多少个文件。

下面是完整的代码，后面会对它进行详细的说明：
```
//: CodePackager.java
// "Packs" and "unpacks" the code in "Thinking 
// in Java" for cross-platform distribution.
/* Commented so CodePackager sees it and starts
   a new chapter directory, but so you don't 
   have to worry about the directory where this
   program lives:
package c17;
*/
import java.util.*;
import java.io.*;

class Pr {
  static void error(String e) {
    System.err.println("ERROR: " + e);
    System.exit(1);
  }
}

class IO {
  static BufferedReader disOpen(File f) {
    BufferedReader in = null;
    try {
      in = new BufferedReader(
        new FileReader(f));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static BufferedReader disOpen(String fname) {
    return disOpen(new File(fname));
  }
  static DataOutputStream dosOpen(File f) {
    DataOutputStream in = null;
    try {
      in = new DataOutputStream(
        new BufferedOutputStream(
          new FileOutputStream(f)));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static DataOutputStream dosOpen(String fname) {
    return dosOpen(new File(fname));
  }
  static PrintWriter psOpen(File f) {
    PrintWriter in = null;
    try {
      in = new PrintWriter(
        new BufferedWriter(
          new FileWriter(f)));
    } catch(IOException e) {
      Pr.error("could not open " + f);
    }
    return in;
  }
  static PrintWriter psOpen(String fname) {
    return psOpen(new File(fname));
  }
  static void close(Writer os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
  static void close(DataOutputStream os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
  static void close(Reader os) {
    try {
      os.close();
    } catch(IOException e) {
      Pr.error("closing " + os);
    }
  }
}

class SourceCodeFile {
  public static final String 
    startMarker = "//:", // Start of source file
    endMarker = "} ///:~", // End of source
    endMarker2 = "}; ///:~", // C++ file end
    beginContinue = "} ///:Continued",
    endContinue = "///:Continuing",
    packMarker = "###", // Packed file header tag
    eol = // Line separator on current system
      System.getProperty("line.separator"),
    filesep = // System's file path separator
      System.getProperty("file.separator");
  public static String copyright = "";
  static {
    try {
      BufferedReader cr =
        new BufferedReader(
          new FileReader("Copyright.txt"));
      String crin;
      while((crin = cr.readLine()) != null)
        copyright += crin + "\n";
      cr.close();
    } catch(Exception e) {
      copyright = "";
    }
  }
  private String filename, dirname,
    contents = new String();
  private static String chapter = "c02";
  // The file name separator from the old system:
  public static String oldsep;
  public String toString() {
    return dirname + filesep + filename;
  }
  // Constructor for parsing from document file:
  public SourceCodeFile(String firstLine, 
      BufferedReader in) {
    dirname = chapter;
    // Skip past marker:
    filename = firstLine.substring(
        startMarker.length()).trim();
    // Find space that terminates file name:
    if(filename.indexOf(' ') != -1)
      filename = filename.substring(
          0, filename.indexOf(' '));
    System.out.println("found: " + filename);
    contents = firstLine + eol;
    if(copyright.length() != 0)
      contents += copyright + eol;
    String s;
    boolean foundEndMarker = false;
    try {
      while((s = in.readLine()) != null) {
        if(s.startsWith(startMarker))
          Pr.error("No end of file marker for " +
            filename);
        // For this program, no spaces before 
        // the "package" keyword are allowed
        // in the input source code:
        else if(s.startsWith("package")) {
          // Extract package name:
          String pdir = s.substring(
            s.indexOf(' ')).trim();
          pdir = pdir.substring(
            0, pdir.indexOf(';')).trim();
          // Capture the chapter from the package
          // ignoring the 'com' subdirectories:
          if(!pdir.startsWith("com")) {
            int firstDot = pdir.indexOf('.');
            if(firstDot != -1)
              chapter = 
                pdir.substring(0,firstDot);
            else
              chapter = pdir;
          }
          // Convert package name to path name:
          pdir = pdir.replace(
            '.', filesep.charAt(0));
          System.out.println("package " + pdir);
          dirname = pdir;
        }
        contents += s + eol;
        // Move past continuations:
        if(s.startsWith(beginContinue))
          while((s = in.readLine()) != null)
            if(s.startsWith(endContinue)) {
              contents += s + eol;
              break;
            }
        // Watch for end of code listing:
        if(s.startsWith(endMarker) ||
           s.startsWith(endMarker2)) {
          foundEndMarker = true;
          break;
        }
      }
      if(!foundEndMarker)
        Pr.error(
          "End marker not found before EOF");
      System.out.println("Chapter: " + chapter);
    } catch(IOException e) {
      Pr.error("Error reading line");
    }
  }
  // For recovering from a packed file:
  public SourceCodeFile(BufferedReader pFile) {
    try {
      String s = pFile.readLine();
      if(s == null) return;
      if(!s.startsWith(packMarker))
        Pr.error("Can't find " + packMarker
          + " in " + s);
      s = s.substring(
        packMarker.length()).trim();
      dirname = s.substring(0, s.indexOf('#'));
      filename = s.substring(s.indexOf('#') + 1);
      dirname = dirname.replace(
        oldsep.charAt(0), filesep.charAt(0));
      filename = filename.replace(
        oldsep.charAt(0), filesep.charAt(0));
      System.out.println("listing: " + dirname 
        + filesep + filename);
      while((s = pFile.readLine()) != null) {
        // Watch for end of code listing:
        if(s.startsWith(endMarker) ||
           s.startsWith(endMarker2)) {
          contents += s;
          break;
        }
        contents += s + eol;
      }
    } catch(IOException e) {
      System.err.println("Error reading line");
    }
  }
  public boolean hasFile() { 
    return filename != null; 
  }
  public String directory() { return dirname; }
  public String filename() { return filename; }
  public String contents() { return contents; }
  // To write to a packed file:
  public void writePacked(DataOutputStream out) {
    try {
      out.writeBytes(
        packMarker + dirname + "#" 
        + filename + eol);
      out.writeBytes(contents);
    } catch(IOException e) {
      Pr.error("writing " + dirname + 
        filesep + filename);
    }
  }
  // To generate the actual file:
  public void writeFile(String rootpath) {
    File path = new File(rootpath, dirname);
    path.mkdirs();
    PrintWriter p =
      IO.psOpen(new File(path, filename));
    p.print(contents);
    IO.close(p);
  }
}

class DirMap {
  private Hashtable t = new Hashtable();
  private String rootpath;
  DirMap() {
    rootpath = System.getProperty("user.dir");
  }
  DirMap(String alternateDir) {
    rootpath = alternateDir;
  }
  public void add(SourceCodeFile f){
    String path = f.directory();
    if(!t.containsKey(path))
      t.put(path, new Vector());
    ((Vector)t.get(path)).addElement(f);
  }
  public void writePackedFile(String fname) {
    DataOutputStream packed = IO.dosOpen(fname);
    try {
      packed.writeBytes("###Old Separator:" +
        SourceCodeFile.filesep + "###\n");
    } catch(IOException e) {
      Pr.error("Writing separator to " + fname);
    }
    Enumeration e = t.keys();
    while(e.hasMoreElements()) {
      String dir = (String)e.nextElement();
      System.out.println(
        "Writing directory " + dir);
      Vector v = (Vector)t.get(dir);
      for(int i = 0; i < v.size(); i++) {
        SourceCodeFile f = 
          (SourceCodeFile)v.elementAt(i);
        f.writePacked(packed);
      }
    }
    IO.close(packed);
  }
  // Write all the files in their directories:
  public void write() {
    Enumeration e = t.keys();
    while(e.hasMoreElements()) {
      String dir = (String)e.nextElement();
      Vector v = (Vector)t.get(dir);
      for(int i = 0; i < v.size(); i++) {
        SourceCodeFile f = 
          (SourceCodeFile)v.elementAt(i);
        f.writeFile(rootpath);
      }
      // Add file indicating file quantity
      // written to this directory as a check:
      IO.close(IO.dosOpen(
        new File(new File(rootpath, dir),
          Integer.toString(v.size())+".files")));
    }
  }
}

public class CodePackager {
  private static final String usageString =
  "usage: java CodePackager packedFileName" +
  "\nExtracts source code files from packed \n" +
  "version of Tjava.doc sources into " +
  "directories off current directory\n" +
  "java CodePackager packedFileName newDir\n" +
  "Extracts into directories off newDir\n" +
  "java CodePackager -p source.txt packedFile" +
  "\nCreates packed version of source files" +
  "\nfrom text version of Tjava.doc";
  private static void usage() {
    System.err.println(usageString);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length == 0) usage();
    if(args[0].equals("-p")) {
      if(args.length != 3)
        usage();
      createPackedFile(args);
    }
    else {
      if(args.length > 2)
        usage();
      extractPackedFile(args);
    }
  }
  private static String currentLine; 
  private static BufferedReader in;
  private static DirMap dm;
  private static void 
  createPackedFile(String[] args) {
    dm = new DirMap();
    in = IO.disOpen(args[1]);
    try {
      while((currentLine = in.readLine()) 
          != null) {
        if(currentLine.startsWith(
            SourceCodeFile.startMarker)) {
          dm.add(new SourceCodeFile(
                   currentLine, in));
        }
        else if(currentLine.startsWith(
            SourceCodeFile.endMarker))
          Pr.error("file has no start marker");
        // Else ignore the input line
      }
    } catch(IOException e) {
      Pr.error("Error reading " + args[1]);
    }
    IO.close(in);
    dm.writePackedFile(args[2]);
  }
  private static void 
  extractPackedFile(String[] args) {
    if(args.length == 2) // Alternate directory
      dm = new DirMap(args[1]);
    else // Current directory
      dm = new DirMap();
    in = IO.disOpen(args[0]);
    String s = null;
    try {
       s = in.readLine();
    } catch(IOException e) {
      Pr.error("Cannot read from " + in);
    }
    // Capture the separator used in the system
    // that packed the file:
    if(s.indexOf("###Old Separator:") != -1 ) {
      String oldsep = s.substring(
        "###Old Separator:".length());
      oldsep = oldsep.substring(
        0, oldsep. indexOf('#'));
      SourceCodeFile.oldsep = oldsep;
    }
    SourceCodeFile sf = new SourceCodeFile(in);
    while(sf.hasFile()) {
      dm.add(sf);
      sf = new SourceCodeFile(in);
    }
    dm.write();
  }
} ///:~
```
我们注意到package语句已经作为注释标志出来了。由于这是本章的第一个程序，所以package语句是必需的，用它告诉CodePackager已改换到另一章。但是把它放入包里却会成为一个问题。当我们创建一个包的时候，需要将结果程序同一个特定的目录结构联系在一起，这一做法对本书的大多数例子都是适用的。但在这里，CodePackager程序必须在一个专用的目录里编译和运行，所以package语句作为注释标记出去。但对CodePackager来说，它“看起来”依然象一个普通的package语句，因为程序还不是特别复杂，不能侦查到多行注释（没有必要做得这么复杂，这里只要求方便就行）。

头两个类是“支持／工具”类，作用是使程序剩余的部分在编写时更加连贯，也更便于阅读。第一个是Pr，它类似ANSI C的perror库，两者都能打印出一条错误提示消息（但同时也会退出程序）。第二个类将文件的创建过程封装在内，这个过程已在第10章介绍过了；大家已经知道，这样做很快就会变得非常累赘和麻烦。为解决这个问题，第10章提供的方案致力于新类的创建，但这儿的“静态”方法已经使用过了。在那些方法中，正常的违例会被捕获，并相应地进行处理。这些方法使剩余的代码显得更加清爽，更易阅读。

帮助解决问题的第一个类是SourceCodeFile（源码文件），它代表本书一个源码文件包含的所有信息（内容、文件名以及目录）。它同时还包含了一系列String常数，分别代表一个文件的开始与结束；在打包文件内使用的一个标记；当前系统的换行符；文件路径分隔符（注意要用System.getProperty()侦查本地版本是什么）；以及一大段版权声明，它是从下面这个Copyright.txt文件里提取出来的：

//////////////////////////////////////////////////
// Copyright (c) Bruce Eckel, 1998
// Source code file from the book "Thinking in Java"
// All rights reserved EXCEPT as allowed by the
// following statements: You may freely use this file
// for your own work (personal or commercial),
// including modifications and distribution in
// executable form only. Permission is granted to use
// this file in classroom situations, including its
// use in presentation materials, as long as the book
// "Thinking in Java" is cited as the source.
// Except in classroom situations, you may not copy
// and distribute this code; instead, the sole
// distribution point is http://www.BruceEckel.com
// (and official mirror sites) where it is
// freely available. You may not remove this
// copyright and notice. You may not distribute
// modified versions of the source code in this
// package. You may not use this file in printed
// media without the express permission of the
// author. Bruce Eckel makes no representation about
// the suitability of this software for any purpose.
// It is provided "as is" without express or implied
// warranty of any kind, including any implied
// warranty of merchantability, fitness for a
// particular purpose or non-infringement. The entire
// risk as to the quality and performance of the
// software is with you. Bruce Eckel and the
// publisher shall not be liable for any damages
// suffered by you or any third party as a result of
// using or distributing software. In no event will
// Bruce Eckel or the publisher be liable for any
// lost revenue, profit, or data, or for direct,
// indirect, special, consequential, incidental, or
// punitive damages, however caused and regardless of
// the theory of liability, arising out of the use of
// or inability to use software, even if Bruce Eckel
// and the publisher have been advised of the
// possibility of such damages. Should the software
// prove defective, you assume the cost of all
// necessary servicing, repair, or correction. If you
// think you've found an error, please email all
// modified files with clearly commented changes to:
// Bruce@EckelObjects.com. (please use the same
// address for non-code errors found in the book).
//////////////////////////////////////////////////
从一个打包文件中提取文件时，当初所用系统的文件分隔符也会标注出来，以便用本地系统适用的符号替换它。

当前章的子目录保存在chapter字段中，它初始化成c02（大家可注意一下第2章的列表正好没有包含一个打包语句）。只有在当前文件里发现一个package（打包）语句时，chapter字段才会发生改变。

构建一个打包文件
第一个构建器用于从本书的ASCII文本版里提取出一个文件。发出调用的代码（在列表里较深的地方）会读入并检查每一行，直到找到与一个列表的开头相符的为止。在这个时候，它就会新建一个SourceCodeFile对象，将第一行的内容（已经由调用代码读入了）传递给它，同时还要传递BufferedReader对象，以便在这个缓冲区中提取源码列表剩余的内容。

从这时起，大家会发现String方法被频繁运用。为提取出文件名，需调用substring()的过载版本，令其从一个起始偏移开始，一直读到字串的末尾，从而形成一个“子串”。为算出这个起始索引，先要用length()得出startMarker的总长，再用trim()删除字串头尾多余的空格。第一行在文件名后也可能有一些字符；它们是用indexOf()侦测出来的。若没有发现找到我们想寻找的字符，就返回-1；若找到那些字符，就返回它们第一次出现的位置。注意这也是indexOf()的一个过载版本，采用一个字串作为参数，而非一个字符。

解析出并保存好文件名后，第一行会被置入字串contents中（该字串用于保存源码清单的完整正文）。随后，将剩余的代码行读入，并合并进入contents字串。当然事情并没有想象的那么简单，因为特定的情况需加以特别的控制。一种情况是错误检查：若直接遇到一个startMarker（起始标记），表明当前操作的这个代码列表没有设置一个结束标记。这属于一个出错条件，需要退出程序。

另一种特殊情况与package关键字有关。尽管Java是一种自由形式的语言，但这个程序要求package关键字必须位于行首。若发现package关键字，就通过检查位于开头的空格以及位于末尾的分号，从而提取出包名（注意亦可一次单独的操作实现，方法是使用过载的substring()，令其同时检查起始和结束索引位置）。随后，将包名中的点号替换成特定的文件分隔符——当然，这里要假设文件分隔符仅有一个字符的长度。尽管这个假设可能对目前的所有系统都是适用的，但一旦遇到问题，一定不要忘了检查一下这里。默认操作是将每一行都连接到contents里，同时还有换行字符，直到遇到一个endMarker（结束标记）为止。该标记指出构建器应当停止了。若在endMarker之前遇到了文件结尾，就认为存在一个错误。

从打包文件中提取
第二个构建器用于将源码文件从打包文件中恢复（提取）出来。在这儿，作为调用者的方法不必担心会跳过一些中间文本。打包文件包含了所有源码文件，它们相互间紧密地靠在一起。需要传递给该构建器的仅仅是一个BufferedReader，它代表着“信息源”。构建器会从中提取出自己需要的信息。但在每个代码列表开始的地方还有一些配置信息，它们的身份是用packMarker（打包标记）指出的。若packMarker不存在，意味着调用者试图用错误的方法来使用这个构建器。

一旦发现packMarker，就会将其剥离出来，并提取出目录名（用一个'#'结尾）以及文件名（直到行末）。不管在哪种情况下，旧分隔符都会被替换成本地适用的一个分隔符，这是用String replace()方法实现的。老的分隔符被置于打包文件的开头，在代码列表稍靠后的一部分即可看到是如何把它提取出来的。

构建器剩下的部分就非常简单了。它读入每一行，把它合并到contents里，直到遇见endMarker为止。

程序列表的存取
接下来的一系列方法是简单的访问器：directory()、filename()（注意方法可能与字段有相同的拼写和大小写形式）和contents()。而hasFile()用于指出这个对象是否包含了一个文件（很快就会知道为什么需要这个）。

最后三个方法致力于将这个代码列表写进一个文件——要么通过writePacked()写入一个打包文件，要么通过writeFile()写入一个Java源码文件。writePacked()需要的唯一东西就是DataOutputStream，它是在别的地方打开的，代表着准备写入的文件。它先把头信息置入第一行，再调用writeBytes()将contents（内容）写成一种“通用”格式。

准备写Java源码文件时，必须先把文件建好。这是用IO.psOpen()实现的。我们需要向它传递一个File对象，其中不仅包含了文件名，也包含了路径信息。但现在的问题是：这个路径实际存在吗？用户可能决定将所有源码目录都置入一个完全不同的子目录，那个目录可能是尚不存在的。所以在正式写每个文件之前，都要调用File.mkdirs()方法，建好我们想向其中写入文件的目录路径。它可一次性建好整个路径。

整套列表的包容
以子目录的形式组织代码列表是非常方便的，尽管这要求先在内存中建好整套列表。之所以要这样做，还有另一个很有说服力的原因：为了构建更“健康”的系统。也就是说，在创建代码列表的每个子目录时，都会加入一个额外的文件，它的名字包含了那个目录内应有的文件数目。

DirMap类可帮助我们实现这一效果，并有效地演示了一个“多重映射”的概述。这是通过一个散列表（Hashtable）实现的，它的“键”是准备创建的子目录，而“值”是包含了那个特定目录中的SourceCodeFile对象的Vector对象。所以，我们在这儿并不是将一个“键”映射（或对应）到一个值，而是通过对应的Vector，将一个键“多重映射”到一系列值。尽管这听起来似乎很复杂，但具体实现时却是非常简单和直接的。大家可以看到，DirMap类的大多数代码都与向文件中的写入有关，而非与“多重映射”有关。与它有关的代码仅极少数而已。

可通过两种方式建立一个DirMap（目录映射或对应）关系：默认构建器假定我们希望目录从当前位置向下展开，而另一个构建器让我们为起始目录指定一个备用的“绝对”路径。

add()方法是一个采取的行动比较密集的场所。首先将directory()从我们想添加的SourceCodeFile里提取出来，然后检查散列表（Hashtable），看看其中是否已经包含了那个键。如果没有，就向散列表加入一个新的Vector，并将它同那个键关联到一起。到这时，不管采取的是什么途径，Vector都已经就位了，可以将它提取出来，以便添加SourceCodeFile。由于Vector可象这样同散列表方便地合并到一起，所以我们从两方面都能感觉得非常方便。

写一个打包文件时，需打开一个准备写入的文件（当作DataOutputStream打开，使数据具有“通用”性），并在第一行写入与老的分隔符有关的头信息。接着产生对Hashtable键的一个Enumeration（枚举），并遍历其中，选择每一个目录，并取得与那个目录有关的Vector，使那个Vector中的每个SourceCodeFile都能写入打包文件中。

用write()将Java源码文件写入它们对应的目录时，采用的方法几乎与writePackedFile()完全一致，因为两个方法都只需简单调用SourceCodeFile中适当的方法。但在这里，根路径会传递给SourceCodeFile.writeFile()。所有文件都写好后，名字中指定了已写文件数量的那个附加文件也会被写入。

主程序
前面介绍的那些类都要在CodePackager中用到。大家首先看到的是用法字串。一旦最终用户不正确地调用了程序，就会打印出介绍正确用法的这个字串。调用这个字串的是usage()方法，同时还要退出程序。main()唯一的任务就是判断我们希望创建一个打包文件，还是希望从一个打包文件中提取什么东西。随后，它负责保证使用的是正确的参数，并调用适当的方法。

创建一个打包文件时，它默认位于当前目录，所以我们用默认构建器创建DirMap。打开文件后，其中的每一行都会读入，并检查是否符合特殊的条件：

(1) 若行首是一个用于源码列表的起始标记，就新建一个SourceCodeFile对象。构建器会读入源码列表剩下的所有内容。结果产生的句柄将直接加入DirMap。

(2) 若行首是一个用于源码列表的结束标记，表明某个地方出现错误，因为结束标记应当只能由SourceCodeFile构建器发现。

提取／释放一个打包文件时，提取出来的内容可进入当前目录，亦可进入另一个备用目录。所以需要相应地创建DirMap对象。打开文件，并将第一行读入。老的文件路径分隔符信息将从这一行中提取出来。随后根据输入来创建第一个SourceCodeFile对象，它会加入DirMap。只要包含了一个文件，新的SourceCodeFile对象就会创建并加入（创建的最后一个用光输入内容后，会简单地返回，然后hasFile()会返回一个错误）。

17.1.2 检查大小写样式
尽管对涉及文字处理的一些项目来说，前例显得比较方便，但下面要介绍的项目却能立即发挥作用，因为它执行的是一个样式检查，以确保我们的大小写形式符合“事实上”的Java样式标准。它会在当前目录中打开每个.java文件，并提取出所有类名以及标识符。若发现有不符合Java样式的情况，就向我们提出报告。

为了让这个程序正确运行，首先必须构建一个类名，将它作为一个“仓库”，负责容纳标准Java库中的所有类名。为达到这个目的，需遍历用于标准Java库的所有源码子目录，并在每个子目录都运行ClassScanner。至于参数，则提供仓库文件的名字（每次都用相同的路径和名字）和命令行开关-a，指出类名应当添加到该仓库文件中。

为了用程序检查自己的代码，需要运行它，并向它传递要使用的仓库文件的路径与名字。它会检查当前目录中的所有类和标识符，并告诉我们哪些没有遵守典型的Java大写写规范。

要注意这个程序并不是十全十美的。有些时候，它可能报告自己查到一个问题。但当我们仔细检查代码的时候，却发现没有什么需要更改的。尽管这有点儿烦人，但仍比自己动手检查代码中的所有错误强得多。

下面列出源代码，后面有详细的解释：
```
//: ClassScanner.java
// Scans all files in directory for classes
// and identifiers, to check capitalization.
// Assumes properly compiling code listings.
// Doesn't do everything right, but is a very
// useful aid.
import java.io.*;
import java.util.*;

class MultiStringMap extends Hashtable {
  public void add(String key, String value) {
    if(!containsKey(key))
      put(key, new Vector());
    ((Vector)get(key)).addElement(value);
  }
  public Vector getVector(String key) {
    if(!containsKey(key)) {
      System.err.println(
        "ERROR: can't find key: " + key);
      System.exit(1);
    }
    return (Vector)get(key);
  }
  public void printValues(PrintStream p) {
    Enumeration k = keys();
    while(k.hasMoreElements()) {
      String oneKey = (String)k.nextElement();
      Vector val = getVector(oneKey);
      for(int i = 0; i < val.size(); i++)
        p.println((String)val.elementAt(i));
    }
  }
}

public class ClassScanner {
  private File path;
  private String[] fileList;
  private Properties classes = new Properties();
  private MultiStringMap 
    classMap = new MultiStringMap(),
    identMap = new MultiStringMap();
  private StreamTokenizer in;
  public ClassScanner() {
    path = new File(".");
    fileList = path.list(new JavaFilter());
    for(int i = 0; i < fileList.length; i++) {
      System.out.println(fileList[i]);
      scanListing(fileList[i]);
    }
  }
  void scanListing(String fname) {
    try {
      in = new StreamTokenizer(
          new BufferedReader(
            new FileReader(fname)));
      // Doesn't seem to work:
      // in.slashStarComments(true);
      // in.slashSlashComments(true);
      in.ordinaryChar('/');
      in.ordinaryChar('.');
      in.wordChars('_', '_');
      in.eolIsSignificant(true);
      while(in.nextToken() != 
            StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          eatComments();
        else if(in.ttype == 
                StreamTokenizer.TT_WORD) {
          if(in.sval.equals("class") || 
             in.sval.equals("interface")) {
            // Get class name:
               while(in.nextToken() != 
                     StreamTokenizer.TT_EOF
                     && in.ttype != 
                     StreamTokenizer.TT_WORD)
                 ;
               classes.put(in.sval, in.sval);
               classMap.add(fname, in.sval);
          }
          if(in.sval.equals("import") ||
             in.sval.equals("package"))
            discardLine();
          else // It's an identifier or keyword
            identMap.add(fname, in.sval);
        }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  void discardLine() {
    try {
      while(in.nextToken() != 
            StreamTokenizer.TT_EOF
            && in.ttype != 
            StreamTokenizer.TT_EOL)
        ; // Throw away tokens to end of line
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  // StreamTokenizer's comment removal seemed
  // to be broken. This extracts them:
  void eatComments() {
    try {
      if(in.nextToken() != 
         StreamTokenizer.TT_EOF) {
        if(in.ttype == '/')
          discardLine();
        else if(in.ttype != '*')
          in.pushBack();
        else 
          while(true) {
            if(in.nextToken() == 
              StreamTokenizer.TT_EOF)
              break;
            if(in.ttype == '*')
              if(in.nextToken() != 
                StreamTokenizer.TT_EOF
                && in.ttype == '/')
                break;
          }
      }
    } catch(IOException e) {
      e.printStackTrace();
    }
  }
  public String[] classNames() {
    String[] result = new String[classes.size()];
    Enumeration e = classes.keys();
    int i = 0;
    while(e.hasMoreElements())
      result[i++] = (String)e.nextElement();
    return result;
  }
  public void checkClassNames() {
    Enumeration files = classMap.keys();
    while(files.hasMoreElements()) {
      String file = (String)files.nextElement();
      Vector cls = classMap.getVector(file);
      for(int i = 0; i < cls.size(); i++) {
        String className = 
          (String)cls.elementAt(i);
        if(Character.isLowerCase(
             className.charAt(0)))
          System.out.println(
            "class capitalization error, file: "
            + file + ", class: " 
            + className);
      }
    }
  }
  public void checkIdentNames() {
    Enumeration files = identMap.keys();
    Vector reportSet = new Vector();
    while(files.hasMoreElements()) {
      String file = (String)files.nextElement();
      Vector ids = identMap.getVector(file);
      for(int i = 0; i < ids.size(); i++) {
        String id = 
          (String)ids.elementAt(i);
        if(!classes.contains(id)) {
          // Ignore identifiers of length 3 or
          // longer that are all uppercase
          // (probably static final values):
          if(id.length() >= 3 &&
             id.equals(
               id.toUpperCase()))
            continue;
          // Check to see if first char is upper:
          if(Character.isUpperCase(id.charAt(0))){
            if(reportSet.indexOf(file + id)
                == -1){ // Not reported yet
              reportSet.addElement(file + id);
              System.out.println(
                "Ident capitalization error in:"
                + file + ", ident: " + id);
            }
          }
        }
      }
    }
  }
  static final String usage =
    "Usage: \n" + 
    "ClassScanner classnames -a\n" +
    "\tAdds all the class names in this \n" +
    "\tdirectory to the repository file \n" +
    "\tcalled 'classnames'\n" +
    "ClassScanner classnames\n" +
    "\tChecks all the java files in this \n" +
    "\tdirectory for capitalization errors, \n" +
    "\tusing the repository file 'classnames'";
  private static void usage() {
    System.err.println(usage);
    System.exit(1);
  }
  public static void main(String[] args) {
    if(args.length < 1 || args.length > 2)
      usage();
    ClassScanner c = new ClassScanner();
    File old = new File(args[0]);
    if(old.exists()) {
      try {
        // Try to open an existing 
        // properties file:
        InputStream oldlist =
          new BufferedInputStream(
            new FileInputStream(old));
        c.classes.load(oldlist);
        oldlist.close();
      } catch(IOException e) {
        System.err.println("Could not open "
          + old + " for reading");
        System.exit(1);
      }
    }
    if(args.length == 1) {
      c.checkClassNames();
      c.checkIdentNames();
    }
    // Write the class names to a repository:
    if(args.length == 2) {
      if(!args[1].equals("-a"))
        usage();
      try {
        BufferedOutputStream out =
          new BufferedOutputStream(
            new FileOutputStream(args[0]));
        c.classes.save(out,
          "Classes found by ClassScanner.java");
        out.close();
      } catch(IOException e) {
        System.err.println(
          "Could not write " + args[0]);
        System.exit(1);
      }
    }
  }
}

class JavaFilter implements FilenameFilter {
  public boolean accept(File dir, String name) {
    // Strip path information:
    String f = new File(name).getName();
    return f.trim().endsWith(".java");
  }
} ///:~
```
MultiStringMap类是个特殊的工具，允许我们将一组字串与每个键项对应（映射）起来。和前例一样，这里也使用了一个散列表（Hashtable），不过这次设置了继承。该散列表将键作为映射成为Vector值的单一的字串对待。add()方法的作用很简单，负责检查散列表里是否存在一个键。如果不存在，就在其中放置一个。getVector()方法为一个特定的键产生一个Vector；而printValues()将所有值逐个Vector地打印出来，这对程序的调试非常有用。

为简化程序，来自标准Java库的类名全都置入一个Properties（属性）对象中（来自标准Java库）。记住Properties对象实际是个散列表，其中只容纳了用于键和值项的String对象。然而仅需一次方法调用，我们即可把它保存到磁盘，或者从磁盘中恢复。实际上，我们只需要一个名字列表，所以为键和值都使用了相同的对象。

针对特定目录中的文件，为找出相应的类与标识符，我们使用了两个MultiStringMap：classMap以及identMap。此外在程序启动的时候，它会将标准类名仓库装载到名为classes的Properties对象中。一旦在本地目录发现了一个新类名，也会将其加入classes以及classMap。这样一来，classMap就可用于在本地目录的所有类间遍历，而且可用classes检查当前标记是不是一个类名（它标记着对象或方法定义的开始，所以收集接下去的记号——直到碰到一个分号——并将它们都置入identMap）。

ClassScanner的默认构建器会创建一个由文件名构成的列表（采用FilenameFilter的JavaFilter实现形式，参见第10章）。随后会为每个文件名都调用scanListing()。

在scanListing()内部，会打开源码文件，并将其转换成一个StreamTokenizer。根据Java帮助文档，将true传递给slashStartComments()和slashSlashComments()的本意应当是剥除那些注释内容，但这样做似乎有些问题（在Java 1.0中几乎无效）。所以相反，那些行被当作注释标记出去，并用另一个方法来提取注释。为达到这个目的，'/'必须作为一个原始字符捕获，而不是让StreamTokeinzer将其当作注释的一部分对待。此时要用ordinaryChar()方法指示StreamTokenizer采取正确的操作。同样的道理也适用于点号（'.'），因为我们希望让方法调用分离出单独的标识符。但对下划线来说，它最初是被StreamTokenizer当作一个单独的字符对待的，但此时应把它留作标识符的一部分，因为它在static final值中以TT_EOF等等形式使用。当然，这一点只对目前这个特殊的程序成立。wordChars()方法需要取得我们想添加的一系列字符，把它们留在作为一个单词看待的记号中。最后，在解析单行注释或者放弃一行的时候，我们需要知道一个换行动作什么时候发生。所以通过调用eollsSignificant(true)，换行符（EOL）会被显示出来，而不是被StreamTokenizer吸收。

scanListing()剩余的部分将读入和检查记号，直至文件尾。一旦nextToken()返回一个final static值——StreamTokenizer.TT_EOF，就标志着已经抵达文件尾部。

若记号是个'/'，意味着它可能是个注释，所以就调用eatComments()，对这种情况进行处理。我们在这儿唯一感兴趣的其他情况是它是否为一个单词，当然还可能存在另一些特殊情况。

如果单词是class（类）或interface（接口），那么接着的记号就应当代表一个类或接口名字，并将其置入classes和classMap。若单词是import或者package，那么我们对这一行剩下的东西就没什么兴趣了。其他所有东西肯定是一个标识符（这是我们感兴趣的），或者是一个关键字（对此不感兴趣，但它们采用的肯定是小写形式，所以不必兴师动众地检查它们）。它们将加入到identMap。

discardLine()方法是一个简单的工具，用于查找行末位置。注意每次得到一个新记号时，都必须检查行末。

只要在主解析循环中碰到一个正斜杠，就会调用eatComments()方法。然而，这并不表示肯定遇到了一条注释，所以必须将接着的记号提取出来，检查它是一个正斜杠（那么这一行会被丢弃），还是一个星号。但假如两者都不是，意味着必须在主解析循环中将刚才取出的记号送回去！幸运的是，pushBack()方法允许我们将当前记号“压回”输入数据流。所以在主解析循环调用nextToken()的时候，它能正确地得到刚才送回的东西。

为方便起见，classNames()方法产生了一个数组，其中包含了classes集合中的所有名字。这个方法未在程序中使用，但对代码的调试非常有用。

接下来的两个方法是实际进行检查的地方。在checkClassNames()中，类名从classMap提取出来（请记住，classMap只包含了这个目录内的名字，它们按文件名组织，所以文件名可能伴随错误的类名打印出来）。为做到这一点，需要取出每个关联的Vector，并遍历其中，检查第一个字符是否为小写。若确实为小写，则打印出相应的出错提示消息。

在checkIdentNames()中，我们采用了一种类似的方法：每个标识符名字都从identMap中提取出来。如果名字不在classes列表中，就认为它是一个标识符或者关键字。此时会检查一种特殊情况：如果标识符的长度等于3或者更长，而且所有字符都是大写的，则忽略此标识符，因为它可能是一个static final值，比如TT_EOF。当然，这并不是一种完美的算法，但它假定我们最终会注意到任何全大写标识符都是不合适的。

这个方法并不是报告每一个以大写字符开头的标识符，而是跟踪那些已在一个名为reportSet()的Vector中报告过的。它将Vector当作一个“集合”对待，告诉我们一个项目是否已在那个集合中。该项目是通过将文件名和标识符连接起来生成的。若元素不在集合中，就加入它，然后产生报告。

程序列表剩下的部分由main()构成，它负责控制命令行参数，并判断我们是准备在标准Java库的基础上构建由一系列类名构成的“仓库”，还是想检查已写好的那些代码的正确性。不管在哪种情况下，都会创建一个ClassScanner对象。

无论准备构建一个“仓库”，还是准备使用一个现成的，都必须尝试打开现有仓库。通过创建一个File对象并测试是否存在，就可决定是否打开文件并在ClassScanner中装载classes这个Properties列表（使用load()）。来自仓库的类将追加到由ClassScanner构建器发现的类后面，而不是将其覆盖。如果仅提供一个命令行参数，就意味着自己想对类名和标识符名字进行一次检查。但假如提供两个参数（第二个是"-a"），就表明自己想构成一个类名仓库。在这种情况下，需要打开一个输出文件，并用Properties.save()方法将列表写入一个文件，同时用一个字串提供文件头信息。
留下你的读书笔记
你还没登录，点击这里
 用户笔记留言