java之Pattern类详解

发布时间：2023-03-25 14:17:10 所属栏目：教程来源：

导读：在JDK 1.4中，Java增加了对正则表达式的支持。

java与正则相关的工具主要在java.util.regex包中；此包中主要有两个类：Pattern、Matcher。

Pattern

声明：public final class Pattern implements java.io

在JDK 1.4中，Java增加了对正则表达式的支持。

java与正则相关的工具主要在java.util.regex包中；此包中主要有两个类：Pattern、Matcher。

Pattern

声明：public final class Pattern implements java.io.Serializable

Pattern类有final 修饰，可知他不能被子类继承。

含义：模式类，正则表达式的编译表示形式。

注意：此类的实例是不可变的，可供多个并发线程安全使用。

字段：
复制代码

public static final int UNIX_LInes = 0x01;
/**
* 启用不区分大小写的匹配。*/
int CASE_INSENSITIVE = 0x02;

* 模式中允许空白和注释。
int COMMENTS = 0x04;

* 启用多行模式。
int MULTILINE = 0x08;

* 启用模式的字面值解析。int LIteraL = 0x10;

* 启用 dotall 模式。
int DOTALL = 0x20;

* 启用 Unicode 感知的大小写折叠。int UNICODE_CASE = 0x40;

* 启用规范等价。
int CANON_EQ = 0x80;
private long serialVersionUID = 5073258162644648461L;

* The original regular-expression pattern string.
private String pattern;

* The original pattern flags.
int flags;

* Boolean indicating this Pattern is compiled; this is necessary in order
* to lazily compile deserialized Patterns.
transient volatile boolean compiled = false;

* The normalized pattern string.
transient String normalizedPattern;

* The starting point of state machine for the find operation. This allows
* a match to start anywhere in the input.
transient Node root;

* The root of object tree for a match operation. The pattern is matched
* at the beginning. This may include a find that uses BnM or a First
* node.
transient Node matchRoot;

* Temporary storage used by parsing pattern slice.
int[] buffer;

* Temporary storage used while parsing group references.
transient GroupHead[] groupNodes;

* Temporary null terminated code point array used by pattern compiling.
int[] temp;

* The number of capturing groups in this Pattern. Used by matchers to
* allocate storage needed to perform a match.此模式中的捕获组的数目。
int capturingGroupCount;

* The local variable count used by parsing tree. Used by matchers to
* allocate storage needed to perform a match.
int localCount;

* Index into the pattern string that keeps track of how much has been
* parsed.
int cursor;

* Holds the length of the pattern string.
int patternLength;
复制代码

组和捕获
捕获组可以通过从左到右计算其开括号来编号。

在表达式 ((A)(B(C))) 中，存在四个组：

1   ABC
2   A
3   BC
4   C
组零始终代表整个表达式。

private Pattern(String p,int f) {
pattern = p;
flags = f;
// Reset group index count
capturingGroupCount = 1;
localCount = 0;
if (pattern.length() > 0) {
compile();
} else {
root = new Start(lastAccept);
matchRoot = lastAccept;
}
}
构造器是私有的，可知不能通过new创建Pattern对象。
如何得到Pattern类的实例？

查阅所有方法后发现：

static Pattern compile(String regex) {
return new Pattern(regex,0);
}
static Pattern compile(String regex,255);line-height:1.5;">int flags) { new Pattern(regex,flags); }
可知是通过Pattern调用静态方法compile返回Pattern实例。

其他部分方法：
1、public Matcher matcher(CharSequence input)

创建匹配给定输入与此模式的匹配器，返回此模式的新匹配器。

public Matcher matcher(CharSequence input) { if (!compiled) { synchronized(this) { if (!compiled) compile(); } } Matcher m = new Matcher(this,input); return m; }
2、public static boolean matches(String regex,编译给定正则表达式并尝试将给定输入与其匹配。
boolean matches(String regex,CharSequence input) { Pattern p = Pattern.compile(regex); Matcher m = p.matcher(input); return m.matches(); }
测试：

代码1（参考JDK API 1.6例子）：

Pattern p = Pattern.compile("a*b");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();
System.out.println(b); true
代码2：

System.out.println(Pattern.matches("a*b","aaaaab"));查阅matcher和matches方法可知matches自动做了一些处理，代码2可视为代码1的简化，他们是等效的。
如果要多次使用一种模式，编译一次后重用此模式比每次都调用此方法效率更高。

3、public String[] split(CharSequence input) 和 public String[] split(CharSequence input,int limit)

input：要拆分的字符序列；

limit：结果阈值；

根据指定模式拆分输入序列。
limit参数作用：

limit参数控制应用模式的次数，从而影响结果数组的长度。

如果 n 大于零，那么模式至多应用 n- 1 次，数组的长度不大于 n，并且数组的最后条目将包含除最后的匹配定界符之外的所有输入。

如果 n 非正，那么将应用模式的次数不受限制，并且数组可以为任意长度。

如果 n 为零，那么应用模式的次数不受限制，数组可以为任意长度，并且将丢弃尾部空字符串。

查看split(CharSequence input) 源码：

public String[] split(CharSequence input) {
return split(input,0);
}
可知split(CharSequence input)实际调用了split(CharSequence input,int limit)；以下只讨论split(CharSequence input,int limit)。

假设：

若input="boo:and:foo"，匹配符为"o"，可知模式最多可应用4次，数组的长度最大为5；

1、当limit=-2时，应用模式的次数不受限制且数组可以为任意长度；推测模式应用4次，数组的长度为5，数组为{"b","",":and:f",""}；

2、当limit=2时，模式至多应用1次，数组的长度不大于 2，且第二个元素包含除最后的匹配定界符之外的所有输入；推测模式应用1次，数组的长度为2，数组为{"b","o:and:foo"}；

3、当limit=7时，模式至多应用6次，数组的长度不大于 7；推测模式应用4次，数组的长度为5，数组为{"b",255);">4、当limit=0时，应用模式的次数不受限制，数组可以为任意长度，并且将丢弃尾部空字符串；推测模式应用4次，数组的长度为3，数组为{"b",":and:f"}。

void main(String[] args) {
String[] arr = null;
CharSequence input = "boo:and:foo";
Pattern p = Pattern.compile("o");
arr = p.split(input,-2);
System.out.println(printArr(arr)); {"b",""}，共有5个元素
arr = p.split(input,2);
System.out.println(printArr(arr));
arr = p.split(input,7);
System.out.println(printArr(arr)););
System.out.println(printArr(arr));
}
打印String数组
static String printArr(String[] arr) {
int length = arr.length;
StringBuffer sb = new StringBuffer();
sb.append("{");
for (int i = 0; i < length; i++) {
sb.append("\"").append(arr[i]).append("\"");
if (i != length - 1)
sb.append(",");
}
sb.append("}").append("，共有" + length + "个元素");
return sb.toString();
}
输出结果与以上猜测结果一致。
4、toString()和pattern()

两个方法代码一样，都是返回此模式的字符串表示形式。

public String toString() {
return pattern;
}
public String pattern() { return pattern; }
测试：

Pattern p = Pattern.compile("\\d+");
System.out.println(p.toString()); 输出\d+
System.out.println(p.pattern()); 输出\d+
5、public int flags()

int flags() { return flags; }
测试:

Pattern p = Pattern.compile("a+",Pattern.CASE_INSENSITIVE);
System.out.println(p.flags()); 2
查阅Pattern源代码：

int CASE_INSENSITIVE = 0x02;
可知CASE_INSENSITIVE =2；所以测试输出2。

（编辑：汽车网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!