首页 > 技术文章 > Java笔记之java.lang.String#trim

cc11001100 2018-08-06 22:53 原文

 

String的trim()方法是使用频率频率很高的一个方法,直到不久前我不确定trim去除两端的空白符时对换行符是怎么处理的点进去看了下源码的实现,才发现String#trim的实现跟我想像的完全不一样,原来一直以来我对这个函数存在着很深的误解。

我想的trim方法是类似于下面这样的:

package cc11001100.trimStudy;

/**
 * @author CC11001100
 */
public class CustomString {

	private char[] values;

	public CustomString(char[] values) {
		this.values = values;
	}

	// ...

	public CustomString trim() {
		char[] localValues = values;
		int left = 0, right = localValues.length;
		while (left < right && isBlankChar(localValues[left])) {
			left++;
		}
		while (right > left && isBlankChar(localValues[right - 1])) {
			right--;
		}
		if (left != 0 || right != localValues.length) {
			char[] newValue = new char[right - left];
			System.arraycopy(localValues, left, newValue, 0, newValue.length);
			return new CustomString(newValue);
		} else {
			return this;
		}
	}

	private boolean isBlankChar(char c) {
		return c == ' ' || c == '\t' || c == '\r' || c == '\n';
	}

	@Override
	public String toString() {
		return new java.lang.String(values);
	}

	// ...

}

即去除字符串两边的回车换行、制表符、回车换行符等等,然而String#trim的实际实现是这样的:

/**
 * Returns a string whose value is this string, with any leading and trailing
 * whitespace removed.
 * <p>
 * If this {@code String} object represents an empty character
 * sequence, or the first and last characters of character sequence
 * represented by this {@code String} object both have codes
 * greater than {@code '\u005Cu0020'} (the space character), then a
 * reference to this {@code String} object is returned.
 * <p>
 * Otherwise, if there is no character with a code greater than
 * {@code '\u005Cu0020'} in the string, then a
 * {@code String} object representing an empty string is
 * returned.
 * <p>
 * Otherwise, let <i>k</i> be the index of the first character in the
 * string whose code is greater than {@code '\u005Cu0020'}, and let
 * <i>m</i> be the index of the last character in the string whose code
 * is greater than {@code '\u005Cu0020'}. A {@code String}
 * object is returned, representing the substring of this string that
 * begins with the character at index <i>k</i> and ends with the
 * character at index <i>m</i>-that is, the result of
 * {@code this.substring(k, m + 1)}.
 * <p>
 * This method may be used to trim whitespace (as defined above) from
 * the beginning and end of a string.
 *
 * @return  A string whose value is this string, with any leading and trailing white
 *          space removed, or this string if it has no leading or
 *          trailing white space.
 */
public String trim() {
    int len = value.length;
    int st = 0;
    char[] val = value;    /* avoid getfield opcode */

    while ((st < len) && (val[st] <= ' ')) {
        st++;
    }
    while ((st < len) && (val[len - 1] <= ' ')) {
        len--;
    }
    return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
}

会将字符串两侧小于空格的字符都去除掉,这里可以简单的将\u005Cu0020理解为ASCII 0x20,即十进制的32,在ASCII码表中小于等于32的字符都将被去除:

image 

先来看一下trim必须要去除的几个字符:

\t是9

\r是13

\n是10

这几个字符倒是都小于空格,而且前31位都是不可见字符,32是空格,这样做的话好像也没有太大的毛病,只是以后再使用trim的时候要想一下自己的数据有没有可能出现小于32不是空格制表符换行之类又需要保留的。

 

下面是对String#trim的一个简单测试:

package cc11001100.trimStudy;

/**
 * @author CC11001100
 */
public class TrimStudy {

	public static void main(String[] args) {

		StringBuilder sb = new StringBuilder();
		for (int i = 0; i < 128; i++) {
			sb.append((char) i);
		}
		String s = sb.toString().trim();
		// trim效果
		System.out.println("-" + s + "-");
		// trim之后第一个字符的ASCII码
		System.out.println((int) s.charAt(0));
		// 删除
		System.out.println((char) 127);
		// 查看其它空白字符的打印效果
		System.out.println(sb.toString());

	}

}

运行结果:
image 

注意ASCII 127删除字符应该也可以算作是不可见的空白字符。

 

后来我不死心,又去找了被依赖超多次数的Apache commons-lang中StringUtils#trim的实现:

/**
 * <p>Removes control characters (char &lt;= 32) from both
 * ends of this String, handling <code>null</code> by returning
 * <code>null</code>.</p>
 *
 * <p>The String is trimmed using {@link String#trim()}.
 * Trim removes start and end characters &lt;= 32.
 * To strip whitespace use {@link #strip(String)}.</p>
 *
 * <p>To trim your choice of characters, use the
 * {@link #strip(String, String)} methods.</p>
 *
 * <pre>
 * StringUtils.trim(null)          = null
 * StringUtils.trim("")            = ""
 * StringUtils.trim("     ")       = ""
 * StringUtils.trim("abc")         = "abc"
 * StringUtils.trim("    abc    ") = "abc"
 * </pre>
 *
 * @param str  the String to be trimmed, may be null
 * @return the trimmed string, <code>null</code> if null String input
 */
public static String trim(String str) {
    return str == null ? null : str.trim();
}

然而也只是调用了String#trim,也不是我想象的那样….

 

看来我一直以来都对trim有着很深的误解,trim是编程中对字符串处理的一个比较通用的概念,也不知道其它语言的具体实现是怎样的。

 

.

推荐阅读