关于java标识符

源自：2-2 认识Java标识符

关于java标识符

oracle官网的Java Language Specification文档是这样描述Identifiers的：
        An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.
Identifier:
IdentifierChars but not a Keyword or BooleanLiteral or NullLiteral
IdentifierChars:
JavaLetter {JavaLetterOrDigit}
JavaLetter:
any Unicode character that is a "Java letter"
JavaLetterOrDigit:
any Unicode character that is a "Java letter-or-digit"

A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true.

A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.

The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII underscore (_, or \u005f) and dollar sign ($, or \u0024). The $ sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems.

The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039).

Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.

An identifier cannot have the same spelling (Unicode character sequence) as a keyword (§3.9), boolean literal (§3.10.3), or the null literal (§3.10.7), or a compile-time error occurs.

Two identifiers are the same only if they are identical, that is, have the same Unicode character for each letter or digit. Identifiers that have the same external appearance may yet be different.

For example, the identifiers consisting of the single letters LATIN CAPITAL LETTER A (A, \u0041), LATIN SMALL LETTER A (a, \u0061), GREEK CAPITAL LETTER ALPHA (A, \u0391), CYRILLIC SMALL LETTER A (a, \u0430) and MATHEMATICAL BOLD ITALIC SMALL A (a, \ud835\udc82) are all different.

Unicode composite characters are different from their canonical equivalent decomposed characters. For example, a LATIN CAPITAL LETTER A ACUTE (Á, \u00c1) is different from a LATIN CAPITAL LETTER A (A, \u0041) immediately followed by a NON-SPACING ACUTE (´, \u0301) in identifiers. See The Unicode Standard, Section 3.11 "Normalization Forms".

Examples of identifiers are:
    String
    i3
    αρετη
    MAX_VALUE
    isLetterOrDigit

    大概意思就是说，标识符是由java字母和java数字组成的，而字母和数字可以从整个Unicode字符集中抽取：即ascii编码中的大小写字母 A-Z (\u0041-\u005a) and a-z (\u0061-\u007a)、数字digits 0-9 (\u0030-\u0039)、下划线underscore (_, or \u005f)、$符dollar sign ($, or \u0024)以及除ascii字符集的Unicode字符集组成（Unicode字符集包含ascii）；其中标识符的首个字符编码不为数字，即编码不能在(\u0030-\u0039)范围中；

所以说中文也可以做标识符；为了规范代码，我们一般使用ascii字符集的字符来组成标识符，而不用中文等Unicode字符集的字符

文档地址(java8)：https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.8

提问者：Cerky 2019-04-24 20:38

关于java标识符

个回答