System.String.Normalize 方法

方法描述

返回一个新字符串,其文本值与此字符串相同,但其二进制表示形式符合 Unicode 范式 C。

语法定义(C# System.String.Normalize 方法 的用法)

public string Normalize()

参数/返回值

参数值/返回值 参数类型/返回类型 参数描述/返回描述
返回值 System.String 一个新的规范化字符串,其文本值与此字符串相同,但其二进制表示形式符合范式 C。

提示和注释

某些 Unicode 字符具有多种等效的二进制表示形式。这些表示形式由组合和/或复合 Unicode 字符集组成。 单个字符存在多种表示形式增加了搜索、排序、匹配和其他操作的复杂性。

Unicode 标准定义了一个称为规范化的过程,此过程将一个字符的任何一种等价二进制表示形式转换为统一的二进制表示形式。 可使用多种遵循不同规则的算法执行规范化,这些算法也称为范式。 .NET Framework 当前支持范式 C、D、KC 和 KD。

有关支持的 Unicode 范式的说明,请参见 System.Text.NormalizationForm。

对调用者的说明

一旦在字符串中遇到第一个非规范化字符,IsNormalized 方法将返回 false。 因此,如果字符串包含后面跟着无效 Unicode 字符的非标准化字符,Normalize 方法可能引发 ArgumentException,尽管 IsNormalized 会返回 false。

System.String.Normalize 方法例子

下面的示例将一个字符串规范化为四种范式的每一种,确认字符串已规范化为指定的范式,然后列出规范化字符串中的码位。

// This example demonstrates the String.Normalize method
//                       and the String.IsNormalized method

using System;
using System.Text;

class Sample 
{
    public static void Main() 
    {
// Character c; combining characters acute and cedilla; character 3/4
    string s1 = new String( new char[] {'\u0063', '\u0301', '\u0327', '\u00BE'});
    string s2 = null;
    string divider = new String('-', 80);
    divider = String.Concat(Environment.NewLine, divider, Environment.NewLine);

    try 
    {
    Show("s1", s1);
    Console.WriteLine();
    Console.WriteLine("U+0063 = LATIN SMALL LETTER C");
    Console.WriteLine("U+0301 = COMBINING ACUTE ACCENT");
    Console.WriteLine("U+0327 = COMBINING CEDILLA");
    Console.WriteLine("U+00BE = VULGAR FRACTION THREE QUARTERS");
    Console.WriteLine(divider);

    Console.WriteLine("A1) Is s1 normalized to the default form (Form C)?: {0}", 
                                 s1.IsNormalized());
    Console.WriteLine("A2) Is s1 normalized to Form C?:  {0}", 
                                 s1.IsNormalized(NormalizationForm.FormC));
    Console.WriteLine("A3) Is s1 normalized to Form D?:  {0}", 
                                 s1.IsNormalized(NormalizationForm.FormD));
    Console.WriteLine("A4) Is s1 normalized to Form KC?: {0}", 
                                 s1.IsNormalized(NormalizationForm.FormKC));
    Console.WriteLine("A5) Is s1 normalized to Form KD?: {0}", 
                                 s1.IsNormalized(NormalizationForm.FormKD));

    Console.WriteLine(divider);

    Console.WriteLine("Set string s2 to each normalized form of string s1.");
    Console.WriteLine();
    Console.WriteLine("U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE");
    Console.WriteLine("U+0033 = DIGIT THREE");
    Console.WriteLine("U+2044 = FRACTION SLASH");
    Console.WriteLine("U+0034 = DIGIT FOUR");
    Console.WriteLine(divider);

    s2 = s1.Normalize();
    Console.Write("B1) Is s2 normalized to the default form (Form C)?: ");
    Console.WriteLine(s2.IsNormalized());
    Show("s2", s2);
    Console.WriteLine();

    s2 = s1.Normalize(NormalizationForm.FormC);
    Console.Write("B2) Is s2 normalized to Form C?: ");
    Console.WriteLine(s2.IsNormalized(NormalizationForm.FormC));
    Show("s2", s2);
    Console.WriteLine();

    s2 = s1.Normalize(NormalizationForm.FormD);
    Console.Write("B3) Is s2 normalized to Form D?: ");
    Console.WriteLine(s2.IsNormalized(NormalizationForm.FormD));
    Show("s2", s2);
    Console.WriteLine();

    s2 = s1.Normalize(NormalizationForm.FormKC);
    Console.Write("B4) Is s2 normalized to Form KC?: ");
    Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKC));
    Show("s2", s2);
    Console.WriteLine();

    s2 = s1.Normalize(NormalizationForm.FormKD);
    Console.Write("B5) Is s2 normalized to Form KD?: ");
    Console.WriteLine(s2.IsNormalized(NormalizationForm.FormKD));
    Show("s2", s2);
    Console.WriteLine();
    }

    catch (Exception e) 
        {
        Console.WriteLine(e.Message);
        }
    }

    private static void Show(string title, string s)
    {
    Console.Write("Characters in string {0} = ", title);
    foreach(short x in s.ToCharArray())
        {
        Console.Write("{0:X4} ", x);
        }
    Console.WriteLine();
    }
}
/*
This example produces the following results:

Characters in string s1 = 0063 0301 0327 00BE

U+0063 = LATIN SMALL LETTER C
U+0301 = COMBINING ACUTE ACCENT
U+0327 = COMBINING CEDILLA
U+00BE = VULGAR FRACTION THREE QUARTERS

--------------------------------------------------------------------------------

A1) Is s1 normalized to the default form (Form C)?: False
A2) Is s1 normalized to Form C?:  False
A3) Is s1 normalized to Form D?:  False
A4) Is s1 normalized to Form KC?: False
A5) Is s1 normalized to Form KD?: False

--------------------------------------------------------------------------------

Set string s2 to each normalized form of string s1.

U+1E09 = LATIN SMALL LETTER C WITH CEDILLA AND ACUTE
U+0033 = DIGIT THREE
U+2044 = FRACTION SLASH
U+0034 = DIGIT FOUR

--------------------------------------------------------------------------------

B1) Is s2 normalized to the default form (Form C)?: True
Characters in string s2 = 1E09 00BE

B2) Is s2 normalized to Form C?: True
Characters in string s2 = 1E09 00BE

B3) Is s2 normalized to Form D?: True
Characters in string s2 = 0063 0327 0301 00BE

B4) Is s2 normalized to Form KC?: True
Characters in string s2 = 1E09 0033 2044 0034

B5) Is s2 normalized to Form KD?: True
Characters in string s2 = 0063 0327 0301 0033 2044 0034

*/

异常

异常 异常描述
ArgumentException 当前实例包含无效的 Unicode 字符。

命名空间

namespace: System

程序集: mscorlib(在 mscorlib.dll 中)

版本信息

.NET Framework 受以下版本支持:4、3.5、3.0、2.0 .NET Framework Client Profile 受以下版本支持:4、3.5 SP1

适用平台

Windows 7, Windows Vista SP1 或更高版本, Windows XP SP3, Windows XP SP2 x64 Edition, Windows Server 2008(不支持服务器核心), Windows Server 2008 R2(支持 SP1 或更高版本的服务器核心), Windows Server 2003 SP2 .NET Framework 并不是对每个平台的所有版本都提供支持。有关支持的版本的列表,请参见.NET Framework 系统要求。