html - 拉丁 ISO 识别 html 文档中的字符但不识别 UTF8
问题描述
我有以下代码:
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<p>Schrödinger
</body>
当我在浏览器中运行它时,我得到:
薛定谔
当我将编码更改为拉丁 ISO 时:
<html>
<head>
<meta charset="ISO-8859-1">
</head>
<body>
<p>Schrödinger
</body>
它运作良好:
薛定谔
奇怪的是,使用这个网站上的代码片段工具,utf-8 效果很好:
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<p>Schrödinger
</body>
</html>
使用 UTF8 应该比拉丁 ISO 更好(它支持更多字符)。
问题可能是什么?
我在 Chrome 和 Firefox 中都进行了测试。我在旧 PC 上使用 Windows 7。
解决方案
You are right that UTF-8 can represent more characters than ISO-8859-1, but it also represents the same characters differently.
To understand what that means, you need to think about the binary representation that a computer uses for text. When you save some text to a file, what you are actually doing is writing some sequence of ones and zeroes to disk; when you load that file in a web browser, it has to look at that sequence of ones and zeroes and decide what to display.
A character encoding is the way that the browser decides what to display for each sequence of ones and zeroes.
In ISO-8859-1, the character "ö" is written as the sequence 111101110. In UTF-8, that same character would instead be written 1100001110110110, and 111101110 would mean something else (in fact, because of the way UTF-8 works, it represents half of something, so can't be displayed).
Your file contains 111101110, so the correct thing to tell the browser is "read this as ISO 8859-1 please". Alternatively, you can open the file in an editor that "knows" both encodings, and tell that editor to rewrite it as UTF-8, so the character will be saved as 1100001110110110 instead.
This is what happens when you paste the character here: your browser knows that Stack Overflow wants the UTF-8 version, and converts it to 1100001110110110 for you.