首页 > 解决方案 > 拉丁 ISO 识别 html 文档中的字符但不识别 UTF8

问题描述

我有以下代码:

<html>
<head>
    <meta charset="utf-8">
</head>

<body>
    <p>Schrödinger
</body>

当我在浏览器中运行它时,我得到:

薛定谔

当我将编码更改为拉丁 ISO 时:

<html>
<head>
    <meta charset="ISO-8859-1">
</head>

<body>
    <p>Schrödinger
</body>

它运作良好:

薛定谔

奇怪的是,使用这个网站上的代码片段工具,utf-8 效果很好:

<html>
	<head>
		<meta charset="utf-8">
	</head>

	<body>
		<p>Schrödinger
	</body>
</html>

使用 UTF8 应该比拉丁 ISO 更好(它支持更多字符)。

问题可能是什么?

我在 Chrome 和 Firefox 中都进行了测试。我在旧 PC 上使用 Windows 7。

标签: html

解决方案


You are right that UTF-8 can represent more characters than ISO-8859-1, but it also represents the same characters differently.

To understand what that means, you need to think about the binary representation that a computer uses for text. When you save some text to a file, what you are actually doing is writing some sequence of ones and zeroes to disk; when you load that file in a web browser, it has to look at that sequence of ones and zeroes and decide what to display.

A character encoding is the way that the browser decides what to display for each sequence of ones and zeroes.

In ISO-8859-1, the character "ö" is written as the sequence 111101110. In UTF-8, that same character would instead be written 1100001110110110, and 111101110 would mean something else (in fact, because of the way UTF-8 works, it represents half of something, so can't be displayed).

Your file contains 111101110, so the correct thing to tell the browser is "read this as ISO 8859-1 please". Alternatively, you can open the file in an editor that "knows" both encodings, and tell that editor to rewrite it as UTF-8, so the character will be saved as 1100001110110110 instead.

This is what happens when you paste the character here: your browser knows that Stack Overflow wants the UTF-8 version, and converts it to 1100001110110110 for you.


推荐阅读