![Python Web Scraping Cookbook](https://wfqqreader-1252317822.image.myqcloud.com/cover/240/36700240/b_36700240.jpg)
上QQ阅读APP看书,第一时间看更新
Getting ready
We will read a file named unicode.html from our local web server, located at http://localhost:8080/unicode.html. This file is UTF-8 encoded and contains several sets of characters in different parts of the encoding space. For example, the page looks as follows in your browser:
![](https://epubservercos.yuewen.com/02C97C/19470398001588706/epubprivate/OEBPS/Images/89c7b066-5d99-4dff-a318-3d97e1d6be0a.png?sign=1739378785-ihvgXTa7Jn4bwzLBDC6fCceqSVXamUGh-0-b0c235c0c9216951cfa93a7ac755e55c)
The Page in the Browser
Using an editor that supports UTF-8, we can see how the Cyrillic characters are rendered in the editor:
![](https://epubservercos.yuewen.com/02C97C/19470398001588706/epubprivate/OEBPS/Images/afdf6e7f-3bbb-4226-bc69-356d01a27d5a.png?sign=1739378785-YUTxEeGhqA6C8pwCzYZl3me0m1bYLanL-0-147c4f18bd090fbc0876b45f31c9a3eb)
The HTML in an Editor
Code for the sample is in 02/06_unicode.py.