使用Javascript检索二进制文件内容，对base64进行编码，然后使用Python对其进行反解

首页课程实战体系课手记专栏慕课教程

使用Javascript检索二进制文件内容，对base64进行编码，然后使用Python对其进行反解

我正在尝试使用XMLHttpRequest（使用最新的Webkit）下载二进制文件，并使用此简单功能对base64的内容进行编码：

function getBinary(file){

var xhr = new XMLHttpRequest();

xhr.open("GET", file, false);

xhr.overrideMimeType("text/plain; charset=x-user-defined");

xhr.send(null);

return xhr.responseText;

}

function base64encode(binary) {

return btoa(unescape(encodeURIComponent(binary)));

}

var binary = getBinary('http://some.tld/sample.pdf');

var base64encoded = base64encode(binary);

附带说明一下，以上所有内容都是标准Javascript内容，包括btoa()和encodeURIComponent()：https : //developer.mozilla.org/en/DOM/window.btoa

这工作非常顺利，我什至可以使用Javascript解码base64内容：

function base64decode(base64) {

return decodeURIComponent(escape(atob(base64)));

}

var decodedBinary = base64decode(base64encoded);

decodedBinary === binary // true

现在，我想使用Python解码base64编码的内容，该内容使用一些JSON字符串来获取base64encoded字符串值。天真的，这就是我的工作：

import urllib

import base64

# ... retrieving of base64 encoded string through JSON

base64 = "77+9UE5HDQ……………oaCgA="

source_contents = urllib.unquote(base64.b64decode(base64))

destination_file = open(destination, 'wb')

destination_file.write(source_contents)

destination_file.close()

但是生成的文件无效，看起来该操作已被UTF-8，编码或其他尚不清楚的东西弄乱了。

如果在将UTF-8内容放入目标文件之前尝试对其进行解码，则会引发错误：

import urllib

import base64

# ... retrieving of base64 encoded string through JSON

base64 = "77+9UE5HDQ……………oaCgA="

source_contents = urllib.unquote(base64.b64decode(base64)).decode('utf-8')

destination_file = open(destination, 'wb')

destination_file.write(source_contents)

destination_file.close()

$ python test.py

// ...

UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 0: ordinal not in range(128)

附带说明一下，这是同一文件的两种文本表示形式的屏幕截图；左：原件；右：从base64解码的字符串创建的一个：http：//cl.ly/0U3G34110z3c132O2e2x

尝试重新创建文件时，是否存在已知的技巧来规避编码方面的这些问题？您将如何实现自己？

任何帮助或暗示非常感谢:)

慕无忌1623718

浏览 664回答 2

2回答

随时随地看视频慕课网APP