c++ - multipart/form-data 在大文件上丢失字节
问题描述
我正在multipart/form-data
用 C++ 编写解析器,因为可用的选项似乎非常稀缺。
我最初的方法是istream::getline
一次缓冲一条线(或部分线),以便检测边界。但是,虽然这适用于较小的文件,但不适用于较大的文件。cin
对于大 (>50MB) 文件,有时会设置坏位,在清除 istream 后,我注意到我会丢失字节。我不知道为什么,这就是这个问题的目的。
但是,如果我将缓冲区大小增加到 4MB 并将istream::read
整个multipart/form-data
请求转储到文件中,我不会丢失任何字节并且cin
永远不会设置坏位。然后我可以重新打开转储文件ifstream
而不是使用cin
,我原来的小缓冲区getline
方法工作得很好。
关于这里发生了什么的任何见解?会不会是 FastCGI 或 Lighttpd 的一些副作用?
编辑:
以下是相关的代码片段:
#include <fcgio.h>
//...
int main()
{
//...
FCGX_Request request;
FCGX_Init();
FCGX_InitRequest(&request, 0, 0);
const size_t LEN = 1024;
vector<char> v(LEN); // Workaround for getting duplicates of every byte?
while (FCGX_Accept_r(&request) == 0) {
fcgi_streambuf cin_fcgi_streambuf(request.in, &v[0], v.size());
//... (eventually calls _parseMultipartFormFieldFile)
}
//...
}
/*
Extract a file from a multipart form section
istream should already have boundary and headers removed up throguh the final "\r\n"
Note that there are a lot of potential off-by-one errors here. Need to pay special attention
to gcount() and what is present in the buffer in each given scenario. Hence why you see:
gcount
gcount-1
gcount-2
These offsets are due to null terminator sometimes being appended, sometimes not, and/or '\r' being present or not.
It is possible for a few rare things to happen that will break this function:
1. Malicious content length
Client could lie about content length and send much more than we have room for. Should count bytes eventually, but easy enough to configure webserver to protect us.
*/
bool _parseMultipartFormFieldFile(
Request & req,
istream & input,
const string & name,
const string & upload_dir,
const string & boundary,
const string & end_boundary
)
{
static unsigned int file_id = 0; //used to generate unique file names
//Need fixed buffer size to prevent running out of RAM (malicious or not)
char buf[4096];
string file_name = upload_dir + ECPP_TMP_FILE + to_string(file_id++);
ofstream f(file_name, std::ofstream::out | std::ofstream::binary);
if (!f.is_open())
return false;
bool eof = false;
while (!eof) {
//Out of space in flash?
if (!f.good())
return false;
f.flush();
input.getline(buf, sizeof(buf));
unsigned int gcount = input.gcount();
if (input.bad()) {
//Crap! If we're here, we have most likely lost a few bytes...
input.clear();
continue;
}
else if (input.eof()) {
//If we are here, the multipart/form-data request was malformed
f.close();
remove(file_name.c_str()); //Delete malformed file
return false;
}
else if (input.fail()) {
//If we are in this condition, it means we encountered a line longer than our buffer
//There is no null terminator in this case, so write out what we have
f.write(buf, gcount);
input.clear(); //clear fail flag
continue;
}
if (gcount >= 2 && buf[gcount-2] == '\r') {
string peek = peekLine(input); //uses putback - modifies gcount()
if (peek == boundary || peek == end_boundary) {
//If we are in here, it means we encountered the last line in the section
//That means there is a trailing '\r' which we need to remove in addition to the null terminator
f.write(buf, gcount-2); // Remove null terminator and \r before writing
req.file[name] = file_name;
eof = true;
continue;
}
}
//If we are here it means we read in the entire line.
//Write out everything (minus the null terminator), and also add in the newline that was stripped by getline()
f.write(buf, gcount-1);
f.write("\n", 1);
}
return true;
}
所以,简而言之,问题是如果我传递cin_fcgi_streambuf
给_parseMultipartFormFieldFile
,我会丢失字节(触发坏位),但如果我不加选择地转储cin_fcgi_streambuf
到带有char buf[4000000]
+的文件,然后将该文件input.read()
的一个传递ifstream
给_parseMultipartFormFieldFile
,那么它工作正常.
解决方案
没有. input.getline
_ CRLF
所以如果你发布一个binary
文件,会发生什么?否则,您的示例source code
无法管理multiple posted file request
. 案例,您刚刚打开了一个文件流。这就是为什么你必须改变你的源代码模式。
您可以上传无限大小的data|file
. 试试这个解决方案
const char* ctype = "multipart/form-data; boundary=----WebKitFormBoundaryfm9qwXVLSbFKKR88";
size_t content_length = 1459606;
http_payload* hp = new http_payload(ctype, content_length);
if (hp->is_multipart()) {
int ret = hp->read_all("C:\\temp\\");
if (ret < 0) {
std::cout << hp->get_last_error() << std::endl;
hp->clear();
}
else {
std::string dir_str("C:\\upload_dir\\");
ret = hp->read_files([&dir_str](http_posted_file* file) {
std::string path(dir_str.c_str());
path.append(file->get_file_name());
file->save_as(path.c_str());
file->clear(); path.clear();
std::string().swap(path);
});
hp->clear();
std::cout << "Total file uploaded :" << ret << std::endl;
}
}
else {
int ret = hp->read_all();
if (ret < 0) {
std::cout << hp->get_last_error() << std::endl;
hp->clear();
}
else {
std::cout << "Posted data :" << hp->get_body() << std::endl;
hp->clear();
}
}
推荐阅读
- html - 滑动效果按钮/Div/单选按钮
- regex - Proftpd mod_rewrite 未按预期工作。如何使其正常工作?
- html - 如何将 Web 应用程序与 SAP EWM RF 功能集成
- javascript - Mongo DB 聚合 $lookup 向所有文档缓慢添加索引?
- listview - Xamarin.Forms:将导航栏(标题)绑定到列表视图并使其随列表上下滚动
- python - Plotly:对多个 CONTOUR 图使用一个颜色条
- c# - 我在 ASP.Net Webform 应用程序上遇到配置错误
- variables - 从本地宏列表(查找名称)创建变量列表
- javascript - 如何防止每次在过滤器上下载来自 Firebase 存储的图像
- apache-poi - Apache POI如何垂直合并Word单元格?