首页 > 解决方案 > 如何使用 C 中的套接字向 http/https 网站发送 GET 请求

问题描述

我想下载网页的内容。当我向 example.com 发出获取请求时,我可以建立连接。

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    //Stream sockets and rcv()
    
    struct addrinfo hints, *res;
    int sockfd;
    
    char buf[2056];
    int byte_count;
    
    //get host info, make socket and connect it
    memset(&hints, 0,sizeof hints);
    hints.ai_family=AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    getaddrinfo("example.com","80", &hints, &res);
    sockfd = socket(res->ai_family,res->ai_socktype,res->ai_protocol);
    printf("Connecting...\n");
    connect(sockfd,res->ai_addr,res->ai_addrlen);
    printf("Connected!\n");
    char *header = "GET /index.html HTTP/1.1\r\nHost: example.com\r\n\r\n";
    send(sockfd,header,strlen(header),0);
    printf("GET Sent...\n");
    //all right ! now that we're connected, we can receive some data!
    byte_count = recv(sockfd,buf,sizeof(buf)-1,0); // <-- -1 to leave room for a null terminator
    buf[byte_count] = 0; // <-- add the null terminator
    printf("recv()'d %d bytes of data in buf\n",byte_count);
    printf("%s",buf);
    return 0;
}

但是,如果我使用http://info.cern.ch/http://galileoandeinstein.physics.virginia.edu/lectures/newton.pdf (下载 pdf)代替www.example.com ,我会得到分段故障(端口号为 80 和 443)。

不起作用的代码:

#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

int main(void) {
    //Stream sockets and rcv()
    
    struct addrinfo hints, *res;
    int sockfd;
    
    char buf[2056];
    int byte_count;
    
    //get host info, make socket and connect it
    memset(&hints, 0,sizeof hints);
    hints.ai_family=AF_UNSPEC;
    hints.ai_socktype = SOCK_STREAM;
    getaddrinfo("http://galileoandeinstein.physics.virginia.edu/lectures/newton.pdf","80", &hints, &res);
    sockfd = socket(res->ai_family,res->ai_socktype,res->ai_protocol);
    printf("Connecting...\n");
    connect(sockfd,res->ai_addr,res->ai_addrlen);
    printf("Connected!\n");
    char *header = "GET /index.html HTTP/1.1\r\nHost: http://galileoandeinstein.physics.virginia.edu/lectures/newton.pdf\r\n\r\n";
    send(sockfd,header,strlen(header),0);
    printf("GET Sent...\n");
    //all right ! now that we're connected, we can receive some data!
    byte_count = recv(sockfd,buf,sizeof(buf)-1,0); // <-- -1 to leave room for a null terminator
    buf[byte_count] = 0; // <-- add the null terminator
    printf("recv()'d %d bytes of data in buf\n",byte_count);
    printf("%s",buf);
    return 0;
}

标签: csocketstcpget

解决方案


getaddrinfo()并且Host:(在标题中)应该指定主机名(不是完整的 URI)。

在您的示例中,这是galileoandeinstein.physics.virginia.edu.

因为您不检查 的结果getaddrinfo(),所以在失败的情况下您不会检测到res指针未正确初始化。然后,使用指向结构的成员会产生分段违规。

请求标头应该类似于

"GET /lectures/newton.pdf HTTP/1.1\r\n"
"Host: galileoandeinstein.physics.virginia.edu\r\n"
"Connection: close\r\n"
"\r\n"

Connection: close不是强制性的,但可以简化您的简单实验。

为了试验 HTTPS,这个例子可能是一个很好的起点。


推荐阅读