首页 > 解决方案 > 如果一个处理程序陷入无限循环,Netty 不会处理所有传入的请求

问题描述

我们在应用程序中遇到了一个错误,在处理我们的协议期间,一个处理程序进入了无限循环并卡在了 channelRead() 方法中。

然而,这开始导致其他(不是全部,而是一些)新连接在连接的某个地方也被卡住。没有可见的线程表明连接被卡住。这会逐渐增加已建立连接的数量,然后最终新连接无法连接并开始超时。

为什么 1 个线程卡在无限循环中的 channelRead 中会阻塞任何其他连接(大约有 32 个可用线程可用于处理)?我确认一旦线程继续,所有卡住的连接都会恢复。

我用这个简单的例子复制了这个行为:

应用服务器.java:

import io.netty.bootstrap.ServerBootstrap;
import io.netty.channel.ChannelFuture;
import io.netty.channel.ChannelOption;
import io.netty.channel.EventLoopGroup;
import io.netty.channel.epoll.EpollEventLoopGroup;
import io.netty.channel.epoll.EpollServerSocketChannel;

public class AppServer {
    private static final int HTTP_PORT = 8080;

    public void run() throws Exception {
        EventLoopGroup bossGroup = new EpollEventLoopGroup();
        EventLoopGroup workerGroup = new EpollEventLoopGroup();

        try {
            ServerBootstrap httpBootstrap = new ServerBootstrap();
            httpBootstrap
                    .group(bossGroup, workerGroup)
                    .channel(EpollServerSocketChannel.class)
                    .childHandler(new ServerInitializer())
                    .option(ChannelOption.SO_BACKLOG,                    512) 
                    .childOption(ChannelOption.SO_KEEPALIVE,             true);
                    
            // Bind and start to accept incoming connections.
            ChannelFuture httpChannel = httpBootstrap.bind(HTTP_PORT).sync();

            // Wait until the server socket is closed
            httpChannel.channel().closeFuture().sync();
        }
        finally {
            workerGroup.shutdownGracefully();
            bossGroup.shutdownGracefully();
        }
    }

    public static void main(String[] args) throws Exception {
        new AppServer().run();
    }
}

服务器处理程序.java

import io.netty.buffer.ByteBuf;
import io.netty.buffer.Unpooled;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.SimpleChannelInboundHandler;
import io.netty.handler.codec.http.*;
import io.netty.util.CharsetUtil;

public class ServerHandler extends SimpleChannelInboundHandler<FullHttpRequest> {
    public static int count = 0;

    @Override
    protected void channelRead0(ChannelHandlerContext ctx, FullHttpRequest msg) {
        if (count == 0) {
            count++;
            while (true) {}
        }
        ByteBuf content = Unpooled.copiedBuffer("Hello World!", CharsetUtil.UTF_8);
        FullHttpResponse response = new DefaultFullHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK, content);
        response.headers().set(HttpHeaderNames.CONTENT_TYPE, "text/html");
        response.headers().set(HttpHeaderNames.CONTENT_LENGTH, content.readableBytes());
        ctx.write(response);
        ctx.flush();
        count++;
    }
}

ServerInitializer.java:

import io.netty.channel.Channel;
import io.netty.channel.ChannelInitializer;
import io.netty.channel.ChannelPipeline;
import io.netty.handler.codec.http.HttpObjectAggregator;
import io.netty.handler.codec.http.HttpServerCodec;

public class ServerInitializer extends ChannelInitializer<Channel> {
    @Override
    protected void initChannel(Channel ch) {
        ChannelPipeline pipeline = ch.pipeline();
        pipeline.addLast(new HttpServerCodec());
        pipeline.addLast(new HttpObjectAggregator(Integer.MAX_VALUE));
        pipeline.addLast(new ServerHandler());
    }
}

我只使用 xargs 和 curl 运行许多连接curl -v http://[ip]:8080

一段时间后,它会进入超时失败的状态:

nc -vz [ip] 8080
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connection timed out.

这没有显示在环回界面上。

如果没有线程被卡住,netty 不会在相同的测试中遇到这个问题。它正在处理所有请求并且没有连接卡住。

我也尝试过使用 Nio。结果相同。

Netty4.1

已建立的卡住连接:

netstat -n | grep 8080 |  sed -E 's/[[:space:]]+/ /g' | cut -d' ' -f 6 | sort | uniq -c
  14551 ESTABLISHED
   6839 TIME_WAIT

卡住的连接如下所示:

curl -v http://localhost:8080
* Rebuilt URL to: http://localhost:8080/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.61.1
> Accept: */*
>

连接正常时的 tcp 转储:

00:25:35.135929 IP 10.94.158.96.50192 > 10.200.154.102.8080: Flags [S], seq 648221383, win 65340, options [mss 1210,nop,wscale 8,nop,nop,sackOK], length 0
00:25:35.135993 IP 10.200.154.102.8080 > 10.94.158.96.50192: Flags [S.], seq 167219764, ack 648221384, win 35844, options [mss 8961,nop,nop,sackOK,nop,wscale 8], length 0
00:25:35.174362 IP 10.94.158.96.50192 > 10.200.154.102.8080: Flags [.], ack 1, win 515, length 0
00:25:35.176419 IP 10.94.158.96.50192 > 10.200.154.102.8080: Flags [P.], seq 1:84, ack 1, win 515, length 83
00:25:35.176440 IP 10.200.154.102.8080 > 10.94.158.96.50192: Flags [.], ack 84, win 140, length 0
00:25:35.177307 IP 10.200.154.102.8080 > 10.94.158.96.50192: Flags [P.], seq 1:77, ack 84, win 140, length 76
00:25:35.216609 IP 10.94.158.96.50192 > 10.200.154.102.8080: Flags [F.], seq 84, ack 77, win 514, length 0
00:25:35.216856 IP 10.200.154.102.8080 > 10.94.158.96.50192: Flags [F.], seq 77, ack 85, win 140, length 0
00:25:35.254134 IP 10.94.158.96.50192 > 10.200.154.102.8080: Flags [.], ack 78, win 514, length 0

连接卡住时的tcp转储:

00:25:38.409177 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [S], seq 8750522, win 65340, options [mss 1210,nop,wscale 8,nop,nop,sackOK], length 0
00:25:38.409254 IP 10.200.154.102.8080 > 10.94.158.96.50193: Flags [S.], seq 1214051234, ack 8750523, win 35844, options [mss 8961,nop,nop,sackOK,nop,wscale 8], length 0
00:25:38.446641 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [.], ack 1, win 515, length 0
00:25:38.449108 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [P.], seq 1:84, ack 1, win 515, length 83
00:25:38.449141 IP 10.200.154.102.8080 > 10.94.158.96.50193: Flags [.], ack 84, win 140, length 0
00:25:39.535154 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [.], seq 83:84, ack 1, win 515, length 1
00:25:39.535211 IP 10.200.154.102.8080 > 10.94.158.96.50193: Flags [.], ack 84, win 140, options [nop,nop,sack 1 {83:84}], length 0
00:25:40.641378 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [.], seq 83:84, ack 1, win 515, length 1
00:25:40.641404 IP 10.200.154.102.8080 > 10.94.158.96.50193: Flags [.], ack 84, win 140, options [nop,nop,sack 1 {83:84}], length 0
00:25:41.741142 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [.], seq 83:84, ack 1, win 515, length 1
00:25:41.741199 IP 10.200.154.102.8080 > 10.94.158.96.50193: Flags [.], ack 84, win 140, options [nop,nop,sack 1 {83:84}], length 0
00:25:42.844891 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [.], seq 83:84, ack 1, win 515, length 1
00:25:42.844947 IP 10.200.154.102.8080 > 10.94.158.96.50193: Flags [.], ack 84, win 140, options [nop,nop,sack 1 {83:84}], length 0
00:25:44.035849 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [.], seq 83:84, ack 1, win 515, length 1
00:25:44.035869 IP 10.200.154.102.8080 > 10.94.158.96.50193: Flags [.], ack 84, win 140, options [nop,nop,sack 1 {83:84}], length 0
00:25:45.135646 IP 10.94.158.96.50193 > 10.200.154.102.8080: Flags [.], seq 83:84, ack 1, win 515, length 1
00:25:45.135702 IP 10.200.154.102.8080 > 10.94.158.96.50193: Flags [.], ack 84, win 140, options [nop,nop,sack 1 {83:84}], length 0
... repeats ...

标签: javanettynioepoll

解决方案


这很有意义...... Netty 使用 an 的概念,EventLoop这意味着它在循环中处理任务和 IO。这里重要的一点是,Netty 使用非阻塞 IO,这意味着它永远不会阻塞 IO,因此可以使用一个线程处理多个连接。这允许仅使用少量线程处理 1M+ 连接。也就是说,这也意味着如果您“阻塞”一个线程(基本上是通过无限循环进行的),您还将影响由该线程处理的所有其他连接。


推荐阅读