httprouter框架 (Gin使用的路由框架)

之前在Gin中已经说到, Gin比Martini的效率高好多耶, 究其原因是因为使用了httprouter这个路由框架, httprouter的git地址是: httprouter源码. 今天稍微看了下httprouter的实现原理, 其实就是使用了一个radix tree(前缀树)来管理请求的URL, 下面具体看看httprouter原理.

###1. httprouter基本结构

httprouter中, 对于每种方法都有一颗tree来管理, 例如所有的GET方法对应的请求会有一颗tree管理, 所有的POST同样如此. OK, 那首先看一下这个router结构体长啥样:

type Router struct {
// 这个radix tree是最重要的结构
// 按照method将所有的方法分开, 然后每个method下面都是一个radix tree
trees map[string]*node

// Enables automatic redirection if the current route can't be matched but a
// handler for the path with (without) the trailing slash exists.
// For example if /foo/ is requested but a route only exists for /foo, the
// client is redirected to /foo with http status code 301 for GET requests
// and 307 for all other request methods.
// 当/foo/没有匹配到的时候, 是否允许重定向到/foo路径
RedirectTrailingSlash bool

// If enabled, the router tries to fix the current request path, if no
// handle is registered for it.
// First superfluous path elements like ../ or // are removed.
// Afterwards the router does a case-insensitive lookup of the cleaned path.
// If a handle can be found for this route, the router makes a redirection
// to the corrected path with status code 301 for GET requests and 307 for
// all other request methods.
// For example /FOO and /..//Foo could be redirected to /foo.
// RedirectTrailingSlash is independent of this option.
// 是否允许修正路径
RedirectFixedPath bool

// If enabled, the router checks if another method is allowed for the
// current route, if the current request can not be routed.
// If this is the case, the request is answered with 'Method Not Allowed'
// and HTTP status code 405.
// If no other Method is allowed, the request is delegated to the NotFound
// handler.
// 如果当前无法匹配, 那么检查是否有其他方法能match当前的路由
HandleMethodNotAllowed bool

// If enabled, the router automatically replies to OPTIONS requests.
// Custom OPTIONS handlers take priority over automatic replies.
// 是否允许路由自动匹配options, 注意: 手动匹配的option优先级高于自动匹配
HandleOPTIONS bool

// Configurable http.Handler which is called when no matching route is
// found. If it is not set, http.NotFound is used.
// 当no match的时候, 执行这个handler. 如果没有配置,那么返回NoFound
NotFound http.Handler

// Configurable http.Handler which is called when a request
// cannot be routed and HandleMethodNotAllowed is true.
// If it is not set, http.Error with http.StatusMethodNotAllowed is used.
// The "Allow" header with allowed request methods is set before the handler
// is called.
// 当no natch并且HandleMethodNotAllowed=true的时候,这个函数被使用
MethodNotAllowed http.Handler

// Function to handle panics recovered from http handlers.
// It should be used to generate a error page and return the http error code
// 500 (Internal Server Error).
// The handler can be used to keep your server from crashing because of
// unrecovered panics.
// panic函数
PanicHandler func(http.ResponseWriter, *http.Request, interface{})
}

上面的结构中, trees map[string]*node代表的一个森林, 里面有一颗GET tree, POST tree…
对应到每棵tree上的结构, 其实就是前缀树结构, 从github上盗了一张图:

假设上图是一颗GET tree, 那么其实是注册了下面这些GET方法:

GET("/search/", func1)
GET("/support/", func2)
GET("/blog/:post/", func3)
GET("/about-us/", func4)
GET("/about-us/team/", func5)
GET("/contact/", func6)

注意看到, tree的组成是根据前缀来划分的, 例如search和support存在共同前缀s, 所以将s作为单独的parent节点. 但是注意这个s节点是没有handle的. 对应/about-us/和/about-us/team/, 前者是后者的parent, 但是前者也是有 handle的, 这一点还是有点区别的.
总体来说, 创建节点和查询都是按照tree的层层查找来进行处理的. 下面顺便解释一下tree node的结构:

type node struct {
    // 保存这个节点上的URL路径
    // 例如上图中的search和support, 共同的parent节点的path="s"
    // 后面两个节点的path分别是"earch"和"upport"
	path      string
	// 判断当前节点路径是不是参数节点, 例如上图的:post部分就是wildChild节点
	wildChild bool
	// 节点类型包括static, root, param, catchAll
	// static: 静态节点, 例如上面分裂出来作为parent的s
	// root: 如果插入的节点是第一个, 那么是root节点
	// catchAll: 有*匹配的节点
	// param: 除上面外的节点
	nType     nodeType
	// 记录路径上最大参数个数
	maxParams uint8
	// 和children[]对应, 保存的是分裂的分支的第一个字符
	// 例如search和support, 那么s节点的indices对应的"eu"
	// 代表有两个分支, 分支的首字母分别是e和u
	indices   string
	// 保存孩子节点
	children  []*node
	// 当前节点的处理函数
	handle    Handle
	// 优先级, 看起来没什么卵用的样子@_@
	priority  uint32
}

###2. 建树过程

建树过程主要涉及到两个函数: addRoute和insertChild, 下面主要看看这两个函数:
首先是addRoute函数:

// addRoute adds a node with the given handle to the path.
// Not concurrency-safe!
// 向tree中增加节点
func (n *node) addRoute(path string, handle Handle) {
	fullPath := path
	n.priority++
	numParams := countParams(path)

	// non-empty tree
	// 如果之前这个Method tree中已经存在节点了
	if len(n.path) > 0 || len(n.children) > 0 {
	walk:
		for {
			// Update maxParams of the current node
			// 更新当前node的最大参数个数
			if numParams > n.maxParams {
				n.maxParams = numParams
			}

			// Find the longest common prefix.
			// This also implies that the common prefix contains no ':' or '*'
			// since the existing key can't contain those chars.
			// 找到最长公共前缀
			i := 0
			max := min(len(path), len(n.path))
			// 匹配相同的字符
			for i < max && path[i] == n.path[i] {
				i++
			}

			// Split edge
			// 说明前面有一段是匹配的, 例如之前为:/search,现在来了一个/support
			// 那么会将/s拿出来作为parent节点, 将child节点变成earch和upport
			if i < len(n.path) {
				// 将原本路径的i后半部分作为前半部分的child节点
				child := node{
					path:      n.path[i:],
					wildChild: n.wildChild,
					nType:     static,
					indices:   n.indices,
					children:  n.children,
					handle:    n.handle,
					priority:  n.priority - 1,
				}

				// Update maxParams (max of all children)
				// 更新最大参数个数
				for i := range child.children {
					if child.children[i].maxParams > child.maxParams {
						child.maxParams = child.children[i].maxParams
					}
				}
				// 当前节点的孩子节点变成刚刚分出来的这个后半部分节点
				n.children = []*node{&child}
				// []byte for proper unicode char conversion, see #65
				n.indices = string([]byte{n.path[i]})
				// 路径变成前i半部分path
				n.path = path[:i]
				n.handle = nil
				n.wildChild = false
			}

			// Make new node a child of this node
			// 同时, 将新来的这个节点插入新的parent节点中当做孩子节点
			if i < len(path) {
				// i的后半部分作为路径, 即上面例子support中的upport
				path = path[i:]

				// 如果n是参数节点(包含:或者*)
				if n.wildChild {
					n = n.children[0]
					n.priority++

					// Update maxParams of the child node
					if numParams > n.maxParams {
						n.maxParams = numParams
					}
					numParams--

					// Check if the wildcard matches
					// 例如: /blog/:ppp 和 /blog/:ppppppp, 需要检查更长的通配符
					if len(path) >= len(n.path) && n.path == path[:len(n.path)] {
						// check for longer wildcard, e.g. :name and :names
						if len(n.path) >= len(path) || path[len(n.path)] == '/' {
							continue walk
						}
					}

					panic("path segment '" + path +
						"' conflicts with existing wildcard '" + n.path +
						"' in path '" + fullPath + "'")
				}

				c := path[0]

				// slash after param
				if n.nType == param && c == '/' && len(n.children) == 1 {
					n = n.children[0]
					n.priority++
					continue walk
				}

				// Check if a child with the next path byte exists
				// 检查路径是否已经存在, 例如search和support第一个字符相同
				for i := 0; i < len(n.indices); i++ {
					// 找到第一个匹配的字符
					if c == n.indices[i] {
						i = n.incrementChildPrio(i)
						n = n.children[i]
						continue walk
					}
				}

				// Otherwise insert it
				// new一个node
				if c != ':' && c != '*' {
					// []byte for proper unicode char conversion, see #65
					// 记录第一个字符,并放在indices中
					n.indices += string([]byte{c})
					child := &node{
						maxParams: numParams,
					}
					// 增加孩子节点
					n.children = append(n.children, child)
					n.incrementChildPrio(len(n.indices) - 1)
					n = child
				}
				// 插入节点
				n.insertChild(numParams, path, fullPath, handle)
				return

				// 说明是相同的路径,仅仅需要将handle替换就OK
				// 如果是nil那么说明取消这个handle, 不是空不允许
			} else if i == len(path) { // Make node a (in-path) leaf
				if n.handle != nil {
					panic("a handle is already registered for path '" + fullPath + "'")
				}
				n.handle = handle
			}
			return
		}
	} else { // Empty tree
		// 如果是空树, 那么插入节点
		n.insertChild(numParams, path, fullPath, handle)
		// 节点的种类是root
		n.nType = root
	}
}

上面函数的目的是找到插入节点的位置, 需要主要如果存在common前缀, 那么需要将节点进行分裂, 然后再插入child节点. 再看一些insertChild函数:

// 插入节点函数
// @1: 参数个数
// @2: 输入路径
// @3: 完整路径
// @4: 路径关联函数
func (n *node) insertChild(numParams uint8, path, fullPath string, handle Handle) {
	var offset int // already handled bytes of the path

	// find prefix until first wildcard (beginning with ':'' or '*'')
	// 找到前缀, 直到遇到第一个wildcard匹配的参数
	for i, max := 0, len(path); numParams > 0; i++ {
		c := path[i]
		if c != ':' && c != '*' {
			continue
		}

		// find wildcard end (either '/' or path end)
		end := i + 1
		// 下面判断:或者*之后不能再有*或者:, 这样是属于参数错误
		// 除非到了下一个/XXX
		for end < max && path[end] != '/' {
			switch path[end] {
			// the wildcard name must not contain ':' and '*'
			case ':', '*':
				panic("only one wildcard per path segment is allowed, has: '" +
					path[i:] + "' in path '" + fullPath + "'")
			default:
				end++
			}
		}

		// check if this Node existing children which would be
		// unreachable if we insert the wildcard here
		if len(n.children) > 0 {
			panic("wildcard route '" + path[i:end] +
				"' conflicts with existing children in path '" + fullPath + "'")
		}

		// check if the wildcard has a name
		// 下面的判断说明只有:或者*,没有name,这也是不合法的
		if end-i < 2 {
			panic("wildcards must be named with a non-empty name in path '" + fullPath + "'")
		}

		// 如果是':',那么匹配一个参数
		if c == ':' { // param
			// split path at the beginning of the wildcard
			// 节点path是参数前面那么一段, offset代表已经处理了多少path中的字符
			if i > 0 {
				n.path = path[offset:i]
				offset = i
			}
			// 构造一个child
			child := &node{
				nType:     param,
				maxParams: numParams,
			}
			n.children = []*node{child}
			n.wildChild = true
			// 下次的循环就是这个新的child节点了
			n = child
			// 最长匹配, 所以下面节点的优先级++
			n.priority++
			numParams--

			// if the path doesn't end with the wildcard, then there
			// will be another non-wildcard subpath starting with '/'
			if end < max {
				n.path = path[offset:end]
				offset = end

				child := &node{
					maxParams: numParams,
					priority:  1,
				}
				n.children = []*node{child}
				n = child
			}

		} else { // catchAll
			// *匹配所有参数
			if end != max || numParams > 1 {
				panic("catch-all routes are only allowed at the end of the path in path '" + fullPath + "'")
			}

			if len(n.path) > 0 && n.path[len(n.path)-1] == '/' {
				panic("catch-all conflicts with existing handle for the path segment root in path '" + fullPath + "'")
			}

			// currently fixed width 1 for '/'
			i--
			if path[i] != '/' {
				panic("no / before catch-all in path '" + fullPath + "'")
			}

			n.path = path[offset:i]

			// first node: catchAll node with empty path
			child := &node{
				wildChild: true,
				nType:     catchAll,
				maxParams: 1,
			}
			n.children = []*node{child}
			n.indices = string(path[i])
			n = child
			n.priority++

			// second node: node holding the variable
			child = &node{
				path:      path[i:],
				nType:     catchAll,
				maxParams: 1,
				handle:    handle,
				priority:  1,
			}
			n.children = []*node{child}

			return
		}
	}

	// insert remaining path part and handle to the leaf
	n.path = path[offset:]
	n.handle = handle
}

insertChild函数是根据path本身进行分割, 将’/’分开的部分分别作为节点保存, 形成一棵树结构. 注意参数匹配中的’:’和’*‘的区别, 前者是匹配一个字段, 后者是匹配后面所有的路径. 具体的细节, 请查看代码中的注释.

###3. 查找path过程

这个过程其实就是匹配每个child的path, walk知道path最后.

// Returns the handle registered with the given path (key). The values of
// wildcards are saved to a map.
// If no handle can be found, a TSR (trailing slash redirect) recommendation is
// made if a handle exists with an extra (without the) trailing slash for the
// given path.
func (n *node) getValue(path string) (handle Handle, p Params, tsr bool) {
walk: // outer loop for walking the tree
	for {
		// 意思是如果还没有走到路径end
		if len(path) > len(n.path) {
			// 前面一段必须和当前节点的path一样才OK
			if path[:len(n.path)] == n.path {
				path = path[len(n.path):]
				// If this node does not have a wildcard (param or catchAll)
				// child,  we can just look up the next child node and continue
				// to walk down the tree
				// 如果不是参数节点, 那么根据分支walk到下一个节点就OK
				if !n.wildChild {
					c := path[0]
					// 找到分支的第一个字符=>找到child
					for i := 0; i < len(n.indices); i++ {
						if c == n.indices[i] {
							n = n.children[i]
							continue walk
						}
					}

					// Nothing found.
					// We can recommend to redirect to the same URL without a
					// trailing slash if a leaf exists for that path.
					tsr = (path == "/" && n.handle != nil)
					return

				}

				// handle wildcard child
				// 下面处理通配符参数节点
				n = n.children[0]
				switch n.nType {
				// 如果是普通':'节点, 那么找到/或者path end, 获得参数
				case param:
					// find param end (either '/' or path end)
					end := 0
					for end < len(path) && path[end] != '/' {
						end++
					}
					// 获取参数
					// save param value
					if p == nil {
						// lazy allocation
						p = make(Params, 0, n.maxParams)
					}
					i := len(p)
					p = p[:i+1] // expand slice within preallocated capacity
					// 获取key和value
					p[i].Key = n.path[1:]
					p[i].Value = path[:end]

					// we need to go deeper!
					// 如果参数还没处理完, 继续walk
					if end < len(path) {
						if len(n.children) > 0 {
							path = path[end:]
							n = n.children[0]
							continue walk
						}

						// ... but we can't
						tsr = (len(path) == end+1)
						return
					}
					// 否则获得handle返回就OK
					if handle = n.handle; handle != nil {
						return
					} else if len(n.children) == 1 {
						// No handle found. Check if a handle for this path + a
						// trailing slash exists for TSR recommendation
						n = n.children[0]
						tsr = (n.path == "/" && n.handle != nil)
					}

					return

				case catchAll:
					// save param value
					if p == nil {
						// lazy allocation
						p = make(Params, 0, n.maxParams)
					}
					i := len(p)
					p = p[:i+1] // expand slice within preallocated capacity
					p[i].Key = n.path[2:]
					p[i].Value = path

					handle = n.handle
					return

				default:
					panic("invalid node type")
				}
			}
			// 走到路径end
		} else if path == n.path {
			// We should have reached the node containing the handle.
			// Check if this node has a handle registered.
			// 判断这个路径节点是都存在handle, 如果存在, 那么就可以直接返回了.
			if handle = n.handle; handle != nil {
				return
			}
			// 下面判断是不是需要进入重定向
			if path == "/" && n.wildChild && n.nType != root {
				tsr = true
				return
			}

			// No handle found. Check if a handle for this path + a
			// trailing slash exists for trailing slash recommendation
			// 判断path+'/'是否存在handle
			for i := 0; i < len(n.indices); i++ {
				if n.indices[i] == '/' {
					n = n.children[i]
					tsr = (len(n.path) == 1 && n.handle != nil) ||
						(n.nType == catchAll && n.children[0].handle != nil)
					return
				}
			}

			return
		}

		// Nothing found. We can recommend to redirect to the same URL with an
		// extra trailing slash if a leaf exists for that path
		tsr = (path == "/") ||
			(len(n.path) == len(path)+1 && n.path[len(path)] == '/' &&
				path == n.path[:len(n.path)-1] && n.handle != nil)
		return
	}
}

httprouter框架 (Gin使用的路由框架)

推荐阅读