Python HTTP Server实现详解

本文讲解Python 2.7版本上的HTTP Server的实现。

钻研这个代码主要是因为线上出现了一个这样的问题。

1
2
3
4
5
6
7
8
9
10
11
12
Traceback (most recent call last):
File "xxx.py", line 3245, in __init__
BaseHTTPRequestHandler.__init__(self, A, B, C)
File "/usr/lib64/python2.7/SocketServer.py", line 649, in __init__
self.handle()
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 340, in handle
self.handle_one_request()
File "/usr/lib64/python2.7/BaseHTTPServer.py", line 310, in handle_one_request
self.raw_requestline = self.rfile.readline(65537)
File "/usr/lib64/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
error: [Errno 104] Connection reset by peer

一开始我是怀疑HTTP1.0的Server不能正确响应HTTP1.1的请求,但后来发现Python2.7对HTTP1.1的实现也就是是否设置close_connection字段的问题,结果看了一圈代码,没发现这个库会在套接口被关闭之后会再处理读取这个套接口。事实上,只有在TCPServer.server_close函数中才会调用self.socket.close()方法,而socket.close()方法也是唯一会将self._sock清空的。

一个请求的路由

首先介绍一下打交道最多的BaseHTTPRequestHandler,在里面需要用户自己定义对每个请求的处理方法,例如要实现do_GETdo_POST等。这个东西实际上是每个请求都会创建一个,所以我们需要把全局用到的东西写到类成员里面。
BaseHTTPRequestHandler的继承链是BaseRequestHandler -> StreamRequestHandler -> BaseHTTPRequestHandler

Server的继承链是BaseServer -> TCPServer -> HTTPServer

从Server到Handler

BaseServer(TCPServer).process_request -> BaseServer(TCPServer).finish_request -> BaseRequestHandler.__init__
这个调用链解释了请求从Server到Handler的过程。可以看出,在finish_request中会直接创建一个RequestHandlerClass的实例,这个对应的就是我们继承实现的BaseHTTPRequestHandler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class BaseServer:
def process_request(self, request, client_address):
"""Call finish_request.

Overridden by ForkingMixIn and ThreadingMixIn.

"""
self.finish_request(request, client_address)
self.shutdown_request(request)

def finish_request(self, request, client_address):
"""Finish one request by instantiating RequestHandlerClass."""
self.RequestHandlerClass(request, client_address, self)

class BaseRequestHandler:
def __init__(self, request, client_address, server):
self.request = request # 这是一个socket._socketobject对象,也就是一个套接口
self.client_address = client_address # 这是一个(addr, port)的tuple
self.server = server # 这是Server对象
self.setup()
try:
self.handle()
finally:
self.finish()

StreamRequestHandlerhandlefinish被重写,在这里设置wfilerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class StreamRequestHandler(BaseRequestHandler):
def setup(self):
self.connection = self.request
if self.timeout is not None:
self.connection.settimeout(self.timeout)
if self.disable_nagle_algorithm:
self.connection.setsockopt(socket.IPPROTO_TCP,
socket.TCP_NODELAY, True)
self.rfile = self.connection.makefile('rb', self.rbufsize)
self.wfile = self.connection.makefile('wb', self.wbufsize)

def finish(self):
if not self.wfile.closed:
try:
self.wfile.flush()
except socket.error:
# A final socket error may have occurred here, such as
# the local error ECONNABORTED.
pass
self.wfile.close()
self.rfile.close()

Handler

下面是BaseRequestHandler.__init__的实现,主要涉及setup()handle()方法。
setup()继承自StreamRequestHandler

1
2
3
4
5
6
7
8
9
def setup(self):
self.connection = self.request
if self.timeout is not None:
self.connection.settimeout(self.timeout)
if self.disable_nagle_algorithm:
self.connection.setsockopt(socket.IPPROTO_TCP,
socket.TCP_NODELAY, True)
self.rfile = self.connection.makefile('rb', self.rbufsize)
self.wfile = self.connection.makefile('wb', self.wbufsize)

handle()代码如下,可以看到,当开启HTTP1.1后,close_connection会变成0,此时会一直handle_one_request。只有当收到的HTTP请求中也是HTTP1.1的,并且Connection: Keep-Alive时候close_connection才会变为0。当超时、Connection: Close、请求为空时,close_connection会变成1。

1
2
3
4
5
6
7
def handle(self):
"""Handle multiple requests if necessary."""
self.close_connection = 1

self.handle_one_request()
while not self.close_connection:
self.handle_one_request()

handle_one_request

这个是实际处理HTTP的逻辑
注意self.rfile.readline可能会阻塞。readline参数接受一个表示size的字段,格式类似GET /login_validate?username=aaa&password=bbb HTTP/1.1\r\n。如果读到的过长,会返回414错误,表示请求的URL过长。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def handle_one_request(self):
"""Handle a single HTTP request.

You normally don't need to override this method; see the class
__doc__ string for information on how to handle specific HTTP
commands such as GET and POST.

"""
try:
self.raw_requestline = self.rfile.readline(65537)
if len(self.raw_requestline) > 65536:
self.requestline = ''
self.request_version = ''
self.command = ''
self.send_error(414)
return
if not self.raw_requestline:
self.close_connection = 1
return
if not self.parse_request():
# An error code has been sent, just exit
return
mname = 'do_' + self.command
if not hasattr(self, mname):
self.send_error(501, "Unsupported method (%r)" % self.command)
return
method = getattr(self, mname)
method()
self.wfile.flush() #actually send the response if not already done.
except socket.timeout, e:
#a read or a write timed out. Discard this connection
self.log_error("Request timed out: %r", e)
self.close_connection = 1
return