Beautiful Soup 简称 BS4(其中 4 表示版本号)是一个 Python 第三方库,它可以从 HTML 或 XML 文档中快速地提取指定的数据。Beautiful Soup 语法简单,使用方便,并且容易理解,因此您可以快速地学习并掌握它。本节我们讲解 BS4 的基本语法。
BS4下载安装
由于 Bautiful Soup 是第三方库,因此需要单独下载,下载方式非常简单,执行以下命令即可安装:
pip install bs4
由于 BS4 解析页面时需要依赖文档解析器,所以还需要安装 lxml 作为解析库:
pip install lxml
Python 也自带了一个文档解析库 html.parser, 但是其解析速度要稍慢于 lxml。除了上述解析器外,还可以使用 html5lib 解析器,安装方式如下:
pip install html5lib
该解析器生成 HTML 格式的文档,但速度较慢。
“解析器容错”指的是被解析的文档发生错误或不符合格式时,通过解析器的容错性仍然可以按照既定的正确格式实现解析。
BS4解析对象
创建 BS4 解析对象是万事开头的第一步,这非常地简单,语法格式如下所示:
#导入解析包
from bs4 import BeautifulSoup
#创建beautifulsoup解析对象
soup = BeautifulSoup(html_doc, 'html.parser')
上述代码中,html_doc 表示要解析的文档,而 html.parser 表示解析文档时所用的解析器,此处的解析器也可以是 'lxml' 或者 'html5lib',示例代码如下所示:
#coding:utf8
html_doc = """
<html><head><title>"堆代码"</title></head>
<body>
<p class="title"><b>www.duidaima.com</b></p>
<p class="website">一个学习编程的网站
<a href="http://www.duidaima.com/python/" id="link1">python教程</a>
<a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
#prettify()用于格式化输出html/xml文档
print(soup.prettify())
输出结果:
<html>
<head>
<title>
"堆代码"
</title>
</head>
<body>
<p class="title">
<b>
www.duidaima.com
</b>
</p>
<p class="website">
一个学习编程的网站
<a href="http://www.duidaima.com/python/" id="link1">
python教程
</a>
<a href="http://www.duidaima.com/c/" id="link2">
c语言教程
</a>
</body>
</html>
如果是外部文档,您也可以通过 open() 的方式打开读取,语法格式如下:
soup = BeautifulSoup(open('html_doc.html', encoding='utf8'), 'lxml')
BS4常用语法
下面对爬虫中经常用到的 BS4 解析方法做详细介绍。
Beautiful Soup 将 HTML 文档转换成一个树形结构,该结构有利于快速地遍历和搜索 HTML 文档。下面使用树状结构来描述一段 HTML 文档:
<html><head><title>堆代码</title></head><h1>www.duidaima.com</h1><p><b>一个学习编程的网站</b></p></body></html>
树状图如下所示:
![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAhUAAAIjCAYAAAC0x+nOAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAEnQAABJ0Ad5mH3gAAFOZSURBVHhe7d0HnNXnfef73/TeB4beh96rqBKgAqqWreK15ZbYTlGy2SSbLcne13Wu7/Xu3mSTjX2z2bgnKrZ6RR1ER6J3GDoMDEOf3st9vs+cg8YIZJCeQXDm85aPOf2cOef5P//v0/4nrt0xAACAzyg+8i8AAMBnQqgAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEF84pLShoaGyDkAAIArS01NpacCAACEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQRFy7Ezn/MQ0NDZFzQGyLi4vzp/h4cjbCaGtr8/+qiv2EahaIGampqYQKdG/RMBENFAkJCZFbgM+mtbXV/6sqVgEjGjKAWKVQQbMM3VY0SCQmJlpSUhKBAkGpPOkULV8qb0CsI1SgW1KYSE5O9hU+lT2uB5U3htcQ6yjh6Laiwx6ECnQ1yhq6C0IFuh3mTuDzorJHbwViGaUb3Q6hAp8XlTtCBWIZpRsAriOGQRDLCBUAACAIQgW6FVqJuBFQBhGrCBXoVggV+LxR/hDLCBUAACAIQgUAAAiCUAEAAIIgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIglABAACCIFQAAIAgCBUAACAIQgUAAAiCUAEAAIIgVAAAgCAIFcBNqL293dra2i6edDkWdf47dT5W/04gVhAqgJtQc3Oz1dXXW3VNjdU3NMR0qKitrbOamlprcH+nwgWAG1ec22ivWBtpIwZiSXx8vCUmJvp/b2a79+y195evsiNHj9nECePsnsV3WXZ21k3/d3XW0NBox0+csF//+nmrccFixvSpduu82Zafnxe5x81JwailpYWAhJiTmppKTwVwM6qorLI9e/fZxk1b7NChI9bU3BRzvRWtra1WVVltW7fvsE2bt9ix0lJramqK3ArgRkSoAOCHU85fuGDlp05bTW0trWgAnwqhAujG1Luh07lz523jpq22ctUaO3HipO8lAIBrRagAujkNKSx9f4X96xNP2zvvLLPGhkYXNCI3AsA1IFQA3Zx6KjQpW8MesbySBEDXI1QAMUZDF1o5oVNTU/N1mR+h11CPh0KJTlcaPtH1jY0f3U+rIAgxQOxI+J4TOf8x2uCBWBIXF+eXXerfm1np8RO2Y+duO3PmrA0aNNCmTJrod9ZHjpbanj0lVrJvvx0+fMSOHSu1iopK/5j0tHT/d+uksHHaPXbHzl124OBh27x1ux05ctRt862Wl5vrHlNhR0tL7cKFCn//zMxM/xzl5ads3/6DdvTYcWtpbbHU1BQ/uXPf/gP+dQ8cOORf97x7XLsLGikpKZaQkGDNzS12ouyklZS4++3dawfda+p+J8vLrdGFkeTkZH/fzjR5VHM9Vq1Z6/62Rhs+fJiNHTPq4nu5WSlEKYQRphBrtFyf41SgW4mV41Ss/WC9Pf2r52zX7j1267w59thXHrH9Bw7ae0uX+2WmtbW1fkeuzXvkiGJbdNftdv+9d1teXq7/2xU0lq9YbT/6x/9tZ86eu2JvxqRJE+yhLz5gX7j/Hn/55VeX2PMvvmJnXSC5797Fduft823lmnW27P2Vtt+FDQUBnQYO6O9fU6f+/fraqVNn7LU33rSVK9f4Y2u0uvDS5O6XlZXp3/8D991tUyZP7KiUIoFPB70qKdlvP/jvf2uVlVV27z2L7NGHv2i9ehX5229W+qw5TgViEcepAG5y2jFVV1Xba6+/ab/816f8waJ0kKi7F99lI1zLXht5yb4D9uprb9orry3xR6YM0UJuaW31YeTZ51+ydes+tHwXVu66c6F77SmWkZFuJ8rKbMmb79gzz71kO3btsX/68c/s7XeW+h6JhfNvtVtvnWM9exT6hssKF26eee5FO3xYPSX0jgI3M4Y/0K2oFRxLwx8aRtDhujU8cMv0abZ40Z02b84smzhhvE2fNsUPFVRWVdqpU6d978DYMaN974CGGgoK8m3ypAm24LZ5lpKa4sNJYWGhfftbX7dHH/mi7xmYf+tcGzVy+MUhh70l+233nhI7duy4VVRWWpa7fr4LCbcvuM2mTp3kX1c9I1XuucrLT1tZ2Uk7cuSY7xm53d3vnrvvstkzb7GJE8fZ+PFjrbGhyYWTs9bg3r/ez5DBg1zwSPLfD8MfwM3F9wJHzgO4CWnHq6GOoUMH29w5M12wmOp26sOteNgQHxhuX3ibTRw/zm/sGnYoP3XKz73Qjrt3ryKbect0/7ihQwZZRmaG5eRk+8fNmzvbBwqdv9xwgyZcKqSMHzfWh5exY0e71xxq49y/C1x4mDJlkuXm5tip02f8fIvRo0a615llU931Cgd6j3Nmz7SZM6dbn969/PCG5mPUN9QzLADcxAgVwE1MYaEgP9/3TgwaONDS0tIit3QYPGig24kXW35enlVVV9tZ1/IPcajrdPc6Q1wQmXnLND+MocmYol4g/TbHMBdqevYs9JfVMzLLhYe+ffv44Y/o/fQcw4YOtV5FRdbkgo56LNQjQagAbl6ECuAmlpaWav3797P+/fr585dKSUn28x00QVPd7ZpTEWJYU70aCg4askhKSopc+5GcnBxLT8+wtFT3/vr1tX79+172/eVkZ7v7pVtrm5aaNvrVJwwKADcvQgVwE1PLv6Agz1JSk33r/3pJTEiwzIwMFyiSP3F+SrILNYWFBZaWknpd3x+AzwdbOXAT0w49IT7h+k88da8XF99xzItPeu2L78/dF0DsI1QAAIAgCBUAACAIQgWAizSZs+P4CUyXBHDtCBUALmpra7WGRv1SaeQKALgGhAqgm9Nkyvi4jqOM1tc3WNnJk1f8lVEA+CSECqCb0wG0dMAqLRHVYbGXvb/K3n1vma1avdbWb9hkBw8eitwTAD4ZoQLoxvySz4QEGzx4oA0bOsRfXrvuQ/vVMy/40zsuXOjXTwHgahAqgJtQclKS/50OHdEyMyvzEw8spR8P0+9wFBYUWHp6msVHDqnd2Yjhw+22W+f4XxnVj4RpCGTnrt12+vRZS0hMjNxLP22cYnnuuQry8/yRMBVCLudq319iYoJlZWX5A2Tp/jqoVvQZ9dxJSYmWl5fne1IyMjI4gBZwg4tr/4SfytPPEgOxRDsl/0t6N/nOSb/fUV9fb80tLZaSnNIRFtzfdOlOXpu3Dn9d77ZlHZ47wwUBhYzob3VEddyvyWpqa/wvjEbnVChgKJDoJ9RFz9NQ3+B/n0PX6dDbl/ssr/b96T3V1dVbY1OjJSUmWWZmhn9vul9bW7u/vaamxr8/vW+93qXv/Wajz05/F79xglijOoFQgW4lVkJFV9GOrtWdotVCgvucbvad+I2GUIFYpVBBzQrgIoWtJBe6NHyhE4ECwLUgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIglABAACCIFQAAIAgCBUAACAIQgUAAAiCUAEAAIIgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIglABAACCIFSgW2lvb/cn4PNC+UMsI1SgWyFU4EZAGUSsIlQAAIAgCBUAcB3RW4ZYRqhAt9PW1matra2RS8D1o3Kn8gfEKkIFuh1CBT4vKnuECsQyQgW6rWg3NF3R6GqUNXQXhAp0S2otNjU1WUtLCxU9rguVN3opEOsIFei2FCZUyStYNDc3MySCoFSedIqWL8IruoM4V9CvWNIbGhoi54DYFhcX50/x8eTsT3J21XmrL+uoFxJSE6zozkJLSEvwl/Gbor0SqmIJFOgOUlNTCRUArt7eHxywCxsr/fmkvCQb/zcjLTk/2V8G0L0pVNAsAwAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQce1O5PzHNDQ0RM7hWlXVtNg/P3fMqupaItcAN78p2+qt6GxHmY5LirPsUZkWn0zb5LdJzEywgd/qb0lZiZFrgNiTmppKqOgq5yqa7PEf7Lbzlc2Ra4Cb31cq42xEU1zkEq5WUl6Sjf+bkZacnxy5Bog9ChU0MQAAQBD0VHSRS3sqkhLjLCuDrk/c3B6sjrOh9FRclZaaFmtv7qhe6alAd8DwRxe6NFRMHpVtf/HNwf48cLNKazMjGl+dAz88YpXbq/15QgW6A0JFF7o0VMwYn2Pff3y4Pw8g9u39wQG7sLHSnydUoDtgTgUAAAiGUAEAAIIgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIglABAACCIFQAAIAgCBUAACAIQgUAAAiCUAEAAIIgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIglABAACCIFQAAIAgCBUAACAIQkUXSU6Kt9kTc23+tHx/Gl+cFbkFQHeQPTrTCufk+VP+jByLT6a6ReyLa3ci5z+moaEhcg7XqrmlzfYfrbXm1o6PNzsj0Qb3TffnAcS+uqN11lLd6s/HJcZZxtB0i3eNDSBWpaamEioAAMBnp1BBbAYAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBMepcE6ePGWrV6+3Y0eP21133WbDhw+15JTkyK03Bn1Nzc3Ntvz9tXb8+Enr16+33TZ/liUlJVlcXFzkXgCuRUNDo5WWnrBnn3nVRo0eblOnTrABA/pGbr0+tF1v27rLtm/fY+lpqbbo7gWWnZ1l8fG0+XBz4TgVEXV19XZg/2HbsmWnnTt3wVrb2iK33Fja3Ps6duy47dm9z/+rywA+vdbWVqusrLING7bawYNHrKamNnLL9aMGw+nTZ/12vW/fIWtqavLXATcjQgUAAAiCUAEAAIIgVAAAgCCYqOloLFUTtXbtKrEvfekeP3lr/fotdub0OX/7rNnTbP6C2TZqVLG/HHXu7Hlbt26T/fpXL0euMT/Za8HCOTZz5pTINR3On79gGzdut+efe83qauv9dQMH9bMvPLjYxo8fbSmdJoa2NLfY0WMn7KknnvdjrJKZmW5f+/rDfuz3xPGTNmLkUHvsaw+5x6UwURP4lGpr62zPnn32f3//f9rsOdNt3LhRfl7Vm0uWRu5hduttM+3uexZa3769I9d06DzJ88iRUquv66gvs7Iy7Nvfecxvo5mZGf66qBMnTtob7rlXLF8Xucbsi1+622pr6mz79t3usZn2+3/4ddu/77Dt2bvfsjIz7d57b7e09LSLEzc176Nk70H76U+etHvuvcPXNQWF+f424POkiZoJ33Milz+mpaUlci62XbhQ4TfoLZt3WGpaqmVkpFlubrbfwGtqtQEfsKSkRBs0sJ+lu41bFCiWLl1ty5ev9fcbPGSAFRTkWm1dnQ8jKcnJVtgj3xISEqyqqtpVPmV+cqXCQ8+iQv88et0dO/bYsKGDfeWj16iurrGdO/fa00++aJXucb169bSBA/taoas0ml3Y+OCDTVZVWW2DBvW38RNGW2JiIqEC+JS08uLs2XO2cuUHdvrUGT9BMstti5kuGPTq1cNto5U+ZLS0tlj//n399qvtTdu06ovnnn3Nb7c93Lbeu3dPy8nJ9hMtt23bZclJyb4eychI99vuwYOH7aUX3/CTwvPzc32doddQWNjtGjSHDh312/vUaRNcHXLeVq74wPbvP2STJo+z7OzMi6HirKt7Pli70TV8ttqkSWP886gyBz5v2h8RKhwfKrbttk0bt1tfLdV0LZN777vTtQCmWpHb6LUjV6tkoAsVvXsX+c9FvQ7vvrvCVSDN9u/+9Lt29+IFNuOWKdbU2GS7d5fYqVNnbdzYkZaSmuJbIdWudaFlYve5551360zfImp2j33jjaV+CVufPr1coEmx0mNl/roVK9bagw/ebV/5yoN2+x23uspjrGvlnLK1azZYY2OjjRgxlFABfEadQ0V5+WkbOKCf3XnXbfaFLyy22bOnu4ZBgZWVlfsGgbb9Hj0L3aPabV/JIXvvvZW2/sMt9ju/+xXfYzB//mybMnW89SrqaS++sMTq6xvc/QusyF2uq62zN99aZu+/v8ZGjhxm3/7uV22RqzP0Guot2bRpu3uNEzZ48AAfKtLS0nwPytGjx614+BDfS6JtXatV1EBZtepDG+AaG3PmzPB1B8tPcSNQGaUkRqhHof+APjZv3i02YmSxP/6DWiBjRo+wiRPH+tuPuR2+lnGqp2DZstV+Z373Pbfb0KEDLdHdNy0t1VcAAwf1dy2Mgz6saChDXZMKEdOmTfQtHT2H/u3Xv48f+lALRb0SDa4SUiWiCubuu2+36dMnXezW1DDHre69TZ8xybdmAIQ1d+4MW+x29KNHD49cY26bnWCzZk31O/mNG7f5Xgg1EnRe2+1DD99nY8aOuDjMoX91WddXVFT68FFVWWVnzpyzVSs/tDFjRtit82f9xlCK6hcNsQ4ZMiByjbntPs9GjS52jZqetmtniW+sqBdFjRv1qGjYRXVNXn6ur5uAGwWhIkJJPzc3x3dXpqamRK41fxCsgoK8i5VGa2ubnThR7oct2tvaf6MiEN0vMyPdzp+vtEoXPpov09tTX1dv77y93P7+739s77yz3E6WnbK6ujr3mAo7cOCwX68+3FUY2TlZkUd0uPS9AAhHO2gNe3SmxkVmVqYfXjh/7oKfZ3bK7dT37tnvjy2hkKD7dKbLul51SkVlpZ11j9OcCw2X5OfnfaxRkJSc6Oud3NzcyDXmh0dHjhpuQ1yDZe/e/R0NFFeXqE45c+a8f42JE8b4Ogu4kRAqOlEloFN0OEH/6mxCQrw1uhZCdZULCc1NbuNu9UMXGqL4q7/8r/bYVx6/ePrTP/k/7Je/fMYfnVPjseqpULfmU0+9ePE+3//+3/vhkHvvvcOmTBlviYkJPqCoB0PdRxpy0RyK5OSPKqvO74WuTiC86PbfmbY7XadhktNnzvohDc2P0E6/Z89CF0RyLvsYbbtnz5yzUyfP+J4NTYfXvAvNs4rOy4qKvoa27Shdp+fvVdTDhxc1ZBrqG32gUa/HsGGD3fMV/EYdAdwI2DtdJY1lNrlAET2KZU5Olp8gNX78KJswcfTF04xbJvuw8M1vPWoDBvZzj2m2rVt3+tUkaempNnrMcL9CRI8bN26kHw/t3H2p51drSD0cHFUPuDHEx7ugkJRkCW7nrx2+ts/Gxibfc6k5FpejHX5yinuMCwt+6MLdv7mp5WId8tvk5GRa7z5Flp2V6VeoacKoVo+o12L0mBG/sSIEuFFQIq+RKpR0Fw56uhbELTOn2te/8Yj9xX94/GOnP/rj37Xi4sG+ZbNj+x4741obj375AfuzP/99+4Z7jOZYxLmKqt4FiGh4iHeVj56/rq7B93JoEmiU7qMJmlVVNT50AAhL8xbUI9GZhhw0kbKtrd3PudKOXNt/UmKiH8bU7wZ1BIuPKDRcqKj0kzw1d0IrN9Qbqd5OHRJcdUJnmqehZaJ6nc405NK7V0/XeBnol5YfPVpqZSfKfZgZ7uoWPSdwoyFUXKOE+AQ/C7yPa0HUVNfagQNH/FBINBioR0OTqToqojZfYVS4gKBKQ2vOFRp0X/3eiJaelp887R8jqkSKigr9EtLjpWV+Qqhu0/1VuZ04Xu6XmJ05e97fH0A45eVn/HJNbWva5nS6cL7Cjh4p9Uu91cOoCZsadtAkSq0E2bJ5p1/ZEe190L+6rOtzsrNs8OD+ftKlTkOLB1lVteZEnPuN19Bl9USUHi/zzxGlHkzN89CEzOPuNs3j0vtTL6mWrzJBEzciQsU1Uu+CJnNNmTrBLzX75c9/bSUlB/3cCdH4p5Z9XlxSVlhgI0cV+/FRHVxLLSGdNm3cZs8//7pfzx7tkVBlMXzEMBs9eoT9+MdP+BUmej5VVFoXv2TJe76yUkUHIKxlS1fZu++stKNHT/jLahBoQvXWLTv9RMqJE8b6IQ31VmgSpSZNP/3UC/6HCKuravxj9K8u63qt3CouHmpZWVk2aGB/mzVzmr2/dI0/VkXn11i9ar2tcXWGGhiX0kTMSRPHWl5url8VpmEX1T2aPKoGCnCj4TgVjsYotcNXi0ETJ3WsCi0nFfUU6MBY585e8JMndWwIzbxWJaPKRTv9559/zR9b4qUX3/SrN9TtqaWfOsCNKiF1laqF8dyzr/qDZb3yytu+61THvRAdMGfCxDF+gqYOlKNx1OiyNR3ZT+vhd+/eZ2PHjXT3jrO01FT3HntxnArgM+p8nAotJS0szPOrNH7206ft5ZfetMOHj9mYsSNt8eKFfvhDvQOax6DtVA0FrcRSSFDgf+Xlt/xqLi0LX3zPQluwYI71c3WJ6gAtOdf8Kb3enj37bcnr79pbbyyzDz/c7FeDqA7Q8+bl5frjVGgypy5HTwcPHfXPO6B/H7tFR9AsyGe7xw1H+yNCRYSOMTFwQF/fxanWQXQmtjZcbdQa8hg6dJDf4euyNnotD8t3rRUFEHVH6uh4w4oH++Vkmk8RrRB036xsLUtL8cMmuu8E1/qYPHmcDXHPqcfoudUDouNXKIz06OEqLHdZrSFVRoMG97cZMyb7ik337fxeqFyAT09BQUfF1WG6tV1pe9VB67TNjxs/2h8Ge9TIYhcOOo6mKdpO8/Jy/LaphodfqeG2/379+thId9877rzVn48uT9d2qnpFwxlqkKhXsperB3SUzsmuIVNcPMRP/B46bJA/Wq4aLnotnVrb2mzvngN+Oal+KkBH2OQImrgR+UZuuwb1roAJgQDw+dHQp+Zk/eQnT2m2tg8+WmF26bExgBuBwi5zKgDgBqWVHppAqqN36kBY6kkhUOBGRqgAgBvUKRcolr63yh+rQr8Lkl/w0VE3gRsRwx8AcIPRxGxN+owe2vuB++/yE7Nz8z5+BE/gRsHwBwDcgLSqRBM2p02b5FeRTJg0xnJyO1aIADcyeioAAMBnRk8FAAAIhlABAACCIFQAAIAgCBUAACAIQgUAAAiCUNHFtLjmExbYAAAQMwgVXay5td1a2yIXAACIYRynoovUNbTaO2vP2t7DNZaWmmCTRmbbvCn5kVsBxLrTy89Z9a5qa21qs/SB6dZrUQ9LTE+I3ArEHo5T0UXaXE5TqFi1+byt21ZhqzdfsC17q6yllaEQoLuocoHi3AcVduFDnS5YmwsXQKwjVHQFlxuaW9rtxKlGq29ss8qaFjtzvsmHDQDdQ0tli7XWtrow0W6NZ5vV2ojcAsQuQkUXiI+Ps+TEODP3v6g490knJ8ZbXFynKwEAiCGECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABBHX7kTOf0xDQ0PkHK7VuYome/wHu+18ZbO/PGN8jn3/8eH+PPBpNVe32NFflFpLTWvkGtyoavbXWnNliz8flxRn2aMyLT45dtpxhXPz/QmISk1NJVR0FUIFukLT+Sbb/hd7rflCR7kCPi/9Hu1t/R/tE7kEdIQKhj8AAEAQ9FR0EXoq0BUu7alQt3piZqI/D3Sl9ua23xh2o6cCl2L4owsRKtAVLg0VOeOzbNi/HeTPA12pYluVHfzR0cglQgU+jlDRhQgV6AqXhoq8qTk28i+H+fNAVzq/ocJK/uvByCVCBT6OORUAACAYQgUAAAiCUAEAAIIgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIglDRRRLi46xvzxRLS4m3nMxE65GXHLkF+PTiEuMta0SGJWYlWEqPZEsfkBa5BehaSVmJvuzFJca5cpfqyx9wqYTvOZHzH9PS0hI5h2vV1q7Pr92HiqH9023SyGwb2IcdAD67hNR4i3fhIrM4w3InZVuKC69AV4uLc2UvLd7iEuItb3KOZY/OtKTcpMitgFliYiI/KNbVmlraLN5tjYkJbosEACBG8YNi10GSCxMu2AMAEPPY3XWxuLg4fwIAINYRKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAATBr5R+Bk1NzXb+fIWVl5+2wYP7W0ZGuv/p18tpaGi0c+cuWGNjoxUV9fD3BS6nurrGKiqqrLW11fr1633FMhWlTbitrc1OnjxtCfpZ6rxcS03l59ABXF/8Suk1aG5usdraOl/hq7JXRV5TU2MbNmyxH/3wp3bw4BEXGJp85a4wVlFR6UOHLovCx6pVH9qrr7ztQwhwJaWlZfb++2vsrTff92H0k6h8qZydPn3W3ljynq1etd6VtQuRWwHg+iJUXKXDh4/ayy+/aT/98ZN24ULFxbBwqcrKah8evv9//Z2VlBz4rTsF4LNQ+dq7d7/95X/+gb3yyltWdrI8cgsAXH+EiqvU0tJqdbX1Vl1dezFQZGZm2rRpk+yP/+23bejQQZaSkmzt7eqpaPTd183Nzb5HA7iSPXv223vvrrSDB49aS3NL5Nqrc+7seftg3Sbf+7Vo0XwbNarY0tPSIrd+RPfbtHG7vfnmMquvZ0gTV3bixElbvXq9rXPl6tOgrCHhe07k/Me0tFxbJRerVPEvXbrKVqxYZ4cPH3MbnmsNuqyQnJJkNTV1bodw2Hr2LLS6unpb4zbI1157x0pKDtq5c+etuqrGj28nJCT4bu2amlobMWKo5efn+o3uyJFS+9XTL/rn3rp1l593UViY7wNKXFxc5B0gljQ1NdmWzTvtjTeX2r59By0jPd0GDernQmqGxcfH25kz56ys7JQvH+kZ6fbO2yvs3XeW+3LY1tpuhT3yfXnS9tnsTsnJyTZ5yng7fqzMsrKzXMAdaFlZmZFX69iOKy5U2p69B2z1qg/9UFxaeprl5GRF7oHurHM9dHD/ET8vp1fvnr5Ok2PHTvjQe7z0pJ06dcaee/Y1W7lynR08cNQPBffoWeDrKp0oa92b5n8RKq7CgQNHbOOGrbZ7V4kf3mhqbra+/Xr7ilsh45WX37JRo4vdPeNsmwsGH36w2Sd29WhkZ2W5+/byEzM7h4q0tFS/Ib+/bLUfLjlZdtrOusdozkai22FoMmdSUiLBIoaogt2//5ALFDts27ZdtnPHXh8up02fZEOGDPSBQhQq9u8/bPtKDrlwmeTvu2tniS8vFZVVNnrUcEtNS3FlKM2Xk+HDh1i8Kye6n667NFSkpKT4kFpX1+DK2ge+DFdX6VRjbe3tF8MMuhfVRfv2qTxut70usK5ds8GXmwkTx9iYsSN8cJWdO/faiuXrbMeOPb5OW//hZl+XnSo/7XtlFXKzs7N9GKGsdW+EiqvUv38f69Onl59V36d3kf2f3/tzm+52BOqB0Ma116VyXR48eIAPF3369vLh4D/8x8ftnntvd63Q/lZbW29Hjx73G7J2AvVuo1u5Yp0tX77W/uN/+iP7nd/9NzZr1jTfaliy5D2bOGmsZbkNMCGxY8PGzU3f6wYXTJ984nl7+umXbPTo4fad737VFi1aYD16FETu1UGhQvfVxMvZc6bbY1972BYvXmDJyUm+zIwbN9KXRfVQiIbYNEl469adlw0V0nH9ILvv/jutobHRlbGl/vkVWlVuVZap7LsHlRcNtSnU/ssvnrFnnnnFhdQ0+0//+Y/tzrtuc/Vd34uBQkpLT9iypattvwsgt98x1/7gD75hD37xbtdQSnPB96DvVRszZoQlR3pXKWvdl0IF3+znQGFt//6Ddub0WfvCg4tdK3WQ39DURdirV08/Lq7woa5txAaFx6XvrfRdyk899Y/27e98xS8XvZIid7+771lo8+bNsIKCXMt3J4VRVdYKKNc6/6Kz226bZf/tv/+V34mcOH7Snn/udTvu/kX3oLKj4dm//7t/9pf/x9/9tf3Vf/kT3xi60vLl4cOH2l0ucKjxpDCblJRkxcVDbYgrj5oorMnrl2uEUta6H0LF50BzN7VE9eCho/aTHz9pv/fdf29ff+yP7du/+2f23/7bj2zTpu1+bsVn2XHgxjJx4libMmW8X2qslUFPP/miH6vWmPTlaL137969XNDM7uhSdC1HVeRJ7rzKz2eZ/qu5Oz/+5yfs6adetMIeBXbrrTM/1luC2KXez/4D+tjDj9znL//DP/zE/vH/+4UfylWP1+VkZWdaz6IefvhCPQ46ZWSmW15ejiuTSb73Q+XyUpS17odQ8TnSvAoNjYwbN8omTBxtkyePs/nzZ9kjj97nuyDVnYjY0Lt3T5s+Y7ItXDjXxowdaafPnLNfuYpWXc+agHmpeD8+nezDhCrwz0rDcZs377Af/sNP/WRi9YxNmzHJ5s27xYaPGMLB2LoRDT0orOq7X7R4vgu8Y3wo+MXPf+2H5zQ3p+mScKFgqx6KzuVR8yt8uHBlJyk5yT1vx/WUte6NUPE50ManGdF9+/ayGbdMtj/8w2/an//7P7A/+/Pft9/7va/bvffdaQMH9vUbI2KH5ubce98d9m//5NsuYEzyky43b9phO7bvsUp3/krHPvmsNFyiFUubNm3zO4x0V/buufcO+/a3v2pTp024ODcD3UtBYb4LFQvs8T/6HVu8eKFdOF/hJ2VqroVf4dbJ+XMX7Nix4xfLqYY6yk+e9pM1exYV+AaSwgplDYSKq6SEnpiY4I89oYl0SuMawrhUXFz8xe5qzfbX8irNkO58vArdrkCRl5/nlxYeOHDYrw/XhCe1Wpe/v9bPqeiqnQw+X6qANfHyb/72/7Qvf/kLvmyUHiuz1pbLD4V8VhpyKXMVfWpKiv3wR/+P/eHj3/THtABEQxpTpo63f/rn/9cfc0e9GHtLDkRu7aAjBmsFyO5d+1w9VW5Hjxz3RxM+4K5Xz1t6eroPFZQ1sPrjKqWmJPsd/Vtvv28//clTfhJldnaWPzZFdPWHji+hDbTdhYFDh47aP/3jL31g0Hhkbm6On6Ck59CS0kGDBlhBQZ4LJ9X2gx/8g/3yF8/4mf1alvXgg3f7pYahur5x49JyvOLiIX4CZ2JkCXH0OBUKreMnjPatOwVMHYp7z+59fnJcUa8ePpyIQsknrf5Qa7Fv3942cuQw/xjKFK4k29VVAwf198OyWtIuWv2h5fKaLKx5QH/9139n//ovz1qCaxyp523OnBl+qE7lirLWvanBzA+KXSVNqFOvg8KCjpY5YEBfP/FIFf+lPyim3whRgDheWubCRLb169/HV/IX3OM7/6CYejBOnz5jB/YfcSGl2a/6KOrV0z+X/3LYILsl9YJd+oNiChW6Xj8a1sOFVwVajWOLNmE1ANTboes0EY6hM4SyZs1627PngKW5MjV12kRfr7W0tlhRzx42YGBf1wDKi9wT3Z0mmBMqAABXFA0VBS48PPjFxZFrgY/jV0oBAEAwhAoAwBVpTo/mSujQ8MBvw/AHAOCK9AN4mjum1R3RycHA5TCnAgAABMGcCgAAEAyhAgAABEGoAAAAQRAqAABAEIQKAAAQBKGii7W2tltb2xUX2ACfSrsrV+2UKwA3GJaUdpGWljY7erLBzlY0WVJinBUVpFjfnqzxxmfT5spV/fEGazrXbHGuXKX0TLa03pQrdL2WmharK22w1tpWi0+Nt9ReKZZSyE+Z4yMcp6KL6COtqm2x//3MMdu0p8rS0xJs4fQCe+zePv52figMn4bKVUt1ix35+XGr2FZliekJVjgv3/o90tvfTrlCV6raU+PKXqnVHam3FBco+jxQZEW3F0ZuBThORZdRr3RdQ5vtPFhjlS7dnzrbaAdK66yphe5qfAZtZq31bVbtKveWqhZrONVotYfrrL2ZcoWu1+oaSnVH6/3QW0N5gzVfaI7cAnyEUNEF4l2DMcF9ss0uRKgfqNXtDDSvIjGBliQ+A1em4lwZ0hCIKUfon1Z3XSLlCl1PdVl7tGHkyh1zenA5hIouoG7oBCWLznzQiKOLGp+ayk6c32I/KkMqTnGUKwA3CEIFAAAIglABAACCIFQAAIAgCBUAACAIQgUAAAiCUAEAAIIgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIglABAACCIFQAAIAgCBUAACAIQgUAAAiCUAEAAIIgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIIq7diZz/mIaGhsg5XKtzFU32+A922/nKZn95xvgc+/7jw/154NNqOt9k2/9irzVf6ChXeVNzbORfDvPncWOp3l9rzRUd31MsqHF/z4nnyyOXzArm5Fnh3PzIpdiRWZxhyblJkUu4FqmpqYSKrkKoQFcgVNw89v7ggF3YWBm5hJvFiP881PKn5UYu4VooVDD8AQAAgiBUAACAIBj+6CIMf6ArMPxx8+g8/BGfGm9FdxZaQlqCv4wbR9XOaqvaVRO5xPDHZ8Gcii5EqEBXIFTcPDqHiqS8JBv/NyMtOT/ZX8aNo/SZMjv+zMnIJULFZ8GcCgAAEAyhAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCEii6SmBBnwwemW2Z6ghXmJdmAorTILcCnF5cYZxlD0y0hI8GSC5IstW9q5BbcaLJHZ1r6gFRLyk+y/Bk5Fp9MdXsjSh+Q5r+ruOQ4y5+eY8nu+8Knx6+UdpHGpjbbvKfSjp1ssBRXmQztn27jirMitwKfTpsrVxVbq6z+eIPFp8RbxmBViJSrG1Hd0TqrOVhnrQ1tljkk3YfB+CSCxY2m8XSj/57qyxos031HGUMyLCk7MXIrrgU/fX4dNLe2WbzFWUJCXOQa4LNra2mzuLg4i6NcAbhB8NPn10FSQjyBAsHFJ7qoSrkCcIMhVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIglABAACCIFQAAIAgCBUAACAIQgUAAAgi5n9QTH9ec3OzrV2zwVJSUmzEyKGWkZFh27fvtpdfetOOHjkeuedHcnKzbMLEMfbAA4ussDDf2trabNvWXXb8+EkbN360DRzQ1xKTOn7FrqGh0U6cOGlLXn/Pbpk5xUaNKrasrEx/2ydpamyyI0dKbdPG7Xb7HXMtLz/XEhOv/pfxampq7cD+w/b66+/aggVz3PsadVWvizAoV0BYra2tblsos40bttuUqeOsV68iO336rK1a9aG9uWRp5F4fSUiIt169e/rtSeU0IyPdqqqq7a03ltmgwQOsuHiwFbjtLOr8+Qv24Ydb7Py5C27bmOe3wYSEhMitl6dttL6u3r+HvLxcGz9htKWlpUZuvTrajtes3mAVlZUXt/3f9ro3q27xK6X68xobG+3555dYenqa3XLLZOvRo9AOHz5q69ZucoX2TOSeH0l3hXPIkIHuvlMsJyfLjh49bs8/97q7/wZfWHv2LPQFOjc3x92e7QvyU0+9YEOHDrIBbsegQpecnGwTJozxhTDfVeyXqq9vsB1uB/Sm2wC+9TtfdhtHkXvMx3/HX8+9Z89+W7liXeSaDvX1jXayrNxtJJv9jmqwe1+dC7velza0sWNHslPoApQrytWlmptbrPzkKdu9e5//jvIL8i772ePyWlpabK8rk2+4srto8Xxf7uvcDn3nzr0uaGyN3Osj8fHxfkev0D14cH//+W/btsv+59/92G9Dffr2suzsTL8D79Onl8XFma1du9H27ztkEyeN9dugnqO/27b0fSm4X0qhoqqy2p759SvW2wWYBQvnWOZlyr0CkQLQ+8vW+BDR2bmzF2zX7hKrra27uO3rdUXbc1FRD5t36y0ubBTc9OUlpkOFCkN1dY1vMepLXrnyA9+inOgqytGjh9vAQf38F6qfj74SPYdajG+/vdyWvrfSdu0ssQa3I+nfr48luS9fFWyuO+l+x0pPWOmxE5aSmuKTaGZmhi1cONdumz/Lv64S+ElX4UQ1NTa791ZqGzdu86m5o0XZkV4TExKtX/8+1q9fb6u4UOlf/xc//5X1dZd12ydr9++nuaXVvf4cu/+Bu/wGhTAoV5SrK1Gg27Jphz3/wuv29a8/bMXDh/jWM65Mux+VK/XWHS8t8zvfTRu22+Sp43xwHTZssAvrBRd3wlfS1NRsBw8ettdefddWr/rQzp0770Ovgl1SYqIV9erh/k2y8xcq/PakANDfbQvarsaMGenDwrRpEy6+j9q6Ov+87W3tPtiopyI/L8eF+TGWmpbib5P8/Dy3PfWxAvc6u1z4+V//+Esrc6FcYee3aW9vc2GkzdIy0uyP/+h3bMTIYTd9eYnpUKHUW7L3gD315Av2zjsr7Pz5Clcw43yL8J67F9rXvvGw/+I/qRtK3dsHDx5xrcnXXGWe6VuLx4+X25ceutsVonw74QqgUvS5cxds8eIF9u67KyzPJeSZs6f5QibauSgd//pXL9s7rhJPinRvt7v/Wl0FrY1BhTQ+rmOj0dehtPqFB++2rz72RUt1O47t2/f4Vu1jX/uSf79qIet5VQB1WY/RTqqxscm3dLUTefGFN3zLWC1oKv9wKFeUK9Fno7LQ2NDkdkzJkc+v6apDhR7f5nZY9fX17rtL8t/fb9txxiJ9Dk1NTfbC80vslZffsq1bd/n9Tlp6mt1260x7+NH7be7cGT6of5JTp87YBx9s8tvC3W47LC096QLJIJvhymmC+1w3b97htrmjvuctKyvD3nt3pT3whUU+WKSmuR2hK/ca+nvyiRdcGV/iGwzR70PvUduGLnd+Hy2trTZp0lh7+OH77Z57b/ehQu9f4WD69Ik+ZGo7VE9mSkpHQ0M9GnodtwX77enw4WO+VzHaM0OouIHpz9JGX1Nda2fPnveVb65LmvMXzLaBA/v5L1qFRF/0ldS6L/9ZV/Hnu52EWqFKvntLDrjr66z/gD6+oB53LcmHHr7PFdaBVllR5cfUVRHfcee8iy1WVf47XAU+dOhAGzFiqG+NNtQ32qFDR32L8r777rBs1zJVxV1ZUW379h30t93rrr+08ldiVhdbqqvIvv2dx1zFXuQqtkZ7881lPqFrB/XIow/YsmWrfTcdoSIsyhXlSlQGDhw4bEuXrrEFC2a5776//06uNlRo51JRUWnPPfu6H8pS13ueK0fdTXT3o94A9Z5tcWV/yZKlruzfYyNHFvs5EdEd8idZv36L7dm934a4bWGC+zw1r+hU+Rm/TWiY8Kc/ecoW3j7XDz+ot0/BffXq9TZ33gwfNDTvKBoqdH99dxruUC+KtvUVK9ZZj56FNmXyOP+ccsiFlAMHjvjt99JQoe3nlZffdu9rk33lq1+yWbOmuddNt5Nlp+xHP/q5D5G3utCkXpj33ltFqLiZqFWoL/+//uCHvutLhWjWzGm2dNkqKz1WZvV1l/8be/fp6b70WTaseLCv/LOyMy3RtUYuuIpAY3KbNm/3OxAViqqqGpszZ5ofU29w6VRjez2LCi/uXFT5q3WrLvIxY0f6FmOF21Hs2rXXNqzf6iv1aOv2gtuwdu5Q4dxpd9+z0HoV9bAdrrBq8p+6uefOucWGugSu19ZX98GHm1yin+XHDk+fPue7/dat22TNLiHrb1U3uVrDCIty1f3KlT4Xfe/L31/r56I0NDbZnXfealOnTfSfk74fhYrnXGD80kP3+hCn7+JU+VnfOlZY04Re7dT0XOrZ2Fdy0O3cPrQy9x0ouMX6RL5PoiGJt9583374Dz+1qdMn2OJFC/xQ46ZN2+2MK4NXMmv2ND9HQsN6mq+gnbxCyvHSk7bTff4le/fbhIljfTjX8N+YMSP8zlu9G+r503ltU9FQocbB+PGj/LBgS0urlZefsvfeXeW/nzlzp1/srdB3t23bbh9I7n/gTreNVdgrL71lW7bstAz3Hauhoe86NTXFjhwudc/V7IcuT5SVW2VltZ8/ctCFkmz3nr/xzUesqKin34ZvZgoVCd9zIpc/Rmn8ZtfiNvSSkkO2xxUsVcxJScn+i1MLLic7y1XY+f6yCnShq8gHuALVr39vX/g0njdy1DD/pau1d/bMeV9g1OWtwjJ27AgbMmSQf17tYFRJaAxb43edW6ulpWV2zrVqe/XqGZmMl+Be74xvTa5a+YFvqWgcXd2gao3qvZSXn/ZpWelYY4EdXaRJNnnyWN/9rB3BMdfK1Li67t/qErU2Ak06UrebUq+Svl5P7wVhUa66T7lSa1WtXgUJrcbR55KWlmbD3eeo7nXtmLRjUX2pbnO1musb6t19Uvznr/u2uts0/0X3zc3t+E70fWnujMKIXqPahUjNr9nrgqI+f7Vsdb/uotx9xnvd9lR67LjfdvQZaCelHpzcvGxXznv4cBwXH+d7AgYPGeCv0/Y0fMRQv9NXr4Y+y9279vneCm2nmrCpHgqtrDp75pwv4xp6GDCgn5+EHN2eNFSh3jt9P5o8me22Y303Bw4c8kOdLlb6idYKDKLhSQUTbQejRhf719b5JBc69L60/Z0/V2HHjp1wIafOD5eUlp5w9UGhe+0+PmwoxCvkKNBfTY/MjU7bQUyHChUutRh37SrxLYz+/fu6irmn/7seeugem3frTNcS7Oj+0gS3BS5ZqtWnlsc0dxo0uL/f8E+ePG27d5e4ltpG36pQt2eRazGqcKuCqaqs8uNimvmvgtbsnl8tE20YKiSXVv4qqCV7D9pWtyNRF3qa23DS0lN9AdM4a+fK3ydpV7loAqAeqzHYaNeeCvfC2+fZcVeRnTp11nerK23PmDHZxo0b5XccBIrwKFfdLVCc9q3lZUtX2cFDR1yreJwft58zd4ZvFUc/C33/Cg4a29dO6847b7O77ppv06ZP9IHwrbeWdYRD1yLV4/Qdakc3cFB/Ky4eYskuQKin6sjh437oSfMKtIrhWpYE36w6PruTPpSp52D06BG+zI5xAfv++++0uXNvsdmzp/vrNNyn3gktw57jrtNOWZ+rVlccPnTMNrvvSsMV6kVQj8UgV8YVJKqrqv3KqDVrN1hZ2Smrqa7x25J27hqOuDRUaLupdNugegT1vHptH/bcNqgwcsE1AjqHCv2r77m3Czy6Xauo9u8/7P+2UaOG+3piu3tPmkOisKFex+kzJrlwOtS9j5s/UEjMhwq18M6cOevHvVTglGxVIDV2ra4y0feorqj9+w/5ilZLl1TgVOGqG1KFcunS1fbkvz5va9Zs8BXwBx9u9i3Bd95eYe+6BLtyxQf+MZoI9OEHm33rsmNCUKYvrCdOlPvKX5VJXn6On12slql2LF//xkMuBS/33dZqKaowqsUarfzVWtHyP419//IXz9hzz7/m0/sdd97qK3hNEtRrqUC+++5K+x9/+09+fFeTj3SbJgRFJ/EhDMpV9ylXCo2bNm7rOB6J2/n8/h98w/fqaNxdn3Nnqi+1U9RS3S996R4/T0IBTQHDd4EfKrVk1xrt2bPAD291ps9SYU3hUzsmzWNRT5XKlnqvYp2GJrQqo6zstN85z5073X2erZaZkWE5bievOSgJCYlWUnLAfycKuSuXr/M7Ma3w0Ges7eP551+3H//zE35y5tmz5/znqO1JJ81d0ARobTea/7B06UrLzs7230V02C8aKrS8U710e913uf7DLX5YSt+dtmO9F4WO6upaVw+c8+9Xc5q0TWuY8P1lq+2Jf3nOzz9atHiBb0woQKhBoACh8P7D//kTv1331pJv97xaZaJey2hAvVnp+4jpORWqQDe4QqBJadrQVeHrC1aX5IXzF1xi3WgD+vdxG3OBK3Qr/QatSlSpVq0EtTA1E/icS5Wjxwz3t6s7UgcY0iz7y1HFcvjQUVv63mpfSWgCj3YMGvtWolbLRJP71H13n0vgmn2sLr1XX3nLVfqVvgtPS5SiY98X3HVaJfDaa+/4ZU5qjaa4VKudROcCqNdV97ROGldU8lXL8qFH7vMrCBAO5ar7lCtVjwqOauE+/dRLvlfp9tvn+Um0U6dNiNyrgz4jtZJ17JGvff0h3yUfnT+hneATLkAqVKq3ShN0O9N3tda1oH/+s1/5z/zRLz9g8+bd4nc6l4aXWOQPSuW2HQ0BHXY73cWLbrM+fXv7Hbh68HZs32sPP3KvLX9/ne896tu3l9+Jqydg0aL5blvSga02++1Mkx61Lepz12d5ObpNPQYKG5ogOnb8KL/KJDqnQkFDK7R0ELjvfvcxP2cp0e30d+4q8Qfi0vFfNPyl0K4d6Z133eq36WefedUHHw03KpRk+VDpdrRqZeh13X/azmtcA0RBUoFfw6UPP3K/femhe2zQoP7+fjermJ+oqa7cLa4SHTZ0kO8OUxfZjBmTfCp96cU33Aaf6ce9lBA1QWjixLF+xq9mIJ92CfQLDy7yrVIdUU2T71zjzBc87Rg0c1tdzVpqpK447RRUkFTJq1u7T98i9zrpvpJXN9xrr77tCtvBixWKThqnjo4BKr3qKIiasKXHlLk0rZ2DJtqpS1Xdb8uXr/WFXQWvoLBjaWFnaimrS1yT/pScNdaoHYn+JoRDuepe5UqfY+djk2jYSxPv1IL+woOLbfz40f7zVqi43OoPPV6T+H7y46fcZzfIB1B91rpeZeD119+zHTv3+OEPHVhMrWTdT8uL1T3fHWxYv8Xt6DWvockPNygYDBzY18+N2L17vw/Fs9328NZby33Q0A5dn/lTT73odui3+aPR1rvtREMcCvoKvv4zdk+qiZabNu5wn/t4X261y1MY+GDdJt/DpwnTuZo0nZXhQ4WCY3ZWpp/8OWrUMBs1eoS/TWFbvY8KDatXrffbjEKLjlVx3/13+OEb9RZqqEznhwwZ4OdtJCZ9PBRqO9QEaP0N6hmMHj8mFlZ/xPTwh9KhWpOFrqJUV7VaWep69q2CNRv8hDjNwlelrcSrsWlV2nV1DX4sLT0jzYqLh/ru6ypXqag7Lc4VLBUifTb79h1yheOUn4Sn1Llzxx5XgM/4illHaNNr6TVVIWsJk1otU6ZM8AVeSwe1w9B7VGFVK1AFShWLxu7UHaiWjhK5utpU4PRamiSnHdikyeN85d755I+e6Apppnue2XNmuPsO8xsDwqJcda9y5b/v1BTfha3hCIXHBNf6bGvTMQfqfLhSZaqdlULHxg3b/LJDfe8aqlIvhXZ0akkr9Gmyn4ZSFCjUQlfg0OqRkaOK/VwcHV5akwS7Qw9FlD5jhTQNbWhYQNuPyuvePQf8NqP5Kzm5OX7bUBhTD44mQys0a8esw3UrRKSnpfnApzChXoD09FQfCDXEqAnTublZPlBsduFPvQb6PvQ4ff7RORUKLRrOVFCfPGV8x/ftvgu9R72WtlOFPW1T2qY1z0K9hZr/om1W7zve3Xfq1Al2y8ypF4N451OGaxjoeTW/6tbbZkW26Zs/QMb8nAp9STqGgFqCe1zhVCWrCndfySF/UCAlUW34WsOv9fuand/bXU5KTrS6+nqfJFW41CWsSlUT5XSwko7jEaT68WwtJ9R5bQzaALREcMyY4a4iz/CFUFT5q8Cq21StT2080dui9N7UetRJXYCaAKSWTvRQzNECr9n9GscrO1nuu+c6n7TESV2FqrD0HlWhITzKVfcsV/ps9XlqIp56FBTgNIRV1KvQfy/iQ8XGbb5ybW1t8UNUGsfXcUbUc6Veir59e1u8O6/Qqe9WkxFnz+mYcKgycel32B2oTOlv1+9yKDhoR3/s6Ak/GVKBQYeM12e6b/+hjtUfLqSrR0fbjoZNtINW0NDKi/SMVL9sU9+VJj4rEOx35Tf62Z45c95tl8d8aOjZs4ffdiW6LSi46xgSOt6EnqPz96Hz2v416bKttc3PU9J9Oh/iW9ulfj/n1KnTblu+4CdaX7pN7d5V4hsUalioFyZWxHyoiFLyVBdau0u4mo2/w7X8Ft0930/w0YatilbH7B9e7Cpbd51agRqv0+GOtYRO91F3tAq4Zu+qQlarRC1TFVB1Sx926TgjUrBVwOtq611rJsEXOHWFdZ6l37kFoufRSa0ZTZzT8kJVTJoEpPcTPSBOtMC/+srbtmTJe/buOysvTkCKnnTwIlVg2gCVhgkVXYty1X3p81fLVC3d6AoNLRvV/BXNuVGvj4bG9JsRmnyrndE3v/VlP1EvegRH9X5oZ6QeI7WAuzv1QGjVkkJFH1fe/TZRkOuDmKgca7hOPX1akqkeOn3uWgKtcK4JlrqP6JgsWqGlbabJBXN9JwqA2uFrCKNvn15+m1Fo13asiZ6aGKptofOS0s60PWl7qaqq8s/hJ0qfO++HLdUbEaXX0IG1nn7qRT9p89LtSadV7naFEw2dxVqoiPmDX+nPU8WtH35SUp0yZbxfG66C9K9PPOcrUS3dm+Bal3/+Z7/nx9w06U0T17S2WGPef/s3/8vPqlfI0pifujpVqehy5+Cl6/Sh6l9V2t/767/wr7dr9z4/Jq4JedGDFEVpQ9LSpl/84tf2tits2kA0M/+v/suf+tankqxo53Dp0d4upZaxUrCWB86cNc2PoaNrUK5wKZUJHRys3tWbCpAqH9oJ6XoNY2mujXZe+h7xcSrzWm2xRIetvus230OhXoQ9u0vsZz/7lf9FX21Hjzx6v913352+907bW8dqrHN+ye/f/92P3TN1HF5en7PvhYhz4VmToCO7OgW6+PgE/53Ig1+8277xzUd9r8alB7/qTKtxtL3+5CdP+vdS5y7fd+8d9o1vPep7maIud6TbS2mlyhkXoDSEqqWysSLmJ2qK/jxVsGr1qftR3cBqFagAqxWoNehKqErExa4V1rFkrmOj1+M0HqeD0WgC3bXQeJt+qEY7gWpXcavyjh7trXOlovfX+b2oe7zzY6MFUvdRy1RjhtqYtCO7lCo0vZYqM/2dagmha1CugLB8EHblTMdTUS+CehkUALQ6Q8OImguh3gmtbFLgUNCO0iRZ9UboKKbXSsNRGm5Ruf6kbUEBpvN7aWvXKpSOx+p9RWlI5ErbZZRu13tW3aE5GrGiW4QKAADQ9Xxva+Q8AADAZ0KoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAukh7a7u1tbRFLgGxjyNqAkAXqNpdbbWH662tsc3S+qVa7sRsi0+mHYfYxRE1ASAwtdPa29rt/PoKK3u53E68VG6n3jtrLXWt/jYglhEqACCw9pZ2qz/eYE3nmq21ttVqD9RaW0ObGSMhiHGECgAILC4pztWu7hTlznZcF7kMxCiKOAAEpJ/r9qfI5Q66ruM2IJYRKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECALpAXGJcRw2rf5Lc/wHdAKECALpA+sA0Sy1KscTsRMsamWlxSVS3iH1x7U7k/Mc0NDREzgEArkX9yQZrPN1k7S3tllyQZGn9Ui0+kWCB2JWamkqoAICu0t7mqlf3v7gEhj8Q+xQqiM0A0EXi4uMIFOhWCBUAACAIQgUAAAiCUAEAAIIgVAAAgCAIFQAAIAhCBQAACIJQAQAAgiBUAACAIAgVAAAgCEIFAAAIgt/+ABDTVMU1NTVZS0urpaQkW0JCgjU3N1ttbb01NjZG7vWRuLg4S0pKtMzMTEtMTPCXW1parL6+wT0+xd8WH9/RHtNzt7a2Wl1dvX/e1NQU/+9vo8e1tbb5x6WkJrvnTPKvc7X0+JbmFqt1j9d7TEtLvarXBboSPygGIOYpUHz4wWYrLS2zOXNnWJ8+RbZly057+qkX7b13V0bu9ZHMzHQbNXq4/cm/+44VFw+x5OQkK9l7wJa8/p7NnXeLjXa35eRm+/sqbJSVnbJnfv2y9erd0+688zYrKurhb/skCjWnT511j3vF5i+YbaNGFVuqCwZXq7mp2UpKDtpTT71ggwcPsAe+sOiqXhfoSoQKADFPoWLlyg/syOFSW3j7XOvfv49VVVXbsaMn7NTpM5F7fSQxIdGyszOtePgQFzAy7MSJcntjyXv285/9ygoL891tWb5nQfcZ5Hbo6p14842lVlNTa0W9elpqSop/nlmzp10MDJdqcqGg/OQp+8XPf22L715g48aP9r0Nl9Jzluw9aD/9yZNWXV0budasrb3NXa7xf0NGRpr16dv74utKTm6WTZg4xh54YJF/z/Ri4HogVACISW1tbVZfX28bN2yzXbtKbMeOPVZxocqGjxhiU6dOtGnTJlrPosKLwxhXoh336tXrbeWKdXbm9Dk7ffqsjRgx1Ap7dOyo9Xjt+PXz5seOHfc7/rFjR/ohjdGjR/gde0ZGum3fttu2bdvlA460trZZjXtu9ZgUFw9276WHH1aJGjVquH+sQs3mTdvtl798xvr16235+bmRe1yZhmPOnj1vee6+X/nKF32ISkz86LmBrkKoABCTFCrUG/HWG8ts9Zr1tnvXPqtzIWPwoAF+CGPR4vk2cGC/39qC3+0CyQcfbHbBocamTZ9ku3aW2PgJo23w4P7u+Wv8sEqbq0JvmTHZB47jpWU2xoWKYcMGWXJKsp8ncfDgET/MoYCj4KBeDvU0NDU2WfmpM1ZYkOeDR1xc/MUeCAWT++6/y/dy7Nyx17Zu3Wl337PQ8vJyrbz8tNW6IBPtSdFrVFyotKNHj/s5IwooW11Y0eVozwyhAtcDoQJAzKusqLLXX3vX9u0/ZA9+cbENGzrYWlpb/BCEwsflaEednpZmK1as85c1x2KA2zlrHsPxEyf9PIuKikpbsXydffnffMFGjiz2O/Qjh4/Zxo3b7d777rCcnCwfWhQq3liy1CZOHGtjx410wSDHB4pTLlC8+uo7tmDhbD93Qzt+vaddOxUidtmIkcNsjHvd3bv32fr1W/xwioLSqpUf2NFjx+3xx79l/fr1ce8l2fbu2WdPPvmC5efn2R23z7Nz5y/Y+fMVduddt1qfPr0IFbguCBUAYt6BA4ftn//3E/bWm8t8L4NCwIrla22T2/lrmOBy1HPw8CP32z333u57NNTDoICgFRf79x+2999f43bwVbZo8UK/k58ydYLvVcjKyvDBQPMjFEyiPRWXhora2jo77ALIm28s83MqRroAcblQMX36RN8D8rOfPm3L3XueNm2SPfrlB6xP7yI7UVbu/qalPpAsdEFCwx4KKk8+8bydPXPebpk52b7y1S/6iv63DfMAIRAqAMS8DRu22nvvrLA9ew74uRBDhw7ywxPqWWhsarKG+ka/w+7Ro8AGDepvBYV5frJmv/59/DwGBYzOwyn17v5FvXr45+jTt5dfVbJ1yw4XPJJt9JjhfnVIZ5cLFQoKmjz6T//4S/v2d75qt98xz7/+paFi5swp7romO3rkuB/20KqTeBdU1n+4xQ+T9B/Qz6qqq62xodEmTxlvw4cPsePHT/qekILCfP/+FSgUboCuRqgAELNUtWlnu2r1etuxfY9fxjlgQF8/j2HRXfNtWPFg3wOhiZZPPvGC3wFrOamORTFocH/r2bPQ9050hIadfqKlegAUCBRONFdBtWdra4ufiKlAoJChiZx3LVrgn089FpeGitTUZNu796DvOdFcjezsbB9ENDGzV6+evxEqZs2aevFYGHoNPZd6OdLT0/wqFD1OQx0HDhzx9yk7UW6z50z3vSa6HbieFCoSvudELn+M1mADwM2otaXVt9rLT562ysoqy8rKdK35cZbgWu75BXmu0dToA0VyUpLt2LHXD11owuOx0hO+Z6Jv3952xoUIBQrNaThy5LiNGD7UHzDrwvlKO1V+xg83aFWIegM0AVMHyDpz5pylucrV4uJ8b4gaZxoyUWDQ8k9NoNy5Y48PC48++oB/b2Vl5VbnwkJKcrLV1TV0BJfCfL/a4/ChY34FyMsvv+WXrmo+xdBhgy3NBQsFCoUMnTScoyWqjY1NVl9X759Xk0g1mZPhD1wPGsKjpwJATNIOftmy1W4Hn+KPNaGdrVZDqAfh+PEy27Rph7W51r2ue/aZ1/zQh3bk586p5X/YHvval2ztmo0+BPTt28vmz5/tewjiE+J9IGhubvG9H2lp7rr4juGF6ATM55591RKTkm327GmWl5/jJ4oOGTLQD6Vs3LjVLz39zncf8wfiUoDRvIx16za5YJNl4yeM8kFi5Khif/sLzy/xz6f3pbWriS68JF5h1YreU1tbq+9hGVY8xB555D774pfu8a8LdDWGPwDELPVErF2z3oWIPr6n4OTJU7Zg4RwrKiq0F194wx8me8q0CX4i5lNPvuj/HTp0oO8lePWVd+xr33jIr5xQFbl/3yFbt3ajnySp41vU1NTZnj37/ZDEw4/ca7m5Of4olwcPHbUN67faggWz/fwHDX9o+ETP//JLb9ikSePs/gfustlzpvnHRJe0qqdhz+599s47K63/gD5WVVnt50hoomZ9XYNfDvvGG8ustaXFD3kMce/zUtGjbGp4R8M8GgJRCNJ7oKcC14OfFBw5DwAxRfMlxk8Y4ydVagmoaN5B6bEyu3Ch0rJzsnyvRedJjKoUdbjtYcMH+zCg+2tipeZPaMihZN9Bq3Q7fB1FMzUl2aqqa3wPgn7DQ0s/FQxGjS62Hj0L/ZwGLfcU9RR861tftm988xG7ZeYUv/TTdxW719ZJO3/NodCSVwUCzesQ3ScrO9Mfgjs3J9ty3Em9KRpKufSko3kWFORZngsruo8mfup1CRS4nphTASAmaWfqhyvi4+zYsRNu51/h51LoIFQZ+n2PUcP9Cg/Vc1oZop1w7z6a95Durzt6pNQPe+iAU0kulCiY7NX93A5bvQxalaFDbWsuhXoudFwIHSpbh9zuvDO/cKHCDh06ahMnjXO3jfI7/M5BRnRZQUKhQb0U0TkVCjNRem1NFtX737Jlh61ds+Fjp3XrNvqDbqnXRX8PcD0pBBMqAMQ09TZoXsR5FyrU4j958rQ/YmVaeqodPHDU9y7odv0wV+8+vXwQSUpK8If1zs3L8Us5d+7c64+W+aGWclbV+J3+4SPH/HDHERc+SkoO2Lmz5/1raYmnQoSGNtSjoaWf0YmaGjq59Dc+VM9q1YYOJ64jdupfFzN8MOjdu2fHnRyFCvWGHDp89KOJop1O5adO+yEe9b4MGzaYUIHrjomaAGLe5X6lVEtN33xzmT377KsuWBzxyzm/853HbNKksb/xa6EHDx61f/nlM/bSi29Errl6f/pnv+cPbKUJosuWrrYxY0b4g1xFf+E0Sofcjr6XPbv3++uij+3du8hfFt1HK1p0dE/N/bjUR3MqTvshlOHDh0ZuAa4PJmoCiHmq4hQsWtwOWUs81YOg6zSRU3Mh1FOgoY2MjAz/b+ehCd2mSZRaSXKtsjIzfECJvpaGSTTP49I5Dlr90fm9SPSx0YmcEn0Peg61CC+l19ERP1vd8+lxuh9wPREqAABAEKz+AAAAwRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABEGoAAAAQRAqAABAEIQKAAAQBKECAAAEQagAAABBECoAAEAQhAoAABAEoQIAAARBqAAAAEEQKgAAQBCECgAAEAShAgAABGD2/wNpp3/8H0qdgAAAAABJRU5ErkJggg==)
图1:HTML文档树结构图
文档树中的每个节点都是 Python 对象,这些对象大致分为四类:Tag , NavigableString , BeautifulSoup , Comment 。其中使用最多的是 Tag 和 NavigableString。
Tag:标签类,HTML 文档中所有的标签都可以看做 Tag 对象。
NavigableString:字符串类,指的是标签中的文本内容,使用 text、string、strings 来获取文本内容。
BeautifulSoup:表示一个 HTML 文档的全部内容,您可以把它当作一个人特殊的 Tag 对象。
Comment:表示 HTML 文档中的注释内容以及特殊字符串,它是一个特殊的 NavigableString。
1) Tag节点
标签(Tag)是组成 HTML 文档的基本元素。在 BS4 中,通过标签名和标签属性可以提取出想要的内容。看一组简单的示例:
from bs4 import BeautifulSoup
soup = BeautifulSoup('<p class="Web site url"><b>www.duidaima.com</b></p>', 'html.parser')
#获取整个p标签的html代码
print(soup.p)
#获取b标签
print(soup.p.b)
#获取p标签内容,使用NavigableString类中的string、text、get_text()
print(soup.p.text)
#返回一个字典,里面是多有属性和值
print(soup.p.attrs)
#查看返回的数据类型
print(type(soup.p))
#根据属性,获取标签的属性值,返回值为列表
print(soup.p['class'])
#给class属性赋值,此时属性值由列表转换为字符串
soup.p['class']=['Web','Site']
print(soup.p)
输出结果如下:
soup.p输出结果:
<p class="Web site url"><b>www.duidaima.com</b></p>
soup.p.b输出结果:
<b>www.duidaima.com</b>
soup.p.text输出结果:
www.duidaima.com
soup.p.attrs输出结果:
{'class': ['Web', 'site', 'url']}
type(soup.p)输出结果:
<class 'bs4.element.Tag'>
soup.p['class']输出结果:
['Web', 'site', 'url']
class属性重新赋值:
<p class="Web Site"><b>www.duidaima.com</b></p>
遍历节点
Tag 对象提供了许多遍历 tag 节点的属性,比如 contents、children 用来遍历子节点;parent 与 parents 用来遍历父节点;而 next_sibling 与 previous_sibling 则用来遍历兄弟节点 。示例如下:
#coding:utf8
from bs4 import BeautifulSoup
html_doc = """
<html><head><title>"堆代码"</title></head>
<body>
<p class="title"><b>www.duidaima.com</b></p>
<p class="website">一个学习编程的网站</p>
<a href="http://www.duidaima.com/python/" id="link1">python教程</a>,
<a href="http://www.duidaima.com/c/" id="link2">c语言教程</a> and
"""
soup = BeautifulSoup(html_doc, 'html.parser')
body_tag=soup.body
print(body_tag)
#以列表的形式输出,所有子节点
print(body_tag.contents)
输出结果:
<body>
<p class="title"><b>www.duidaima.com</b></p>
<p class="website">一个学习编程的网站</p>
<a href="http://www.duidaima.com/python/" id="link1">python教程</a>,
<a href="http://www.duidaima.com/c/" id="link2">c语言教程</a> and
</body>
#以列表的形式输出
['\n', <p class="title"><b>www.duidaima.com</b></p>, '\n', <p class="website">一个学习编程的网站</p>, '\n', <a href="http://www.duidaima.com/python/" id="link1">python教程</a>, '\n', <a href="http://www.duidaima.com/c/" id="link2">堆代码</a>, '\n']
Tag 的 children 属性会生成一个可迭代对象,可以用来遍历子节点,示例如下:
for child in body_tag.children:
print(child)
输出结果:
#注意此处已将换行符"\n"省略
<p class="title"><b>www.duidaima.com</b></p>
<p class="website">一个学习编程的网站</p>
find_all()与find()
find_all() 与 find() 是解析 HTML 文档的常用方法,它们可以在 HTML 文档中按照一定的条件(相当于过滤器)查找所需内容。find() 与 find_all() 的语法格式相似,希望大家在学习的时候,可以举一反三。
BS4 库中定义了许多用于搜索的方法,find() 与 find_all() 是最为关键的两个方法,其余方法的参数和使用与其类似。
1) find_all()
find_all() 方法用来搜索当前 tag 的所有子节点,并判断这些节点是否符合过滤条件,最后以列表形式将符合条件的内容返回,语法格式如下:
find_all( name , attrs , recursive , text , limit )
参数说明:
name:查找所有名字为 name 的 tag 标签,字符串对象会被自动忽略。
attrs:按照属性名和属性值搜索 tag 标签,注意由于 class 是 Python 的关键字吗,所以要使用 "class_"。
recursive:find_all() 会搜索 tag 的所有子孙节点,设置 recursive=False 可以只搜索 tag 的直接子节点。
text:用来搜文档中的字符串内容,该参数可以接受字符串 、正则表达式 、列表、True。
limit:由于 find_all() 会返回所有的搜索结果,这样会影响执行效率,通过 limit 参数可以限制返回结果的数量。
find_all() 使用示例如下:
from bs4 import BeautifulSoup
import re
html_doc = """
<html><head><title>"堆代码"</title></head>
<body>
<p class="title"><b>www.duidaima.com</b></p>
<p class="website">一个学习编程的网站</p>
<a href="http://www.duidaima.com/python/" id="link1">python教程</a>
<a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>
<a href="http://www.duidaima.com/django/" id="link3">django教程</a>
<p class="vip">加入我们阅读所有教程</p>
<a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>
"""
#创建soup解析对象
soup = BeautifulSoup(html_doc, 'html.parser')
#查找所有a标签并返回
print(soup.find_all("a"))
#查找前两条a标签并返回
print(soup.find_all("a",limit=2))
#只返回两条a标签
最后以列表的形式返回输出结果,如下所示:
[<a href="http://www.duidaima.com/python/" id="link1">python教程</a>, <a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>, <a href="http://www.duidaima.com/django/" id="link3">django教程</a>, <a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>]
[<a href="http://www.duidaima.com/python/" id="link1">python教程</a>, <a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>]
按照标签属性以及属性值查找 HTML 文档,如下所示:
print(soup.find_all("p",class_="website"))
print(soup.find_all(id="link4"))
输出结果:
[<p class="website">一个学习编程的网站</p>]
[<a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>]
正则表达式、列表,以及 True 也可以当做过滤条件,使用示例如下:
#列表行书查找tag标签
print(soup.find_all(['b','a']))
#正则表达式匹配id属性值
print(soup.find_all('a',id=re.compile(r'.\d')))
print(soup.find_all(id=True))
#True可以匹配任何值,下面代码会查找所有tag,并返回相应的tag名称
for tag in soup.find_all(True):
print(tag.name,end=" ")
#输出所有以b开始的tag标签
for tag in soup.find_all(re.compile("^b")):
print(tag.name)
输出结果如下:
第一个print输出:
[<b>www.duidaima.com</b>, <a href="http://www.duidaima.com/python/" id="link1">python教程</a>, <a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>, <a href="http://www.duidaima.com/django/" id="link3">django教程</a>, <a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>]
第二个print输出:
[<a href="http://www.duidaima.com/python/" id="link1">python教程</a>, <a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>, <a href="http://www.duidaima.com/django/" id="link3">django教程</a>, <a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>]
第三个print输出:
[<a href="http://www.duidaima.com/python/" id="link1">python教程</a>, <a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>, <a href="http://www.duidaima.com/django/" id="link3">django教程</a>, <a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>]
第四个print输出:
html head title body p b p a a a p a
最后一个输出:
body b
BS4 为了简化代码,为 find_all() 提供了一种简化写法,如下所示:
#简化前
soup.find_all("a")
#简化后
soup("a")
上述两种的方法的输出结果是相同的。
2) find()
find() 方法与 find_all() 类似,不同之处在于 find_all() 会将文档中所有符合条件的结果返回,而 find() 仅返回一个符合条件的结果,所以 find() 方法没有limit参数。使用示例如下:
from bs4 import BeautifulSoup
import re
html_doc = """
<html><head><title>"堆代码"</title></head>
<body>
<p class="title"><b>www.duidaima.com</b></p>
<p class="website">一个学习编程的网站</p>
<a href="http://www.duidaima.com/python/" id="link1">python教程</a>
<a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>
<a href="http://www.duidaima.com/django/" id="link3">django教程</a>
<p class="vip">加入我们阅读所有教程</p>
<a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>
"""
#创建soup解析对象
soup = BeautifulSoup(html_doc, 'html.parser')
#查找第一个a并直接返回结果
print(soup.find('a'))
#查找title
print(soup.find('title'))
#匹配指定href属性的a标签
print(soup.find('a',href='http://www.duidaima.com/python/'))
#根据属性值正则匹配
print(soup.find(class_=re.compile('tit')))
#attrs参数值
print(soup.find(attrs={'class':'vip'}))
输出结果如下:
a标签:
<a href="http://www.duidaima.com/python/" id="link1">python教程</a>
指定href属性:
<a href="http://www.duidaima.com/python/" id="link1">python教程</a>
title:
<title>"堆代码"</title>
正则匹配:
<p class="title"><b>www.duidaima.com</b></p>
#attrs参数值
<p class="vip">加入我们阅读所有教程</p>
使用 find() 时,如果没有找到查询标签会返回 None,而 find_all() 方法返回空列表。示例如下:
print(soup.find('bdi'))
print(soup.find_all('audio'))
输出结果如下:
None
[]
BS4 也为 find()提供了简化写法,如下所示:
#简化写法
print(soup.head.title)
#上面代码等价于
print(soup.find("head").find("title"))
两种写法的输出结果相同,如下所示:
<title>"堆代码"</title>
<title>"堆代码"</title>
CSS选择器
BS4 支持大部分的 CSS 选择器,比如常见的标签选择器、类选择器、id 选择器,以及层级选择器。Beautiful Soup 提供了一个 select() 方法,通过向该方法中添加选择器,就可以在 HTML 文档中搜索到与之对应的内容。应用示例如下:
#coding:utf8
html_doc = """
<html><head><title>"堆代码"</title></head>
<body>
<p class="title"><b>www.duidaima.com</b></p>
<p class="website">一个学习编程的网站</p>
<a href="http://www.duidaima.com/python/" id="link1">python教程</a>
<a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>
<a href="http://www.duidaima.com/django/" id="link3">django教程</a>
<p class="vip">加入我们阅读所有教程</p>
<a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>
<p class="introduce">介绍:
<a href="http://www.duidaima.com/view/8066.html" id="link5">关于网站</a>
<a href="http://www.duidaima.com/view/8092.html" id="link6">关于站长</a>
</p>
"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
#根据元素标签查找
print(soup.select('title'))
#根据属性选择器查找
print(soup.select('a[href]'))
#根据类查找
print(soup.select('.vip'))
#后代节点查找
print(soup.select('html head title'))
#查找兄弟节点
print(soup.select('p + a'))
#根据id选择p标签的兄弟节点
print(soup.select('p ~ #link3'))
#nth-of-type(n)选择器,用于匹配同类型中的第n个同级兄弟元素
print(soup.select('p ~ a:nth-of-type(1)'))
#查找子节点
print(soup.select('p > a'))
print(soup.select('.introduce > #link5'))
输出结果:
第一个输出:
[<title>"堆代码"</title>]
第二个输出:
[<a href="http://www.duidaima.com/python/" id="link1">python教程</a>, <a href="http://www.duidaima.com/c/" id="link2">c语言教程</a>, <a href="http://www.duidaima.com/django/" id="link3">django教程</a>, <a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>, <a href="http://www.duidaima.com/view/8066.html" id="link5">关于网站</a>, <a href="http://www.duidaima.com/view/8092.html" id="link6">关于站长</a>]
第三个输出:
[<p class="vip">加入我们阅读所有教程</p>]
第四个输出:
[<title>"堆代码"</title>]
第五个输出:
[<a href="http://www.duidaima.com/python/" id="link1">python教程</a>, <a href="http://vip.biancheng.net/?from=index" id="link4">成为vip</a>]
第六个输出:
[<a href="http://www.duidaima.com/django/" id="link3">django教程</a>]
第七个输出:
[<a href="http://www.duidaima.com/python/" id="link1">python教程</a>]
第八个输出:
[<a href="http://www.duidaima.com/view/8066.html" id="link5">关于网站</a>, <a href="http://www.duidaima.com/view/8092.html" id="link6">关于站长</a>]
最后的print输出:
[<a href="http://www.duidaima.com/view/8066.html" id="link5">关于网站</a>]
如果想了解更多关于 BS4 库的使用方法,可以参考官方文档:https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#