This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as
possible. If you’re interested in financially supporting Kenneth Reitz open source, consider visiting this link. Your support helps tremendously with sustainability of motivation, as Open Source is no longer part of my day job. When using this library you automatically get: Full JavaScript support! CSS Selectors (a.k.a jQuery-style, thanks to PyQuery). XPath Selectors, for the faint at heart. Mocked user-agent (like a real web browser). Automatic following of redirects. Connection–pooling and cookie persistence. The Requests experience you know and love, with magical parsing abilities. Async Support Tutorial & UsageMake a GET request to ‘python.org’, using Requests:
Try async and get some sites at the same time:
Grab a list of all links on the page, as–is (anchors excluded):
Grab a list of all links on the page, in absolute form (anchors excluded):
Select an element with a CSS Selector:
Grab an element’s text contents:
Introspect an Element’s attributes:
Render out an Element’s HTML:
Select Elements within Elements:
Search for links within an element:
Search for text on the page:
More complex CSS Selector example (copied from Chrome dev tools):
XPath is also supported:
JavaScript SupportLet’s grab some text that’s rendered by JavaScript:
Or you can do this async also:
Note, the first time you ever run the render() method, it will download Chromium into your home directory (e.g. ~/.pyppeteer/). This only happens once. Using without RequestsYou can also use this library without Requests:
Installation
Only Python 3.6 is supported. |