Basic Usage¶
To use dude
, start by importing the library.
from dude import select
A basic handler function consists of the structure below.
A handler function should accept 1 argument (element) and should be decorated with @select()
.
The handler should return a dictionary.
Click on the annotations (+ sign) for more details.
@select(css="<put-your-selector-here>") # (1)
def handler(element): # (2)
... # (3)
return {"<key>": "<value-extracted-from-element>"} # (4)
@select()
decorator.- Function should accept 1 parameter, the element object found in the page being scraped.
- You can specify your Python algorithm here.
- Return a dictionary. This can contain an arbitrary amount of key-value pairs.
The example handler below extracts the text content of any element that matches the CSS selector .title
.
from dude import select
@select(css=".title")
def result_title(element):
"""
Result title.
"""
return {"title": element.text_content()}
It is possible to attach a single handler to multiple selectors.
from dude import select
@select(css="<a-selector>")
@select(selector="<another-selector>")
def handler(element):
return {"<key>": "<value-extracted-from-element>"}
Supported selector types¶
The @select()
decorator does not only accept selector
but also css
, xpath
, text
and regex
.
Please take note that css
, xpath
, text
and regex
are specific and selector
can contain any of these types.
from dude import select
@select(css="<css-selector>") #(1)
@select(xpath="<xpath-selector>") #(2)
@select(text="<text-selector>") #(3)
@select(regex="<regex-selector>") #(4)
def handler(element):
return {"<key>": "<value-extracted-from-element>"}
- CSS Selector
- XPath Selector
- Text Selector
- Regular Expression Selector
It is possible to use 2 or more of these types at the same time but only one will be used taking the precedence selector
-> css
-> xpath
-> text
-> regex
.
How to run the scraper¶
To start scraping, use any of the following options. Click on the annotations (+ sign) for more details.
dude scrape --url "<url>" --output data.json path/to/script.py #(1)
- You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to
dude scrape
command.
if __name__ == "__main__":
import dude
dude.run(urls=["https://dude.ron.sh/"]) #(1)
- You can also use dude.run() function and run python path/to/script.py from terminal.
Examples¶
Check out the example in examples/flat.py and run it on your terminal using the command python examples/flat.py
.