Basic Usage¶

To use dude, start by importing the library.

Python

from dude import select

A basic handler function consists of the structure below. A handler function should accept 1 argument (element) and should be decorated with @select(). The handler should return a dictionary. Click on the annotations (+ sign) for more details.

Python

@select(css="<put-your-selector-here>") # (1)
def handler(element): # (2)
    ... # (3)
    return {"<key>": "<value-extracted-from-element>"} # (4)

@select() decorator.
Function should accept 1 parameter, the element object found in the page being scraped.
You can specify your Python algorithm here.
Return a dictionary. This can contain an arbitrary amount of key-value pairs.

The example handler below extracts the text content of any element that matches the CSS selector .title.

Python

from dude import select


@select(css=".title")
def result_title(element):
    """
    Result title.
    """
    return {"title": element.text_content()}

It is possible to attach a single handler to multiple selectors.

Python

from dude import select


@select(css="<a-selector>")
@select(selector="<another-selector>")
def handler(element):
    return {"<key>": "<value-extracted-from-element>"}

Supported selector types¶

The @select() decorator does not only accept selector but also css, xpath, text and regex. Please take note that css, xpath, text and regex are specific and selector can contain any of these types.

Python

from dude import select


@select(css="<css-selector>")     #(1)
@select(xpath="<xpath-selector>") #(2)
@select(text="<text-selector>")  #(3)
@select(regex="<regex-selector>") #(4)
def handler(element):
    return {"<key>": "<value-extracted-from-element>"}

CSS Selector
XPath Selector
Text Selector
Regular Expression Selector

It is possible to use 2 or more of these types at the same time but only one will be used taking the precedence selector -> css -> xpath -> text -> regex.

How to run the scraper¶

To start scraping, use any of the following options. Click on the annotations (+ sign) for more details.

TerminalPython

dude scrape --url "<url>" --output data.json path/to/script.py #(1)

You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

if __name__ == "__main__":
    import dude

    dude.run(urls=["https://dude.ron.sh/"]) #(1)

You can also use dude.run() function and run python path/to/script.py from terminal.

Examples¶

Check out the example in examples/flat.py and run it on your terminal using the command python examples/flat.py.