Selenium Scraper¶

Option to use Selenium as parser backend instead of Playwright has been added in Release 0.9.0. Selenium is an optional dependency and can only be installed via pip using the command below.

Terminal

pip install pydude[selenium]

Required changes to your script in order to use Selenium¶

Instead of Playwright's ElementHandle objects when using Playwright as parser backend, WebElement objects are passed to the decorated functions.

Python

from dude import select


@select(css="a.url")
def result_url(element, page):
    return {"url": element.get_attribute("href")}


@select(css=".title")
def result_title(element, page):
    return {"title": element.text}

Running Dude with Selenium¶

You can run Selenium parser backend using the --selenium command-line argument or parser="selenium" parameter to run().

TerminalPython

dude scrape --url "<url>" --selenium --output data.json path/to/script.py

if __name__ == "__main__":
    import dude

    dude.run(urls=["https://dude.ron.sh/"], parser="selenium", output="data.json")

Limitations¶

Selenium does not support XPath 2.0, therefore not allowing regular expression.

Examples¶

Examples are can be found at examples/selenium_sync.py and examples/selenium_async.py.