Thursday, January 18, 2024

[SOLVED] Resolving Selenium Driver issue on AWS EC2 Linux RedHat

January 18, 2024 amazon-ec2, python, selenium-chromedriver, selenium-webdriver

Issue

I have a peculiar issue while trying to set up Selenium WebDriver on an AWS EC2 instance running Red Hat. I'm using Google Chrome version 120.0.6099.109 on this instance.

I have been trying to install the appropriate ChromeDriver for this Chrome version, but I've encountered an intermittent problem. My Selenium Python script sometimes runs successfully, and other times, it reports that "chrome has crashed." I have tried multiple versions of ChromeDriver without consistent success.

Here's the version of Google Chrome I have:

$ google-chrome --version
Google Chrome 120.0.6099.109

I've attempted to download the corresponding ChromeDriver version (e.g., 120.0.6099.71), but the issue persists. I've also tried various versions of ChromeDriver with no consistent results.

Below is a snippet of the Python script using Selenium:

chrome_options = webdriver.ChromeOptions()
prefs = { 
    "download.prompt_for_download": False,
    "plugins.always_open_pdf_externally": True,
    "download.open_pdf_in_system_reader": False,
    "profile.default_content_settings.popups": 0,
    "download.default_directory": file_path_descargas_guias
}
chrome_options.add_experimental_option('prefs', prefs)


chrome_options.add_argument("--headless")
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Chrome(options=chrome_options)

The weird thing is that sometimes IT DOES work, but most of the time it does not the cell on Jupyter Labs keeps "running" until I get this error:

---------------------------------------------------------------------------
SessionNotCreatedException                Traceback (most recent call last)
Cell In[11], line 16
     13 chrome_options.add_argument('--no-sandbox')
     14 chrome_options.add_argument("--disable-dev-shm-usage")
---> 16 driver = webdriver.Chrome(options=chrome_options)

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/chrome/webdriver.py:45, in WebDriver.__init__(self, options, service, keep_alive)
     42 service = service if service else Service()
     43 options = options if options else Options()
---> 45 super().__init__(
     46     browser_name=DesiredCapabilities.CHROME["browserName"],
     47     vendor_prefix="goog",
     48     options=options,
     49     service=service,
     50     keep_alive=keep_alive,
     51 )

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/chromium/webdriver.py:61, in ChromiumDriver.__init__(self, browser_name, vendor_prefix, options, service, keep_alive)
     52 executor = ChromiumRemoteConnection(
     53     remote_server_addr=self.service.service_url,
     54     browser_name=browser_name,
   (...)
     57     ignore_proxy=options._ignore_local_proxy,
     58 )
     60 try:
---> 61     super().__init__(command_executor=executor, options=options)
     62 except Exception:
     63     self.quit()

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py:209, in WebDriver.__init__(self, command_executor, keep_alive, file_detector, options)
    207 self._authenticator_id = None
    208 self.start_client()
--> 209 self.start_session(capabilities)

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py:293, in WebDriver.start_session(self, capabilities)
    286 """Creates a new session with the desired capabilities.
    287 
    288 :Args:
    289  - capabilities - a capabilities dict to start the session with.
    290 """
    292 caps = _create_caps(capabilities)
--> 293 response = self.execute(Command.NEW_SESSION, caps)["value"]
    294 self.session_id = response.get("sessionId")
    295 self.caps = response.get("capabilities")

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/webdriver.py:348, in WebDriver.execute(self, driver_command, params)
    346 response = self.command_executor.execute(driver_command, params)
    347 if response:
--> 348     self.error_handler.check_response(response)
    349     response["value"] = self._unwrap_value(response.get("value", None))
    350     return response

File ~/anaconda3/lib/python3.11/site-packages/selenium/webdriver/remote/errorhandler.py:229, in ErrorHandler.check_response(self, response)
    227         alert_text = value["alert"].get("text")
    228     raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 229 raise exception_class(message, screen, stacktrace)

SessionNotCreatedException: Message: session not created: DevToolsActivePort file doesn't exist
Stacktrace:
#0 0x5635242acd33 <unknown>
#1 0x563523f69f87 <unknown>
#2 0x563523fa5e21 <unknown>
#3 0x563523fa1d9f <unknown>
#4 0x563523f9e4de <unknown>
#5 0x563523feea90 <unknown>
#6 0x563523fe30e3 <unknown>
#7 0x563523fab044 <unknown>
#8 0x563523fac44e <unknown>
#9 0x563524271861 <unknown>
#10 0x563524275785 <unknown>
#11 0x56352425f285 <unknown>
#12 0x56352427641f <unknown>
#13 0x56352424320f <unknown>
#14 0x56352429a028 <unknown>
#15 0x56352429a1f7 <unknown>
#16 0x5635242abed4 <unknown>
#17 0x7f84fd69f802 start_thread

or:

I did download an dunzip the webdriver and put in PATH but I'm lost right now.

I am seeking any insights into the following:

The recommended version of ChromeDriver for Google Chrome 120.0.6099.109 on Linux/Red Hat. Any specific configurations or adjustments needed for running Selenium WebDriver in headless mode on an AWS EC2 instance without display hardware. Or what would be the proper way to set the path of the WebDriver?

Solution

Instead of using selenium, you can try playwright. Playwright is a newer python module, that does not require a webdriver.

You can install it using: pip install playwright

Then install the included browsers: playwright install and it will start downloading Chromium, Webkit, and Firefox, which come built in with playwright. Of course, you can also use Chrome and Microsoft Edge (running through a channel).

For example:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(channel="chrome") #using chrome
    page = browser.new_page()
    page.goto('https://example.com')
    print(page.title)
    page.click('button#something') #Clicks on a button (CSS for example)
    page.close()
    browser.close()

and that should work. Note that in the line: browser = p.chromium.launch(channel="chrome"), you're not actually launching chromium, you are launching chrome. I assume the reason chromium is there before launching the correct browser is because chromium, chrome, and msedge are similar browsers. You can also do the same with firefox and webkit, just replace chromium with the desired browser, but only put the channel argument if you are running chrome or msedge.

Note that playwright launches a browser in headless mode by default (the window opens without the user seeing so), similar to certain functions of selenium. To change headless mode to False, add an argument called headless when launching the browser.

The reason I suggest playwright instead of selenium is that selenium is pretty old, and not the best choice for browser automation. Playwright is a newer and better version of selenium, and it will probably help you with your crashing problem.

Playwright: https://playwright.dev/python/docs/intro

Answered By - 5rod

Answer Checked By - Katrina (WPSolving Volunteer)

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 18, 2024

[SOLVED] Resolving Selenium Driver issue on AWS EC2 Linux RedHat

Issue

Solution

Popular Posts

Labels