I’m having problems web scraping https sites using LibreOffice python.
I have Libreoffice 5.3.4.2 on Windows 7, and can demonstrate the problem with this simple script:
try: import urllib.request myUrl = 'https://ask.libreoffice.org/c/english/5' hdr = {'User-Agent': 'Mozilla/5.0'} req = urllib.request.Request(url=myUrl, headers=hdr) response = urllib.request.urlopen(req) except Exception as e: print(e)
This fails immediately with “urlopen error unknown url type: https”.
It works fine with an http url, but fails with any https url.
I tried the above in a LibreOffice Calc document with this embedded script and it failed. It also failed when I tried running it in a terminal window from C:\Program Files (x86)\LibreOffice 5\program\python-core-3.3.0\bin\python.exe
The script works fine with my standalone Python 3.3.2 running from a terminal window.
I’ve also tried various LibreOffice Portable installations I have:
4.0.2.2: Works OK 5.3.1.2: Fails 5.3.2.2: Fails
I’ve tried uninstalling and reinstalling 5.3.4.2 more times than I can count and cannot get it to work. Yet installing it on Windows 10 on the same PC using a VM machine, it works fine.
Any idea what is going on?
=================================
Further news 19Jul17:
I tried the Safe Mode in LibreOffice 5 and the script works fine. Went back to normal mode and it failed again. Uninstalled LibreOffice 5.4.3.2 and then deleted everything I could find relating to LibreOffice. Reinstalled 5.4.3.2 x86 and the behaviour is unchanged… works OK in Safe Mode and fails in normal mode