We use the command line converter to extract text from PDFs. This usually works well, but often it returns what appears to be garbage with a .txt extension, but which is actually a zip file. Change the extension to “zip” and you can open it and see, among other things, the content in XML. I just tried to export one of these files to HTML instead of TXT, and it worked as expected. Any idea what’s going on here? This is a regularly recurring issue.