Print Unicode to a file

I’m using the simple code below to print a string to a file. It is working fine except when the string contains extended Unicode characters. In that case those characters are all converted to U+FFFD. Examination of the string just before printing to the file shows the correct Unicode values. How can I setup my file output to preserve the extended Unicode characters?

    Dim fileHandle As Integer
    Dim FileWriteGPX As String  ‘output file name
    Dim strGPX As String  ‘data assembled for output
  
' Write strGPX to file
    fileHandle = FreeFile
    Open  ConvertToUrl(FileWriteGPX) For Output As #fileHandle
    Print #fileHandle, strGPX
    Close #fileHandle

You do not say what operating system you are using, language, output file type etc. U+FFFD is the substitute character when the system has ‘unknown’ or invalid data type. Often, for example when copying/pasting to a file type which is not Unicode. So, for example, if you write a .csv file without specifically defining you are using Unicode.

More information would help us understand the problem.

Thank you for responding. I guess I was too brief. My Calc spreadsheet is written to run on both Windows and Mac using macros to take location information from the sheet, reformat it as GPS waypoints and print it to a .GPX formatted file. The waypoints are built up into one long string, with header information, and printed to the open file. The .GPX file is intended to be imported into the Garmin Basecamp program. The spreadsheet, called Bonii-Prep, is available at http://turnlink.net.

The exporting/importing works fine as long as the 255 ASCII characters are used. Since I’m not the only user of the spreadsheet, occasionally an extended Unicode character is entered and appears in the printed file as U+FFFD. That kills the import into Basecamp. I need to prevent that from happening. Basecamp will accept “valid” Unicode characters.

In your comment you mention that I need to specifically define that I am using Unicode. That would seem to be the solution. How can I do that?

LibreOffice, like the internet uses Unicode. You do not need to define this in LibO. How do you export the file as a .GPX? What data types does .GPX support?

A .GPX file uses an XML format with each data item surrounded by opening and closing tags. I manually (macro) take each piece of data, put the appropriate tags around it, and append it to the developing string. When done, I print the string to the file, and save it with a .gpx suffix. It seems that during the printing of the string, the extended Unicode characters are corrupted.

Although some of the data items are numeric, such as latitude/longitude, all are printed as string text.

You say you are printing the file but you are creating a .gpx file. I am confused as the .gpx file is not a print file, is it? If you are creating a � character (U+FFFD), what program is creating it? I have found a similar problem if you cut and then paste using a program that does not yet support the International Unicode Character set first agreed in 1997. US-ASCII (0-127) (1968) and ISO-8859-1 (0-255) (1987) are subsets of Unicode. What text do you have that is causing the problem? Perhaps you can save as a .csv or .dif file which would allow you to define the character subset you are using. And then use that to import.

By printing the file I mean that I am using the Print command to send one long string to an open file. It could be described as a .TXT file structured as an XML file with identifying tags (similar to HTML) around individual pieces of text. The data include location names, latitude/longitude values, etc. I mark the file as .GPX because that is the file type the the Basecamp mapping program imports.

As I mentioned in my answer, it now appears that the Unicode characters beyond AscII(127) are being output to my file (in some form) because Notepad displays them. LibreOffice Write doesn’t display them. MS Word and Garmin Basecamp fail to open the file. My programming text editor, BBEdit, displays them as U+FFFD. Specific characters that I’ve had trouble with are U+017d ( Ž ), U+201a ( ‚ ), U+00e9 ( é ), and U+003F ( ? ).

Thank you for taking an interest in this problem. I’ve essentially decided to “fix” the problem by replacing the problem characters. Others may not be able to

It seems that the situation is both simpler and more complicated that I expected. Opening my file with NotePad shows the extended Unicode characters. Opening with LibreOffice Writer shows the characters missing and opening with MS Word fails with invalid character. My solution is to replace all characters above ASCII 126 with ^. For my application, it is more important to have the file import into Basecamp than to preserve those few characters which are typically not critical.