Display not printing characters (Ascii code < &H20)

torreone · January 21, 2023, 3:27pm

I have to display a long writer document with inserted control characters (ascii code < &H20)
Is there any way to highlight these characters with colors or with placeholders?
Both in the options and on the net I have not found anything
Thanks in advance for any suggestion

LO 7.4.4.2 snap, Ubuntu 16.04 ,writer saved as odt

edit: I know how to enter characters with CTRL+SHIFT+U+hex code+Enter
I am only interested in having the possibility of highlighting those already present in the document, due to batch insertion tests that I had done in the past, as they are not visible (each occupy one byte in the text, I can see them from the macro, in search I find them but I cannot see them at sight)
In an editor I see this placeholder (a square icon with two zeros in the first row and a 0-1 in the second ), in writer I don’t see this placeholder.
Is there any way to see it at these characters? It would be enough for me to be able to see this placeholder or something similar
I do not want them to be effective, I do not want i.e. have the same effect as in a terminal.
I do not want to display them with their names.
I want only to display with a placeholder or with a color on request
But I don’t see anything relevant in the options
Apparently it seems not, but before insisting on trying I would like to ask for confirmation

ajlittoz · January 21, 2023, 4:00pm

Edit your question with OS name, LO version and save format.

What do you expect from these control character insertions? Do you want them to be effective, i.e. have the same effect as in a terminal? This is not possible since Writer, and any document processing application in general, reinterprets the input character sequence.

Do you want to display them with their names? If so, don’t insert them directly but use Unicode characters in the range U+2400 to U+241F. In other words, to display character 0xYZ (hexadecimal code), type U+24YZ followed by Alt+X.

Lupp · January 21, 2023, 7:46pm

Help meeee…!
What do you mean by Ascii < &H20H?
&H is a prefix in some context usable to give numeric constants based on hexadecadic notation, but &H20 simply is the ASCII (and unicode) number for the ordinary space. The final H of yor string isn’t a hex digit. If we enter a character based on UniCode in Writer, we generally use 4 hex digits and press then Alt+X.
…
I have no clue concerning Ascii < &H20H. Nor has my search engine one.

If you want to highlight whitespace characters in a Writer document, use regular expressions with FindAll in F&R. If you know the correct unicode for any characters, you can enter them into the Find field as \uABCD each, where A, B, C, D represent any hexadecadic digits.
The Regex \u0020|\u0009|\u000A will find ordinary space, HT, and LF e.g.
The same way you can find “special space characters” and the like.

===editing to regard the comment below===
Taking the question literally as it is worded now, the appropriate RegEx would be [\u0000-\u001F]. However, \u000d -that’s CR- (and probably a subset of additional old-style control characters not known exactly to me) cannot occur in a Writer document. On the other hand you may have imported/inserted lots of additional non-printing or combining “characters” defined by unicode. If this can be an issue for you, the “<&H20” won’t be sufficient. In addition there are “visible” space/whitespace characters not being “ordinary” spaces: smaller and wider spaces, no-break-space (U+00A0), and lots of things I don’t know details about.

torreone · January 21, 2023, 11:47pm

I’m sorry, I didn’t notice that final H, which I then repeated with the copy and paste.
I meant non-printable characters, with ascii code < &H20 or 32 decimal
For now I have corrected the title and the text of the message…
I have some problems in finding these characters in F&R that I didn’t have yesterday, I have to do some tests before answering.

ajlittoz · January 22, 2023, 7:26am

It is probably simpler to filter your data before inserting it into your Writer document. This is a simple task for a text editor.

torreone · January 22, 2023, 1:16pm

In my tests I used both LO writer 6.2.8.2 canonical and LO writer 7.4.4.2 snap, both on .odt file, both under linux 16.04

In the two versions of writer I entered non-printable characters with CTRL+SHIFT+u+hexcode+ENTER, hexcode can consist of 1 to 4 hexadecimal digits

Strangely, I don’t know if due to settings/options/or other problems, on writer 6.2.8.2 the insertion takes place as under gedit, i.e. an underlined u is displayed and the characters inserted before ENTER can be seen on the screen
On the contrary, under writer 7.4.4.2 the same insertion works, but nothing is seen on the screen, making the insertion much more cumbersome and exposed to errors

The presence of the non-printing character can be detected by placing the cursor before the displayable character that precedes it, moving with the arrow to the right: the cursor must be pressed twice to move to the next printable character: the second time it is as if it stopped on an invisible character of zero dimension.

The search performed with Search & Replace always requires the regular expression flag activated and works with two specific different search patterns, \uhhhh or \xhh
However, if only the non printing character is searched for, the search returns no results.
Therefore, if the pattern should be only \u0001, NO character with ascii code &H1 is found, even if it exists: after the search, the message “not found” is displayed

Conversely, by searching for a printable character preceded or followed by a non-printing character, every combination existing in the document is selected

For example 1[\x0E-\x1F]2 match every combination of character 1 and 2 with in the middle a non printing character with ascii code from &H0E to &H1F
However, even if the selection also includes the non-printing character, this character is NOT visible at sight

What is really feasible is, with a macro, to scan all the recognized occurrences verifying their length (Len function) and verifying the content of each character with the ASC function.
Again with a macro, upon request, it is possible to insert a single or many different placeholders in the text, one for each non-printing character to be recognised.

Therefore, unless there are display options whose existence I don’t know, the answer is NO, these characters cannot be displayed even on request (apart from the processing with macros I was talking about before)

ajlittoz · January 22, 2023, 1:32pm

This is probably a function in your OS (Ubuntu). Doesn’t work here in Writer under Fedora 37.

The standard Writer way to enter aribtrary Unicode character is to type U+<hex code> followed by Alt+X. Reciprocally typing Alt+X will convert the character at left of cursor into the sequence U+<hex code>.
Note: It looks like Writer won’t let you enter control codes (0x00-0x1F) with this method.

I wonder why you want to insert control codes into your document. Could you explain?

mikekaganski · January 22, 2023, 2:14pm

Note that some such codes can be used internally by writer, e.g. to represent object anchors and such. I am afraid that using those in text could be impossible or inconsistent…

See e.g. https://gerrit.libreoffice.org/c/core/+/121134

torreone · January 22, 2023, 2:14pm

@ajlittoz You’re probably right, I have a distant memory of this being a technique to be used only under ubuntu, but I need to verify this
To answer your other question, I start by saying that I use writer for personal reasons, I don’t have to share my files with anyone.
I have full control over it, even using macros
I don’t worry if in the future some of my techniques should be incompatible with the next versions of LO, I always have more alternative ways to apply in batch mode as a fallback
I use libreoffice writer mainly to build manuals on very broad topics.
I don’t handle master documents well at the moment, I prefer to use very large files in terms of size
For me it is essential to reduce its weight as much as possible (e.g. using styles, avoiding tables, using shift+enter a lot for paragraphs with homogeneous content, etc.)
I know (or so it seems to me) that the misuse of some techniques is discouraged if there are other alternatives
However, alternative functions that do not weigh in small documents, systematically applied to large documents weigh too much
I know I’m a bit of an anomalous user, but just this, as you can see from my other questions, prompted me to get to know and make the most of LO’s resources and features, including macros, exploring niche topics with documentation often non-existent.
Having clarified this, the purpose you asked me is to lightly highlight portions of text, usually paragraphs associated with examples, definitions, applications, etc.
words or sequences of keywords distributed in my document
Using character styles systematically would exponentially increase the portions of text, especially if inserted in the middle of a paragraph instead of at the beginning or end
Using these non-printable characters in a key position makes this information invisible, is very light in terms of memory occupation, and in a 400-page document it is recognized in fractions of a second.
In the opening phase With a macro I can trace all these occurrences, obtain for each one a reference to the paragraph that contains them and the relative additional information, very quickly build a direct access enumerated map with all this information including a unique recognition code for each distinct sequence
For now it’s just an experiment, being able to display these characters on request during editing would help me avoid accidental deletions.

torreone · January 22, 2023, 2:18pm

Thanks @mikekaganski , for now it’s just one of the many experiments I’m doing, sometimes they work sometimes they don’t
But I am autonomous enough to deal with possible malfunctions
The important thing for me is to use these experiments to get to know even underexploited or poorly documented LO features
It is now a tool that I will use for several years, this is a long term investment
Very thanks for the link

torreone · January 22, 2023, 2:24pm

I forgot: thanks to some problems using textframes that I had reported in a previous request made here, I reevaluated the dialog boxes, which perhaps I didn’t know and enough, now I’m starting to use mixed techniques, dialog boxes and textframes
Thank you therefore for your previous observations, also in my previous requests

Wanderer · January 22, 2023, 2:54pm

I can’t really help how to handle this in writer. I sometimes used the features of Notepad++ (Windows) to show characters like cr/lf or tab.
.
But I guess even if writer tries to show something, the used font has tho have a glyph there (like in the linked article:
Text Editor Fonts for Programmers ) But I guess this is unusual for modern unicode-fonts. Second problem are characters internally handled by Writer like tab as already pointed out by Mike Kaganski.

torreone · January 22, 2023, 3:07pm

thanks @wanderer,
after removing the codes &H09(tab), &H10 (LF), &H0D (CR), there are 29 other non printint characters.
For example, it’s enough for me that if one of these is inserted after the « or before the » character, there are no interferences
It would allow me in a very light way to quickly recognize and select sequences of interest of the type «sequence of word» without interference with other uses of these two characters «»
Using macros I can quickly run many tests parameterized to the chosen non-printing character.

ajlittoz · January 22, 2023, 3:12pm

Not at all.
My technical manuals are in the 250-300 pages range. They exhibit very sophisticated layout and formatting. I have absolutely no problems with styles. And styles are the basic principles upon which Writer is built, so we can expect they are correctly and efficiently managed.

Once you have made a full introspection on your document structure and semantics and also about the message intended to be passed to your readers (even you if your documents are strictly personal), you should make do with ~15 paragraph styles, ~15 character styles, ~5 page styles for the various “non recurrent” parts, i.e. cover, front material, TOC, index, + 1 or 3 page styles for the chapters, at most 5 frame styles, 2-3 list styles.

There is no penalty in memory with styles. Memory representation is different from file storage. A styled document may be bigger than your tagged one but there is no need for decoding macros. As soon as the document is loaded, it is ready for use.

If you don’t fully use styles, i.e. overload your text with direct formatting (because it looks easier on first sight), you’ll really increase your document size on disk because every occurrence of direct formatting is a unique instance. Writer creates an “anonymous” style for each instance. And this really inflates the document.

content: you probably mean “formatting”

Wrong! Here are some reasons:

paragraphs are the basic units of significance
A paragraph present one idea. Merging several consecutive paragraphs means you merge several ideas, which is not recommended by all writing doctrines.
the distribution of your text into paragraph should strictly follow the semantics/significance
using a line break instead of a paragraph break suppresses the inter-paragraph vertical spacing which must be replaced by empty lines (with multiple consecutive line breaks) which is another form of direct formatting
A line break is nearly as “heavy” as a paragraph break in the underlying XML.
you lose some formatting power in the merge

torreone · January 22, 2023, 4:00pm

@ajlittoz

Unfortunately all my most important documents started 3 years ago, when I started using LO and for ignorance I didn’t use styles much
My manual in .odt format would all have to be copied one piece at a time into a new file and reformatted from scratch, a huge job that I will do little by little
I agree, it is my goal to build essential but efficient stylesheets common to all my manuals
A fundamental element in my manuals must be research, i.e. looking for a term, its definition or its application
For this purpose I’m focusing on formatting the text using paragraph and character styles linked to the meanings of the text, working on the attributes and on the research on attributes (e.g. all the definitions with a slightly different character color from other areas of the text)
I have no constraints because they are all documents for personal use
The ultimate goal of my work is to be able to associate a unique but very short code and code lists of other associated information units to each information unit (IU), certainly to the headings but not only. In Html-css-javascript it’s simple, in LO it’s possible, but not simple.
The goal is to be able to position oneself on an UI and, upon request, queue portions of the text of the document pertaining to that UI from time to time in a textframe or in a multiline textbox for consultation
Clearly it is an anomalous use of a wordprocessor, because it combines the representation of the information with the database, but in the past I had built a working prototype in html-javascript (to be resumed today though) and I know that technically it can be done in libreoffice using macros and in particular the enumerated maps serving the dialog boxes
Also for this reason, if a definition or a concept are graphically concatenated better in several paragraphs (max 3 or 4), therefore with homogeneous contents and formatting, I often use the soft return, in order to be able to select and move all that portion of homogeneous text
I would like to talk separately about the considerations on soft return and compressed XML coding, it is not entirely clear to me what you are saying, perhaps it is appropriate to open a separate request on their use

ajlittoz · January 22, 2023, 4:14pm

Click on the icon next to my name. This will pop up an information box with a Message button. Click on it and you’re in private mail mode.