How to parse .odg file?

I need to parse an .odg file that shows a flowchart.
how can i output connections between blocks?
as a result of parsing, I want to fill the database with this data, in which there will be related tables
i need your help!

Tell us what you want to do. What is the expected result and what for? Do you want a list of connectors like: A →B, A→C, B→C under which format? Once you have this information, what do you do with it? Pass it to GraphViz?
Edit your question to improve it.

Thanks, i edited my question

I had a look at one of my drawings describing a DB and dumped the underlying XML. Retrieving the information is possible but extremely difficult. One of the assumptions is you are using connectors between boxes and not lines. Connectors are glued to glue points of the boxes and therefore there is positive information in the XML about how the boxes are connected.
The first tool needed is an XML parser. The second tool is a complete understanding of ODF XML description, basically how to retrieve shapes (XML elements) and the XML attributes where the glue points are described. You’ll have to customise the parser to recognise the ODF elements.
An important step is to build a dictionary of internal names used to designate the styles, shapes, connectors. These names are not the same of your human-assigned names. A translation dictionary needs to be build. You must also associate the shape (XML elements) with their label (XML element contents).
All in all, if you’re not a computer science professional versed in parsing lore, I discourage you from embarking in such an adventure unless you find an already existing parser targeted at ODF.

You could eventually study a proof of concept in Python since it provides the SAX XML parser but a lot of work remains to be done to target ODF.

1 Like

Thank you, you really helped me.
but it is very important for me to parse this.
maybe you know some other ways, from other formats, maybe I can first convert it to pdf and parse?

No, PDF is a page description language. This means it will position “basic objects”, like strokes, dots, characters, on the page in any order convenient for it. In the end, all structuring present in the original file is lost.

Another approach would be to start from the source which created the drawing if you have access to it.

To give you an idea of the parsing effort, save your drawing as .fodg (flat XML ODF drawing). Open this file in a simple text editor and you’ll understand the difficulty.

1 Like

Attach a reduced example showing what you have (the graphical object) and what you want to get in the respective textual format (representation of connections). IMO the term “parse” isn’t sufficiently clear. .
Parsing XML isn’t my hobby, and I never tried to used this related service in LibO.

The dedicated services for drawings might help.

If I should consider to check for the mentioned alternative, I would need your help. See above.

Снимок экрана от 2022-02-14 16-14-23
Hi, this is the reduced example
The output should look something like this: entity_name, entity_type, next_element(for example, ‘A’, ‘input’, ‘S1’)

I am afraid that there is not enough data in the drawing: ‘input’ is not present. This means you need “meta-rules” to describe what should be added.

I made a little experiment to see what could be done.

  • the tape symbol doesn’t exist in the flowchart shape collection; I hope it has the same properties as others, i.e. it has glue points
  • your screenshot doesn’t show if “lines” connecting the symbols are connectors (primary Drawing objects with “nice” properties, i.e. they glue to attached shapes)
  • the screenshot doesn’t show if labels are written in independent text boxes or are own properties of the shapes and connectors

Please, attach an .odg file corresponding to the sample. Also, give us the full textual translation.
If the drawing is done with strict methodology with as few shapes as possible and according to Drawing rules regarding connectors, there may be hope but needed effort is still important.

I’m a bit tired ofd asking you (@gulshat) for your help. I wouldn’t try to become an expert in your field. But it shouldn’t be difficult for you to understand that the “reduced example” I asked for wasn’t an image, but an .odg containing the real object.

Anyway: My suggestion’s background is that LibO, when loading an .odg, parses the XML representation of your FC anyway, and you should probably make use of the results.

To shorten this I made a trivial example of an arbitrary FC, and wrote a few lines of Basic code using the LibO API to demonstrate what you can get very easily.

To run.the code from the example, you should check it for the absence of malign parts, and then reload the file with permission for the document macro to ruin.

disask73971nonsenseFlowChart.odg (13.1 KB)

1 Like

@Lupp: very good job, simple and clean

test.odg (10.7 KB)
Sorry, that I don’t immediately think of some things, this is the first experience for me, I hope for your understanding, thank you.
Here I have input ‘A’ that goes to ‘S1’, from ‘S1’ to branching ‘B’, from ‘B’ to ‘S2’, or to ‘S3’. ‘C’ and should have been ‘D’ are the outputs.
Yes, this is not exactly a flowchart, it is more like a flowchart, but it is very important for me to get the connections between entities.
Could this be a problem for parse?
Thank you very much for your answers!!

@gulshat: your drawing has all the required properties. I hope (because I can’t check it for sure) that all the connectors have the correct orientation from source to destination, or from prior task to later task. In this case @Lupp’s macro is a starting point.
What is still missing at this stage is a clean specification of what should be output.

In turn I hope, I wasn’t too harsh.
It’s, of course, always problematic to start as a newcomer with a rather demanding question. Don’t worry, however.

Your example looks as if you want to assure that the FlowChart is the only content of the file. I would suggest to group the objects belonging to the chart, and this way to allow for additional content of the file.

Another issue was already emphasized by @ajlittoz: You can’t clearly report the results without a specification of a format. Anyway it’s required that the connectable elements are uniquely identified (not only by their position). You seem to take it as granted that this is done by an assigned text. To prefer a name as the designator might be a better way in some cases. Alas. The UI doesn’t provide naming for connectors despite the fact that they have a .Name property on the API level.

In short: I reworked your example and also the code which can now recursively rersolve groups down to the single shapes, and this way alo report about connectors internally grouped with other shapes.

Reworked example:
test_Combined_disask73971nonsenseFlowChartConnectorEdgesRecursively.odg (16.1 KB)

1 Like

Thank you so much! You really help me.
In the result, i want to get a dict like

      "entity_name": "А",
      "entity_type": "input", (I'll add)
      "next_entity": "S1"
     "entity_name": "S1",
      "entity_type": "conveyor", (I'll add)
      "next_entity": "B"
     "entity_name": "B",
      "entity_type": "branching", (I'll add)
      "next_entity": "S2"
     "entity_name": "B",
      "entity_type": "branching", (I'll add)
      "next_entity": "S3"
   ], etc..

and to fill a database with this data
Will this data help you to understand what i am going to do?
Is it possible to parse with python?

@gulshat: to summarise, you do not want to describe the connectors, something like merging your first pair:

{ [ "connector_name": "S1",
    "source": "A",
    "destination": "B"

which can be done easily from @Lupp’s macro.
Instead of describing your graph by its edges, you want to describe it by a mix of vertex-to-edge and edge-to-vertex half entities, i.e. its graphical components. In addition, to be able to make a distinction between the nature of the components, you insert the “type” of the egress half entity in the descriptor.
My comment is; your specification is rather weird in graph theory where we prefer to list vertices and edges (as pairs of source-dest vertices) separately. Note this is the way you describe a graph to GraphViz, a utility to automatically layout and display graphs.
The traditional approach to graphs is easier to implement. If you really want your “half-entity” specification, you can post-process the vertex/edge list to produce any problem-specific description.

I forgot to write that I am going to use S1, S2, S3 as blocks, i.e. edited the file accordingly and will not be used as the connectors name
test1.odg (10.6 KB)

Then it comes back to enumerating the Draw connectors and @Lupp’s macro can do the job. It only needs small adjustments. I guess your Sn blocks are processing steps.

There are some remaining questions possibly causing serious problems:

  1. Are all the entities marked by a pair of square brackets on the same level? (I assume “yes”.)
    Or shall there also be allowed grouped entities to be resolved on a nested level?
    Difficult? Probably not too much.
  2. How shall the order be defined?
    Seems: You expect the graphical representation of the FC to “show” the order.
    Fact: This order only is represented by .Position properties of gaphical blocks. It is not rerpresented by the order of objects inside their groups (or the DrawPage). Only the connectors also know from what to where they are pointing, but among the connectors there again isn’t represented a logical order in the graphic.
    Difficult: To analye the order based on positions.
    Alternative1. Assure a specific naming which can represent the order. Apply it actually via the .Name properties; don’t use the .String property for the purpose.
    Alternative2: A probably complicated analysis of the paths from a uniquie Start element to a uinique End element via any connectors. This may be a non-trivial task in the field of graph-theory which I don’t know in any detail.
  3. For blocks being the .EdgeStartConnection for many (>=2) connectors no reasonable order of these connectors can be defined (imo) except via the name (see above).
  4. As a consequence of 2./3.: Before a textual representation regarding an order can be created, sorting is required as an antecedent.
  5. Let me solve just my simple case for now. A generalizing enhancement will follow later.
    This attitude may cause double labour - or much more.
  6. Most of the expectable problems will not be solvable (imo) based on some additional hints given via a Q&A site. The solution involves thorough and complete considerations and muist be based on your own expertise, whether present or needing to be developed.

Somebody may choose to see this as his (f/m) own problem and solve it. He propbably/hopefully will share the solution then.

This is what I never understood anyway. Sorry.
[[How and to what end shall any textual representation of a (DB-related) FC be “filled” into a DB?
Are you trying to create a “DB of useful FlowCharts”? How to query it? How to apply results without a solution for the reverse task? Would you also add the graphic to the DB?]]

Since this was somehow interesting to me, I played around a bit more following the suggestions from my own comments.

As also in related cases I then used a spreadsheet as the frontend of a kind of batch processor, and also as the data container for the results. The specific syntax and the sorting were not included, but spreadsheet tools can do it used the interactive way or called by user code as well.

The example was made with LibO V 7.2.3 and also tested with Portable V 6.2.5. I did not research the reasios for what it didn’t work in AOO V 4.1.7.

You may play with the attached sheet. The example source is loaded via http. You may use your own example odg-files, of course. Of the included code only some parts are written specially for this example. Most of it is from my toolboxes. Suggestions gotten over the years from other contributors are freely used without explicit quotations. Concerning the way to call a basic script by name I want to thank @sokol92. It was only recently that I learnd this from one of his posts.
devDisask73971Sheet2.ods (15.4 KB)