Extracting serialized data

Hi all, I’ve got a problem whereby I need to extract some data from a serialized data string that has come from a websites product export.

An example of the serialized data is shown below:


The text I need to extract from the serialized data for each product is:

  • pa_colour
  • pa_material

So I would need the values ‘Grey’ and ‘Tweed’ in to separate columns in my sheet for the above example.

Can anyone help with a suggestion of how to achieve this?

All help greatly appreciated. Thanks in advance!

That looks like the ugly PHP serialize()'ed output. Probably easiest would be to find a converter that converts between PHP serialize and JSON (don’t know one off-head), and then use another converter that converts between JSON and CSV, which you can import to Calc. For JSON to CSV you could use in2csv from the Python csvkit which should be available in most Linux distributions, or from GitHub.

This is a nontrivial case of parsing.
Spreadsheet software does not provide powerful standard functions for the purpose.
Therefore I think you will either need to write a bit of user code, or to accept to split the task to a series of steps based on finding positions with the help of SEARCH with RegEx enabled.
In any case it is necessary to describe precisely the syntax of the compound information you got. Based on just one example, it’s a guessing like stabbing in the dark. Even the key "pa_colour" occurs twice.

Please check the attached example about how to extract the colour information. An attempt to merge the different steps in one cell-formula would result in an unmaintainable monster formula.

BTW: Anyone providing information in such compounds should be obliged to also provide the syntax and openly available means to analyse the compond.