Changing the complicated dates

Hello everyone. I’m fairly new here. In LibreOffice Calc, I have a time series data (starting from 2003 to 2011 so right about 28000 ish observation data lol) which I got from a certain database.

The problem is the included “Time and Date” column in the file. They are in a “non-conventional” DOY (Day of Year = YYYY-DOY-HHMMSS) format with those underscores and no colon signs to ‘differentiate’ the hour, minutes and seconds e.g. 2011_271_190334 making it more complicated.

Certainly, I want to import this file in Python - Astropy (as .csv). The problem is it doesn’t read it but only in a correct way e.g. 2011:271:19:03:34 and it’s totally time consuming to change those ~28000 ish dates one by one or manually.

May I ask if there’s a way, sort of like a function, to modify the whole column from YYYY_DOY_HHMMSS into YYYY:DOY:HH:MM:SS correct format.

Attach here is the spreadsheet I’m having trouble with. In Column F labeled as Time and Date (DOY), the initial 500 cells are already converted ‘manually’ into the correct one which Astropy.time reads. I highlighted with yellow which where I stopped.

Thanks and I hope for your understanding and response(s).

It looks pretty awkward, but I think it gives the right result. Just try

=LEFT(SUBSTITUTE(F502;"_";":");9) & MID(F502;10;2) & ":" & MID(F502;12;2) & ":" & RIGHT(F502;2)

(Sorry, communication failure did not allow to tell in full in one go)
Place the formula (mine or @mikekaganski’s) in a helper cell outside the spreadsheet, for example AK502. Select this cell and press Ctrl+Shift+End - cells from current to last will be selected, range AK502:AK28240. Now press Ctrl+D and wait a little - the column will be filled with formulas and you will see the results. Press Ctrl+X to cut out what you get, go to cell F502 and use Ctrl+Shift+V (Paste Special) to insert rows (not formulas!). I told you it was awkward.

Another way is a little easier - use the Find&Replace tool (Ctrl+H)

Find: (\d+)_(\d+)_(\d\d)(\d\d)

Replace: $1:$2:$3:$4:

Regular expressions: ON

ReplaceDOY.png

Or alternatively, on LibreOffice 6.2+:

=REGEX(F502;"^(\d+)_(\d+)_(\d\d)(\d\d)(\d\d)$";"$1:$2:$3:$4:$5")

(OP’s document is reported to be edited on 6.4.6.2)

The same regex and replacement strings may be used in replacement dialog (Ctrl+H) on any version to make in-place changes.

it actually works…The thing is, when doing this on bulk (that is to say, doing this in a whole column), only gives me 1 value. :confused:

both sir johnsun and Mike Kaganski’s soln works…The thing is, when doing this on bulk (a whole column), it only returns me a single value :c (or maybe I’m missing out things lol)

@Senkei: that means that you are doing that “on bulk” incorrectly, copying the formula verbatim, instead of copying the cell (or you may put the formula into AJ502, then select the cell, and double-click on the small rectangle at the bottom right corner of the cell to auto-fill the formula all the way down).

@mikekaganski, it seems to me that the parts (\d\d)$ and $5 are not needed in the formula - these numbers are not processed in any way, so they can be ignored either in the search string and in the replacement pattern (IMHO)

@JohnSUN: you are right, as always! Yours is simpler and shorter, and the only use case of the full regex as I posted may be when one needs to make sure that the full cell content is matched, not only the first part of it.

By the way, the same could be done for the first numeric part, to make it even shorter :slight_smile:

Alright. It worked :smiley: Thanks to both of you.

@mikekaganski, You mean _(.{3})_(..)(..) replace with :$1:$2:$3:?Yes, you’re right, it’s much shorter.

.{3} is longer than \d+ or ... (the latter has the same benefit as .{3} making sure the length is 3, at the cost of not caring about character class)… just in the hunt for the shortest regex possible :slight_smile: