Ask Your Question
0

REGEX problem splitting dates [closed]

asked 2019-12-12 15:07:42 +0100

Bastiaan gravatar image

updated 2019-12-13 11:01:49 +0100

I have the following rows in column A. They are dates, but sometimes years or nothing. Also there could be till dates etc.

1999
5-1700/5-1960

4-1700/4-1960
1983-7-23
2019-07-10


1982-5-19
1982-5-19
1982-5-21
1982-5-18
1983-7-19/1983-7-22

I would like to split them in from_date from_year from_month from_day till_date till_year till_month till_day.

I tried the following in the from_year column: =IF(REGEX(A2,"^(\d{4})$","$1"),REGEX(A2,"^(\d{4})$","$1")) But it does give me #VALUE! for columns that do not match but it does give me the whole years also in the correct places How should I do this? if I use =REGEX(A2,"^(\d{4})$","$1") I get returned all values! (That should be plain wrong as I put beginning and end in the regex)

Perhaps I should first split the from and till dates first and then to the regex?

I have the feeling regex isn't working properly in Libreoffice calc version: 6.2.8.2 (I am used to vim)

Any help would be appreciated!

Thanks Bastiaan EDIT 2019-XII-13: changed the example text as preformatted text and adjusted the dates day month order

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by erAck
close date 2020-01-19 19:13:05.414023

Comments

Hello,

1999 5-1700/5-1960
4-1700/4-1960 1983-23-7 2019-10-07
1982-19-5 1982-19-5 1982-21-5 1982-18-5 1983-7-19/1983-7-22

You really have these three rows in a single column? If not, please reformat your question.

Opaque gravatar imageOpaque ( 2019-12-12 15:51:05 +0100 )edit
2

Quoting from the question:
1982-19-5 1982-19-5 1982-21-5 1982-18-5 1983-7-19/1983-7-22

1982-19-5 isn't a date at all. Knowing that there are truely strange "date formats" around,we would probably consider someone thought YYYY-DD-M was an acceptable date format, but what about 1983-7-22 then?

You cannot interpret or handle in any way textual data supposed to mean/contain dates as long as not there is a minimum of reliable assurance about the format.

Lupp gravatar imageLupp ( 2019-12-12 18:51:10 +0100 )edit

Of course they are dates but you are right that the month and day are swapped here, i adjusted it in the example. The example regex isnt looking at dates but structures anyway so that should not matter.

Bastiaan gravatar imageBastiaan ( 2019-12-13 10:32:44 +0100 )edit

I can see that you reformatted. However, Lupp's comment still stands. The date layout is not consistent, so a procedure for extracting a date from the entries is not going to be reliable.

First you seem to have a (day-)month-year layout. Then you switch to year-day-month, and finally year-month-day (ISO-standard, sort of). A solution to handle all possible layouts is going to be messy.

The 2019 entry in the middle is ambiguous, and this is where "messy" changes to "impossible". Clean up your data, or include a date format specifier in your data, or else the dates will not be reliable.

That notwithstanding, splitting date segments is probably easy. I'd use simple FIND() instead of REGEX(). Not as elegant, but easier to debug. It looks like the from/till divider is always a slash and within dates you always use hyphen, so you only need to locate ...(more)

keme gravatar imagekeme ( 2019-12-13 10:52:17 +0100 )edit

1 Answer

Sort by » oldest newest most voted
2

answered 2019-12-13 19:04:29 +0100

erAck gravatar image

if I use =REGEX(A2,"^(\d{4})$","$1") I get returned all values! (That should be plain wrong as I put beginning and end in the regex)

That is a misconception of how regular expression search and replace works. Only iff there is a match then the replacement happens, otherwise the original string is returned unaltered. It is the same as how sed s/// or awk sub(,,) work.

Your example data can be treated with regular expressions though quite cumbersome. You'll have to first check if there is a match for a given expression and only then do the replacement, else do something else. The following returns a yyyy-mm-dd/yyyy-mm-dd string for each of your sample data, which then can be processed further in a uniform manner. I do it on several lines and indented to line out the IF() blocks with their condition, then and else paths.

=IF(
    ISNA(REGEX(A1;"^\d{4}-\d{1,2}-\d{1,2}$"));
    IF(
        ISNA(REGEX(A1;"^\d{1,2}-\d{4}/\d{1,2}-\d{4}$"));
        IF(
            ISNA(REGEX(A1;"^\d{4}$"));
            A1;
            REGEX(A1;"^(\d{4})$";"$1-1-1/$1-12-31")
          );
        REGEX(A1;"^(\d{1,2})-(\d{4})/(\d{1,2})-(\d{4})$";"$2-$1-1/$4-$3-28")
      );
    REGEX(A1;"^(\d{4}-\d{1,2}-\d{1,2})$";"$1/$1")
   )

Note that for mm-yyyy the first of month is assumed for the start day thus results in yyyy-mm-1, and for the end day the 28th is assumed to have always a valid date. If only a year is given the entire year results, yyyy-1-1/yyyy-12-31. Alter to your liking.

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2019-12-12 15:07:42 +0100

Seen: 163 times

Last updated: Dec 13 '19