Importing dates from .csv files into Base via Calc

I found this quite tricky, not because it’s tricky in principle but because there are a number of things that we can overlook if we are in a hurry. I will explain what can go wrong both in terms of what to do, and then in terms of why it can look right in Calc but not import into Base properly. I thought I would document how I get it to work reliably, both for my convenience next year, and for anyone else who only does this occasionally and gets as stuck as I did today.

  1. Get your data into a .csv file from whatever source.
  2. Open this file in Calc. IMPORTant (pun intended) any columns containing dates must be loaded as Date columns, not as Standard – these are the headers. This will give you the choice between DMY, MDY, or YMD formats
  3. Check that the data is lin3d up properly, if not tweak it, save as .csv, reload (with Barclays bank file I seem always to have to adjust the first two rows) and start again from top
  4. Select all the data you want
  5. Go over to Base
  6. Go to the Tables screen (click on the tables icon, or View → Tables and put the mouse pinter in the bottom half
  7. Edit → Paste or right-click ->paste
  8. Change the table name if it has suggested an existing one to a suitable temporary table name (I will use T)
  9. Choose definition and Data
  10. Remember to change the format of the date column, and any numeric columns - Base will not pick up what you did in Calc :frowning_face:
  11. Look at T in edit mode: are you happy with the data types
  12. Look at T: did the dates and any numbers come across?
    13.Use Tools → SQL to execute INSERT INTO ( … ) SELECT … FROM T to move the data into whjrever you really wanted it

OK that’s my recipe. If your brain works like mine, know how and why it can go wrong helps to get it right, so I will explain that below. Stop reading here if your brain is not wired like mine :slight_smile:

Why it can go wrong is that while dates look like dates in Calc, really they are numbers. Christmas day 2022 is really 44920 for example. Midday on that day is 44920.5

It looks like text, whether you loaded it as a date or as Standard. Sadly, if you loaded it into Calc as Standard, it really is text, and when you try to load it into a field you said was a Date, you get that annoying Java error saying the data type is wrong, and all your dates are NULLed out (or no rows are transferred if you made the date an essential value).

Conversely, if you loaded the .csv file correctly into Calc, but then forgot to tell Base you wanted it as a date, you will get a five digit number (integer for a date, or a timestamp at midnight; fraction for timestamps at other times).

If you get it wrong, delete the temporary table and start again: trying to fix it after its gone pear shaped takes far longer, at least in my experience. For one thing, different versions of the back end HSQLDB database either do or don’t accept the functions to convert from text to date.

I hope this is helpful. I am new here, (or rather newly returned after a 4 year absence) so if this should have been posted somewhere else please could a suitable wizard move it for me and tell me where it really belongs so I will know next time I want to post a tip rather than a cry for help.

Merry Christmas to anyone else who’s online as I type: HO! HO! and indeed HO!

2 Likes

Open your text in a text editor, copy and paste a few lines here. csv is not a file format. in particular, csv has nothing to do with spreadsheets.

I’m sorry, .csv is a standard file format used by many banks and other financial organisations when you want to download transaction data. That is why it is IMPORTant to know how to load them into Base.

As far as I am aware, the only way to do this is via Calc. If you know better then please tell me how, but I do need to start with a .csv

That is because, however much you and I agree to dislike the format, the the .csv file seems to be a finance industry de facto standard, and the OP is the only way I have succeeded in making it work. Barclays Bank and PayPal both use it for example, at least in the UK. Barclays Bank also offers two other download formats, but neither seems to be suitable for loading into any of the LO suite of programs.

And NO, I am not posting even a snippet of the files that come from either PayPal or Barclays online, because every line (apart from the heading line) contains details that are confidential either to myself or to other businesses.

I agree it is a cludge for spreadsheets, and a worse cludge for databases, but I do not have a choice of how the data arrives, and Calc does come with a built in import mechanism for loading it; even though it is still a cludge it is one that the LO team supports

PS: ExCell (yuk), OO, and LO all export data to .csv and re-import it from/to their respective spreadshhet programs,; so nobody told any of those spreadsheet devs that .csv has nothing to do with them…

I’m also sorry. @Villeroy is right, as csv files are widespread, but no standard at all. So you always have to find out, what is really in the box.

  • First problem: Charset is not defined. No big problem in the US where 7-bit ASCII is often sufficient. But Europe has more accents ê, umulauts ä, ligatures like ß and much more ø. And your file may use the internal charset of the bank, the charset of the OS where it was saved or any other, and even unicode leaves utf8, utf16 (with or without BOM)
  • separator may be a comma as csv was named as “comma separated values”, but as comma is sometimes used in numbers like 1,234.56 (US) or 1.234,56 (germany) other separators like ; or tab are also common and I’ve often seen also the pipe | Some people use tsv for tab separated, but not all.
  • You may have seen above: No standard to format numbers
  • Same problem: 10/11/12:is a “valid” date but wich November 10 in 2012 (or 1912), October 11 in 2012 or 2010, Nov 12.
  • Rules on quoting necessary, especially for comma as separator

After you found all answers to this questions you can usually import your csv. I really like your idea of explaining this to people, but because of the bunch of variants it will stay a constant recurring problem.
.
IMHO on of the first problems is for some users to see the import-dialog as something, where you can change the behaviour. A lot will just press Ok or Cancel, nothing else.

1 Like

When importing text data, you need to declare the correct import settings because csv is not standardized at all.
Most common issues regarding date strings:

  1. Always check “Detect special numbers”. Dates on sheets are special numbers.
  2. Choose the right import language which describes the numerals best. It might be some flavour of English which implies a decimal point. In case of M/D/Y dates you have to choose “English (USA)”. In case of D/M/Y dates, you have to choose a non-US dialect of English.
  3. If a text delimiters has been declared, option “Format quoted field as text” should be unchecked if the dates and financial amounts are quoted.

Ctrl+F8 highlights constant numbers in blue font. If ALL the dates and amounts are blue, they have been imported correctly.

I do import all my banking csv into Base directly without using any spreadsheet. It is rather complex to setup. However once it has been set up, it is very easy to handle. Never gets any duplicates nor wrong data types.

This is a line from a csv of one of my German bank accounts. 3 Banks, 3 different flavours of csv:

10.01.2022;10.01.2022;"Dauerauftrag Überweisung";"Stadtwerke xxxxxxx";"VKto 2135004 Stadtwerke xxxxxxxx Info: VKto 2135004";EUR;;579;"759,78"

The fields are separated by semicolon, because the comma is the decimal separator in the amounts.
First 2 columns are German dates DD.MM.YYYY
Last 2 columns are comma decimals.
Some strings are double-quoted. The EUR is unquoted. Surprisingly, the last comma decimal is quoted too. This is a rather silly flavour of csv.
That bank encodes text with Windows 1252. With wrong encoding, the umlauts (“Überweisung”) would be wrong.

1 Like

For your info, I started to do these imports via Calc some years ago (back in the day of embedded Firebird) because at that time I could not get data directly from a .csv file into a table in a pre-existing embedded Firebird database, and the habit stuck.

(I do not know if that was my mistake at the time, or if going via Calc really was the best way at the time – it’s history now and at this stage I don’t care)

At your suggestion I will try the direct import over the next two days, and report back. It looks like there is no way to import directly from a .csv file into a single table while leaving all the other tables alone am I right? If not, then please tell me how to do that.

Unless you say otherwise, I will assume that the best way to go is to create a new and temporary database in the opening menu, based on a folder containing only the relevant .csv file(s), and then copy/paste the resulting tables across from the temporary databse to the permanent one. Is that what you advise, or is ther4e something even better I could be trying, please.

Thanks for that, but I was well past that point and had already cracked most of the problems of getting Calc to read the input appropriately for the files I am given.

The thing that confused me, and prompted me to write that post, was the fact that in doing a copy/paste from Calc to Base dates need to be expressed as integer days since some epoch. It can look like a date in Calc and really be a number underneath the format, or it can be an apparently identical text string which will merely go across as Text. There is no special number option in that copy/paste, you have to have identified that column as a date when opening Calc, That was one of the two gotchas that my recipe is designed to protect people from.

Indeed, Firebird can not import text directly, therefore the spreadsheet is one way to do this without programming effort.

It is not possible to help you with that unless you give a small sample of your actual text data as I did on Dec 28th. Simply replace text and account numbers with xxx or 000 so all that remains is meaningless numbers and dates.

Hi again:

I think I am still not succeeding in making my point clear.

I do not need help in selecting the options to load a .csv file into any part of LO. I have been doing that successfully for over two decades.

My post was to help other users overcome a DIFFERENT issue than one cause by .csv – that of the counter intuitive behavious when using copy/paste to transfwer data from Calc to Base. It is not a .csv issue by that point, it is that Base does not recognise anything like 25/12/2022 as a date in that process, it wants a five digit integer.

My post was a work-around for that issue by beig careful how to load the .csv into Calc in the first place.

I do thank you for your continued efforts to help. Actually the most useful point you have made is not about .csv at all, but the fact that we can now in LO 6.4 load .csv directly into a database, which did not seem to work under firebird.

That makes my original post obsolete, except for anyone still using the Firebird vintage of Base.

I will therefore start a new thread giving my new, improved, and easier method. Thanks once again :slight_smile:

Thanks for that: yes in a general help post I should have made clear the absolute need to make clear that at the time of import of a .csv file (into ANY program, not just the LO components) it is essential to consider all the options very carefully, especially if the data does not “look right”

When trouble shooting it is equally important to check what format the data is after being loaded – here when I got the import into Calc wrong the data data was loaded as text, not as number.

(NB Internally to Calc there is not such thing as a date format, date is an output firmat of a number of days past some defined epoch, and in Base import from paste refuses to convert text into date)

It seems to me that everyone trying to assist me on this has missed the point about copy and paste from Calc to Base, and instead assumed that the issue was one of the many many .csv issues.

Anyway I am giving up on this thread now, if anyone else wished to take it further then be my guest.

First thought: “You import text dates from csv. Virtually everybody imports text dates.”
Second thought: “Oh, gosh. Do you really use embedded Firebird?”
A quick test with such bastard reveals that it is possible to copy&paste dates from Calc.
Localized date strings 12/31/1999 NO
ISO date strings: 1999-12-31 NO
Day numbers: 36525 NO
Formatted day numbers 31/12/1999 NO
Day numbers formatted as ISO dates YES

Tested with LibreOffice 7.4 on Linux.

DO I use embedded Firebird: not any more, but I did back in the day. a lot of my working habits were formed then, and clearly some are obsolete now.

The other questions I am not going to answer because, as I said, you are not trying to help the same problem that I am addressing. Thanks again for your good intentions.

date_columns.ods (23.9 KB)
hsql_dates.odb (3.3 KB)

Hands-on demo with spreadsheet having various date formatted columns and a database with one date column.
Open side by side.
Copy one column (row 1 until row 21) from the sheet.
Select the table icon and paste. Confirm the dialog.
After copying all 5 columns you have 100 correct dates in the database. HSQL accepts date formatted numbers as well as ISO strings.

To see how little is standardize for “csv” (if anything at all):

The best informed explanations regarding the current state of options usable in Calc with csv import/export (organized as “tokens”):
helppage
See also:

(@erAck is a developer who actually knows things.)
The fact that data exchange works so miserably regarding dates is solely the responsibility of those who refuse to apply the clear ISO guidelines. This has already cost millions of users many millions of hours. A special contribution to that mess are the stubborn habits invented in the USA.

1 Like

agree totally!

Yes we have a lot we can legitimately complain about, but in the meantime we have to work with what we are given, not what we would have preferred if only they had asked us first :wink:

1 Like

Since you refuse to reveal anything at all, I posted a sample of my own csv data. You remember? It contained German dates like 31.12.1999.
One possible “formula” to convert this into a date is:

SELECT CAST(SUBSTR("DT",7,4)||'-'||SUBSTR("DT",4,2)||'-'||SUBSTR("DT",1,2) AS DATE) AS "D" FROM "APO_Import"

where “APO_Import” is the name of a text table linked to a csv file having a column “DT” with the German date strings.
This works with all versions of HSQL and I’m confident that something very similar works with all SQL databases on the market.

And from a Calc sheet I can copy numbers that are formatted as dates into a date filed as well as ISO strings (text values like 1999-12-31). I still suspect that you import date strings into your Calc sheet. For me this is the end of this topic since you refuse to give concrete information.