CSV linked table

Hi,

I’m trying to link a CSV table in Base and some [automatically detected] fields data type are incorrect ([DECIMAL] identified as [VARCHAR] so that I can’t correctly filter them in queries.

Could somebody explain me why and how to correct this ?

THX

Here is a short extract of my cvs file (sorry, csv are not uploadable here …) :

Local time,Download (Rx) bandwidth (bps),Download bandwidth,Rx packet loss (percent),Upload (Tx) bandwidth (bps),Upload bandwidth,Tx packet loss (percent),RTT (ping) (ms),Rx packet loss burst length (packets),Tx packet loss burst length (packets),Rx instant jitter (ms),Tx instant jitter (ms),Rx RFC3550 jitter (ms),Tx RFC3550 jitter (ms)
2022-08-04 17:49:56,10466680,9.98Mbps/10.00Mbps,0,10461738,9.98Mbps/10.00Mbps,0,34.1792,0,0,52.773197,14.961499,5.3724713,5.6450286
2022-08-04 17:49:57,10474231,9.99Mbps/10.00Mbps,0,10471938,9.99Mbps/10.00Mbps,0,28.546501,0,0,13.503099,14.998799,2.4642513,5.511099
2022-08-04 17:49:58,10477353,9.99Mbps/10.00Mbps,0,10477929,9.99Mbps/10.00Mbps,0,22,0,0,8.2404995,17.5199,1.1321371,6.3444633
2022-08-04 17:49:59,10480189,9.99Mbps/10.00Mbps,0,10478924,9.99Mbps/10.00Mbps,0,22,0,0,4.0049,17.5199,0.80543256,6.3444633
2022-08-04 17:50:00,10480891,10.00Mbps/10.00Mbps,0,10481786,10.00Mbps/10.00Mbps,0,18,0,0,6.5267,16.2887,2.4026084,5.7177634
2022-08-04 17:50:01,10481112,10.00Mbps/10.00Mbps,0,10483386,10.00Mbps/10.00Mbps,0,18,0,0,20.0628,19.220299,3.4091177,5.8376527
2022-08-04 17:50:02,10478805,9.99Mbps/10.00Mbps,0,10481293,10.00Mbps/10.00Mbps,0.21011452,29,0,1,11.4219,41.4852,4.633186,10.155594
2022-08-04 17:50:03,10485384,10.00Mbps/10.00Mbps,0,10482960,10.00Mbps/10.00Mbps,0.18379664,34,0,1,4.6328,18.5009,1.5157533,8.333704
2022-08-04 17:50:04,10484976,10.00Mbps/10.00Mbps,0,10484260,10.00Mbps/10.00Mbps,0.14154649,32.306503,0,1,59.192898,11.8984995,5.7748494,4.71473
2022-08-04 17:50:05,10484225,10.00Mbps/10.00Mbps,0,10484684,10.00Mbps/10.00Mbps,0.12191343,32.306503,0,1,3.5869,10.5047,4.7440615,4.71473
2022-08-04 17:50:06,10485760,10.00Mbps/10.00Mbps,0,10483083,10.00Mbps/10.00Mbps,0.09488011,34.903503,0,1,7.0442,14.454499,1.313329,5.620639
2022-08-04 17:50:07,10484617,10.00Mbps/10.00Mbps,0,10484475,10.00Mbps/10.00Mbps,0.07223342,34.903503,0,1,4.9853,12.2119,1.8018888,6.1370406
2022-08-04 17:50:08,10483414,10.00Mbps/10.00Mbps,0,10484296,10.00Mbps/10.00Mbps,0.06263185,32.134705,0,1,25.144,12.2119,3.0382674,6.1370406
2022-08-04 17:50:09,10482317,10.00Mbps/10.00Mbps,0,10485760,10.00Mbps/10.00Mbps,0.047844563,32.134705,0,1,3.885,10.7904,1.8716671,5.4966993
2022-08-04 17:50:10,10481781,10.00Mbps/10.00Mbps,0,10485760,10.00Mbps/10.00Mbps,0.041381467,34.2953,0,1,3.3267,11.434199,1.068448,5.235266
2022-08-04 17:50:11,10483287,10.00Mbps/10.00Mbps,0,10482359,10.00Mbps/10.00Mbps,0.031947434,34.2953,0,1,4.818,12.0589,1.7139931,5.9259667
2022-08-04 17:50:12,10483588,10.00Mbps/10.00Mbps,0,10485033,10.00Mbps/10.00Mbps,0.027603492,33.3573,0,1,59.834198,12.212,6.0532823,5.512632
2022-08-04 17:50:13,10483836,10.00Mbps/10.00Mbps,0,10482932,10.00Mbps/10.00Mbps,0.021297868,32.9396,0,1,4.0453,11.236799,1.7067077,4.902788
2022-08-04 17:50:14,10483473,10.00Mbps/10.00Mbps,0,10485245,10.00Mbps/10.00Mbps,0.018413423,30.789902,0,1,3.9764998,16.3286,0.97920585,6.128749

How did you link this file:. Opened it with Calc and set the language to English(USA), because the decimal separator is a point. So it will see this in Calc as with the separator of the language of my system (Germany, comma). Has been imported all as decimal numbers.

Then I copied this content to Base, set the import to recognize the first 100 rows and press “Automatic”. All fields where imported with a working field definition. Many fields have been set as DOUBLE, which would work there also.

Second test: I connected directly to a folder with different *.csv-files. The Base file will show all these files as separate “tables”. The content is detected right. All numeric values are shown as DECIMAL. You couldn’t use any special functions here. Is like using a CALC-file as datasource for a BASE file.

Another option will be to connect to the file through internal HSQLDB. You need to define the table for this through direct SQL and could connect after this in a HSQLDB file.

No, the cvs file is the database I’m working on.

So in LO → New → Database, I choose “Connect to an existing database” → Text, then Next, and *.cvs file (comma separated), with , as field separator, " as test separator, . as decimal and nothing for thousand.

This done, the field “Rx packet loss (percent)” is identified as [DECIMAL], and “Tx packet loss (percent)” as [VARCHAR]. Both have strictly identical format in the cvs file.

I can’t use Calc to manipulate data as the cvs file is dynamically generated by another process (and read only file).

Have had a look: I get only “Download bandwith” and Upload bandwith" as VARCHAR, all other as Decimal instead of “Local Time” (TIMESTAMP).

This happens with the data you uploaded. Could be there is a wrong entry somewhere in the column

If you couldn’t upload the file here you could send it to me per private mail to test. I tested it here with LO 7.4.0.2 and also with LO 7.3.5.2 .

Could also help: Open the database file, go to Edit → Database → Advanced Settings…. Have set Rows to scan columns types: 100 here.

Of course strings like 9.98Mbps/10.00Mbps are VARCHAR, they are obviously not decimal values.

Of course yes ! But my problem is “Tx packet loss (percent)” detected as [VARCHAR] (should be [DECIMAL], as “Rx packet loss (percent)”)

The complete file and base are here : Divers - pCloud

And Edit → Database → Advanced Settings…. Have set Rows to scan columns types: is set to 100

Is that because you have those 5.30E-05 in the column?

This can not work. Base has no GUI to generate linked text tables. And how should any software guess the column type of zeroes?

Reason for getting VARCHAR instead of DECIMAL starts at row 46. If you set Rows to scan columns types: 40 it will set this column also to DECIMAL. It will recognize the values well.

If “Base” happens to mean “embedded HSQL”: Chapter 6. Text Tables
Link the columns with Mbps units as varchar(20).
Then create a query that eliminates the units.
If you want to copy the result to a database table:
Right-click the query> Create as View
Copy the view icon onto the table icon.

1 Like

More spoon feeding although we still don’t even know which software we are talking about. In case of embedded HSQL, this gets the job done. However, you are the only one to know the possible value ranges. The following is based on what I see in your csv.
Store text file named “import.csv” in the same directory as the database document.
menu:Tools>SQL…

CREATE TEXT TABLE "import.csv"(
"Local time" TIMESTAMP,
"Download (Rx) bandwidth (bps)" INT,
"Download bandwidth" VARCHAR(20),
"Rx packet loss (percent)" DECIMAL(4,1),
"Upload (Tx) bandwidth (bps)" INT,
"Upload bandwidth" VARCHAR(20),
"Tx packet loss (percent)" DECIMAL(4,1),
"RTT (ping) (ms)" DECIMAL(6,4),
"Rx packet loss burst length (packets)" INT,
"Tx packet loss burst length (packets)" INT,
"Rx instant jitter (ms)" DECIMAL(6,4),
"Tx instant jitter (ms)" DECIMAL(6,4),
"Rx RFC3550 jitter (ms)" DECIMAL(6,4),
"Tx RFC3550 jitter (ms)" DECIMAL(6,4)
)
SET TABLE "import.csv" SOURCE "import.csv;ignore_first=true"

Select the “Tables” section.
menu:View>Refresh Tables
menu:Insert> Query in SQL view

SELECT "Local time", "Download (Rx) bandwidth (bps)",
Cast(Left("Download bandwidth", Locate('M',"Download bandwidth")-1) AS DECIMAL(4,2)) AS "Download bandwidth",
"Rx packet loss (percent)", 
"Upload (Tx) bandwidth (bps)", 
Cast(Left("Upload bandwidth", Locate('M',"Upload bandwidth")-1) AS DECIMAL(4,2) )  AS "Upload bandwidth",
"Tx packet loss (percent)", "RTT (ping) (ms)", "Rx packet loss burst length (packets)", "Tx packet loss burst length (packets)", "Rx instant jitter (ms)", "Tx instant jitter (ms)", "Rx RFC3550 jitter (ms)", "Tx RFC3550 jitter (ms)" FROM "import.csv"

Have downloaded the files. @Narsil has created a simple connection to csv-files in a folder. So it has nothing to do with HSQLDB or other internal or external databases.

The LibreOffice csv driver is close to useless but not useless enough to not run my SELECT query.
SQL Functions for file based database drivers
Nevertheless, I would create a new, blank HSQLDB and run the CREATE statement.

No, it won’t run the query because of the CAST function. The csv driver is close to useless because text-to-number conversion depends on the global office locale. 9.98Mbps/10.00Mbps may or may not convert to decimal even when you strip the unit. This is not a problem when you do the job in SQL where decimals are digits with one decimal point.

Sorry, my knowledge is very limited. As @RobertG said below, it’s a basic cvs table connection.

  1. call Tools>Options… Security > button [Macro Security… ]
    Set highest security level and specify some directory (NOT your download directory) as “trusted source”.

  2. Download zip archive https://www.mediafire.com/file/vjl8p90jin84wms/text_import_80330.zip/file and extract it to the trusted directory or any subdirectory therein.

  3. Put your file named “data.csv” into the database subdirectory.

  4. Open the database document text_import_80330.odb and switch to the tables section. Give it some seconds to load. The contained macro should confirm the new database connection.

  5. Copy the “View” icon, select the “DATA” icon, paste, click button [Create…] and wait a few seconds.

  6. switch to the forms section and open the input form.


Next time when you get a new log file:

  1. Shut down the entire office suite
  2. replace the data.csv in the database subfolder with a new data.csv
  3. open the database, copy the view icon onto the data icon
  4. open the form
1 Like

Thank you very much (to everyboby here, but specially @Villeroy ) !

I don’t understand a word about your answer, but it does exactly what I needed !