Libreoffice Base: How to filter and search in multiple tables? [closed]

Hello.

Using the definition of Libreoffice Base manual (https://documentation.libreoffice.org/assets/Uploads/Documentation/en/BH4.0/PDF/BH40-BaseHandbook.pdf p. 198-201), I need to build a query for filtering/searching data in multiple tables, avoiding information repetition.

I am using version 6.0.7.3 of Libreoffice Base, with the default SQL language (HSQLDB).

DATABASE DESCRIPTION

I am building a database for some donations to an archive.

Each donation (“Incorporações”) has the following fields:

  • the donation’s primary key (“ID_incorporação”);
  • the name of the entity that donated the documents (“Nome_entidade”);
  • a brief description of the donation (“Descrição_incorporação”);
  • the oldest known year in which the documents of the donation were produced (“Ano_inicial_incorporação”);
  • the newest known year in which the documents of the donation were produced (“Ano_final_incorporação”).

Each donation is divided into series. The series itself are defined by a table (“Séries”) that has the following fields:

  • the serie’s primary key (“ID_série”);
  • the type of the serie (“Nome_série”).

The relation of the series with the donations (that has a many-to-many nature) is represented in a table (“Rel_incorporação_série”) that contains:

  • its unique primary key (“ID_rel_incorporação_série”);
  • the donation’s foreign key (“ID_incorporação”);
  • the serie’s foreign key (“ID_série”);
  • a full description of the donation’s documents belonging to that serie (“Descrição_série”).

Each series within the donation also have some keywords attached. The keywords, in an identical way to the series, are defined by a table (“Assuntos”) that has the following fields:

  • the keyword’s primary key (“ID_assunto”);
  • the name of the keyword (“Nome_assunto”);
  • the type of the keyword (“Temática”);
  • a description of the keyword (“Descrição”).

Finally, I have made a table for recording which keywords each series has (“Rel_série_assunto”) that contains only two fields, defining, together, the table’s primary key:

  • the foreign key regarding the relationship of the serie with the donation (“ID_rel_incorporação_série”);
  • the foreign key of the keyword (“ID_assunto”).

I have placed a picture of all these tables with all the relationships.

PROBLEM DESCRIPTION

  1. I want to filter the various series that are contained in the donations (any field in “Rel_incorporação_série” table) by its keywords (“ID_assunto”).
  2. I want to search for words in the series’ and in the donation’s descriptions (“Descrição_incorporação” in “Incorporações” table and “Descrição_série” in “Rel_incorporação_série” table), ideally using only one field as the search criteria.
  3. I want to search for words in the entities that donated the documents (“Nome_entidade” in “Incorporações” table).
  4. I don’t want any repetition of information.

Using a filter table, for storing the filter/search criterias, I created a form for this purpose. I am trying to make one big query, using the examples given in the refered pages of the manual, showing all the information on a subform through a single table. The problem with that solution is that the table captures all the repetitions, regarding how much series I have in a donation, but must important, how much keywords I have in each serie.

Here is the query I tried. “Tabela_filtro” refers to the filter table, while “Nome_entidade”, “Descrições” and “ID_assunto1” to “ID_assunto4” to the fields contained in it. Thus, I have added, in the filter table, a field for searching for the name of the entity that donated the documents (“Nome_entidade”), only one field for searching both the donation’s as well as the serie’s descriptions (“Descrições”) and 4 separate fields for filtering multiple keywords (“ID_asssunto1” to “ID_assunto4”).

SELECT * FROM “Incorporações”, “Rel_incorporação_série”, “Rel_série_assunto” WHERE (LOWER (“Nome_entidade”) LIKE IFNULL (( SELECT ‘%’ || LOWER (“Nome_entidade”) || ‘%’ FROM “Tabela_filtro”), LOWER (“Nome_entidade”))) AND (LOWER (“Descrição_incorporação”) LIKE IFNULL ((SELECT ‘%’ || LOWER (“Descrições”) || ‘%’ FROM “Tabela_filtro”), LOWER (“Descrição_incorporação”))) AND (“ID_assunto” = IFNULL((SELECT “ID_assunto1” FROM “Tabela_filtro” ), “ID_assunto” )) AND (“ID_assunto” = IFNULL((SELECT “ID_assunto2” FROM “Tabela_filtro” ), “ID_assunto” )) AND (“ID_assunto” = IFNULL((SELECT “ID_assunto3” FROM “Tabela_filtro” ), “ID_assunto” )) AND (“ID_assunto” = IFNULL((SELECT “ID_assunto4” FROM “Tabela_filtro” ), “ID_assunto” ))

Other solution that I tried was to SELECT only the donation or the series table, but then I couldn’t define all the conditions because some of them returned more than one value.

Maybe it’s better to approach the problem with several subforms or several tables, where the actual real number of donations, series and keywords appears. But I don’t know how to do that.

I simply want to solve the given points, using a form (I am willing, if its impossible to execute, to abdicate 3. and even the capability of search in the donation’s descriptions). Can you help me?

Relationship definition in the query search for that in your help link

this answer replaces the original in order to reflect the objectives as i now understand them.

ok so user pmfs1987 wants to search the form to show only those series which contain ALL of the keywords (filter terms). if no filtration terms are specified then show all records.

this is much more complex than showing the series which contain any of the keywords.

i have moved the filtration code relating to the table “Rel_série_assunto” from the WHERE clause to the FROM clause (no point in doing everything twice).

the reason that i have focused on a single query rather than the data form is an attempt to clarify the necessary steps required.

once the user is able to grasp the concept then form design will be relatively simple.

hope it works! i obviously cannot test it.

select
	a."Nome_entidade", a."ID_CDI", a."Ano_inicial_incorporação", a."Ano_final_incorporação", a."Descrição_incorporação",
	b."Descrição_série", b."Ano_inicial_série", b."Ano_final_série",  d."Nome_série", e."Nome_assunto", e."Temática", e."Descrição"
from
	"Incorporações" a
join "Rel_incorporação_série" b on a."ID_incorporação" = b."ID_incorporação"
join
(	select *
	from "Rel_série_assunto"
	where
	(		(select "ID_assunto1" from "Tabela_filtro") is null
			and
			(select "ID_assunto2" from "Tabela_filtro") is null
			and
			(select "ID_assunto3" from "Tabela_filtro") is null
			and
			(select "ID_assunto4" from "Tabela_filtro") is null
	)
	or
		(
			"ID_rel_incorporação_série"
			in
			(
				select r."ID_rel_incorporação_série"
				from
					"Rel_série_assunto" r,
					(select "ID_assunto1", "ID_assunto2", "ID_assunto3", "ID_assunto4" from "Tabela_filtro") f
				where r."ID_assunto" in(f."ID_assunto1", f."ID_assunto2", f."ID_assunto3", f."ID_assunto4")
				group by r."ID_rel_incorporação_série"
				having count(r."ID_assunto") =
				(
					case when f."ID_assunto1" is not null then 1 else 0 end
					+
					case when f."ID_assunto2" is not null then 1 else 0 end
					+
					case when f."ID_assunto3" is not null then 1 else 0 end
					+
					case when f."ID_assunto4" is not null then 1 else 0 end
				)
			)
		)
) c
on b."ID_rel_incorporação_série" = c."ID_rel_incorporação_série"
join "Séries" d on b."ID_série" = d."ID_série"
join "Assuntos" e on c."ID_assunto" = e."ID_assunto"
where 
(
    lower(a."Nome_entidade") like(select '%' || lower("Nome_entidade") || '%' from "Tabela_filtro")
    or
    (select "Nome_entidade" from "Tabela_filtro") is null
)
and
    ( 
        lower(a."Descrição_incorporação") like(select '%' || lower("Descrições") || '%' from "Tabela_filtro")
        or
        (select "Descrições" from "Tabela_filtro") is null
    )

Sorry, I had to disconnect for a few days.
Your suggestion works, but I want the search form to show me the series that had ALL the of keywords selected (AND), not AT LEAST ONE (OR).

I found out about UNION. Using it with IN I’ve reached the following condition:

(c."ID_assunto" IN (SELECT "ID_assunto1" FROM "Tabela_filtro" UNION SELECT "ID_assunto2" FROM "Tabela_filtro" UNION SELECT "ID_assunto3" FROM "Tabela_filtro"  UNION SELECT "ID_assunto4" FROM "Tabela_filtro") OR ((SELECT "ID_assunto1" FROM "Tabela_filtro" IS NULL) AND (SELECT "ID_assunto2" FROM "Tabela_filtro" IS NULL) AND (SELECT "ID_assunto3" FROM "Tabela_filtro" IS NULL) AND (SELECT "ID_assunto4" FROM "Tabela_filtro" IS NULL)))

But this also shows the series that has AT LEAST ONE of the chosen keywords. But I think with UNION clause I could get a solution to what I want, because the UNION clause can transform the values of the 4 “ID_assuntosn” to a single column. Although I could not reach a solution yet. Any ideas?

Also, I don’t care if the solution to what I want uses one or various queries. I just want a solution, so if you know about one with more than one query it will be perfect.

That worked! Thanks a lot!

I just had to made these additions:

  • Added b.“ID_incorporação”, c.“ID_rel_incorporação_série” and c.“ID_assunto” in the first SELECT to be able to choose these fields for link form to subforms.
  • Added the condition removed OR LOWER (b.“Descrição_série”) LIKE (SELECT ‘%’ || LOWER (“Descrições”) || ‘%’ FROM “Tabela_filtro”) (because I want the words saved on “Descrições” to include any results either from a.“Descrição_incorporação” or b.“Descrição_série” fields).

In the mean time I discover another way to solve the problem and I wanted to ask you which one do you think it’s faster. I will post that in my answer.

@pmfs1987.

it’s gratifying to see that you have achieved your own solution.

query execution depends largely upon the size of the derived table.

while i have filtered out unnecessary data you have added to it.

if you can perceive a difference in execution time between the queries with what is likely to be a relatively small result set then the performance gap will be wide.

i do not think that it’s possible to measure query execution speed in an embedded database.

you should split your database anyway because embedded databases are prone to corruption.

a split database enables the use of the most recent version of that database currently hsqldb 2.5.0.

if you do decide to upgrade then i think that the ‘CASE’ structures will require replacement with:

what an effort to post a comment!!!
struggling to format this.

(		
	case when (select "ID_assunto1" from "Tabela_filtro") is null then 0 else 1 end
	+
 	case when (select "ID_assunto2" from "Tabela_filtro") is null then 0 else 1 end
 	+
 	case when (select "ID_assunto3" from "Tabela_filtro") is null then 0 else 1 end
	+
	case when (select "ID_assunto4" from "Tabela_filtro") is null then 0 else 1 end
)

Thanks for all the help and all the feedback.

Following @mariosv 's tip, and @cpb 's answer, I have reached a solution. Thanks very much for your precious help!

FORM DESIGN

I have recorded the query in a View, a feature explained in the Libreoffice Base Manual pointed by @mariosv ’s tip.

Then, in the search form I added two independent forms: one for the filter table (Form_Tabela_Filtro) and another fed by the big table created by the View feature (Form_Resultados). From this last form I have added subforms showing the information I needed to visualize without repetition, selecting distinct from the big View table and using the master/slave fields to connect the information together (SubForm_Incorporações, SubForm_Séries and SubForm_Assuntos).


The form's structure

SEARCH QUERY

I used the query suggested by @cpb 's answer with a few minor additions to customize it to my needs:

  • Added b.“ID_incorporação”, c.“ID_rel_incorporação_série” and c.“ID_assunto” in the first SELECT to be able to choose these fields for link form to subforms.

  • Added the condition removed “OR LOWER (b.“Descrição_série”) LIKE (SELECT ‘%’ || LOWER (“Descrições”) || ‘%’ FROM “Tabela_filtro”)” because I want the words saved on “Descrições” to include any results either from a.“Descrição_incorporação” or b.“Descrição_série”.

ALTERNATIVE SEARCH QUERY

Although the suggested query works, I have reached another one just by joining 4 duplicates of the “Rel_série_assunto” table and selecting 4 different “ID_assunto” fields (with the help of aliasing). Then I could make the AND conditions regarding the “ID_assunto” field, because now this field is repeated into 4 different fields. Here is the query:

SELECT DISTINCT
	"Nome_entidade", 
	"ID_CDI", 
	"Ano_inicial_incorporação", 
	"Ano_final_incorporação", 
	"Descrição_incorporação", 
	ris."ID_incorporação", 
	ris."ID_série", 
	"Descrição_série", 
	"Ano_inicial_série", 
	"Ano_final_série", 
	rsa."ID_assunto" AS "Assunto 1", 
	rsb."ID_assunto" AS "Assunto 2", 
	rsc."ID_assunto" AS "Assunto 3", 
	rsd."ID_assunto" AS "Assunto 4", 
	rsa."ID_rel_incorporação_série", 
	"Nome_série", 
	"Nome_assunto", 
	"Temática", 
	"Descrição" 
FROM
	"Incorporações" i
	JOIN "Rel_incorporação_série" ris ON i."ID_incorporação" = ris."ID_incorporação"
	JOIN "Rel_série_assunto" rsa ON ris."ID_rel_incorporação_série" = rsa."ID_rel_incorporação_série"
	JOIN "Rel_série_assunto" rsb ON ris."ID_rel_incorporação_série" = rsb."ID_rel_incorporação_série"
	JOIN "Rel_série_assunto" rsc ON ris."ID_rel_incorporação_série" = rsc."ID_rel_incorporação_série"
	JOIN "Rel_série_assunto" rsd ON ris."ID_rel_incorporação_série" = rsd."ID_rel_incorporação_série"
	JOIN "Séries" s ON ris."ID_série" = s."ID_série"
	JOIN "Assuntos" a ON rsa."ID_assunto" = a."ID_assunto"
WHERE 
	(LOWER (i."Nome_entidade") LIKE IFNULL ((SELECT '%' || LOWER ("Nome_entidade") || '%' FROM "Tabela_filtro"), LOWER (i."Nome_entidade"))) 
AND 
	(LOWER (i."Descrição_incorporação") LIKE IFNULL ((SELECT '%' || LOWER ("Descrições") || '%' FROM "Tabela_filtro"), LOWER (i."Descrição_incorporação")) OR (LOWER (ris."Descrição_série") LIKE IFNULL ((SELECT '%' || LOWER ("Descrições") || '%' FROM "Tabela_filtro"), LOWER (ris."Descrição_série"))))
AND 
	"Assunto 1" = IFNULL ((SELECT "ID_assunto1" FROM "Tabela_filtro"), "Assunto 1") AND "Assunto 2" = IFNULL ((SELECT "ID_assunto2" FROM "Tabela_filtro"), "Assunto 2") AND "Assunto 3" = IFNULL ((SELECT "ID_assunto3" FROM "Tabela_filtro"), "Assunto 3") AND "Assunto 4" = IFNULL ((SELECT "ID_assunto4" FROM "Tabela_filtro"), "Assunto 4")

In this solution I also had to get another query for the “SubForm_Assuntos”, that grouped together all 4 columns that I have created again into a single one. Also, I had to use a View for this, because the links of master and slave fields didn’t worked without activating “Analyze SQL command” option (and that option didn’t recognize the UNION clauses in the query). Here is this query used for this purpose:

SELECT DISTINCT "ID_rel_incorporação_série", "Assunto 1" AS "Assunto" 
FROM "Teste" 
WHERE ("Assunto 1" = (SELECT "ID_assunto1" FROM "Tabela_filtro") OR  (SELECT "ID_assunto1" FROM "Tabela_filtro" IS NULL))
UNION SELECT "ID_rel_incorporação_série", "Assunto 2" AS "Assunto" 
FROM "Teste" 
WHERE ("Assunto 2" = (SELECT "ID_assunto2" FROM "Tabela_filtro") OR  (SELECT "ID_assunto2" FROM "Tabela_filtro" IS NULL))
UNION SELECT "ID_rel_incorporação_série", "Assunto 3" AS "Assunto" 
FROM "Teste" 
WHERE ("Assunto 3" = (SELECT "ID_assunto3" FROM "Tabela_filtro") OR  (SELECT "ID_assunto3" FROM "Tabela_filtro" IS NULL))
UNION SELECT "ID_rel_incorporação_série", "Assunto 4" AS "Assunto" 
FROM "Teste" 
WHERE ("Assunto 4" = (SELECT "ID_assunto4" FROM "Tabela_filtro") OR  (SELECT "ID_assunto4" FROM "Tabela_filtro" IS NULL))

It seems that this last solution is slower than @cpb 's answer, in terms of computer’s usage. What do you think about that?

Again thanks very much for the help.

hello pmfs1987,

your code is fine.
you have explicitly declared your table relationships in the ‘WHERE’ clause.
there has been some debate over many years amongst developers as to where these declarations should be made but Fred Toussi the man behind HSQLDB suggests that the ‘FROM’ clause is the correct place and the reason given is that the ‘FROM’ clause is parsed and data is retrieved before the ‘WHERE’ clause is executed. this results in smaller rowsets which speed query execution.

the error you received (you did not describe it) is likely related to duplication of the field “ID_assunto” because you did not explicitly declare it as ‘“Rel_série_assunto”.“ID_assunto”’ or using the alias C ‘C.“ID_assunto”’ e.g.

AND ( C."ID_assunto" = IFNULL( ( SELECT "ID_assunto1" FROM "Tabela_filtro" ), C."ID_assunto" ) )

I just realized that the multiple conditions for “ID_assunto” field are not working when I choose multiple different fields in “Tabela_filtro”.“ID_assunto1”, “Tabela_filtro”.“ID_assunto2”, “Tabela_filtro”.“ID_assunto3” or “Tabela_filtro”.“ID_assunto4”. I have searched for solutions, but they suggest using IN clause, which, in this case, does not seems to be valid (because I can only use one SELECT statement in the IN clause). I could keep only one field in “Tabela_filtro” table to filter the “ID_assunto”, but maybe there is a solution for that. Do you think there is a way to maintain these multiple fields with a different solution in the query (or even to reformulate the search form structure)?

i have updated my original answer yet again with regard to your filtration issue.