Suggestion: consolidate tags

We current have almost 16000 tags. Some of them are typos or duplicates, which I guess makes searches more difficult and might confuse future question askers. Discourse’s tagging UI is arguably better than AskBot and given the large number of tags we already have we don’t have to allow tag creation to just created (TL0) users no? If so, then I assume typos won’t accumulate as much anymore and propose to consolidate existing tags. In the table from post #4 you’ll find all tags for which a counterpart exists with a trailing ‘s’, with topic counts and grouped by categories. The list needs to be reviewed of course, but typically one version is a lot more popular than the other and it seems like a lot of tags can be merged.

If there is interest in that consolidation, I can also provide a list of pairs of tags of Levenshtein distance 1 and 2, but the number of false positives will likely be higher. Either way please save yourself the trouble of merging the tags by hand, this can be scripted :-).

Speaking about tags, Discourse has some features we might find useful:

  • Auto-tag questions based on words found in the first post: for instance if the first post is “The border around the cells are of the wrong color.” we could automatically tag the question with ‘border’, ‘cell’ and ‘color’. No complex AI though, only simple word and regexp match.
  • Restrict tags by category: some tags only make sense for a single language, and listing them in all categories arguably “pollutes” the tag selection UI. Tags for which every single topic belongs to the same category (language) could be restricted to that category. Given our large pool of tags I don’t think we’ll have too many false positives as long as we don’t add another language.
1 Like

Completely agree.

+1.

  • Tag synonyms: if people keep adding plural versions, or for the most common typos, we might consider creating aliases. But AFAICT this is a global setting not per category, so we need to be careful if the typoed/plural version is a tag that makes sense in another language. Beside singular vs. plural, another obvious use-case is for the various en_* spelling; as an example I just merged ‘colour’ into ‘color’ and made the former an alias to the latter.

English

${TAG} ${TAG}s
windows-10 ×1875 windows-10s ×2
math ×537 maths ×4
xl ×2 xls ×154
new ×64 news ×1
wp ×1 wps ×13
right ×27 rights ×3
permission ×29 permissions ×6
recent-document ×9 recent-documents ×43
recent-file ×2 recent-files ×8
stream ×4 streams ×1
window ×5 windows ×17
reference ×36 references ×12
if ×3 ifs ×8
countif ×56 countifs ×22
require ×5 requires ×2
text-function ×2 text-functions ×5
averageif ×7 averageifs ×3
master-page ×3 master-pages ×6
resource ×4 resources ×2
show-change ×2 show-changes ×1
xp ×2 xps ×1
sumif ×48 sumifs ×26
module ×3 modules ×2
part ×3 parts ×2
na ×4 nas ×5
ff ×1 ffs ×1
http ×9 https ×9

Deutsch

${TAG} ${TAG}s
window ×2 windows ×20
io ×1 ios ×2
aktuelle ×1 aktuelles ×1
automatische ×1 automatisches ×1
extention ×1 extentions ×1
geht ×1 gehts ×1

Português do Brasil

${TAG} ${TAG}s
lista ×15 listas ×1
formulario ×13 formularios ×2
tabela ×48 tabelas ×11
planilha ×77 planilhas ×18
título ×4 títulos ×1
arquivo ×35 arquivos ×9
hiperlink ×19 hiperlinks ×5
gráfico ×18 gráficos ×5
documento ×14 documentos ×4
modelo ×2 modelos ×7
somase ×2 somases ×7
consulta ×10 consultas ×3
borda ×2 bordas ×6
countif ×3 countifs ×1
diferente ×3 diferentes ×1
nota ×1 notas ×3
numero ×1 numeros ×3
palavra ×2 palavras ×6
protegida ×3 protegidas ×1
fórmula ×29 fórmulas ×10
grafico ×14 graficos ×5
relatório ×14 relatórios ×5
célula ×51 células ×19
tecla ×5 teclas ×2
página ×19 páginas ×8
número ×9 números ×4
campo ×5 campos ×10
capítulo ×1 capítulos ×2
contse ×14 contses ×7
da ×2 das ×1
estilo ×8 estilos ×16
etiqueta ×4 etiquetas ×8
feriado ×2 feriados ×1
hora ×5 horas ×10
legenda ×4 legendas ×2
medida ×1 medidas ×2
régua ×2 réguas ×1
tema ×2 temas ×4
time ×1 times ×2
multipla ×5 multiplas ×3
referência ×5 referências ×3
celula ×8 celulas ×5
pagina ×7 paginas ×11
maiúscula ×3 maiúsculas ×2
na ×3 nas ×2
parâmetro ×2 parâmetros ×3
rótulo ×3 rótulos ×2
figura ×10 figuras ×7
dia ×3 dias ×4
linha ×32 linhas ×24
ícone ×5 ícones ×6
fonte ×14 fontes ×12
coluna ×21 colunas ×20
aplicativo ×1 aplicativos ×1
branca ×1 brancas ×1
centímetro ×1 centímetros ×1
empresa ×1 empresas ×1
idade ×1 idades ×1
janela ×4 janelas ×4
minúscula ×2 minúsculas ×2
múltipla ×2 múltiplas ×2
nomeada ×1 nomeadas ×1
oculta ×1 ocultas ×1
pergunta ×1 perguntas ×1
pontilhada ×1 pontilhadas ×1
ponto ×3 pontos ×3
resultado ×1 resultados ×1
símbolo ×1 símbolos ×1
todo ×1 todos ×1
usuário ×2 usuários ×2

Español

${TAG} ${TAG}s
math ×32 maths ×2
consulta ×8 consultas ×1
lista ×2 listas ×12
hoja ×13 hojas ×3
pagina ×12 paginas ×3
página ×13 páginas ×4
borde ×2 bordes ×6
diapositiva ×1 diapositivas ×3
enlace ×9 enlaces ×3
forma ×1 formas ×3
grafico ×2 graficos ×6
nota ×2 notas ×6
hiperenlace ×11 hiperenlaces ×4
fila ×3 filas ×8
referencia ×3 referencias ×8
columna ×7 columnas ×18
animacione ×1 animaciones ×2
cabecera ×4 cabeceras ×2
calculado ×2 calculados ×1
carpeta ×2 carpetas ×1
casilla ×2 casillas ×1
desplegable ×6 desplegables ×3
directo ×1 directos ×2
duda ×1 dudas ×2
hipervínculo ×2 hipervínculos ×1
hora ×4 horas ×2
línea ×2 líneas ×1
menú ×4 menús ×2
objeto ×2 objetos ×4
punto ×2 puntos ×1
rango ×2 rangos ×1
tabla ×17 tablas ×32
estilo ×11 estilos ×19
salto ×5 saltos ×3
encabezado ×2 encabezados ×3
fórmula ×2 fórmulas ×3
campo ×15 campos ×20
informe ×3 informes ×4
número ×4 números ×3
formulario ×17 formularios ×13
archivo ×10 archivos ×8
celda ×16 celdas ×20
documento ×11 documentos ×9
cita ×1 citas ×1
countif ×1 countifs ×1
cuadro ×1 cuadros ×1
dañado ×1 dañados ×1
doble ×1 dobles ×1
externo ×2 externos ×2
fuente ×6 fuentes ×6
gráfica ×2 gráficas ×2
gráfico ×2 gráficos ×2
io ×1 ios ×1
marco ×2 marcos ×2
plantilla ×6 plantillas ×6
repositorio ×1 repositorios ×1
tema ×2 temas ×2
ventana ×2 ventanas ×2

Nederlands

${TAG} ${TAG}s
notitie ×2 notities ×1
regel ×1 regels ×2
suggestie ×2 suggesties ×1
versie ×2 versies ×1
functie ×3 functies ×2
hoofdletter ×1 hoofdletters ×1

Last updated on Sun, 08 Aug 2021 at 23:22:11 +0000

While digging into that I noticed that tags are sometimes misused/abused.

Some users (though interestingly not so much in English) take every single word in their question title and set it the tag list, yielding high tag counts for pronouns and connectors. This really defeats the point of tagging in the first place.

“Compound” tags are also sometimes tagged as “word₁-word₂” but others as “word₁_word₂”, “word₁word₂” or even as 2 tags ”word₁”+“word₂”. I think the latter is not ideal as it forces users to use the advanced search. A classic example of this is “regular expressions”.

Over half the tags (nearly 8000) are used for a single topic. That’s arguably not so useful and we could just remove these when said topic was posted years ago. Dunno how far we want to go, but on that note ⅔ of the tags are used for ≤2 topics, and about ¾ are used for ≤3 topics.

Can we start with just dropping all tags with a single use?

Absolutely, but excluding those at Levenshtein-distance ≤1 of some other tag for now (thought we might want to review the remaining tags and merge the typoed ones), as well as those which were used in the past 6 months. Removed 4907 single-use tags, and there are now 2847 left.

2 Likes