How do I get document information from the command line?

I love the command line capability because it allows me to work with documents like never before, however, I was wondering if there was a way to get document info/properties in the same manner.

Maybe something like:

oodocinfo

There are probably other who have more experience in writing scripts,
but i submitted a script to pastebin.com
http://pastebin.com/mpMA1qxM

#!/bin/bash
# documentinfo, gives document info about a LibreOffice document.
# Created: 2012-02-19
#
# uses: xml, from http://xmlstar.sourceforge.net/
#

if [ ! -e $1 ] || [[ "$1" == "" ]]; then
        echo "Usage: `basename $01` <filename>"
        exit 1
fi

if [ -e "meta.xml" ]; then
        echo "Sorry, this cannot be done because some 'meta.xml' already exists"
else
        # extract 'meta.xml' from the inputfile
        unzip -qo $1 meta.xml

        for f in `xml el meta.xml`;
        do
                n=${f/*:}
                if [[ ! "$n" =~ "meta" ]]; then
                        w=`xml sel -t -v "$f" meta.xml`
                        echo "$n: $w"
                fi
        done
        rm meta.xml
fi

It’s not perfect, but usable:

unzip -c [FILE NAME].odt meta.xml | tr -s " " "\n" | fmt

You could write with this command output a ~/.bashrc function that extracts the desired information, maybe even extend the file command. Not a one-liner but possible.

:slight_smile:

That’s nice! Also remember ODF files are ZIP files, for which you can obtain detailed information using any ZIP file utility (zipinfo for example in GNU/Linux).

$> file test.doc test.doc: Composite Document File V2 Document, Little Endian, Os: Windows, Version 1.0, Code page: -535, Title: this is the title, Subject: this is the subject, Keywords: some keywords here, Comments: and some uninteresting comments here, Revision Number: 1, Total Editing Time: 01:05, Create Time/Date: Fri Feb 17 13:50:36 2012, Last Saved Time/Date: Fri Feb 17 13:51:39 2012

But this only works when saved to doc…

$> file test.odt test.odt: OpenDocument Text

enter code here

OpenDokument files are zip-files that contain a bunch of xml and other files - the document info/properties (what in the UI is available via File|Properties) are stored in meta.xml inside that archive.

So when you want to read it from the commandline, you need write a little utility in your language of choice that extracts the info from meta.xml and prints it out.

I ended up with a small script to find out the number of pages. It is based on Luuk’s answer and I published it here: https://github.com/migmruiz/opendocument-utils

to use it, just do

wget https://raw.github.com/migmruiz/opendocument-utils/master/documentpages
chmod +x documentpages
./documentpages <filename.od?>

You can see it and adapt if you want

if [ ! -e $1 ] || [[ "$1" == "" ]]; then
  echo "Usage: `basename $01` <filename>"
  exit 1
fi

if [ -e "content.xml" ]; then
  echo "Sorry, this cannot be done because some 'content.xml' already exists"
else
  # extract 'content.xml' from the inputfile
  unzip -qo $1 content.xml

  let "i = 1";

  for f in `xmlstarlet el content.xml`;
  do
    n=${f/*:}
    if [[ "$n" =~ "page-break" ]]; then
      let "i += 1";
    fi
    done
    rm content.xml
    echo $i;
fi

You can use the ExifTool to extract document information from many types of files, including Open Document Format files.

For example, after installing, you can extract all information using the command

exiftool -a file.odt

and get for example

ExifTool Version Number         : 9.67
File Name                       : exiftest.odt
Directory                       : D:/Software
File Size                       : 9.1 kB
File Modification Date/Time     : 2014:07:06 18:39:05+02:00
File Access Date/Time           : 2014:07:06 18:39:05+02:00
File Creation Date/Time         : 2014:07:06 18:39:04+02:00
File Permissions                : rw-rw-rw-
File Type                       : ODT
MIME Type                       : application/vnd.oasis.opendocument.text
Initial-creator                 : Firstname Lastname
Creation-date                   : 2014:07:06 18:37:48.864000000
Editing-cycles                  : 1
Editing-duration                : P0D
Description                     : En een beetje commentaar.
Keyword                         : een
Keyword                         : sleutelwoord
Subject                         : Blaat
Title                           : Test
Date                            : 2014:07:06 18:39:04.738000000
Creator                         : Firstname Lastname
Document-statistic Table-count  : 0
Document-statistic Image-count  : 0
Document-statistic Object-count : 0
Document-statistic Page-count   : 1
Document-statistic Paragraph-count: 1
Document-statistic Word-count   : 3
Document-statistic Character-count: 17
Document-statistic Non-whitespace-character-count: 15
Generator                       : LibreOffice/4.2.4.2$Windows_x86 LibreOffice_project/63150712c6d317d27ce2db16eb94c2f3d7b699f8
User-defined Name               : Department
User-defined                    : My Department
User-defined Name               : Extra
User-defined                    : nog wat.
Preview PNG                     : (Binary data 1365 bytes, use -b option to extract)