I love the command line capability because it allows me to work with documents like never before, however, I was wondering if there was a way to get document info/properties in the same manner.
Maybe something like:
oodocinfo
I love the command line capability because it allows me to work with documents like never before, however, I was wondering if there was a way to get document info/properties in the same manner.
Maybe something like:
oodocinfo
There are probably other who have more experience in writing scripts,
but i submitted a script to pastebin.com
http://pastebin.com/mpMA1qxM
#!/bin/bash
# documentinfo, gives document info about a LibreOffice document.
# Created: 2012-02-19
#
# uses: xml, from http://xmlstar.sourceforge.net/
#
if [ ! -e $1 ] || [[ "$1" == "" ]]; then
echo "Usage: `basename $01` <filename>"
exit 1
fi
if [ -e "meta.xml" ]; then
echo "Sorry, this cannot be done because some 'meta.xml' already exists"
else
# extract 'meta.xml' from the inputfile
unzip -qo $1 meta.xml
for f in `xml el meta.xml`;
do
n=${f/*:}
if [[ ! "$n" =~ "meta" ]]; then
w=`xml sel -t -v "$f" meta.xml`
echo "$n: $w"
fi
done
rm meta.xml
fi
It’s not perfect, but usable:
unzip -c [FILE NAME].odt meta.xml | tr -s " " "\n" | fmt
You could write with this command output a ~/.bashrc function that extracts the desired information, maybe even extend the file
command. Not a one-liner but possible.
That’s nice! Also remember ODF files are ZIP files, for which you can obtain detailed information using any ZIP file utility (zipinfo for example in GNU/Linux).
$> file test.doc test.doc: Composite Document File V2 Document, Little Endian, Os: Windows, Version 1.0, Code page: -535, Title: this is the title, Subject: this is the subject, Keywords: some keywords here, Comments: and some uninteresting comments here, Revision Number: 1, Total Editing Time: 01:05, Create Time/Date: Fri Feb 17 13:50:36 2012, Last Saved Time/Date: Fri Feb 17 13:51:39 2012
But this only works when saved to doc…
$> file test.odt test.odt: OpenDocument Text
enter code here
OpenDokument files are zip-files that contain a bunch of xml and other files - the document info/properties (what in the UI is available via File|Properties) are stored in meta.xml inside that archive.
So when you want to read it from the commandline, you need write a little utility in your language of choice that extracts the info from meta.xml and prints it out.
I ended up with a small script to find out the number of pages. It is based on Luuk’s answer and I published it here: https://github.com/migmruiz/opendocument-utils
to use it, just do
wget https://raw.github.com/migmruiz/opendocument-utils/master/documentpages
chmod +x documentpages
./documentpages <filename.od?>
You can see it and adapt if you want
if [ ! -e $1 ] || [[ "$1" == "" ]]; then
echo "Usage: `basename $01` <filename>"
exit 1
fi
if [ -e "content.xml" ]; then
echo "Sorry, this cannot be done because some 'content.xml' already exists"
else
# extract 'content.xml' from the inputfile
unzip -qo $1 content.xml
let "i = 1";
for f in `xmlstarlet el content.xml`;
do
n=${f/*:}
if [[ "$n" =~ "page-break" ]]; then
let "i += 1";
fi
done
rm content.xml
echo $i;
fi
You can use the ExifTool to extract document information from many types of files, including Open Document Format files.
For example, after installing, you can extract all information using the command
exiftool -a file.odt
and get for example
ExifTool Version Number : 9.67
File Name : exiftest.odt
Directory : D:/Software
File Size : 9.1 kB
File Modification Date/Time : 2014:07:06 18:39:05+02:00
File Access Date/Time : 2014:07:06 18:39:05+02:00
File Creation Date/Time : 2014:07:06 18:39:04+02:00
File Permissions : rw-rw-rw-
File Type : ODT
MIME Type : application/vnd.oasis.opendocument.text
Initial-creator : Firstname Lastname
Creation-date : 2014:07:06 18:37:48.864000000
Editing-cycles : 1
Editing-duration : P0D
Description : En een beetje commentaar.
Keyword : een
Keyword : sleutelwoord
Subject : Blaat
Title : Test
Date : 2014:07:06 18:39:04.738000000
Creator : Firstname Lastname
Document-statistic Table-count : 0
Document-statistic Image-count : 0
Document-statistic Object-count : 0
Document-statistic Page-count : 1
Document-statistic Paragraph-count: 1
Document-statistic Word-count : 3
Document-statistic Character-count: 17
Document-statistic Non-whitespace-character-count: 15
Generator : LibreOffice/4.2.4.2$Windows_x86 LibreOffice_project/63150712c6d317d27ce2db16eb94c2f3d7b699f8
User-defined Name : Department
User-defined : My Department
User-defined Name : Extra
User-defined : nog wat.
Preview PNG : (Binary data 1365 bytes, use -b option to extract)