OOfice.org (nebo doc) ->LaTeX

Prochocky_Marian at sepsas.sk Prochocky_Marian at sepsas.sk
Fri Sep 24 08:06:06 CEST 2004


Ako tak pozeram do dokumentacie antiwordu (ver. 0.35; v starsej verzii
0.32, ktoru som doteraz pouzival to este nebolo), pomocou prepinaca -x db
je mozna konverzia z Wordu aj do xml podla DTD DocBook. V tomto pripade je
pouzitie taketo:

antiword -m cp1250.txt -x db aaa.doc > aaa.xml

Vysledkom bude xml podla DTD DocBook v kodovani UTF-8 (problemy s
diakritikou som nepostrehol). Kvalita prevodu je samozrejme zavisla od
"cistoty" zdrojoveho suboru vo formate .doc (vacsina autorov pisucich vo
Worde nepouziva logicke znacenie pomocou stylov), takze zazraky sa nedaju
cakat ani tu.

Zatial mi vychadza najrychlejsi prevod do cisteho textu a jeho nasledne
(rucne) oznackovanie, pokial sa jedna zvacsa o hladky text. Ako to vyzera s
formatovanim dokumentu je mozne zistit prevodom dokumentu do PostScriptu,
vysledny PS subor ma ale problemy s diakritikou (natvrdo definuje prevod
znakov podla ISO8859-1)**, ale ako nahlad formatovania hladkeho textu plne
postaci (na zistenie, kde je kurziva, tucne pismo, velkosti pisma). Priklad
prevodu do PS suboru s velkostou strany A4:

antiword -m -p a4 aaa.doc > aaa.ps




** Mozno pre niekoho nemusi byt problemom nahradit definiciu prevodu v
hlavicke PS suboru, tak aby diakritika bola v poriadku. Pre ilustraciu
uvadzam ukazku hlavicky:

%!PS-Adobe-2.0
%%Title: aaa.doc
%%Creator: ANTIWORD 0.35  (14 Nov 2003)
%%For: unknown
%%CreationDate: Fri Sep 24 07:15:04 2004
%%Orientation: Portrait
%%BoundingBox: 0 0 595 842
%%DocumentFonts: Times-Roman Times-Bold Helvetica Helvetica-Bold
%%+ Helvetica-Oblique Helvetica-BoldOblique Courier
%%Pages: (atend)
%%EndComments
%%BeginProlog
/newcodes   % ISO-8859-1 character encodings
[
140/ellipsis 141/trademark 142/perthousand 143/bullet
144/quoteleft 145/quoteright 146/guilsinglleft 147/guilsinglright
148/quotedblleft 149/quotedblright 150/quotedblbase 151/endash 152/emdash
153/minus 154/OE 155/oe 156/dagger 157/daggerdbl 158/fi 159/fl
160/space 161/exclamdown 162/cent 163/sterling 164/currency
165/yen 166/brokenbar 167/section 168/dieresis 169/copyright
170/ordfeminine 171/guillemotleft 172/logicalnot 173/hyphen 174/registered
175/macron 176/degree 177/plusminus 178/twosuperior 179/threesuperior
180/acute 181/mu 182/paragraph 183/periodcentered 184/cedilla
185/onesuperior 186/ordmasculine 187/guillemotright 188/onequarter
189/onehalf 190/threequarters 191/questiondown 192/Agrave 193/Aacute
194/Acircumflex 195/Atilde 196/Adieresis 197/Aring 198/AE 199/Ccedilla
200/Egrave 201/Eacute 202/Ecircumflex 203/Edieresis 204/Igrave 205/Iacute
206/Icircumflex 207/Idieresis 208/Eth 209/Ntilde 210/Ograve 211/Oacute
212/Ocircumflex 213/Otilde 214/Odieresis 215/multiply 216/Oslash
217/Ugrave 218/Uacute 219/Ucircumflex 220/Udieresis 221/Yacute 222/Thorn
223/germandbls 224/agrave 225/aacute 226/acircumflex 227/atilde
228/adieresis 229/aring 230/ae 231/ccedilla 232/egrave 233/eacute
234/ecircumflex 235/edieresis 236/igrave 237/iacute 238/icircumflex
239/idieresis 240/eth 241/ntilde 242/ograve 243/oacute 244/ocircumflex
245/otilde 246/odieresis 247/divide 248/oslash 249/ugrave 250/uacute
251/ucircumflex 252/udieresis 253/yacute 254/thorn 255/ydieresis
] bind def






More information about the csTeX mailing list