XTAGGER
XGTagger est une interface générique traitant le texte contenu
dans des documents XML, conçue et réalisée
par Xavier Tannier. Il ne fonctionne pas seul, mais englobe
n’importe quel système d’analyse textuelle souhaité
par l’utilisateur.
Il utilise le concept de "contexte de lecture" [2]
et la classification des balises transparentes (soft tags),
balises de saut (jump tags) et balises dures (hard tags) [1].
Cliquez ici pour télécharger XGTagger
Vous pouvez également consulter le manuel d’utilisation (en anglais),
disponible
en ligne, en postscript ou en pdf.
Bibliographie
[1] Luca Lini, Daniella Lombardini, Michele Paoli, Dario
Colazzo, and Carlo Sartiani.
XTReSy : A Text Retrieval System for XML documents.
In Dino Buzzetti, Harold Short, and Giuliano Pancalddella,
editors, Augmenting Comprehension : Digital Tools for
the History of Ideas. Office for Humanities Communication
Publications, King’s College, London, 2001.
[2] Xavier Tannier.
Dealing with XML structure through
"reading contexts".
Research
Report G2I-EMSE 2005-400-007, April 2005, Ecole des
Mines de Saint-Etienne, 16 pages.
Exemple

Schéma de fonctionnement général de XGTagger
Etiquetage morphosyntaxique
Locutions
Analyse syntaxique
Traduction
Etiquetage morphosyntaxique
Exemple court
Document XML initial :
<sentence>The <bold>cat</bold> jumps on the table</sentence>
Texte donné au système S :
The cat jumps on the table
Exemple de texte retourné par S (Brill Tagger) :
The/DT cat/NN jumps/VVZ on/IN the/DT table/NN
Exemple de document final en sortie :
<sentence>
- <w id=’’1’’ pos=’’DT’’>The</w>
<bold>
- <w id=’’2’’ pos=’’NN’’>cat</w>
</bold>
<w id=’’3’’ pos=’’VVZ’’>jumps</w>
<w id=’’4’’ pos=’’IN’’>on</w>
<w id=’’5’’ pos=’’DT’’>the</w>
<w id=’’6’’ pos=’’NN’’>table</w>
</sentence>
Exemple long
Document XML initial :
<article>
- <title>Visit I<sc>stanbul</sc> and
M<sc>armara</sc> Region</title> <par>
- This former capital of three empires<footnote>Istanbul
has successively been the capital of Roman, Byzantine
and Ottoman empires</footnote> is now the
capital of <bold>Turkey</bold>.
...
</par>
</article>

Texte donné au système
S :
Visit Istanbul and Marmara Region . This former capital
of three empires is now the capital of Turkey. . Istanbul
has successively been the capital of Roman, Byzantine and
Ottoman empires

Exemple de texte retourné par
S (TreeTagger) :
| Visit |
VV |
visit |
| Istanbul |
NP |
Istanbul |
| and |
CC |
and |
| Marmara |
NP |
Marmara |
| Region |
NN |
region |
| . |
SENT |
. |
| This |
DT |
this |
| former |
JJ |
former |
| capital |
NN |
capital |
| of |
IN |
of |
| three |
CD |
three |
| empires |
NNS |
empire |
| is |
VBZ |
be |
| now |
RB |
now |
| the |
DT |
the |
| capital |
NN |
capital |
| of |
IN |
of |
| Turkey |
NP |
Turkey |
| . |
SENT |
. |
| Istanbul |
NP |
Istanbul |
| has |
VHZ |
have |
| successively |
RB |
successively |
| been |
VBN |
be |
| the |
DT |
the |
| capital |
NN |
capital |
| of |
IN |
of |
| Roman |
NP |
Roman |
| , |
, |
, |
| Byzantine |
JJ |
Byzantine |
| and |
CC |
and |
| Ottoman |
NP |
Ottoman |
| empires |
NNS |
empire |

Exemple de document final en sortie :
<article>
- <title>
- <w id=’’1’’ pos=’’VV’’ lem=’’visit’’>Visit</w>
<w id=’’2’’ pos=’’NP’’ lem=’’Istanbul’’>I</w>
<sc>
- <w id=’’2’’ pos=’’NP’’ lem=’’Istanbul’’>stanbul</w>
</sc>
<w id=’’3’’ pos=’’CC’’ lem=’’and’’>and</w>
<w id=’’4’’ pos=’’NP’’ lem=’’Marmara’’>M</w>
<sc>
- <w id=’’4’’ pos=’’NP’’ lem=’’Marmara’’>armara</w>
</sc>
<w id=’’5’’ pos=’’NN’’ lem=’’region’’>region</w>
</title>
<par>
- <w id=’’7’’ pos=’’DT’’ lem=’’this’’>This</w>
<w id=’’8’’ pos=’’JJ’’ lem=’’former’’>former</w>
<w id=’’9’’ pos=’’NN’’ lem=’’capital’’>capital</w>
<w id=’’10’’ pos=’’IN’’ lem=’’of’’>of</w>
<w id=’’11’’ pos=’’CD’’ lem=’’three’’>three</w>
<w id=’’12’’ pos=’’NNS’’ lem=’’empire’’>empires</w>
<footnote>
- <w id=’’20’’ pos=’’NP’’ lem=’’Istanbul’’>Istanbul</w>
<w id=’’21’’ pos=’’VHZ’’ lem=’’have’’>has</w>
<w id=’’22’’ pos=’’RB’’ lem=’’successively’’>successively</w>
...
<w id=’’31’’ pos=’’NP’’ lem=’’Ottoman’’>Ottoman</w>
<w id=’’32’’ pos=’’NNS’’ lem=’’empire’’>empires</w>
</footnote>
<w id=’’13’’ pos=’’VBZ’’ lem=’’be’’>is</w>
<w id=’’14’’ pos=’’RB’’ lem=’’now’’>now</w>
<w id=’’15’’ pos=’’DT’’ lem=’’the’’>the</w>
<w id=’’16’’ pos=’’NN’’ lem=’’capital’’>capital</w>
<w id=’’17’’ pos=’’IN’’ lem=’’of’’>of</w>
<bold>
- <w id=’’18’’ pos=’’NP’’ lem=’’Turkey’’>Turkey</w>
</bold>
</par>
</article>
Locutions
Document XML initial :
<sentence>
- I did it in order to clarify matters
</sentence>

Texte donné au système
S :
I did it in order to clarify matters

Exemple de texte retourné par
S :
| I |
PP |
| did |
VVD |
| it |
PP |
| in///order///to |
LOC |
| clarify |
VV |
| matters |
NNS |

Exemple de document final en sortie :
<sentence>
- <w id=’’1’’ pos=’’PP’’ word=’’I’’>I</w>
<w id=’’2’’ pos=’’VVD’’ word=’’do’’>did</w>
<w id=’’3’’ pos=’’PP’’ word=’’it’’>it</w>
<w id=’’4’’ pos=’’LOC’’ word=’’in///order///to’’>in</w>
<w id=’’4’’ pos=’’LOC’’ word=’’in///order///to’’>order</w>
<w id=’’4’’ pos=’’LOC’’ word=’’in///order///to’’>to</w>
<w id=’’5’’ pos=’’VV’’ word=’’clarify’’>clarify</w>
<w id=’’6’’ pos=’’NNS’’ word=’’matter’’>matters</w>
</sentence>
Analyse syntaxique
Document XML initial :
<english_sentence>
- He has a taste<gloss>Taste : preference, a strong
liking</gloss>
for danger
</english_sentence>

Texte donné au système
S :
He has a taste for danger . Taste : preference, a strong
liking .

Exemple de texte retourné par
S :
He has a taste_for_danger/NP . Taste : preference, a
strong liking .

Exemple de document final en sortie :
<english_sentence>
- <w id=’’1’’>He</w>
<w id=’’2’’>has</w>
<w id=’’3’’>a</w>
<w id=’’4’’ pos=’’NP’’>taste</w>
<gloss>
- <w id=’’6’’>Taste :</w>
<w id=’’7’’>preference,</w>
...
<w id=’’10’’>liking</w>
</gloss>
<w id=’’4’’ pos=’’NP’’>for</w>
<w id=’’4’’ pos=’’NP’’>danger</w>
</english_sentence>
Traduction
Document XML initial :
<sentence>I had a conversation with my brother</sentence>
Texte donné au système S :
I had a conversation with my brother
Exemple de texte retourné par S :
| I |
| had |
| a |
| conversation/entretien, conversation/Gespräch |
| with |
| my |
| brother/frère/Bruder |

Exemple de document final en sortie :
<sentence>
- <w>I</w>
<w>had</w>
<w>a</w>
<w french=’’entretien, conversation’’ german=’’Gespräch’’>conversation</w>
<w>with</w>
<w>my</w>
<w french=’’frère’’ german=’’Bruder’’>brother</w>
</sentence>