DiSeg
 A Discourse Segmenter for Spanish

DiSeg is the first discourse segmenter for Spanish using the framework of the Rhetorical Structure Theory (Mann and Thompson, 1988) and based on lexical and syntactic rules. The system can be tested here.

One of the best ways to evaluate a discourse segmenter is comparing its results with the results of other similar available systems. However, we have developed the first discourse segmenter for Spanish, so we cannot use another system for its evaluation. We have carried a gold standard in order to encourage other researchers to go on investigating in this field. You can consult the original texts and the discourse text segmentations into the following table. The segmentations are xml files.







SPANISH GOLD STANDARD
Original MEDICAL texts  Segmented  MEDICAL texts
Text 1  Discourse segmentation Text 1
Text 2  Discourse segmentation Text 2
Text 3  Discourse segmentation Text 3
Text 4  Discourse segmentation Text 4
Text 5  Discourse segmentation Text 5
Text 6  Discourse segmentation Text 6
Text 7  Discourse segmentation Text 7
Text 8  Discourse segmentation Text 8
Text 9  Discourse segmentation Text 9
Text 10   Discourse segmentation Text 10
Text 11  Discourse segmentation Text 11
Text 12  Discourse segmentation Text 12
Text 13  Discourse segmentation Text 13
Text 14  Discourse segmentation Text 14
Text 15  Discourse segmentation Text 15
Text 16  Discourse segmentation Text 16
Text 17  Discourse segmentation Text 17
Text 18  Discourse segmentation Text 18
Text 19  Discourse segmentation Text 19
Text 20  Discourse segmentation Text 20
Original LINGUISTIC texts Segmented  LINGUISTIC texts
Text 21  Discourse segmentation Text 21
Text 22  Discourse segmentation Text 22
Text 23  Discourse segmentation Text 23
Text 24  Discourse segmentation Text 24
Text 25  Discourse segmentation Text 25
Text 26  Discourse segmentation Text 26
Text 27  Discourse segmentation Text 27
Text 28  Discourse segmentation Text 28
Text 29  Discourse segmentation Text 29
Text 30  Discourse segmentation Text 30
              
  Download Full DiSeg Corpus zipped

Do you like DiSeg? If you want to use this corpus, please cite us as follows:


IULALIA UB

©2010 IULA / LIA / UB