Hammer and nails

Tuesday, May 16, 2006

Weekly updates --- Linguistics or newpaper reader...

Read extensively of DUC 05 and DUC06 data recently, and try to find good examples for the hypothesis of topic clustering. To my disappointment, only found one so far. Some topics, the clustering method failed to group them into the right category.

Based on the analysis of DUC 06 data, many of those who have higher Structure and Coherence score have only one question. So topic clustering here doesn't work at all.

Those topics that system performs well are due to several reasons:
1. Most sentences are the first sentence in the original doc;
2. Neighbouring sentences from the same doc are grouped together.
3. Lexical cohesion.

Another discovery is that Structure and coherence score have some correlations with the Content responsiveness. This means if an article has a good content selection, chances of its being coherent are high. If it is poor content selected, it will most likely be rated as incoherent. So coherence here really counts on the content.

One more possible try is to use the human summary , and check if the topic clustering help to group or not. Since the human summary answer questions by topic.

Got pretty frustrated. :( Sentence ordering is too subjective, so part of the research is to be a newspaper reader?

0 Comments:

Post a Comment

<< Home