Hammer and nails

Thursday, May 11, 2006

Hypothesis and plans ahead

11:20- 12:20 Meeting with Anoop.

Scenerial: Multiple question based summarization.

Finding the hypothesis:

There are two places that multiple questions might affect the ordering stratege and the quality of the final summary.

1. If the shallow semantic-related(concept ID overlap) sentences answer different questions, then they should be put in the different cluster group in the summary.

2. Use questions to help picking the sentences for ordering. Since the input to the Editor is the twice the size of the final summary. Based on the importance score of the sentences from Extractor, it is possible that all the sentences that answer a specific questions ranked higher while ommiting the other questions. By using question-directed clustering, it garantees that every question is answered in a good portion.


Planned work:
Check out the DUC05 data, based on PE results and human summary. Identify the relations between sentences answering different questions.

Work plan:
1. Identify summary/topic that has higher Responsiveness and Rouge Score to garantee the relateness of sentence.
2. Manually create an order of the summaries. Identify the lexical cohesions.
3. Discover the patterns of human summary.

Dada analysis Plan: DUC05 data
1. Analyze the human summary and see if they structure summaries by questions.
2. Get all the SCU marked sentences and divide into question group, according to human summaries. Then create new ordered summaries (human work involves).

Question for Gabor: Did extractor extracted sentences question by question or consider them as a whole?
Answer: Extractor treated the questions as a bag of WordNet concepts. It was unmoved by the number of questions asked.

Get DUC2006 data topic analysis done:
Discover: Squash did equally well in responsiveness in Content and Overall. Some topics do improve in overall based on the content info. Will check out by reading summaries tomorrow.

0 Comments:

Post a Comment

<< Home