%0 Conference Proceedings %T Mining the Web for Discourse Markers %A Hutchinson, Ben %Y Lino, Maria Teresa %Y Xavier, Maria Francisca %Y Ferreira, Fátima %Y Costa, Rute %Y Silva, Raquel %S Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04) %D 2004 %8 May %I European Language Resources Association (ELRA) %C Lisbon, Portugal %F hutchinson-2004-mining %X This paper proposes a methodology for obtaining sentences containing discourse markers from the World Wide Web. The proposed methodology is particularly suitable for collecting large numbers of discourse marker tokens. It relies on the automatic identification of discourse markers, and we show that this can be done with an accuracy within 9% of that of human performance. We also show that the distribution of discourse markers on the web correlates highly with those in a conventional balanced corpus. %U http://www.lrec-conf.org/proceedings/lrec2004/pdf/333.pdf