Masoud Rouhizadeh


2021

Supervised approaches usually achieve the best performance in the Word Sense Disambiguation problem. However, the unavailability of large sense annotated corpora for many low-resource languages make these approaches inapplicable for them in practice. In this paper, we mitigate this issue for the Persian language by proposing a fully automatic approach for obtaining Persian SemCor (PerSemCor), as a Persian Bag-of-Word (BoW) sense-annotated corpus. We evaluated PerSemCor both intrinsically and extrinsically and showed that it can be effectively used as training sets for Persian supervised WSD systems. To encourage future research on Persian Word Sense Disambiguation, we release the PerSemCor in http://nlp.sbu.ac.ir.

2019

In this paper, we presented a WSD system that uses LDA topics for semantic expansion of document words. Our system also uses sense frequency information from SemCor to give higher priority to the senses which are more probable to happen.

2018

Individuals express their locus of control, or “control”, in their language when they identify whether or not they are in control of their circumstances. Although control is a core concept underlying rhetorical style, it is not clear whether control is expressed by how or by what authors write. We explore the roles of syntax and semantics in expressing users’ sense of control –i.e. being “controlled by” or “in control of” their circumstances– in a corpus of annotated Facebook posts. We present rich insights into these linguistic aspects and find that while the language signaling control is easy to identify, it is more challenging to label it is internally or externally controlled, with lexical features outperforming syntactic features at the task. Our findings could have important implications for studying self-expression in social media.

2017

Recommendations are often rated for their subjective quality, but few researchers have studied comment quality in terms of objective utility. We explore recommendation quality assessment with respect to both subjective (i.e. users’ ratings) and objective (i.e., did it influence? did it improve decisions?) metrics in a massive online geopolitical forecasting system, ultimately comparing linguistic characteristics of each quality metric. Using a variety of features, we predict all types of quality with better accuracy than the simple yet strong baseline of comment length. Looking at the most predictive content illustrates rater biases; for example, forecasters are subjectively biased in favor of comments mentioning business transactions or dealings as well as material things, even though such comments do not indeed prove any more useful objectively. Additionally, more complex sentence constructions, as evidenced by subordinate conjunctions, are characteristic of comments leading to objective improvements in forecasting.
Social media sites (e.g., Twitter) have been used for surveillance of drug safety at the population level, but studies that focus on the effects of medications on specific sets of individuals have had to rely on other sources of data. Mining social media data for this in-formation would require the ability to distinguish indications of personal medication in-take in this media. Towards that end, this paper presents an annotated corpus that can be used to train machine learning systems to determine whether a tweet that mentions a medication indicates that the individual posting has taken that medication at a specific time. To demonstrate the utility of the corpus as a training set, we present baseline results of supervised classification.

2016

2015

2014

2013

2012

2011