When using all user tweets, they reached an accuracy of 88.0%.An interesting observation is that there is a clear class of misclassified users who have a majority of opposite gender users in their social network. When adding more information sources, such as profile fields, they reach an accuracy of 92.0%.Their highest score when using just text features was 75.5%, testing on all the tweets by each author (with a train set of 3.3 million tweets and a test set of about 418,000 tweets). (2012) used SVMlight to classify gender on Nigerian twitter accounts, with tweets in English, with a minimum of 50 tweets.
The paper does not describe the gender component, but the first author has informed us that the accuracy of the gender recognition on the basis of 200 tweets is about 87% (Nguyen, personal communication). (2014) did a crowdsourcing experiment, in which they asked human participants to guess the gender and age on the basis of 20 to 40 tweets. on this, we will still take the biological gender as the gold standard in this paper, as our eventual goal is creating metadata for the Twi NL collection. Experimental Data and Evaluation In this section, we first describe the corpus that we used in our experiments (Section 3.1).Then we describe our experimental data and the evaluation method (Section 3), after which we proceed to describe the various author profiling strategies that we investigated (Section 4). Gender Recognition Gender recognition is a subtask in the general field of authorship recognition and profiling, which has reached maturity in the last decades(for an overview, see e.g. Even so, there are circumstances where outright recognition is not an option, but where one must be content with profiling, i.e.Then follow the results (Section 5), and Section 6 concludes the paper. For whom we already know that they are an individual person rather than, say, a husband and wife couple or a board of editors for an official Twitterfeed. the identification of author traits like gender, age and geographical background.In this case, the Twitter profiles of the authors are available, but these consist of freeform text rather than fixed information fields.And, obviously, it is unknown to which degree the information that is present is true.