Mitigating online gender harassment through
1) user feedback, 2) empathetic innovation, and 3) data-science products

Disclaimer: Given the nature of the problem we're trying to mitigate with our products,
please note that as you scroll through the website, you'll be exposed to some offensive and violent language.
Please continue at your own discretion.


Empathization refers to an effort that, thus far, has produced two data-science products aimed to foster empathy and how people regard themselves and each other. The products are informed by user feedback to mitigate an epidemic -- online gender harassment -- as part of an even larger issue: global gender inequality that exists online and offline.

The current products revolve around two groups of Twitter users: people who repeatedly send online gender harassment tweets, and people who receive and are affected by such harassment. The products are built upon artificial intelligence (AI) algorithms that learn from humans' detection of gender harassment, and do so in an automated way.

Among tweets the algorithms predict as online gender harassment, the algorithms are accurate 76-80% of the time. One of the user-facing products has shown the scalable potential to not only detect but take action on over 1 million offensive tweets per week.

Empathization co-won the Hal R. Varian Award from UC Berkeley in mid-May 2017; presented at Google to the Conversation AI Team in late-May 2017; and received the opportunity to discuss with a Jigsaw leader our blog post series during part of 2017, published in January 2018. The blog post series was re-published by Towards Data Science in March 2018.

Global Problem

Unequal treatment of women cuts across race/ethnicity, nationality, and region. However, the opportunity to elevate gender equality exists through daily interpersonal interactions.

According to the United Nations Human Rights Office of the High Commissioner, gender equality refers to “equal rights, responsibilities and opportunities. . . However, after 60 years, it is clear that it is the human rights of women that we see most widely ignored around the world”. And McKinsey Global Institute estimates potential gender equality at $12+ trillion gain per year for future global GDP. While social and financial aspects exist, our efforts revolve around the social aspect.

Given gender inequality comprises myriad issues, we target our efforts at a specific sub-issue that has reached an epidemic: online gender harassment. Such harassment is prevalent (Norton, 2016; Women, Action, & the Media, 2015; Pew Research Center, 2014) across Twitter, Facebook, YouTube, etc. Consequently, this influences many women to disengage from social media and not share their perspectives as much. Yet, women deserve equal opportunities to contribute online and offline.

Source: Norton, 2016 via Claire Reilly, c|net, 2016

Various women granted valuable interviews, helping us understand their personal experiences with Twitter online gender harassment and what they view as potential solutions. For instance, regarding potential solutions, women provided feedback that's been incorporated in our product work in the next section. As for the global problem, some women said they anonymize their usernames to reduce gender-based backlash. Various women said their engagement with Twitter and other social media has declined. And some believe they are treated worse online, where offenders (men and women) reveal their true character. Moreover, people's harassing behavior can bleed into their offline behavior.

Tara Moss (Canadian-Australian author, women's rights advocate, and UNICEF ambassador) explains, “it's feeding into the higher rates of sexual violence and sexual harassment that women are experiencing in the physical world.” Other research uncovers:

  • In part of the globe, nearly 1 of 2 women is harassed online, while 3 out of every 4 women under age 30 have experienced online harassment. In addition, "Women are twice as likely to receive death threats online, and women are also twice as likely to receive threats of sexual violence and rape. They're also more likely to be the target of revenge porn, sextortion and sexual harassment" (Source: Norton, 2016 via Claire Reilly, c|net, 2016).
  • In some cases, women usernames incur an average of 100 sexually explicit or threatening messages a day, whereas men usernames receive 3.7 (2014 article on University of Maryland, 2006)
  • WAM! (Women, Action, and the Media) study: "The vicious targeting of women, women of color, queer women, trans women, disabled women, and other oppressed groups who speak up on online has reached crisis levels. Hate speech and violent threats are being used to silence the voices of women and gender non-conforming people in the public discourse everyday. Examples of the impact these attacks are having on women’s lives are everywhere" (Women, Action, & the Media, 2015).
  • Twitter General Counsel, Vijaya Gadde, admits: "These users often hide behind the veil of anonymity on Twitter and create multiple accounts expressly for the purpose of intimidating and silencing people" (Washington Post, 2015).
  • “Online violence against women is an overt expression of the gender discrimination and inequality that exists offline. Online, it becomes amplified,” says Jac sm Kee of the Association for Progressive Communications (APC), a Global Fund for Women grantee partner, which provided the above examples of online violence and harassment. “The most important way to shift this is to enable women and girls to engage with the Internet at all levels – from use, creation, and development to the imagination of what it should and can be" (Global Fund for Women, 2015).

    The Automated Twitter Bot is the first of our products. This bot, disguised to appear as a young white male, is designed to detect gender harassment tweets and intervene by calling out the offensive language of the tweet in a reply to the offender. This product is designed with the intent to mitigate abusive online behavior at the source.

    The AI behind the bot detects gender harassment tweets using an ensemble of eight models: Five Gradient Boosting Decision Trees (GDBT), Two Feed Forward Neural Networks (FNN), and One Logistic Regression (LR). Tweets are classified based on the average predicted probability of harassment across these models. Our default probability threshold is set at 70%. This threshold was defined through the rigorous process of analyzing almost 20,000 tweets for language specifically indicative of gender harassment.

    The method and message of intervention was informed by two studies. One is a study using ReThink, a software product designed to prevent adolescents from sending or posting hurtful messages. The second is an NYU Field Experiment, which addressed racial harassment on Twitter. Both studies found that checking offensive language with a simple message was effective.


    (Names and parts of some messages have been blacked out for privacy reasons)


    Original Field Experiment

    Hypothesis: The number of offensive tweets (tweets with harassment probability of 70% or more) per offender in the treatment group will be lower than the number of offensive tweets per offender in the control group post intervention. The intervention is the response from the bot to the offender.

    Setup: Over 25 million tweets were run through our AI ensemble of models to identify about 4K offenders, excluding porn and bot accounts. Each bot is set up to track around 1.5K offender accounts for offensive tweets.

    Randomization: As soon as a tweet from a selected offender is flagged as gender harassment, the offender is randomly placed in treatment or control. If they get placed in treatment, the bot replies to their offensive tweet six minutes later. No reply to offenders in the control group.

    The experiment ran April 15-23 (with results posted and presented previously) after a brief pilot study. We ran a new, larger pilot study from June-July 2017 to refine our approach, and concluded with a more rigorous full study from July-September 2017, which is explained below.

    New Field Experiment

    To learn about our new experiment, feel free to check out our January 2018 blog post series, specifically Blog Post 2 of 3: First-Ever Social Experiment vs. Gender Harassment on Twitter, or feel free to read below.

    Hypothesis: Over time, users' percent of misogynistic tweets (tweets with harassment probability of 70% or more) in the treatment groups will be lower than that of the control group, after bots reply to users in treatment groups but not the control group.

    Setup: Another set of millions of tweets were run through our AI ensemble of models to identify about 8K offenders who hadn't been in the earlier field experiments. Each bot has a different profile and photo than the earlier field experiments. Each bot is again set up to track around 1.5K offender accounts for offensive tweets.

    Randomization: We randomly assigned users at two stages. At the first stage, we randomly assigned users to 1 of the 4 bots, and then collected their real-time tweets for roughly 2 weeks. Then, at the second stage, as soon as a user sent a tweet that was automatically flagged as misogynistic, the user was automatically and randomly assigned to one of two types of groups. The first type (control group) wouldn’t receive an automated reply, reflecting the scenario as if the experiment had never existed. The second type (treatment group) would receive an automated reply in 30 seconds, aimed to reduce their harassing behavior. Then we collected users’ real-time tweets for roughly another 2 weeks.

    Bots: Below is an illustration of the 4 treatment bots, their profile photos and descriptions, and their automated replies.

    Sample Sizes: The following table shows the number of Twitter harassers tracked during our 6-week study.

    Results: In the graphs below, the horizontal trend lines show the change in percent of tweets detected as misogynistic before and after 8/11/2017: the date when treatment bots started to reply to harassers. The horizontal trend lines of the Treatment Bots #1 and #2 weren’t statistically different than that of the control group. That is, any difference among them is likely to be due to random chance. [Note: The vertical bars reflect the likely potential variation in percent of misogynistic tweets that could have occurred if we had replicated the experiment.]

    Conclusion: The new experiment, with more rigorous design and measurement than our earlier experiments, shows no statistically significant impact against gender harassment on Twitter.

    Technical Language (Optional):

    Percents: We measured not the number but percent of users' misogynistic tweets before vs. after our bots intervened. Why? A trend based off a user’s number of misogynistic tweets can be misleading. For instance, a user’s number can decrease from 15 misogynistic tweets last month to 13 this month, yet their percent of misogynistic tweets can increase from 15% (15 out of 100) last month to 50% (13 out of 26) this month.

    Weights: We weighted each user by their number of overall tweets sent during the study. Why? Percent of misogynistic tweets would be inflated, for example, if a person with 5 overall tweets (1 misogynistic out of 5 overall tweets) were weighted the same as a person with 50 overall tweets (10 misogynistic out of 50 overall tweets).

    Graphs: We show non-regression, weighted means. The vertical bars represent 95% confidence intervals. For each group’s weighted mean, the standard error was computed from a bootstrapped sampling distribution of 200 weighted means. And the “Post - Pre” value, [e.g., “-0.04 (0.02)”], is a weighted mean, followed by a standard error in parentheses.

    Model: We used weighted least squares rather than difference-in-differences regression to estimate the social experiment impact, as weighted least squares regression is a slightly more functional form. It allows the coefficient (pre-treatment percent of misogynistic tweets) to differ from 1.0. It also allows straightforward weighting (i.e., weighting each user by number of overall tweets for more reliability).

    Equation: post-treatment percent of misogynistic tweets = intercept + pre-treatment percent of misogynistic tweets + treatment_bot1 + treatment_bot2 + treatment_bot3 + treatment_bot4

    R-squared: 0.530, Adjusted R-squared: 0.528

    Distribution: While the dependent variable (post-treatment percent of misogynistic tweets) isn’t normally distributed but skewed, the Central Limit Theorem says as samples become large, the sampling distribution has a normal distribution, and regression coefficients will be normally distributed even if the dependent variable isn’t.

    Outliers: Since some Twitter “users” are bots with high tweet activity, we researched several methods for outlier removal: standard deviations, interquartile range, log transformation, median absolute deviation, and top 5% trimming. However, because it’s best to keep all observations unless clear evidence for a specific observation shows otherwise, we proceeded without outlier removal. In general, outlier removal is related to controversial p-hacking.


    The free Gender Harassment Tweets Blocker (beta release on the Chrome Web Store) is the second product. With positive user experience our top priority, a layer of web security was added in December 2017 before the recent official release. Our product had been first demonstrated at UC Berkeley and at Google in April and May 2017, respectively. Women can download this Chrome extension to automatically block tweets that the product predicts to be gender harassment. Based on feedback, women have 3 customizable features to start.

    Feel free to view this 5-minute video tutorial on YouTube and/or the descriptions below.

    1) As highlighted in the red boxes below, one can adjust "Threshold" to their preferred level. If a user selects 0.60, for instance, then the Chrome extension will block tweets that it predicts to have 60%+ likelihood of harassment. A user can click "Set" to save their threshold, and then refresh the webpage to apply the update. Some women interviewees expressed interest to block the most extreme tweets (e.g., Threshold = 0.90), whereas other women interviewees expressed interest to block a greater share of gender harassment tweets (e.g., Threshold = 0.60).

    2) As highlighted in the red box below, one can click "Show" to unhide a tweet for any reason.

    3) As highlighted in the red box below, one can click the button to flag tweets as gender harassment that the browser extension didn't block (similar to clicking spam in email). Then the button turns red. Or, one can flag tweets as not gender harassment (similar to restoring email that went to spam incorrectly). Then the button turns gray. The Chrome extension will remember your preference after you refresh the webpage.

    Artificial Intelligence

    We used active learning (machine learning) to collect enough tweets (dated 2017 and earlier) that humans such as Mechanical Turk women workers regard as gender harassment. That allowed our AI models to better learn and predict what humans regard as gender harassment language and symbols.

    We found roughly 0.09% (9 out of 10,000) tweets detected as harassment. Rather than read through 10,000 tweets to find roughly 9 harassment ones, active learning (machine learning) helped us circumvent. We first labeled 1K tweets via various methods (i.e., Twitter live stream via API, Twitter keyword searches via API, harassment tweets via articles, etc.), then used our earliest baseline model (Logistic Regression) to output predicted probabilities on the tweets, before starting the cycle of active learning. Below are actual tweets we presented to an initial audience on 2/15/2017.

    [If you prefer not to read what many regard as highly offensive / misogynistic tweets, please bypass the table below, and jump to the next circular diagram.]

    With active learning -- iteratively moving back and forth between data collection and machine learning -- we retrained our earliest baseline model, improving its ability to predict probability of harassment on past labeled tweets and new unlabeled tweets. For instance, our earliest model predicted the tweet, “you deserve vagina cancer”, at only 50.6% probability of harassment. As the model learned further, it eventually predicted that tweet with 70%+ probability of harassment. Our active learning process. . .

    Our data collection process. . .

    We chose to leverage and tune three categories of models: Gradient Boosting Decision Trees (GBDTs), Feed Forward Neural Networks (FNNs), and Logistic Regression (LR). And we sought not the best single model but the best combination of models. Our artificial intelligence process. . .

    Rather than take a tweet's predicted probability of harassment from one model, we took a tweet's average predicted probability of harassment across multiple models for better reliability. Different models can make different mistakes in predicting the likelihood that tweets are offensive. For instance, for specific tweets, two models might predict low probability of gender harassment incorrectly, whereas six models might predict high probability of gender harassment correctly. By taking their average, the models can compensate for each other.

    We used an automated approach that viewed the results of thousands of ensembles (where one ensemble refers to one combination of models). The graph below shows results for three separate combinations of models. The combination in the leftmost column is tied to our user-facing products: the Twitter Bot and Gender Harassment Tweets Blocker. Note, our flexibile approach allows replacement of one ensemble with another ensemble, if users and stakeholders prefer different performance. [Technical Language (Optional): As an interim step, we trained our final ensemble on the labeled train + validation data, then ran it on the labeled test data once (AUC: 91.3%, Precision: 78.9%, Recall: 32.5%). Then we proceeded to create a sampling distribution of results. Our final ensemble and specific alternative ensembles were eventually retrained on all labeled data (train + validation + test data), before linking our final ensemble to our user-facing products in the wild.]

    As we collect more labeled tweets via various channels, including via the Gender Harassment Tweets Blocker, our ensemble of models should improve even further.

    [Technical Language (Optional): Our final models analyze words, not characters, despite our preference for some models in our ensemble to analyze characters. For instance, we initially analyzed characters as well, leveraging vectorizer "analyzer='char'" with random search across "ngram_range=(1,4)". Some of our initial GBDT models achieved about 95%+ precision and 80% recall. However, GDBT feature importances revealed some single characters such as " ' " took too much importance in the predicted probability of harassment, despite limited occurrences. So, we concluded a larger dataset than 18.8K tweets seems necessary to analyze characters in the future, and should not use that character-level method until then. Thus, to be fair and reasonable, we discarded ensembles which use that method and yield better results, and instead selected ensembles that both perform well and should generalize to new tweets in the Twitter universe. As revealed in the graph above, we built a sampling distribution to show not only our averages, but our standard errors around those averages. The small standard errors indicate each ensemble's consistent performance on 15 cross-validation samples of 6K+ tweets. 15 samples were derived from randomizing seed for 5 iterations and, within each iteration, implementing 3-fold cross-validation.]

    Our combination of 8 models (5 GBDTs, 2 FNNs, and 1 LR) yielded gender harassment probabilities across a sample of 46.2 million tweets. . .

    If users collectively tweet an average of 500 million times a day (David Sayce, November 2016; Business Insider, June 2015), our products (if scaled) could have not only detected but responded to around 1.18 million tweets per week. In full transparency, that also means our products could have incorrectly flagged about 331,000 tweets per week. However, the AI underlying our Twitter Bot and Gender Harassment Tweets Blocker can allow more correct predictions currently (if exchanged for lower detection rates of harassment tweets). For instance, users of the Gender Harassment Tweets Blocker can change the default of 0.70 (hiding tweets with a 70%+ chance of harassment) to 0.85 (hiding tweets with an 85%+ chance of harassment) to have less tweets incorrectly flagged as gender harassment.

    Projected number of harassment tweets that our AI could have detected on Twitter's full dataset (3/6/2017 - 4/16/2017) The horizontal green line refects the projected average of 168K harassment tweets a day across the 6-week timeframe

    Future Possibilities

    Implement an existing list of user feedback for the Gender Harasment Tweets Blocker

    Continue to learn from users on product concepts aimed to mitigate online gender harassment

    Reach out to writers and organizations whose gender-harassment research and advocacy has inspired us to consider partnerships

    Create a corresponding tweets blocker for phone and tablet given 82% of Twitter active users are on mobile

    Continue field experiments for cause-effect conclusions on product impact

    Collect more labeled tweets and/or try other AI methods to further detect and mitigate online gender harassment


    Derek S. Chan

    Sr. Product Manager - Cognitive at Automation Anywhere; Artist in live theater

    Shruti van Hemmen

    Data Scientist at Intelisent

    Apekshit Sharma

    Software Engineer, Cloudera

    Women Who Granted Anonymized Interviews


    Joyce Shen

    Investment Director at Tenfore Holdings; Lecturer at UC Berkeley

    Alberto Todeschini

    Lecturer at UC Berkeley

    D. Alex Hughes

    Lecturer at UC Berkeley