Gender graph

Ever wonder if we can quantify a gender bias in society?

Using machine learning, we can generate word associations present in a given media source. By looking at those associations we can tell how closely words are related to women or men. The Gender Graph project allows users to plot where words lie on a scale of "he" to "she" based on a selected media source.

Enter your words and observe the differences that exist in the way we perceive gender.

How does it work?

In order for computer to understand english words, they need to be converted to numbers. In particular each word can be represented as a point in multidimensional space. It can be roughly visualized in two dimensions.

We use the word2vec tool to generate these word vectors based on semantic relationships between words in a given text source. This collection of word vectors is called a model. We wrote custom tool that uses this model to score user words in relationship to given pair of words (in our case he and she).

In order to quantify if a word is more commonly associated with women or men, we can find how far away this word is positioned from “she” and “he”. Mathematically, it can be accomplished by finding the vector direction between “she” and “he”, and projecting user words onto this vector using simple vector properties such as the dot product.

The length of the projection onto this axis gives us an association score, where values closer to 0.0 are related to “he”, and values closer to 1.0 are related to “she”.

This approach give us a very good picture of semantic biases in the media. However, it is important to understand that in reality these models are not perfect. Factors such as data quantity, quality, and algorithmic imperfections may introduce noise into the model.

Resources

Gender Graph - Source code of this web app

Word Plot - Word scoring tool that we wrote

Word2vec - Word embeding tool that we use to generate models

Created by Sneha Belkhale and Kirill Kovalevskiy

Gender graph

Manifesto

Traning sources

How does it work?

Resources