Representation

Min Country Economic Representation Count

This test checks the data regarding the sample counts of countries by economic levels.

alias_name: min_country_economic_representation_count

This data was curated using World Bank data. To apply this test appropriately in other contexts, please adapt the data dictionaries.

Config

min_country_economic_representation_count:
    min_proportion: 
        high_income: 50
        low_income: 50

min_count (int): Minimum count to pass the test.

Min Country Economic Representation Proportion

This test checks the data regarding the sample proportions of countries by economic levels.

alias_name: min_country_economic_representation_proportion

This data was curated using World Bank data. To apply this test appropriately in other contexts, please adapt the data dictionaries.

Config

min_country_economic_representation_proportion:
    min_proportion: 
        high_income: 0.6
        low_income: 0.1

min_proportion (float): Minimum proportion to pass the test.

Min Ethnicity Representation Count

This test checks the data regarding the sample counts of ethnicities.

alias_name: min_ethnicity_name_representation_count

This data was curated using 2021 US census survey data. To apply this test appropriately in other contexts, please adapt the data dictionaries.

Config

min_ethnicity_name_representation_count:
    min_count: 
        white: 50
        black: 10
        asian: 40
        hispanic: 30           

min_count (int): Minimum count to pass the test.

Min Ethnicity Representation Proportion

This test checks the data regarding the sample proportions of ethnicities.

alias_name: min_ethnicity_name_representation_proportion

This data was curated using 2021 US census survey data. To apply this test appropriately in other contexts, please adapt the data dictionaries.

Config

min_ethnicity_name_representation_proportion:
    min_proportion: 
        white: 0.20
        black: 0.36                

min_proportion (float): Minimum proportion to pass the test.

Min Gender Representation Count

This test checks the data regarding the sample counts of genders.

alias_name: min_gender_representation_count

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_representation_count:
    min_count: 
        male: 20
        female: 30

min_count (int): Minimum count to pass the test.

Min Gender Representation Proportion

This test checks the data regarding the sample proportions of genders.

alias_name: min_gender_representation_proportion

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_representation_count:
    min_count: 
        male: 0.2
        female: 0.3

min_proportion (float): Minimum proportion to pass the test.

Min Label Representation Count

This test checks the data regarding the sample counts of labels.

alias_name: min_label_representation_count

Config

min_label_representation_count:
    min_count: 
        positive: 10
        negative: 10

min_count (int): Minimum count to pass the test.

Min Label Representation Proportion

This test checks the data regarding the sample proportions of labels.

alias_name: min_label_representation_proportion

Config

min_label_representation_proportion:
    min_proportion: 
        O: 0.2
        LOC: 0.2
        PER: 0.2
                

min_proportion (float): Minimum proportion to pass the test.

Min Religion Name Representation Count

This test checks the data regarding the sample counts of religions.

alias_name: min_religion_name_representation_count

This data was curated using Kidpaw. Please adapt the data dictionaries to fit your use-case.

Config

min_religion_name_representation_count:
    min_count: 
        christian: 10
        muslim: 5
        hindu: 8
        parsi: 40
        sikh: 10

min_count (int): Minimum count to pass the test.

Min Religion Name Representation Proportion

This test checks the data regarding the sample proportion of religions.

alias_name: min_religion_name_representation_proportion

This data was curated using Kidpaw. Please adapt the data dictionaries to fit your use-case.

Config

min_religion_name_representation_proportion:
    min_proportion: 
        muslim: 0.2
        hindu: 0.2

min_proportion (float): Minimum proportion to pass the test.

Custom Representation

Supported Custom representation Data Category:

Country-Economic-Representation
Religion-Representation
Ethnicity-Representation
Label-Representation (only ner)

How to Add Custom Representation

To add custom representation, you can follow these steps:

# Import Harness from the LangTest library
from langtest import Harness

# Create a Harness object
harness = Harness(
    task="ner",
    model='en_core_web_sm',
    hub="spacy"
)

# Load custom representation data for ethnicity representation
harness.pass_custom_data(
    file_path='ethnicity_representation_data.json',
    test_name="Ethnicity-Representation",
    task="representation"
)
     

When adding custom representation data, it’s important to note that each custom representation category may have a different data format for the JSON file. Ensure that the JSON file adheres to the specific format required for each category.

Additionally, it’s important to remember that when you add custom representation data, it will affect a particular set of representation tests based on the category and data provided.

To learn more about the data format and how to structure the JSON file for custom representation data, you can refer to the tutorial available here.