Running Test Cases

 

The run() method is called after the generate() method and is used to run all the specified tests. It returns a pass/fail flag for each test. There are two ways to run the tests

  • Running All Test Cases at Once

      h.run()
    
  • Checkpointing Mechanism for Large Test Sets

      h.run(checkpoint=True, batch_size=500, save_checkpoints_dir="checkpoints")
    

    To handle large test sets effectively, you can enable checkpointing. Here’s what each parameter signifies:

    • checkpoint: Enable this option to activate the checkpointing mechanism.
    • batch_size: This parameter specifies the number of test cases processed per batch.
    • save_checkpoints_dir: Use this option to define the directory where checkpoints and intermediate results will be saved.

    If the kernel restarts or if an API failure occurs, users can resume the execution from the last saved checkpoint, preventing the loss of already processed model responses.

      h = Harness.load_checkpoints(
          save_checkpoints_dir="checkpoints",
          task="text-classification",
          model={"model": "lvwerra/distilbert-imdb", "hub": "huggingface"},
      )
      h.run(checkpoint=True, batch_size=500, save_checkpoints_dir="checkpoints")         
    

Once the tests have been run using the run() method, the results can be accessed using the .generated_results() method.

h.generated_results()

This method returns the generated results in the form of a Pandas dataframe, which provides a convenient and easy-to-use format for working with the test results. You can use this method to quickly identify the test cases that failed and to determine where fixes are needed.

A generated results dataframe looks like this:

category test_type original test_case expected_result actual_result pass
robustness lowercase I live in Berlin i live in berlin Berlin: LOC   False
robustness uppercase I live in Berlin I LIVE IN BERLIN Berlin: LOC BERLIN: LOC True
Last updated