Skip to end of banner
Go to start of banner

MLFlow Tracking

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

MLFlow server tracking is utilized to measure different data and statistics to analyze how generative AI is performing. Here’re a few things currently being used.

Statistics currently measured:

Sure! Here are simple definitions for F1 Score, Precision, and Recall:

  1. Precision:

    • Definition: Precision is the ratio of correctly predicted positive observations to the total predicted positives. It tells us how many of the predicted positive cases were actually correct.

    • Formula: \( \text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}} \)

    • Example: If a model predicts 10 positive cases and 7 of them are actually positive, the precision is \( \frac{7}{10} = 0.7 \) or 70%.

  2. Recall (also known as Sensitivity or True Positive Rate):

    • Definition: Recall is the ratio of correctly predicted positive observations to all the observations in the actual class. It measures how well the model can identify all relevant cases.

    • Formula: \( \text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}} \)

    • Example: If there are 20 actual positive cases and the model correctly identifies 15 of them, the recall is \( \frac{15}{20} = 0.75 \) or 75%.

  3. F1 Score:

    • Definition: The F1 Score is the harmonic mean of precision and recall. It provides a single metric that balances both the precision and recall of the model.

    • Formula: \( \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \)

    • Example: If the precision is 0.7 and the recall is 0.75, the F1 Score is \( 2 \times \frac{0.7 \times 0.75}{0.7 + 0.75} = 2 \times \frac{0.525}{1.45} \approx 0.724 \) or 72.4%.

These metrics are often used to evaluate the performance of classification models, especially in cases where the classes are imbalanced.

  • No labels