MLFlow Tracking

MLFlow server tracking is utilized to measure different data and statistics to analyze how generative AI is performing. Here’re a few things currently being used.

Statistics currently measured:

Sure! Here are simple definitions for F1 Score, Precision, and Recall:

Precision:
- Definition: Precision is the ratio of correctly predicted positive observations to the total predicted positives. It tells us how many of the predicted positive cases were actually correct.
- Formula: \( \text{Precision} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Positives (FP)}} \)
- Example: If a model predicts 10 positive cases and 7 of them are actually positive, the precision is \( \frac{7}{10} = 0.7 \) or 70%.
Recall (also known as Sensitivity or True Positive Rate):
- Definition: Recall is the ratio of correctly predicted positive observations to all the observations in the actual class. It measures how well the model can identify all relevant cases.
- Formula: \( \text{Recall} = \frac{\text{True Positives (TP)}}{\text{True Positives (TP)} + \text{False Negatives (FN)}} \)
- Example: If there are 20 actual positive cases and the model correctly identifies 15 of them, the recall is \( \frac{15}{20} = 0.75 \) or 75%.
F1 Score:
- Definition: The F1 Score is the harmonic mean of precision and recall. It provides a single metric that balances both the precision and recall of the model.
- Formula: \( \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \)
- Example: If the precision is 0.7 and the recall is 0.75, the F1 Score is \( 2 \times \frac{0.7 \times 0.75}{0.7 + 0.75} = 2 \times \frac{0.525}{1.45} \approx 0.724 \) or 72.4%.

These metrics are often used to evaluate the performance of classification models, especially in cases where the classes are imbalanced.