Skip to content

UI decisions #6

@ClimbsRocks

Description

@ClimbsRocks
  1. if the user passes in a list of more than 1 model_id, or calls predict_all, we will add only one item to our DB to track features, in order to be somewhat more space efficient
  2. if they pass in a single model at a time to .predict, we will save data each time
  3. the user can pass in the same row of data multiple times, and we will add it to our db multiple times (they will be able to pass in drop_duplicates=True at analytics time
  4. our analyze_discrepancies function will take in a "print=True" param, which will use tabulate to pretty print a table of features and their results
  5. analyze discrepancies will return a list of dictionaries (sorted by feature importances, if they exist)
  6. each dictionary will represent a row
  7. the first dictionary will contain summary information (in aggregate how much predictions differ between the two envs, how much the actuals differ, avg values for each, how many rows have 0 discrepancies, total counts of missing features, etc.)
  8. each row will have feature_name, avg_val, num_missing, avg_discrepancy, median_discrepancy, avg abs discrepancy, median abs discrepancy, all of the above done as a percent of the "usable range" of that feature (95th percentile - 5th percentile at training time)
  9. analyze_discrepancies will take in an optional model_id. if none, we'll look at all of our data
  10. tracking avg features will not focus on discrepancies at all (tracking features becoming more or less out of whack over time is out of scope for MVP). it will only focus on showing serving time values
  11. the user can pass in a feature name to track_features and we will only return results for that feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions