You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if the user passes in a list of more than 1 model_id, or calls predict_all, we will add only one item to our DB to track features, in order to be somewhat more space efficient
if they pass in a single model at a time to .predict, we will save data each time
the user can pass in the same row of data multiple times, and we will add it to our db multiple times (they will be able to pass in drop_duplicates=True at analytics time
our analyze_discrepancies function will take in a "print=True" param, which will use tabulate to pretty print a table of features and their results
analyze discrepancies will return a list of dictionaries (sorted by feature importances, if they exist)
each dictionary will represent a row
the first dictionary will contain summary information (in aggregate how much predictions differ between the two envs, how much the actuals differ, avg values for each, how many rows have 0 discrepancies, total counts of missing features, etc.)
each row will have feature_name, avg_val, num_missing, avg_discrepancy, median_discrepancy, avg abs discrepancy, median abs discrepancy, all of the above done as a percent of the "usable range" of that feature (95th percentile - 5th percentile at training time)
analyze_discrepancies will take in an optional model_id. if none, we'll look at all of our data
tracking avg features will not focus on discrepancies at all (tracking features becoming more or less out of whack over time is out of scope for MVP). it will only focus on showing serving time values
the user can pass in a feature name to track_features and we will only return results for that feature
drop_duplicates=Trueat analytics time