Skip to content

Aloha as an on demand QA tool #119

@jmorra

Description

@jmorra

After speaking with a bunch of people I think we should consider repositioning Aloha as a full service model representation language. Aloha already solves a key problem in the world of ML, a tight coupling between feature functions and the model. One of the other key concepts I think we should address is determining the correctness of data at score time. In order to do this I propose the following framework. At train time we should record statistics on the features in the model. This should include at least

  1. The P(occurrence) of the feature
  2. The mean value of the feature
  3. The variance of the feature

If we can record these then at score time we can have a field which is called something like "QA window" that will continue to calculate these values using streaming algorithms. We can then have another field called a policy. This already exists somewhat with the missingValues field but I think we should extend this to include at least three options.

  1. Nothing
  2. Notify
  3. Refuse to score

This will necessitate a notification system be built into Aloha as well, which honestly I don't think is too hard. If we had an email and message notification system I think that would be sufficient.

If all of this existed Aloha could be marketed as a very full featured model representation language, far superior to PMML (and frankly anything I can think of) and get us much closer to a true self service modeling framework.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions