Research on artificial intelligence has highlighted the importance of quality data: a machine learning device is “only as smart as the data it’s allowed to cull through,” after all. So, how do you mitigate the risks posed to AI development by bad data? Forbes offers a couple of strategies. First, a clear-cut focus helps to avoid the problems caused by confusing distinct goals for any given AI project, whether they be reducing costs, increasing revenue, or maintaining a competitive edge. A second strategy is essentially Ockham’s Razor for AI; do no “overfit” an AI tool. That is, do not introduce more variables than is necessary. This connects to the next strategy: reduce noise. Machine learning needs to identify and exclude outliers that can skew proper learning. Given the complexity of AI and their accompanying data sets, programming the correct parameters in adherence with this strategy can be exceedingly difficult. Lastly, maintenance is key. An AI program will either improve or decline in performance over time, depending on data updates. Remaining aware of the data and how it interacts with a machine’s algorithms over time will become an increasing priority as AI matures.
Source: AI & Data: Avoiding The Gotchas