3 big problems with datasets in AI and machine learning
Datasets fuel AI models like gasoline (or electricity, as the case may be) fuels cars. Whether they’re tasked with generating text, recognizing objects, or predicting a company’s stock price, AI systems “learn” by sifting through countless examples to discern patterns in the data. For example, a computer vision system can be trained to recognize certain types of apparel, like coats and scarfs, by looking at different images of that clothing. Beyond developing models, datasets are used to test trained AI systems to ensure they remain stable — and measure overall progress in the field. Models that top the leaderboards on certain open source benchmarks are considered state of the art (SOTA) for that particular task.