Algorithms and AI — who owns the knowledge

A few months ago, I was doing some research for the team and came across an article about the top algorithms teams use. What was interesting beyond the use of algorithms was the history of some of them and how old they are. For example, Naïve Bayesian Classifier is based on Baye's Theorem (https://en.wikipedia.org/wiki/Bayes%27_theorem) who was alive from 1701–1761. Consider that a theorem that is over 200 years old is now being used in predictive analysis on data produced by technology that did not exist then. I think it’s pretty amazing. Of course, there are many examples of solutions and innovations from the past providing answers to today’s challenges. But it’s easy to get caught up in the pace of today’s innovation: new technologies, new ways of doing things and forget that innovation didn’t start with Uber or Twitter.

This also means that the development and use of new algorithms continues.  Of course, companies like Google and Facebook continue to develop and refine their algorithms. But there is a market for actually selling algorithms, which seems somewhat in defiance the spirit of the whole idea in some ways. But then Google and Netflix have made huge amounts of money based off their algorithms so maybe it’s the right way to be thinking about them.

Another niche that took me by surprise and I don’t know why is the degree to which AI modeling depends upon humans tagging/categorizing things. Sometimes AI or machine learning is portrayed as this sort of this all-powerful autonomous aggregator, able to identify and categorize huge amounts of data with a single input. When in actual fact there have been 100s or 1000s of people tagging photographs, reviewing legal documents or medical results, and getting paid for their efforts. All of this work they’re is doing is feeding these vast AI models with the goal of identifying what…I guess everything. Totally unnerving. I really don’t think I want my mammograms made anonymous, then reviewed, tagged, given a result and then dropped into a system to allow matching of clear scans.

But then maybe I do, if I’m compensated.

I think we, the public, need to start taking ownership of the data we create and disseminate if it’s too be reused and leveraged to create something that businesses in turn sell back to us. If we benefited from the public availability of 200 year old algorithm, it seems we should also not be made to pay for the data we in fact created.