Cognificance

Cognificance

About the significance of machine cognition

Cognificance

About the significance of machine cognition

Artificial General Intelligence - A closer look

While Alpha Go‘s success at beating the world‘s best human Go player has recently been surpassed by a new implementation of the AI used beating the „old“ one in 100:0 games, that machine is still only able to play Go. And while the resulting gameplay is an amazing achievement, even raising comments on clever and original moves from seasoned Go players, we shouldn‘t forget that to get to the original capability (beating Lee Sedol) the AI played literally millions and millions of rounds of Go against itself to learn the game.

In comparison, a human can learn the basics of the game very quickly and sit down to play his or her first full games by just going through a few learning trials. Why? Because the human brain is „generally intelligent“, meaning that it can learn new things quickly and apply them nearly instantly. This is not to say that a human that has played 10 games of Go would be considered anything other than green behind the ears.

On the other hand, as good as Alpha Go is at playing Go, it has absolutely zero knowledge of anything else, i.e. playing Chess or driving a car.
A goal that many AI scientists are aiming for is an artificial general intelligence (AGI) - an AI that is able to be used for many tasks. Even creating an AI that is able to learn similar things, such as „playing a game“ is currently pure science fiction.

As I‘ve indicated before (link), the major advantage of AI is the way the learning sets are currently set up: the learning set of any AI can be saved to a file and installed in another machine / computer / smartphone that has the same AI framework to instantly make that device „smarter“.
This is why the inclusion of AI hardware in the iPhone X is such a fascinating aspect: all of a sudden, it is possible to expand the App store with learning sets. Want a camera that can identify cars by model? Just load up the appropriate learning set someone (or some company) has trained a compatible AI for and *Voila*: your iPhone can instantly start identifying car models.
That still doesn‘t bring us to an AGI, of course, but it does for portable (or industrial) AI what the App store did for … well, Apps. It gives you the ability to expand the toolset you carry around in your pocket as you need.

And while the concept of an AGI is enticing, for most things it really isn‘t necessary. Identifying tumor cells in an MRI likely won‘t be any more accurate with an AGI as it is with an AI.
Once you get to very complex activities, however, AI starts to fail. While you can use an AI to spider travel portals on the internet to find the shortest and least expensive flight to a vacation destination, it will fail miserably if you try to train it to plan out the vacation.

Why? Because the network (or population, if the AI is an evolutionary one) would explode out of proportion due to the complexity of the task.
Vacation planning needs to take a large number of variables into consideration, some of which change in time and many of which change due to changes in other variables. The cheap flight your AI picked out might turn out to be a disaster if it didn‘t consider the fact that the airline it chose had ongoing Chapter 11 litigation.

So the best example for an application of AGI might be an automated travel planner that you can take with you (probably not on your iPhone, but at least as an instance living in a cloud service). Such an AGI would plan the initial stages of your vacation (flights to and from as well as your hotel) but would be able to automatically react to unforeseen situations, such as flight cancellations, as they happen.
It might be possible to build such a „tool“ using a number of different AI‘s, each trained to optimize a particular aspect of planning and a simple workflow backbone coordinating the „if flight is cancelled then use flight-finder AI to find new flight“ situations.

In effect, this type of setup would likely still be too static to do the job properly, as all possible issues with your vacation would need to be taken into consideration (and put into the workflow backbone), a nearly impossible feat. Subsequently, it will be necessary to find the path towards a functioning AGI in order to solve these complex problems. But how? One method that is currently being tried in various teams is to model the human brain. Since the brain has over a billion neurons, however, that isn‘t an easy problem to solve.
Comments

Edge Computing and AI

AI isn't the sole turning point in IT right now. Another revolution is in the "Internet of Things" or IoT. The idea behind IoT is to give an IP address to tiny sensors and devices that aren't smartphones, tablets, workstations or cars. The smart home, if it wants to become mainstream, will require IoT instead of the current status quo: dozens of proprietary standards that don't interoperate.
 
But IoT's real benefit is in industry and services. The much-quoted
Rolls-Royce jet engine that uses IoT sensors is just an example. Another is the use of such sensors to detect the number of plates that remain at a cafeteria line (and that trigger an event that causes the plates to be refilled if they run low).
 
Sensor data is irrelevant, however, if it isn't processed or at least stored for later processing, however. And depending on the device in question, the data stream coming from such a device can be immense, as in the jet engine (multiple terabytes per hour of flight).
 
Currently, the companies offering cloud computing (i.e. Microsoft, Google, Amazon and some outliers) are pushing for IoT data to be directly pumped into their respective clouds to be used for analysis. This can be problematic, however. Sticking with the example of the jet engine, the amount of data generated is not only extreme, it is also very difficult to route that information quickly enough into - in this case Microsoft's - cloud. The only live connection available to a jetliner in flight is a satellite uplink - very expensive and potentially not fast enough.
 
Bring in what IBM terms "
Edge Computing", while Cisco prefers the term "Fog Computing" and Microsoft speaks of the "intelligent Cloud". Whichever terminus finally catches on is probably yet to be determined, but my preference is that of IBM's, as the term "Fog" implies "being in the cloud and not realizing it" to me.
 
It is actually quite surprising to me that this topic is seemingly sinking into the various cloud firms just now. Look at it this way: the human visual system doesn't pump raw sensor data (via the optic nerve) into the brain, as this would flood the brain with information, rendering it useless for other tasks (such as writing articles like this). Nature has realized long ago, that raw sensor data needs to be pre-processed before being handed off to the brain for interpretation, resulting in the development of visual cortex in all animals with complex vision systems.
 
Subsequently, it only seems reasonable that there is no need to dump terabytes of raw data into the "brain" of a cloud without reducing it to sensible batches of concentrated goodness first. Bring on the AI!
 
Some time ago, I wrote about an
exciting new sensor being developed, that uses input from various sources (sound, light, movement) to determine whether Ann has left the water running at the kitchen sink (again) or the dog is crapping on the living room floor (again). This is "edge computing" at its finest - and compactest, as all the sensors are not strewn about the house, feeding data into the AI processing all that input, but rather they all sit on one, compact logic board which in turn feeds intelligent information to whatever backend system you have (like an app), such as "The water is running in the kitchen sink and no one is there".
 
Going back to the jet engine example, this is clearly a case where the consolidation of raw data into at least semi-intelligent output is an absolute imperative. My guess - to be honest - is that the story of a jet engine pumping terabytes of data into Azure per hour is a case of journalistic copy-catting. That's the same effect that caused half of all Formula-1 interested Germans to call the best slot at the start "Pool Position" (instead of Pole Position): some well-known journalist had fudged while writing up a report on some race that was "stolen" multiple times by other journalists not bothering to write their own report and just a short time later, you heard "Pool Position" not only from your friends but also from race commentators on TV!
 
It is unlikely that engineers at Rolls Royce put together a system that generates so much data, it can't be analyzed as it happens (which is the main idea behind pumping it into the cloud). Going by
this article from 2016 there are 25 sensors feeding data such as fuel flow, temperature, pressure, etc. from various parts of the engine into the data stream.
 
However, wether the data stream is terabytes or megabytes per hour, the idea of feeding the raw data into the cloud just doesn't make sense. AI is more than capable of analyzing even the data from the 25 sensors mentioned in the article in a deep learning system and feeding more concentrated information into the cloud for final analysis. The reason for going to these lengths on a jet engine, though it will be the same for a car or a high-speed passenger train or your house, is to save energy and enable predictive maintenance.
 
The solution probably lies in multiple deep learning modules analyzing a subset of sensors for key indicators that can be relayed to the cloud for individual analysis. Even more important, of course, is to use aggregated data from as many jet engines, cars, trains and houses as possible to feed an AI that can make decisions based on the pre-chewed data from an entire airplane fleet, for example. This is where a cloud-based system "shines", though more and more of this "fleet analysis" activity will likely be run in small deep learning centers of specialized companies.
Comments

Supervised Learning via Crowdsourcing

It isn't a new concept that training deep learning systems requires massive amounts of data. In many cases, this data exists in the form of database content or even web crawling output. AI systems for medical applications can often be trained with gigabytes of readily available data.

How, then, do you train systems that need to be able to differentiate between dangerous and non-dangerous situations based on visual input? Crowdsourcing can be a solution. While many projects in autonomous driving research rely on live
LIDAR analysis, these systems are still relatively expensive when compared to standard video cameras. Also, interpreting LIDAR results can be challenging, especially in fog or rain, where the LASER beams are basically completely obliterated.

Several recent efforts in autonomous driving research are focussing on the analysis of standard high-resolution video camera streams via deep learning systems. As with any deep learning AI, it needs to be trained in order to identify dangerous situations or situations that occur very rarely (so-called "edge cases") but still require a proper and safe reaction of the autonomous vehicle. A frequently cited edge case, as an example, is the driver of an electric wheelchair on the street.

Researchers are
using crowdsourcing to train these systems - with smartphone apps where users pick out and identify objects and / or situations and get paid for their work. Crowdsourcing is ideal for this type of training, for several reasons. One is the relatively low cost associated with getting feedback. The apps that present the images to be worked with use gamification to make the work more appealing, and this combined with a relatively low payout serves to keep the crowd-workforce active. Getting feedback from many different people serves to reduce bias in the training set, though a challenge here is keeping a good mix of participants from different continents, as priorities in driving are quite different between the US, Europe and Asia. Take Stop-signs as an example - these have very different impact on a driver depending on where he or she is from.

All in all, crowdsourced training of deep learning AI may be the most efficient way forward make get autonomous driving possible using just video camera input - even if this technology is combined with other sensors (feeding into other AI systems), it will benefit the introduction of high-safety autonomous driving in the near future.

Comments

Poor Journalism doesn't help AI

There are bones to pick between me and most tech journalists. As always, there are exceptions, but many tech journalists simply don't seem to have an inkling of knowledge on the subjects they write on.

Many years ago, I visited an course on rhetoric - and if anything sticks in my mind from that evening class, it is this: "if all your knowledge covers a regular sheet of paper, that which you present to the outside world should not be bigger than a box of matches." What's the point of that? Well, if you spill all of your knowledge on a particular subject (in other words, if it is so little that you can do so in an article), then you'll be treading on thin ice when the questions start flying.

And while my expectation level of many of the "journalistic institutions" on the Web has dwindled over the years to make such encounters less painful, I would not have expected journalism so poor from a magazine like Scientific American. Apparently, even this iconic institution of educating the average American on science news has gone on the cheap. Specifically,
this article about AI used in a new camera system sparked the goading in me.

Behold this statement of utter crap: "An artificial neural network is a group of interconnected computers configured to work like a system of flesh-and-blood neurons in the human brain." Wow. Really folks? The paragraph goes on to say that "the interconnections among the computers enable the network to find patterns in data fed into the system, and to filter out extraneous information via a process called machine learning."

While the article goes into great detail on the use of memristors in the device and indicates that "getting all of the components of a memristor neural network onto a single microchip would be a big step." Quite unfortunately, the article doesn't go into the direct advantages of using memristors as the hardware for running an AI. I can see the advantage of doing some pre-insight on a vision device (much as our optic nerve pre-processes vision input to feed more abstract concepts into the brain's vision center in the cerebral cortex. This isn't that new a concept, by the way, as
this 2014 paper from Cornell University demonstrates.
Comments

Video Dragnet via AI soon a reality?

A dragnet investigation is an attempt to find a person or thing (such as a car) by defining a certain area and the physical aspects of the person or thing sought and systematically checking every matching person/thing one comes across.

And everyone remembers the 1987 film by the same name, right? Right?

If you've ever been in the UK, you'll know that they are - depending on the city you're in - probably the most videotaped people on the planet. Any public place, even in smaller outskirts of London, for example, have cameras pointing every which way. Makes you wonder just how many sit behind the monitors that all this video signal feeds into.

And while AI systems can do incredible things in identifying people by their faces (see
this article from 2016), it is one thing to identify a stationary face nearly filling the available camera resolution. It is another, to identify a "perp" that walks tangentially to a video camera, maybe wearing a hat that shades part of the face.

Image classification tasks, when performed via neural nets, tend to require a huge amount of training data. This makes sense, after all we're talking about recognition of a matrix that may be 4 Megapixel and in color (at least 16, possibly 24 bits per pixel) - you do the math! And a huge learning set means hundreds or thousands of person-hours of manual tagging.

A method, invented by Ian Goodfellow during a discussion over beers in a bar, makes this huge learning set go away. The method is called "generative adversarial networks". In essence, the setup involves at least two AI systems that "play" against another. One AI (or set of AI) is called the "generative" side and the opposing AI is the "discriminative" side. With a basic training set in place, the generative side is triggered to produce pictures of birds, for example. To get the game going, the discriminator is presented with random images from the generator (bad) and real images that fit the training set (good). I.e. in this case a binary classification.

There is a feedback from the discriminator to the generator wether it classifies the picture as that of a bird. Award points are given, depending on which side "wins" each round. I.e. if the discriminator correctly identifies an images as incorrect, it gets a certain number of points. If the generator fools the discriminator, it gets the points. The goal of the game is, of course, to have the most points.

The method was introduced in 2014 and has taken the AI community like an Australian bushfire (you know - the one with the bushes that have oily bark). It is a simple and cheap method to introduce self-learning to AI systems, with minimal human intervention.

A lot has been done with the concept in the last three years, with one of the more
recent research papers by Hang Zan (et al) introducing stacked GANs, where self-learning of image generation is pushed to incredible new resolutions. Have a look at the paper for some really incredible, 256x256 pixel full-color images of birds that were generated from a text description.

Where am I going with this? Well, one of the tools used by police in dragnet operations - in the case of a person search - may be a facial composite, based on the description of a witness or victim. Putting together one of these images requires experts with long years of experience, despite the availability of software that assists in the process.

What if one could throw the textual description of a perpetrator, ideally from multiple witnesses, into a stacked GAN and have it spit out a selection of composites to use in the dragnet operation? And with many cities - especially in the UK - wired with high coverage for video surveillance, one could then use these images in another stacked GAN that analyzed them in comparison to still images from the video feed. Surely, this will require more
TPU-like power to do properly, but give it another 5 years and with Moore's law, we should be there.
Comments