About the significance of machine cognition


About the significance of machine cognition

Edge Computing and AI

AI isn't the sole turning point in IT right now. Another revolution is in the "Internet of Things" or IoT. The idea behind IoT is to give an IP address to tiny sensors and devices that aren't smartphones, tablets, workstations or cars. The smart home, if it wants to become mainstream, will require IoT instead of the current status quo: dozens of proprietary standards that don't interoperate.
But IoT's real benefit is in industry and services. The much-quoted Rolls-Royce jet engine that uses IoT sensors ( is just an example. Another is the use of such sensors to detect the number of plates that remain at a cafeteria line (and that trigger an event that causes the plates to be refilled if they run low).
Sensor data is irrelevant, however, if it isn't processed or at least stored for later processing, however. And depending on the device in question, the data stream coming from such a device can be immense, as in the jet engine (multiple terabytes per hour of flight).
Currently, the companies offering cloud computing (i.e. Microsoft, Google, Amazon and some outliers) are pushing for IoT data to be directly pumped into their respective clouds to be used for analysis. This can be problematic, however. Sticking with the example of the jet engine, the amount of data generated is not only extreme, it is also very difficult to route that information quickly enough into - in this case Microsoft's - cloud. The only live connection available to a jetliner in flight is a satellite uplink - very expensive and potentially not fast enough.
Bring in what IBM terms "
Edge Computing", while Cisco prefers the term "Fog Computing" and Microsoft speaks of the "intelligent Cloud". Whichever terminus finally catches on is probably yet to be determined, but my preference is that of IBM's, as the term "Fog" implies "being in the cloud and not realizing it" to me.
It is actually quite surprising to me that this topic is seemingly sinking into the various cloud firms just now. Look at it this way: the human visual system doesn't pump raw sensor data (via the optic nerve) into the brain, as this would flood the brain with information, rendering it useless for other tasks (such as writing articles like this). Nature has realized long ago, that raw sensor data needs to be pre-processed before being handed off to the brain for interpretation, resulting in the development of visual cortex in all animals with complex vision systems.
Subsequently, it only seems reasonable that there is no need to dump terabytes of raw data into the "brain" of a cloud without reducing it to sensible batches of concentrated goodness first. Bring on the AI!
Some time ago, I wrote about an
exciting new sensor being developed, that uses input from various sources (sound, light, movement) to determine whether Ann has left the water running at the kitchen sink (again) or the dog is crapping on the living room floor (again). This is "edge computing" at its finest - and compactest, as all the sensors are not strewn about the house, feeding data into the AI processing all that input, but rather they all sit on one, compact logic board which in turn feeds intelligent information to whatever backend system you have (like an app), such as "The water is running in the kitchen sink and no one is there".
Going back to the jet engine example, this is clearly a case where the consolidation of raw data into at least semi-intelligent output is an absolute imperative. My guess - to be honest - is that the story of a jet engine pumping terabytes of data into Azure per hour is a case of journalistic copy-catting. That's the same effect that caused half of all Formula-1 interested Germans to call the best slot at the start "Pool Position" (instead of Pole Position): some well-known journalist had fudged while writing up a report on some race that was "stolen" multiple times by other journalists not bothering to write their own report and just a short time later, you heard "Pool Position" not only from your friends but also from race commentators on TV!
It is unlikely that engineers at Rolls Royce put together a system that generates so much data, it can't be analyzed as it happens (which is the main idea behind pumping it into the cloud). Going by
this article from 2016 there are 25 sensors feeding data such as fuel flow, temperature, pressure, etc. from various parts of the engine into the data stream.
However, wether the data stream is terabytes or megabytes per hour, the idea of feeding the raw data into the cloud just doesn't make sense. AI is more than capable of analyzing even the data from the 25 sensors mentioned in the article in a deep learning system and feeding more concentrated information into the cloud for final analysis. The reason for going to these lengths on a jet engine, though it will be the same for a car or a high-speed passenger train or your house, is to save energy and enable predictive maintenance.
The solution probably lies in multiple deep learning modules analyzing a subset of sensors for key indicators that can be relayed to the cloud for individual analysis. Even more important, of course, is to use aggregated data from as many jet engines, cars, trains and houses as possible to feed an AI that can make decisions based on the pre-chewed data from an entire airplane fleet, for example. This is where a cloud-based system "shines", though more and more of this "fleet analysis" activity will likely be run in small deep learning centers of specialized companies.

Supervised Learning via Crowdsourcing

It isn't a new concept that training deep learning systems requires massive amounts of data. In many cases, this data exists in the form of database content or even web crawling output. AI systems for medical applications can often be trained with gigabytes of readily available data.

How, then, do you train systems that need to be able to differentiate between dangerous and non-dangerous situations based on visual input? Crowdsourcing can be a solution. While many projects in autonomous driving research rely on live
LIDAR analysis, these systems are still relatively expensive when compared to standard video cameras. Also, interpreting LIDAR results can be challenging, especially in fog or rain, where the LASER beams are basically completely obliterated.

Several recent efforts in autonomous driving research are focussing on the analysis of standard high-resolution video camera streams via deep learning systems. As with any deep learning AI, it needs to be trained in order to identify dangerous situations or situations that occur very rarely (so-called "edge cases") but still require a proper and safe reaction of the autonomous vehicle. A frequently cited edge case, as an example, is the driver of an electric wheelchair on the street.

Researchers are
using crowdsourcing to train these systems - with smartphone apps where users pick out and identify objects and / or situations and get paid for their work. Crowdsourcing is ideal for this type of training, for several reasons. One is the relatively low cost associated with getting feedback. The apps that present the images to be worked with use gamification to make the work more appealing, and this combined with a relatively low payout serves to keep the crowd-workforce active. Getting feedback from many different people serves to reduce bias in the training set, though a challenge here is keeping a good mix of participants from different continents, as priorities in driving are quite different between the US, Europe and Asia. Take Stop-signs as an example - these have very different impact on a driver depending on where he or she is from.

All in all, crowdsourced training of deep learning AI may be the most efficient way forward make get autonomous driving possible using just video camera input - even if this technology is combined with other sensors (feeding into other AI systems), it will benefit the introduction of high-safety autonomous driving in the near future.


Poor Journalism doesn't help AI

There are bones to pick between me and most tech journalists. As always, there are exceptions, but many tech journalists simply don't seem to have an inkling of knowledge on the subjects they write on.

Many years ago, I visited an course on rhetoric - and if anything sticks in my mind from that evening class, it is this: "if all your knowledge covers a regular sheet of paper, that which you present to the outside world should not be bigger than a box of matches." What's the point of that? Well, if you spill all of your knowledge on a particular subject (in other words, if it is so little that you can do so in an article), then you'll be treading on thin ice when the questions start flying.

And while my expectation level of many of the "journalistic institutions" on the Web has dwindled over the years to make such encounters less painful, I would not have expected journalism so poor from a magazine like Scientific American. Apparently, even this iconic institution of educating the average American on science news has gone on the cheap. Specifically,
this article about AI used in a new camera system sparked the goading in me.

Behold this statement of utter crap: "An artificial neural network is a group of interconnected computers configured to work like a system of flesh-and-blood neurons in the human brain." Wow. Really folks? The paragraph goes on to say that "the interconnections among the computers enable the network to find patterns in data fed into the system, and to filter out extraneous information via a process called machine learning."

While the article goes into great detail on the use of memristors in the device and indicates that "getting all of the components of a memristor neural network onto a single microchip would be a big step." Quite unfortunately, the article doesn't go into the direct advantages of using memristors as the hardware for running an AI. I can see the advantage of doing some pre-insight on a vision device (much as our optic nerve pre-processes vision input to feed more abstract concepts into the brain's vision center in the cerebral cortex. This isn't that new a concept, by the way, as
this 2014 paper from Cornell University demonstrates.

Video Dragnet via AI soon a reality?

A dragnet investigation is an attempt to find a person or thing (such as a car) by defining a certain area and the physical aspects of the person or thing sought and systematically checking every matching person/thing one comes across.

And everyone remembers the 1987 film by the same name, right? Right?

If you've ever been in the UK, you'll know that they are - depending on the city you're in - probably the most videotaped people on the planet. Any public place, even in smaller outskirts of London, for example, have cameras pointing every which way. Makes you wonder just how many sit behind the monitors that all this video signal feeds into.

And while AI systems can do incredible things in identifying people by their faces (see
this article from 2016), it is one thing to identify a stationary face nearly filling the available camera resolution. It is another, to identify a "perp" that walks tangentially to a video camera, maybe wearing a hat that shades part of the face.

Image classification tasks, when performed via neural nets, tend to require a huge amount of training data. This makes sense, after all we're talking about recognition of a matrix that may be 4 Megapixel and in color (at least 16, possibly 24 bits per pixel) - you do the math! And a huge learning set means hundreds or thousands of person-hours of manual tagging.

A method, invented by Ian Goodfellow during a discussion over beers in a bar, makes this huge learning set go away. The method is called "generative adversarial networks". In essence, the setup involves at least two AI systems that "play" against another. One AI (or set of AI) is called the "generative" side and the opposing AI is the "discriminative" side. With a basic training set in place, the generative side is triggered to produce pictures of birds, for example. To get the game going, the discriminator is presented with random images from the generator (bad) and real images that fit the training set (good). I.e. in this case a binary classification.

There is a feedback from the discriminator to the generator wether it classifies the picture as that of a bird. Award points are given, depending on which side "wins" each round. I.e. if the discriminator correctly identifies an images as incorrect, it gets a certain number of points. If the generator fools the discriminator, it gets the points. The goal of the game is, of course, to have the most points.

The method was introduced in 2014 and has taken the AI community like an Australian bushfire (you know - the one with the bushes that have oily bark). It is a simple and cheap method to introduce self-learning to AI systems, with minimal human intervention.

A lot has been done with the concept in the last three years, with one of the more
recent research papers by Hang Zan (et al) introducing stacked GANs, where self-learning of image generation is pushed to incredible new resolutions. Have a look at the paper for some really incredible, 256x256 pixel full-color images of birds that were generated from a text description.

Where am I going with this? Well, one of the tools used by police in dragnet operations - in the case of a person search - may be a facial composite, based on the description of a witness or victim. Putting together one of these images requires experts with long years of experience, despite the availability of software that assists in the process.

What if one could throw the textual description of a perpetrator, ideally from multiple witnesses, into a stacked GAN and have it spit out a selection of composites to use in the dragnet operation? And with many cities - especially in the UK - wired with high coverage for video surveillance, one could then use these images in another stacked GAN that analyzed them in comparison to still images from the video feed. Surely, this will require more
TPU-like power to do properly, but give it another 5 years and with Moore's law, we should be there.

Big AI is watching you!

Last month, I wrote about a newly developed smart sensor that uses sound and AI to identify activities. Nest, a Google company, has now upped the ante.

newest home surveillance camera, the Nest Cam IQ, not only comes with a whopping 4k sensor, but also with new "brains" to analyze the video and audio stream. These "brains" are not built into the camera, of course. We are felt lightyears away from having that sort of capability in a package this small. The AI doing the video analysis runs on the Nest Aware service available as a subscription. You can bet your booty that Nest benefits from Google's new TPU 2.0 hardware, as analyzing video and audio content simultaneously and live takes a lot of processing power.

Anyone (like me) that runs a home surveillance system based on simple pixel change algorithms knows the benefit that intelligent analysis of video streams would bring. To give you an idea of the difference, allow me to explain how current systems (with some exceptions) work. The goal of surveillance recording is not to net the most hours of video material. The goal is to single out security issues, either so that it is easy to find the relevant footage after the fact or - ideally - to get a system alert while an issue occurs. Traditional surveillance systems use a method where video pixel changes are analyzed and situations are flagged where a certain number of pixels (that corresponds to a certain size) changes in a certain timeframe. Think of a dog jumping over a fence or a door being opened.

While it is pretty simple to trigger an event if a door is opened that pretty much fills the camera frame, the situation changes drastically if the camera is recording a yard. Wind will move bushes and trees, and this will trigger the recording mechanism, resulting in a lot of recorded material. If you want to find out if someone is trespassing in your yard during the day, you'll have to look at hours and hours of video of moving trees.

There have been systems available that use algorithms to try to discern the shape of a person, for example. One of these that I have used personally is from
Sighthound, Inc. which continued to develop the product I used several years ago ("Vitamin D"). Sighthound claims to have an SDK available that permits the training of a neural network used for facial recognition that runs locally on an iPad, using only that iPad's hardware.

While the training and recognition of individual faces may even work in a simple AI on an iPad, the video analysis that Nest offers goes several levels higher on the complexity scale. Think of it this way: instead of recognizing someone's face when they stand in front of a camera, a complete AI solution should be able to detect a lot more information, such as

"John Doe, wearing blue shorts and a Metallica T-Shirt. John just entered the dining room carrying a tablet computer and put that tablet computer on the table, leaving the room in the direction of the kitchen. John looks tired".

If that doesn't sound like big brother watching, I don't know what does!

Since the new Next camera not only records video in ultra high resolution but also sound in excellent quality, the data is sufficient for an AI to discern a lot of activities. Want to know how your aging mother is doing in her apartment three blocks down? In a scenario where the Nest service is connected to your Alexa account (which is planned), you could just say: "Alexa, what is mom doing right now?" and Alexa would answer "Your mother is sitting at the table doing a crossword puzzle." And while your mom might not appreciate being watched 24/7 by an online camera system, both of you will likely appreciate an urgent message being automatically sent out that your mom has been lying on the floor without motion for two minutes.

I would have an immediate use for a system like that: one that tells me when our cat is sitting in front of the terrace sliding door, waiting to be let in. I don't think I'll be putting these devices into my kids' bedrooms anytime soon.