Why Your Analytics of Things Aspirations Will Fail

Yes, everybody’s talking about analytics and the Internet of Things (IoT).  A recent Gartner hype cycle places the “Internet of Things” squarely at the top of the so-called “peak of inflated expectations.”  Nearby, just a little down the slope on either side, are “big data”, “prescriptive analytics” and “data science.”  So when you hear talk about the “analytics of things” you are right to be skeptical.

Your skepticism should be focused on the readiness of data to support the lofty aspirations put forth by vendor marketing campaigns and pundit bloggers.  First Analytics has been analyzing machine and sensor data for a long time.  Before we commit to developing a new application we conduct a data readiness assessment.  In almost every case we find that IoT data is not ready.  Here are the primary impediments encountered:

  • Poor data model design: engineers and architects did not have advanced analytics in mind when designing PLCs, event recorders, and their supporting data collection systems.
  • Missing swaths of data: the patterns of missing-ness are varied (time, location, device, intermittency, etc) and more common than one would expect.
  • Anomalous data: unfortunately we aren’t talking about anomaly detection as in the sense of predictive maintenance, but numerical observations that are flat-out wrong or infeasible.
  • Machine “historians” with short-term memory:  why do they call them historians if they don’t keep history for very long?

thing1thing2.jpegThere are indeed very valuable applications to be found in the Internet of Things.  But to avoid failure you must invest the time and effort at the very outset to qualify your data.  Gaps can often be addressed through a well thought-out plan, but it will usually take time to build the right data systems and history.

Too many organizations spend the early stages of their IoT iniatives in planning for applications they later come to learn cannot be supported by the data.  Using a data-first, data-driven approach is a tangible way to get your IoT iniative started and mitigate the risk of failure before you spend too much time and expense in planning.

Tagged with:
Posted in AoT

Visualization: The Weather of My Life

First Analytics team member Matt Yancey shows some nifty visualizations of the weather of every day of his life:

The Weather of My Life

Matt Weather

Tagged with: , ,
Posted in Fun, Visualization

Eight Data And Analytics Capabilities You’ll Need For The IoT

By Tom Davenport.  Originally published in Teradata Perspectives at Forbes.com

There is widespread agreement that the Internet of Things will be a transformative factor in the business use of information. The prospect of billions of connected devices promises to transform home activities, transportation, industrial operations, and many other aspects of our lives.

The bad news about the IoT is that we have a lot of work to do before we are ready for it. We’ve got to up our games considerably with regard to data management and analytics if we’re going to capture, store, access and analyze all the IoT data that will be flowing around the Internet. The good news (in addition to its potential) is that most organizations have a few years to get better at these capabilities before the real onslaught hits. The sensor devices, IoT data standards, and data management platforms are still in their relatively early stages, and no customer, business partner, or CEO could reasonably expect that you could tame all that IoT data today.

AoT graphic

But they will soon. So it’s time to think now about the data management and analytics capabilities you will need to have, say, over the next five years as the IoT matures and blooms. I’ll describe eight of them, but of course there are some other capabilities that underlie them (data security, for example). These are about running, but you will have already needed to walk first.

  1. Data quality tools on steroids—The IoT is going to generate a massive amount of data, and a good bit of it is going to be of problematic levels of quality. Sensors will send bad data, devices will go offline and create missing data, and integration platforms will fail to integrate. So companies need to improve their data quality capabilities massively and employ automated tools to a large degree. This includes identifying data quality problems, determining their seriousness, and fixing them both after data have been collected and at the source.
  2. Data curation on a grand scale—Similarly, companies will need to become much better and faster curators of multiple data sources. If you’re a car manufacturer, for example, your cars already have a couple of hundred sensors in place, and you’re probably planning a lot more. Data curation allows companies to keep track of their data sources, their formats, and their interrelationships. And the scale of the IoT is going to mean that companies will have to make widespread usage of tools like machine learning, which are already being applied to data curation in some companies and vendors.
  3. Qualify your alerts—Alerts are one of the key ways to analyze IoT data, in that organizations will need to know what readings are in and out of normal bounds. But the vast amount of IoT data is going to make alert fatigue a common occurrence unless you have done a good job of qualifying alerts to ensure that they are real and important. You’ll also need to qualify the many security alerts that your IoT system will probably generate. All of this is going to require some high-quality diagnostic models, and I’m guessing that you don’t have them today.
  4. Swim in a data lake—You’re not going to be able to undertake an extended ETL (extract, transform, and load) process to store your IoT data in a traditional data warehouse. Some data may eventually go there, but you need to store and refine it first. So you had better establish a data lake that lets you store the data in whatever format it comes in until you need to analyze it. By the time the IoT data arrives in force, you should be well-practiced in moving data into and out of your data lake.
  5. Predictive analytics—Most organizations thus far have only employed descriptive analytics with IoT data—bar charts, alerts, means and medians. These are useful but not nearly as useful as predictive analytics. We’ll want to know whether a machine is about to break down, whether your car is likely to arrive on time, and whether your good health will persist. That takes a solid competency at predictive analytics.
  6. Automated recommendations and actions—IoT data will flow into your organization at a fast and furious rate, and you’re not going to have enough humans to examine and decide upon it. That means you should be well-versed in building and using automated decision systems by the time the IoT is mature. This capability could take a variety of forms—simple rules, event-driven systems, or sophisticated cognitive capabilities (see the next two items). By the time the IoT is ready, you should be ready to employ the right automation technology for any situation.
  7. Machine learning to create analytical models—Automating IoT processes will require a large number of analytical models, and you won’t have the time or people to create them using traditional hypothesis-based methods. Each type of device and data is going to require its own set of models, and the analysis situations will change quickly. So machine learning is the ticket to developing models rapidly and with much greater analyst productivity. Start now to develop a facility with it, because machine learning is relevant to a wide variety of situations. Machine learning models can also be helpful in identifying unauthorized intruders into your systems, which is critical for IoT security.
  8. Deep learning models for image and sound data—Deep learning, which is based on neural network methods, is the best way to analyze large amounts of image and sound data. Want to know if the drone images you’re receiving detect an unauthorized intruder? Are your sonic sensors detecting squeaks and squeals from your car engine that indicate a lack of lubrication? Deep learning models are the way to make sense of this data. They can also be used to identify patterns in cybersecurity attacks.

No doubt there will be other capabilities that IoT-centric organizations will need to develop, but this is a good start. And many of the ones I have mentioned have relevance to other types of data and analysis contexts. An IoT-capable data and analytics environment is basically one that is state of the art given the technologies and analytical methods that are available today. So it’s time to get busy and make sure you have an implementation trajectory that will ensure you are ready when the IoT data starts flowing in a big way.

Tagged with: , ,
Posted in AoT, Cognitive Computing, Use Cases

Pax Analytica

One of the leaders of First Analytics, Tom Davenport, has written about how to organize analytics professionals within companies. For example, in “Five Ways to Organize Your Data Scientists” he comments on organizational variations ranging between a fully decentralized organization with no coordination, to a centralized group reporting into a strategy function.

Now we are starting to see the emergence of the so-called Chief Analytics Officer. There is even a conference for such executive professionals (Tom is also involved with the sponsoring agency, the International Institute for Analytics). In the early days of our profession analytics was performed in a grass-roots sense, for niche applications, and in various disparate pockets of a company. There was no coordination of effort or resources and there were often several “lone wolves” rocking the boat.   As analytics as a practice took hold, the waste of resources and the downside of political and cultural clashes of these uncoordinated efforts led to recommendations of taking an enterprise approach to analytics.

In European history the expansion of the Roman Empire subjected many fiefdoms to the rule of the emperor. It is said that this imposed Pax Romana established, for the first time in human history, a protracted period of peace, stability, and economic prosperity.statue-augustus

So as companies mature to an enterprise approach to analytics, will companies who establish the rule of their own emperors — CAOs — experience peace and prosperity with analytics? Or are CAOs just another fancy “me-too” title to add to the now-prolific list of CxOs?

One of our clients is counting on the former now, as he moves to establish the post and associated organizational structure. But it is a difficult thing to design and implement. And of course, the political, cultural, and change management issues are daunting.

What advice would you give him?

Tagged with: , ,
Posted in Analytics, Human Aspects

When Will the Analytics of Things Grow Up?

By Tom Davenport.    Initially published on Data Informed.

A few years ago, while working with DJ Patil (now the Chief Data Scientist of the U.S. Office of Science and Technology Policy) on an article about data scientists, he related to me a general rule about big data that we had both observed in the field: “Big data equals small math.” My explanation for this phenomenon is that companies often have to spend so much time and effort getting big data into shape for analysis that they have little energy left for sophisticated analytics. The result is that, for many organizations, the most complex analysis they do with big data is the bar chart.

Unfortunately, the same situation is true for Internet of Things (IoT) analytics. This should not be surprising, since it’s a form of big data. The challenge with IoT data is often not the volume, but the variety of data. If you want to know what’s going on with a car, for example, there are a couple of hundred sensors creating data that require integration, much of it in manufacturers’ proprietary formats.AoT

As a result, most of the “analytics of things” thus far have been descriptive analytics – bar (and Heaven forbid, pie) charts, means and medians, and alerts for out-of-bounds data. These “measures of central tendency” are useful for reducing the amount of data and getting some idea of what’s going on in it, but there are far more useful statistics that could be generated on IoT data.

So for the rest of this column, I’ll describe the analytics of things – both current and potential – in terms of the typology of analytics that I and others have employed widely: descriptive, diagnostic, predictive, and prescriptive.

Descriptive Analytics for the IoT

As I mentioned above, these have been the most common form of IoT analytics thus far. But there is still progress to be made in the descriptive analytics domain. Integrated descriptive analytics about a large entity like a person’s overall health, a car, a locomotive, or a city’s traffic network are required to make sense of the performance of these entities. The city-state of Singapore, for example, has developed a dashboard of IoT traffic data to understand the overall state and patterns of traffic. It’s not the be-all and end-all of IoT analytics, but it at least gets all the important descriptive analytics in one place.

Another useful form of descriptive IoT analytics is comparative analytics, which allow a user to compare an individual’s or an organization’s performance to that of others. Activity tracker manufacturers like Fitbit and fitness data managers like RunKeeper and MyFitnessPal allow comparison with friends’ activities. The comparative descriptive analytics provide motivation and accountability for fitness activities. Similarly, the Nest thermostat offers energy reports on how users compare to their neighbors in energy usage.

Diagnostic Analytics for the IoT

I have not often used the “diagnostic analytics” classification favored by Gartner because the explanatory statistical models it involves are usually just a stepping stone to predictive or prescriptive analytics. But diagnostic analytics have some standalone value in the IoT context, particularly for qualifying alerts. One big problem for the IoT is going to be the massive number of alerts that it generates. Alerts are generally intended to get humans to pay attention, but “alert fatigue” is going to set in fast if there are too many of them – as there are already today in health care with medical devices. Diagnostic analytics can determine whether alerts really need attention and what is causing them. My friends at the analytics software company Via Science, where I am an adviser, tell me that Bayesian networks are really good at distinguishing important alerts, and I take their word for it. I would imagine that logistic regression models could do it sometimes as well.

Predictive Analytics for the IoT

While there aren’t a lot of examples of predictive analytics with IoT data yet, there are some, and there need to be more. The most common example is probably predictive locational analysis, which happens every time I use my smartphone or car GPS to plan a route. Somewhat less but increasingly common is predictive maintenance on industrial machines, which tells companies like GE or Schindler Elevator that their equipment is about to break down, so it better be serviced.

Predictive health is another area with a lot of potential, but not much actual, value. Applications could take your daily steps, weight, and calorie consumption (that’s the toughest data point at the moment, since it relies on self-reporting), and predict things like your likelihood of getting Type 2 diabetes, or even your lifespan. More prosaic predictions could involve your likelihood of losing weight in time for your class reunion, or your beating your best time in an upcoming marathon.

Prescriptive Analytics for the IoT

Prescriptive analytics are those that provide specific recommendations based on predictions, experiments, or optimizations. It’s not hard to see how these could be valuable with the IoT. An airline pilot could be told, “Shut down engine number four now, before it overheats.” At GE, maintenance people are already told when to wash a jet engine with water, which apparently lowers the failure rate and lengthens the lifespan of the engines.

Prescriptive medical applications of the IoT could be very valuable as well. Medical device data could tell clinicians when to intervene with particular patients. Instead of annual medical checkups, which are both expensive and not terribly effective, patients could be advised by home health devices when to see a doctor. Philips already has a service offering called CareSage that uses wearable device data to alert clinicians that an elderly patient needs an intervention.

In some IoT environments, such as “smart cities,” analytics will need to provide automated prescriptive action. It’s useful to look at a dashboard and know which streets are congested in Singapore, for example, but the real value comes when a system can change traffic light durations and block off freeway entrances based on IoT data. Similar automated actions will need to be put in place for industrial environments with IoT sensors and data. In such settings, the amount of data and the need for rapid decision making will swamp human abilities to make decisions on it.

Given the young age and high complexity of the IoT data environment, it’s not surprising that the “analytics of things” isn’t very mature yet. But in order for us to get value from the IoT, we need to move beyond bar charts as quickly as possible.

Tagged with: ,
Posted in AoT

First Analytics celebrates Union Pacific’s Safety Achievement

Union Pacific’s announcement this week of an all-time record low reportable injury rate, making it the safest Class 1 railroad in the United States, is a validation of a data-analytic approach to safety.  First Analytics has worked with Union Pacific since 2011 to help the railroad build their data assets regarding safety, and employ statistical and predictive modeling to identify and mitigate risks — both to employees and the public.

487px-Union_Pacific_Logo.svgAnalytical applications vary in their social value and we take pride in knowing that our Prescriptive Safety Analytics solution has made an impact on many lives.

Tagged with: ,
Posted in News

When data prep turns into p-hacking

One might assume that data generated by sensors and event recorders would be clean because after all, these are precise instruments, right? Alas, the analytics of things suffers from the age-old challenge of data quality. This screenshot from SAS Visual Analytics show that for several minutes, the tank level measurement mysteriously drops to zero quite frequently.

AoT VA data quality

Recently we encountered quite a few of these anomalous measurements on a project. As part of our data quality and preparation routines we set up a “bimodality detector” to isolate these patterns. But then what to do with that data? Options range from throwing it out, to employing various imputation and outlier remediation techniques, both simple and elaborate

We found this is a slippery slope.  We can run our entire process end-to-end, from data intake to model calibration, and see the end results of our remedies. When the stakeholders have a particular answer in mind it’s tempting to tune the remedies, running the process over and over again to approach that answer.

P-hacking is a type of data-dredging wherein one has an answer in mind, and, speaking in the traditional regression sense, one selects variables and a model specification to support the desired outcome. The ‘p’ refers to the p-value used in traditional statistical hypothesis testing.

But while we think of p-hacking as relating to the building of the model or algorithm, it turns out that outcomes sometimes can be influenced more by how the data coming in is transformed.

Machine learning techniques sometimes are perceived to be objective. But there is obvious subjectivity in the selection of the algorithm and its parameters.   Taken together with this ability to intervene within the data preparation stage, there is even more subjectivity from end-to-end.

This is not to say it is wrong to have human domain knowledge inform the analytics process — in fact, the best analytics are steered in the right direction by a human modeler. But let’s dispense with the illusion that “data driven” decisions are free from human biases.

Tagged with: , ,
Posted in Analytics, AoT, Implementation

Let’s not overlook the old analytical mainstays for opportunities

The hot field of analytics is rife with hype and its associated buzzwords, such as “big data” and “data science.” Some have pointed out that, while the world fixates on big data, there are opportunities to be had with the small data that is sitting right in front of us.

Business Forecasting BookThe techniques that are getting all the attention in the analytics community these days are tagged with names such as “deep learning” and “convolutional neural networks.” Indeed, staffers at First Analytics have enjoyed using these methodologies successfully on live projects. For the most part, many of these techniques are not all that different than their predecessors in their pedigree. The approaches per se are not new, just the particular new idea or evolution that builds upon the original concept.

We were happy to see the publication of a new book entitled Business Forecasting, Practical Problems and Solutions. Statistical forecasting as a practice, including its models and its processes, goes back many decades. Therefore, many may be tempted to think there is nothing new there of interest, especially when “all of the fun” is to be had in machine learning. This book, a compilation of writings from many thought leaders in the forecasting field, shows that innovation is still taking place there.

As an implementer of many forecasting systems, First Analytics has consistently seen improvements in forecast accuracy, processes, and business results using these techniques which, while perhaps mistakenly perceived as dated, still hold muster, and in fact, themselves have seen evolution and improvement.

Tagged with: ,
Posted in Analytics, Forecasting

Cognitive technologies all set to transform business processes

(republished with permission)

Some cognitive technology vendors and customers are moving past the “science project” phase and using artificial intelligence (AI) to transform business processes. Now is the time for businesses to consider the areas suitable for AI and how they can help transform key processes.

cognitive-computingWith most new technology, there is a period of excitement and media attention about the power and capabilities of the technology itself. But with effective technologies, the focus should shift rapidly to how the new tools can change business processes. We’re at the beginning of that shift for cognitive technologies—systems that can ingest and analyze information in various forms, and make intelligent decisions.

The early applications of cognitive technologies revolved around their capabilities in games, including Jeopardy! and chess. These were impressive displays that whetted appetites for the smart machines, but the application to business wasn’t entirely clear. Now, however, it’s time to think about the business processes that cognitive technologies can help transform, and how that reengineering can be accomplished.

There are still some vendors and customers who view cognitive technology as suitable only for science projects. But I believe some leading vendors of these technologies are shifting to a process focus. When the early hoopla wears off, they’ll be evaluated on how they help companies transform key processes.

How exactly should companies go about this reengineering process? I think there are several key steps to consider. The first is to think about just what processes are suitable for cognitive reengineering, or “cognitizing” as Cognitive Scale chairman Manoj Saxena refers to it. Perhaps obviously, knowledge-intensive processes are often most appropriate for this technology. That typically means that there is some knowledge bottleneck in the process—too much knowledge for humans to absorb and apply, or knowledge that exists somewhere but isn’t getting used to solve a problem.

That sort of process can be found all across an organization. It might involve, for example, assessing customers or suppliers to understand which ones are most desirable to do business with. It might involve assessing financial investments, or looking at M&A opportunities. In legal work, it might entail assessing documents for a case, or extracting contract provisions. It could involve somewhat less intellectual work in administrative contexts—pulling and manipulating data from a variety of information systems, and taking automated action on it. There are literally hundreds or even thousands of knowledge-intensive tasks to which these technologies could be applied.

Given the potential number of opportunities, they need to be prioritized. Unfortunately, it is possible to spend a lot of time implementing intelligent technologies that aren’t actually a fit for your business or don’t solve an important problem. So it’s a good idea to start with a simple question: If you could wave a magic wand and give some select knowledge workers in your organization superpowers, in what ways would you expand their capacity? In particular, what decisions would you help them make better?

Many managers like to think in terms of “leverage points” in the enterprise—that is, places where a small improvement in operational performance is able to yield large gains in market performance or fulfillment of the organization’s mission. Your goal here is to identify those valuable people who are making the decisions that really move the needle on enterprise success. Another question to surface these opportunities might be: Who are some people you are currently compensating highly, yet only wish you could hire more of? Augmenting their capabilities with technology can help you not only keep paying them well, but also attract more of them.

When you have identified these knowledge-intensive roles, talk to them. Do they wish they could do better for their customers, if only they had the computational power? Where are they wasting their time and the company’s money by doing work that doesn’t require or benefit from their talent? If they were able to offload routine tasks they have mastered, how would they propose to use the time instead? The best way to get people to use a new tool, of course, is to give them the tool they ask for.

Once your organization has identified and prioritized some of the key opportunities for cognitive technology, it makes sense to begin thinking about what types of cognitive technologies might address them. There is a range of such tools—from machine learning to natural language processing to robots to robotic process automation. Each does different things to different types of processes.

Sometimes, with a clear idea of what decision or activity to target, it’s easy to identify the cognitive technologies that would apply. At other times, however, breakthroughs in the technology itself suggest possibilities for augmentation you would not have imagined. So it’s useful to stay abreast of developments in the various domains of cognitive technology and to keep asking the question: How could we make use of that?

For example, one recent development is that computers have learned to read and make inferences from fast, vast digestion of textual content. If you weren’t part of the artificial intelligence community, you might have first learned of this when IBM’s Watson won Jeopardy! To come up with each response, Watson (specifically its “Discovery Advisor”) read a “wide range of encyclopedias, dictionaries, thesauri, newswire articles, literary works, and so on.”1 How could you use that power? At the Baylor College of Medicine in Dallas, they used it to read through more than 70,000 scientific articles, looking for accounts of any protein that could modify p53, a protein that regulates cancer growth.2 Most scientists would struggle to identify one such protein in a year; Watson took only a few weeks to find six (of course, it took many years to develop Watson and adapt it to “reading” biomedical literature, and the project was only a proof of concept).3 Other organizations are using similar technologies to glean insights from natural language content that exists in enormous volume.

Or think about the “Internet of Things”—the ability to place small sensors on objects in the physical world and have them communicate readings in real time. The rise of this technology has been governed by the rise of computers with the processing power to deal with the immense amounts of data produced; unaided humans could not conceivably monitor and control the vast sensor networks used, for example, to detect if a tsunami is brewing far offshore. It’s likely to take tools like machine learning to deal with that much rapidly flowing data. It has probably not occurred to your organization, therefore, until recently, to ask: How could we improve our business if we did have that ability? It should probably occur to you to ask that now.

With these factors as background (and perhaps some other discussions about regulatory issues if you are in an industry that is highly regulated), you may be ready to identify some specific processes, applications, and technologies—prioritized by their importance to your organization’s success. Most organizations today aren’t ready for a full portfolio of cognitive applications; so you may want to begin implementing just one or two.

This is not the place to get into detail about how to build and implement cognitive applications, but it is appropriate to emphasize the focus on the business process at all times. Just as with other types of process-changing technologies, it’s important to plan simultaneously for the new system and new ways of doing business. Knowledge workers generally don’t like to be told what to do (even more than other types), so you may want to involve those who are affected in the design process. Begin planning early for new roles and tasks for humans. And if some workers are going to need major new skills in order to keep their jobs—or if you’re not going to need as many workers as you did previously—it’s good business to help them begin preparing early for the new environment.

Not since ERP and e-commerce has a new technology offered as much business-changing potential as cognitive tools. They hold the potential to augment the capabilities of the smartest humans, and to dramatically improve the productivity and effectiveness of complex knowledge work. There is and will continue to be a lot of hype about these technologies, but in my view it’s not undeserved. It’s time to jump on this bandwagon, but make sure you bring along some business change with it.

 

Endnotes

  1. David Ferrucci et al., “Building Watson: An overview of the deep QA project,” AI Magazine, Fall 2010, http://www.aaai.org/Magazine/Watson/watson.php.
  2. Doug Henschen, “IBM Watson speeds drug research,” InformationWeek, August 28, 2014, http://www.informationweek.com/big-data/big-data-analytics/ibm-watson-speeds-drug-research/d/d-id/1306783.
  3. Scott Spangler et al.,  “Automated hypothesis generation based on mining scientific literature,” proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2014.
Tagged with: ,
Posted in Cognitive Computing

Benford’s Law at Lowe’s

By Rob Stevens

I recently installed a replacement mailbox and post, so off I went to Lowe’s to shop for address number decals to affix to new mailbox. When I located the pegboard hooks holding the numbers, as a quant, I immediately noticed something. What do you see in this picture?

lowes digits

Why do the digits 1 and 2 get two slots, while the higher digits only get one?

Why, it’s Benfords Law of course!

From Wikipedia, Benford’s Law “states that in many naturally occurring collections of numbers the small digits occur disproportionately often as leading significant digits.” And, “it has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physibenfordcal and mathematical constants.”

Perhaps the best application of this law is in fraud detection. The next time you submit that expense report, consider the fact that an algorithm could be checking your reports against this equation, which represents the probability of each leading digit occurring:benford equation

Hopefully your expense entries don’t deviate too far from the expected probabilities! Otherwise, you could be flagged as suspicious.

 

Tagged with:
Posted in Analytics, Miscellaneous

Enter your email address to follow this blog and receive notifications of new posts by email.

Archives
Ways We’ve Put Analytics to Work
Ways We've Put Analytics to Work
First Analytics Infographic
First Analytics Website
Follow

Get every new post delivered to your Inbox.