17 analytics to transform your organisation, Part 2

5 December 2019
data-analysis-transform-organisation-ausy
In Part 1 of our article, we looked at the different ways in which analytics have taken hold of data. Based on calculations and comparisons of quantitative data, we are seeing a development towards data mining and information retrieval. As a corollary, analysis techniques incorporate text, speech and voice – but that’s only the beginning. Development continues at a pace, with the next stage sure to incorporate sight and image, followed in turn by the ability to use all these media forms in combination... by Pierre-Sylvain Roos

(Feel free to have a look at the first part of our series just here).

 

5 - Image Analytics

Images are data too – provided you know how to process and exploit them. Image analysis technologies have only very recently reached their full potential. This field is developing fast and is set to experience huge progress in the years ahead.

For the general public, this type of analytics was popularised by Facebook in its huge-scale implementation of facial recognition algorithms. Its original intention was seemingly harmless, simply being a question of automatically tagging “friends” in photos. However, it also served to demonstrate fairly rapidly that this type of analytics is not only powerful and effective, but presents a major risk of abuse in terms of liberty and privacy violations. For Facebook, technological success could well turn into economic disaster. An ongoing class action lawsuit has asserted that seven million Illinois citizens suffered injury as a result of the technology being used without their consent. Facebook could face a fine of up to 35 billion US dollars.  

In a somewhat less controversial arena, one of the first noteworthy applications of Image Analytics began with a rather wacky idea. In 2009, three Harvard students (J. Vernon Henderson, Adam Storeygard, David N. Weil) proposed to measure a country’s GDP based on the amount of nocturnal light visible from space. The images used are photos taken from a US Air Force satellite making 14 rotations of the earth every day. Among the first astonishing results of this process, it was revealed that South Korea had undergone a 72% growth between 1992 and 2008.

 

growth-south-korea-pib-analysis-data

Korean Peninsular: Long-term growth between 1992 and 2008.

 

In 2015, a World Bank initiative asserted the relevance and robustness of image-based analyses. In this case, the project involved establishing trends in the shadow economy in a developing country where official indicators were either unreliable or inaccurate. To do so, the principal contractor used the services of Premise, an agency created by Joseph Reisinger. The firm uses ancillary employees equipped with smartphones in the target country. Their job is to take photos of interesting occurrences that could be of economic significance.

These thousands of intentionally-taken images are then analysed, constituting a real mine of information. Initially, it was a case of taking photos of cigarette packets in public areas, and then analysing whether or not they bore the revenue stamp indicating legal trade. The proportion of cigarette packets with no stamp is proportional to the size of the shadow economy. And by repeating the same operation one year later, it is possible to measure changes in this market segment. This process was used to observe a contraction in the underground market between 2015 and 2016.

     

6 - Video Analytics

Automatic video analysis has only really become established over the past few years. It has developed primarily in two domains: security and retail. And here as well, the potential for improvement and for progress is huge – as we will see.

In the majority of cases, the sources of analysis (input videos) come from CCTV systems. The overall performance of the Video Analytics system is largely dependent on the right number of cameras being installed in the right places.

The main objective is therefore to analyse automatically, and in real time where possible, the behaviour of people as they move around the area covered by the cameras, for example customers in a shopping centre or a crowd in the street.

An additional approach involves accumulating video recordings in order to perform mass a posteriori analyses and to draw lessons from them. For example, in a network of shops (physical points of sale): what colours are customers tending to wear most? How does this change from one season to the next? Or, when it rains, what do customers do with their umbrellas when they enter the shop?

The medium of video is particularly well suited to the direct visualisation of events in space and time. It is an ideal tool for movement detection, tracking or recognition. And the key benefit of a video analytics system is that it can be connected to a large – even a very large – number of cameras (sources) without posing any limitation in terms of lack of vigilance.

Generally, users are looking to identify unusual situations or behaviours and, conversely, established behaviours or patterns that can be used as a basis for making projections.

Before being implemented at a given site, the system must first be put through a learning phase to receive its input. It will be fed enormous amounts of video footage of the site, correlating to regular activity, then to unusual patterns of activity, until the system is able to identify all possible scenarios. The initial analyses can then be produced through differentiation with the learned situations. And of course the system continues to learn and receive continuous input, which in turn makes it more and more efficient. 

In more advanced systems, the video image is augmented with a heat and flow map, which adds thermal imaging and heat flow to indicate people’s presence or footfall. Full implementation of a video analytics system therefore involves a combination of equipment, software components and specialist hardware components used for video processing.

 

data-analysis-traffic-area-shop

Imaging of traffic areas in the aisles of a shop (CCTV source)

 

It has applications relevant to a large number of domains: entertainment, healthcare, retail, automotive, transport, home automation, smoke and fire detection, safety and security.   

In particular, video analytics is revolutionising the retail trade by focussing on how people do their shopping. Through the use of these new systems, it is now possible to model the behaviour of customers in a shop, see which pathways they take, analyse traffic flow, assess which aisles are visited more or less frequently, or even quantify how many customers stop to look at a particular special offer or shelf section.

In the security domain, video analytics is causing quite a stir in its use as an aid to policing, particularly with our Anglo-Saxon neighbours. The company Kinesense is now the leader in its field, providing dedicated forensic investigation solutions. It is able to produce evidence rapidly by processing thousands of videos, which would not be possible using conventional methods. 

And we can't round off Video Analytics without mentioning the latest exploit of Facebook (yes, them again!). In the midst of the turmoil caused by their facial recognition algorithms, Facebook AI Research (FAIR) recently unveiled (October 2019) a novel method for – wait for it – fully reconstructing video images in which faces can no longer be recognised automatically. This method of logical scrambling of identity completely misleads the machines but in no way prevents us humans from being able to recognise “protected” individuals. What’s more, this system can be enabled in real time, i.e. at the very moment the images are shared, with no noticeable latency. Beyond the rather surreal aspect of this innovation, this leads us to make two observations: the first is that human cognition is still far superior to that of automatons when it comes to recognising people, and the second is that image modification techniques have reached a level of sophistication that makes falsification imperceptible to the naked eye. In the future, only by using advanced analytics will we (perhaps) be able to determine whether or not video footage corresponds to real images.   

 

7 - Sentiment Analysis (SA)

Proposed and formalised in 2004 at the AAAI Spring Symposium (Association for the Advancement of Artificial Intelligence), Sentiment Analysis is a culmination of the analytics techniques covered earlier. It combines text, sound and image to produce complex, in-depth analyses. So yes, machines are now capable of understanding motivations and opinions, and of explaining or even anticipating human behaviours.   

The principle involves researching and categorising attitudes, opinions, beliefs, views or emotions through a variety of complementary sources (text corpora, audio or video recordings, structured data).

In this type of sentiment-focussed analytics, there is a heavy reliance on data from social media platforms as these are especially indicative. We are also talking about sentiment-rich data: tweets, ratings and opinions, blog posts, comments, reviews, etc.

In particular, micro-blogging sites are a favoured source for analysing dynamic changes in opinions within a community or a wider public (opinion mining). Some 190 million tweets are posted on Twitter every day... But it’s not all about Twitter, and ideally analysts should draw from a panel of sites, analysed simultaneously, in order to compare and contrast results and produce more in-depth analyses (as individual sites often have their own styles and tendencies).

Sentiment Analysis is used primarily to understand and improve products and services, and for forecasting and monitoring. 

For forecasting, a level of emotion is determined at being confronted with an event (hope, fear, joy). This can be applied to stock market fluctuations, election results, box offices ahead of a forthcoming film, or for the success of a product or service prior to its launch.

Monitoring involves gathering and objectifying feedback in order to monitor and improve, for example, a brand’s e-reputation or the impact of an online marketing campaign.

Sentiment Analysis techniques involve two complementary approaches:

  • Machine Learning and statistical methods are used to establish ratings automatically, with certain more specific methods using semantics, such as LSA (Latent Semantic Analysis).  
  • Lexicon-based methods are what we call knowledge-based techniques. They rely on either dictionaries (synonyms, antonyms, etc.), or on domain-specific text corpora or lists of opinion-related terms.

We won’t need to elaborate further on the strength of the digital lexicon domain, as the plethora of tools speaks for itself: Werfamous, AFINN, General Inquirer, WordNet / SentiWordNet, SentiSense (SentiStrength), Subjectivity Lexicon, Micro WNOp, NTU Sentiment Dictionary (NTUSD), Opinion Finder, NRC Hashtag Emotion Lexicon, etc.

 

sentiment-and-emotions-lexicons-canada-government

The Government of Canada (NRC) provides seven different lexicons for sentiment and emotion. The main entries are in English with translations available in 40 other languages.

 

When first implemented, Sentiment Analysis systems performed basic polarity analyses: positive, negative, neutral. Since then, more advanced analyses have extended to other sentiments (anger, fear, sadness, joy), with associated qualification tags (hostile, amicable, strong, weak, powerful, submissive, active, passive, etc.).  

The ability of multimodal analyses (text, audio, visual) is the very latest development. Data sources are hybridised to produce global interpretations: facial expressions, gestures and postures, words. This results in highly specialised usages, such as monitoring people who are depressed.

On the other hand, despite significant progress being made, Sentiment Analysis technologies are still limited in their ability to correctly interpret irony, exaggeration, sarcasm, tongue-in-cheek and humour in general. 

By introducing a subjective, human dimension to its calculations, Sentiment Analysis can be used to produce more nuanced investigations, to better understand feelings and experiences (of customers, users, employees), and why not also to generate successful ideas based on proven models of behaviours and reactions.

The drawback is that all analyses of this type use data of a highly personal nature. Without effective safeguards being established, alongside guarantees for the collection and use of personal data, it’s really not easy to second-guess the future of Sentiment Analysis.        

         

8 - DataViz & Advanced DataViz

What we call DataViz in crude terms is the use of abstract shapes, lines, dots, circles, squares to build images in order to graphically represent encoded data. As surprising as it seems, this is a relatively recent innovation. In 1983, the broad principles of modern DataViz were theorised by Edward R. Tufte in his landmark publication “The Visual Display of Quantitative Information”. And since then, this area has continued to improve and expand, encompassing digitisation in the process.   

What we learn from DataViz is that by visualising data, we are much more likely to understand it properly and to grasp orders of magnitude which would otherwise remain abstract. Even better, by introducing conventions of scale and form, and colour codes, we can obtain layouts that reveal differences or peculiarities. Let's use the example of a line used to represent the flow of goods: the thickness of the line indicates the volume exchanged, an unbroken line indicates purchases, a dotted line indicates sales, each goods type has a distinct colour, etc. With conventions like these, if we represent the sales and purchases of a major distributor, we can immediately see which types of goods are purchased or sold in greater volumes.

The other significant development, and the real playground of DataViz, is the creation and continual improvement of new graphical representation models, which are becoming increasingly sophisticated and specialised according to the type of data analysed and the type of information to be represented. Sunburst and TreeMap, for example, both emerged several years ago and are now in widespread use.

Data representation is now a really important issue, out of necessity. Data that is not understood by business teams is dead data and cannot be harnessed to create value.

And this is why, at the very end of the chain, DataViz is the vital link needed to best exploit the results of analyses. It is a key stakeholder, a building block, in the process of data recovery and interpretation.    

Since the early 2000s, DataViz has moved upmarket with the development of ever-more powerful and sophisticated tools and techniques, and we have now come to know it as Advanced DataViz. This new generation of DataViz brings more to the table in terms of ease of use, speed and fluidity, able to literally dig into the data and manipulate it. 

A major step forward with Advanced DataViz is the dynamic configuration of data visualisation. Users can modify the settings of the representation model, the analysis axes and see the resulting changes and developments directly: it navigates through the data. In this type of system, the indicators and graphs are recalculated on the fly with every interaction.

Another considerable development is the coupling of DataViz with mapping and geolocation functions, which makes it possible to represent the data on a plan or a map. In this way, we can, for example, literally watch time-series changes in an area or region across months or years: urbanisation, use of land, forests, fields, industrial facilities, infrastructures, etc. And it's much more explicit than plotting ten or twenty curves on a graph.       

As standard, Advanced DataViz offers powerful data manipulation functions, often in the form of tactile interfaces: zoom in, zoom out, drill down, drag, select, etc. 

In the main, the most popular solutions on the market all emerged in the 2000s: Birst, Domo, Microsoft PowerBI, Microstrategy, Qlick Sense, Salesforce Einstein Analytics, SAS visual analytics, Sisense, Tableau, Thoughtspot.

They allow users to directly read large sets of data implemented in Hadoop type databases or as data lakes. In addition, they contain libraries that hold collections of off-the-shelf and customisable representation models. In Tableau, for example, there are over sixty libraries as standard, with unusual names such as Choropleth, Non-ribbon Chord Diagram, or Candlestick Chart.  

Furthermore, there are also open-source libraries available in different languages: Uber React-vis, GitHub (VX, Rechart), Python (Plotly), Java Script (D3js, Chart JS, Three JS).  

 

dataviz-touch-solution-ausy

Touch DataViz solution by AUSY, Non-ribbon Chord Diagram representation used to visualise relationships between data.

 

This market has undergone rapid expansion to date, with a diverse range of stakeholders, and we are bound to see many other promising developments in the near future.   

For a long while, DataViz has been considered a luxury, the cherry on the cake, but more recently people have suddenly come to realise its true worth. Today, the issues associated with DataViz are clearly understood by major digital players who, in a manner of speaking, have recently woken up to them. Let’s say that hostilities have begun in earnest: in early June 2019, Google Cloud bought out Looker for 2.6 billion dollars, with SalesForce hot on its heels, announcing on 11 June 2019 that it had bought out Tableau Software for 15.3 billion dollars!

Watch this space...   

 

Conclusion:

With this second instalment, we have now covered the bases of all the different types of data and media that can be exploited by analytics techniques (Parts 1 and 2). We could well stop there, but doing so would overlook the wide range of issues that our engineers work tirelessly to overcome. The next part of our analytics journey will therefore lead us to much more specialist techniques, perhaps less well-known to the general public but nonetheless just as widespread in the various realms of data analysis expertise.   

(Feel free to have a look at the third part of our series just here).

And you may be interested in our offer Big Data.

Let’s have a chat about your projects.

bouton-contact-en