ⓘ Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers of t ..

                                     

ⓘ Data visualization

Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers of the images. This communication is achieved through the use of a systematic mapping between graphic marks and data values in the creation of the visualization. This mapping establishes how data values will be represented visually, determining how and to what extent a property of a graphic mark, such as size or color, will change to reflect changes in the value of a datum.

To communicate information clearly and efficiently, data visualization uses statistical graphics, plots, information graphics and other tools. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message. Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable and usable. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic i.e., showing comparisons or showing causality follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables.

Data visualization is both an art and a science. It is viewed as a branch of descriptive statistics by some, but also as a grounded theory development tool by others. Increased amounts of data created by Internet activity and an expanding number of sensors in the environment are referred to as "big data" or Internet of things. Processing, analyzing and communicating this data present ethical and analytical challenges for data visualization. The field of data science and practitioners called data scientists help address this challenge.

                                     

1. Overview

Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in data analysis or data science. According to Vitaly Friedman 2008 the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesnt mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose - to communicate information".

Indeed, Fernanda Viegas and Martin M. Wattenberg suggested that an ideal visualization should not only communicate clearly, but stimulate viewer engagement and attention.

Data visualization is closely related to information graphics, information visualization, scientific visualization, exploratory data analysis and statistical graphics. In the new millennium, data visualization has become an active area of research, teaching and development. According to Post et al. 2002, it has united scientific and information visualization.

                                     

2. Characteristics of effective graphical displays

Professor Edward Tufte explained that users of information displays are executing particular analytical tasks such as making comparisons. The design principle of the information graphic should support the analytical task. As William Cleveland and Robert McGill show, different graphical elements accomplish this more or less effectively. For example, dot plots and bar charts outperform pie charts.

In his 1983 book The Visual Display of Quantitative Information, Edward Tufte defines graphical displays and principles for effective graphical display in the following passage: "Excellence in statistical graphics consists of complex ideas communicated with clarity, precision and efficiency. Graphical displays should:

  • reveal the data at several levels of detail, from a broad overview to the fine structure
  • avoid distorting what the data has to say
  • make large data sets coherent
  • encourage the eye to compare different pieces of data
  • show the data
  • serve a reasonably clear purpose: description, exploration, tabulation or decoration
  • be closely integrated with the statistical and verbal descriptions of a data set.
  • induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production or something else
  • present many numbers in a small space

Graphics reveal data. Indeed graphics can be more precise and revealing than conventional statistical computations."

For example, the Minard diagram shows the losses suffered by Napoleons army in the 1812–1813 period. Six variables are plotted: the size of the army, its location on a two-dimensional surface x and y, time, direction of movement, and temperature. The line width illustrates a comparison size of the army at points in time while the temperature axis suggests a cause of the change in army size. This multivariate display on a two dimensional surface tells a story that can be grasped immediately while identifying the source data to build credibility. Tufte wrote in 1983 that: "It may well be the best statistical graphic ever drawn."

Not applying these principles may result in misleading graphs, which distort the message or support an erroneous conclusion. According to Tufte, chartjunk refers to extraneous interior decoration of the graphic that does not enhance the message, or gratuitous three dimensional or perspective effects. Needlessly separating the explanatory key from the image itself, requiring the eye to travel back and forth from the image to the key, is a form of "administrative debris." The ratio of "data to ink" should be maximized, erasing non-data ink where feasible.

The Congressional Budget Office summarized several best practices for graphical displays in a June 2014 presentation. These included: a) Knowing your audience; b) Designing graphics that can stand alone outside the context of the report; and c) Designing graphics that communicate the key messages in the report.

                                     

3. Quantitative messages

Author Stephen Few described eight types of quantitative messages that users may attempt to understand or communicate from a set of data and the associated graphs used to help communicate the message:

  • Correlation: Comparison between observations represented by two variables X,Y to determine if they tend to move in the same or opposite directions. For example, plotting unemployment X and inflation Y for a sample of months. A scatter plot is typically used for this message.
  • Part-to-whole: Categorical subdivisions are measured as a ratio to the whole i.e., a percentage out of 100%. A pie chart or bar chart can show the comparison of ratios, such as the market share represented by competitors in a market.
  • Time-series: A single variable is captured over a period of time, such as the unemployment rate over a 10-year period. A line chart may be used to demonstrate the trend.
  • Nominal comparison: Comparing categorical subdivisions in no particular order, such as the sales volume by product code. A bar chart may be used for this comparison.
  • Ranking: Categorical subdivisions are ranked in ascending or descending order, such as a ranking of sales performance the measure by sales persons the category, with each sales person a categorical subdivision during a single period. A bar chart may be used to show the comparison across the sales persons.
  • Frequency distribution: Shows the number of observations of a particular variable for given interval, such as the number of years in which the stock market return is between intervals such as 0-10%, 11-20%, etc. A histogram, a type of bar chart, may be used for this analysis. A boxplot helps visualize key statistics about the distribution, such as median, quartiles, outliers, etc.
  • Geographic or geospatial: Comparison of a variable across a map or layout, such as the unemployment rate by state or the number of persons on the various floors of a building. A cartogram is a typical graphic used.
  • Deviation: Categorical subdivisions are compared against a reference, such as a comparison of actual vs. budget expenses for several departments of a business for a given time period. A bar chart can show comparison of the actual versus the reference amount.

Analysts reviewing a set of data may consider whether some or all of the messages and graphic types above are applicable to their task and audience. The process of trial and error to identify meaningful relationships and messages in the data is part of exploratory data analysis.



                                     

4. Visual perception and data visualization

A human can distinguish differences in line length, shape, orientation, distances, and color hue readily without significant processing effort; these are referred to as "pre-attentive attributes". For example, it may require significant time and effort "attentive processing" to identify the number of times the digit "5" appears in a series of numbers; but if that digit is different in size, orientation, or color, instances of the digit can be noted quickly through pre-attentive processing.

Effective graphics take advantage of pre-attentive processing and attributes and the relative strength of these attributes. For example, since humans can more easily process differences in line length than surface area, it may be more effective to use a bar chart which takes advantage of line length to show comparison rather than pie charts which use surface area to show comparison.

                                     

4.1. Visual perception and data visualization Human perception/cognition and data visualization

Almost all data visualizations are created for human consumption. Knowledge of human perception and cognition is necessary when designing intuitive visualizations. Cognition refers to processes in human beings like perception, attention, learning, memory, thought, concept formation, reading, and problem solving. Human visual processing is efficient in detecting changes and making comparisons between quantities, sizes, shapes and variations in lightness. When properties of symbolic data are mapped to visual properties, humans can browse through large amounts of data efficiently. It is estimated that 2/3 of the brains neurons can be involved in visual processing. Proper visualization provides a different approach to show potential connections, relationships, etc. which are not as obvious in non-visualized quantitative data. Visualization can become a means of data exploration.

                                     

5. History of data visualization

There is no comprehensive history of data visualization. There are no accounts that span the entire development of visual thinking and the visual representation of data, and which collate the contributions of disparate disciplines. Michael Friendly and Daniel J Denis of York University are engaged in a project that attempts to provide a comprehensive history of visualization. Contrary to general belief, data visualization is not a modern development. Stellar data, or information such as location of stars were visualized on the walls of caves such as those found in Lascaux Cave in Southern France since the Pleistocene era. Physical artefacts such as Mesopotamian clay tokens 5500 BC, Inca quipus 2600 BC and Marshall Islands stick charts n.d. can also be considered as visualizing quantitative information.

First documented data visualization can be tracked back to 1160 B.C. with Turin Papyrus Map which accurately illustrates the distribution of geological resources and provides information about quarrying of those resources. Such maps can be categorized as Thematic Cartography, which is a type of data visualization that presents and communicates specific data and information through a geographical illustration designed to show a particular theme connected with a specific geographic area. Earliest documented forms of data visualization were various thematic maps from different cultures and ideograms and hieroglyphs that provided and allowed interpretation of information illustrated. For example, Linear B tablets of Mycenae provided a visualization of information regarding Late Bronze Age era trades in the Mediterranean. The idea of coordinates was used by ancient Egyptian surveyors in laying out towns, earthly and heavenly positions were located by something akin to latitude and longitude at least by 200 BC, and the map projection of a spherical earth into latitude and longitude by Claudius Ptolemy, covering an entire wall in his observatory). Particularly important were the development of triangulation and other methods to determine mapping locations accurately.

French philosopher and mathematician Rene Descartes and Pierre de Fermat developed analytic geometry and two-dimensional coordinate system which heavily influenced the practical methods of displaying and calculating values. Fermat and Blaise Pascals work on statistics and probability theory laid the groundwork for what we now conceptualize as data. According to the Interaction Design Foundation, these developments allowed and helped William Playfair, who saw potential for graphical communication of quantitative data, to generate and develop graphical methods of statistics.

In the second half of the 20th century, Jacques Bertin used quantitative graphs to represent information "intuitively, clearly, accurately, and efficiently".

John Tukey and Edward Tufte pushed the bounds of data visualization; Tukey with his new statistical approach of exploratory data analysis and Tufte with his book "The Visual Display of Quantitative Information" paved the way for refining data visualization techniques for more than statisticians. With the progression of technology came the progression of data visualization; starting with hand drawn visualizations and evolving into more technical applications – including interactive designs leading to software visualization.

Programs like SAS, SOFA, R, Minitab, Cornerstone and more allow for data visualization in the field of statistics. Other data visualization applications, more focused and unique to individuals, programming languages such as D3, Python and JavaScript help to make the visualization of quantitative data a possibility. Private schools have also developed programs to meet the demand for learning data visualization and associated programming libraries, including free programs like The Data Incubator or paid programs like General Assembly.

Beginning with the Symposium "Data to Discovery" in 2013, ArtCenter College of Design, Caltech and JPL in Pasadena have run an annual program on Interactive Data Visualization. The program asks: How can interactive data visualization help scientists and engineers explore their data more effectively? How can computing, design, and design thinking help maximize research results? What methodologies are most effective for leveraging knowledge from these fields? By encoding relational information with appropriate visual and interactive characteristics to help interrogate, and ultimately gain new insight into data, the program develops new interdisciplinary approaches to complex science problems, leveraging design thinking and the latest methods from computing, User-Centered Design, interaction design and 3D graphics.



                                     

6. Terminology

Data visualization involves specific terminology, some of which is derived from statistics. For example, author Stephen Few defines two types of data, which are used in combination to support a meaningful analysis or visualization:

  • Categorical: Text labels describing the nature of the data, such as "Name" or "Age". This term also covers qualitative non-numerical data.
  • Quantitative: Numerical measures, such as "25" to represent the age in years.

Two primary types of information displays are tables and graphs.

  • A table contains quantitative data organized into rows and columns with categorical labels. It is primarily used to look up specific values. In the example above, the table might have categorical column labels representing the name a qualitative variable and age a quantitative variable, with each row of data representing one person the sampled experimental unit or category subdivision.
  • A graph is primarily used to show relationships among data and portrays values encoded as visual objects. Numerical values are displayed within an area delineated by one or more axes. These axes provide scales quantitative and categorical used to label and assign values to the visual objects. Many graphs are also referred to as charts.

Eppler and Lengler have developed the "Periodic Table of Visualization Methods," an interactive chart displaying various data visualization methods. It includes six types of data visualization methods: data, information, concept, strategy, metaphor and compound.



                                     

7. Other perspectives

There are different approaches on the scope of data visualization. One common focus is on information presentation, such as Friedman 2008. Friendly 2008 presumes two main parts of data visualization: statistical graphics, and thematic cartography. In this line the "Data Visualization: Modern Approaches" 2007 article gives an overview of seven subjects of data visualization:

  • Displaying connections
  • Tools and services
  • Articles & resources
  • Mind maps
  • Displaying websites
  • Displaying news
  • Displaying data

All these subjects are closely related to graphic design and information representation.

On the other hand, from a computer science perspective, Frits H. Post in 2002 categorized the field into sub-fields:

  • Modelling techniques
  • Multiresolution methods
  • Information visualization
  • Volume visualization
  • Visualization algorithms and techniques
  • Interaction techniques and architectures
                                     

8. Data presentation architecture

Data presentation architecture DPA is a skill-set that seeks to identify, locate, manipulate, format and present data in such a way as to optimally communicate meaning and proper knowledge.

Historically, the term data presentation architecture is attributed to Kelly Lautt: "Data Presentation Architecture DPA is a rarely applied skill set critical for the success and value of Business Intelligence. Data presentation architecture weds the science of numbers, data and statistics in discovering valuable information from data and making it usable, relevant and actionable with the arts of data visualization, communications, organizational psychology and change management in order to provide business intelligence solutions with the data scope, delivery timing, format and visualizations that will most effectively support and drive operational, tactical and strategic behaviour toward understood business or organizational goals. DPA is neither an IT nor a business skill set but exists as a separate field of expertise. Often confused with data visualization, data presentation architecture is a much broader skill set that includes determining what data on what schedule and in what exact format is to be presented, not just the best way to present data that has already been chosen. Data visualization skills are one element of DPA."

                                     

8.1. Data presentation architecture Objectives

DPA has two main objectives:

  • To use data to provide knowledge in the most effective manner possible
  • To use data to provide knowledge in the most efficient manner possible
                                     

8.2. Data presentation architecture Scope

With the above objectives in mind, the actual work of data presentation architecture consists of:

  • Finding the right data
  • Determining the right timing for data presentation when and how often the user needs to see the data
  • Creating effective delivery mechanisms for each audience member depending on their role, tasks, locations and access to technology
  • Determining the required periodicity of data updates the currency of the data
  • Defining important meaning relevant knowledge that is needed by each audience member in each context
  • Utilizing appropriate analysis, grouping, visualization, and other presentation formats
                                     

8.3. Data presentation architecture Related fields

DPA work shares commonalities with several other fields, including:

  • Data visualization in that it uses well-established theories of visualization to add or highlight meaning or importance in data presentation.
  • Business process improvement in that its goal is to improve and streamline actions and decisions in furtherance of business goals
  • Graphic design, conveying information through styling, typography, position, and other aesthetic concerns.
  • HCI and interaction design, since the many of the principles in how to design interactive data visualisation have been developed cross-disciplinary with HCI.
  • Business analysis in determining business goals, collecting requirements, mapping processes.
  • Information architecture, but information architectures focus is on unstructured data and therefore excludes both analysis in the statistical/data sense and direct transformation of the actual content data, for DPA into new entities and combinations.
  • Visual journalism and data-driven journalism or data journalism: Visual journalism is concerned with all types of graphic facilitation of the telling of news stories, and data-driven and data journalism are not necessarily told with data visualisation. Nevertheless, the field of journalism are at the forefront in developing new data visualisations to communicate data.


                                     
  • Biology data visualization is a branch of bioinformatics concerned with the application of computer graphics, scientific visualization and information
  • reasoning. Data visualization is a related subcategory of visualization dealing with statistical graphics and geographic or spatial data as in thematic
  • the context of data visualization a glyph is any marker, such as an arrow or similar marking, used to specify part of a visualization This is a representation
  • Scientific visualization also spelled scientific visualisation is an interdisciplinary branch of science concerned with the visualization of scientific
  • Dundas Data Visualization Inc. is a company specializing in data visualization and dashboard solutions. In addition to developing enterprise - level dashboard
  • computer visualization and analysis environment designed to meet the needs of oceanographers and meteorologists analyzing large and complex gridded data sets
  • Flow visualization Geovisualization Illustration Information graphics, visual representations of information, data or knowledge Data visualization Information
  • Mondrian is a general - purpose statistical data - visualization system, for interactive data visualization All plots in Mondrian are fully linked, and offer
  • Interactive data visualization enables direct actions on a plot to change elements and link between multiple plots. Interactive data visualization has been
  • of large databases to produce stories. Infographics. Data visualization Interactive visualization Serious games, in the sense that they take interaction