8.7.04

The Next Big Thing and me (and you, maybe)

My experience in Visualization of Management and Research Data
Jeff Beddow
Minneapolis. June, 2004

1970-71 As research Assistant to Dr. John Modell at the University of Minnesota, I worked to extract interesting patterns out of 1880-1890 U.S. Census data from Rhode Island. I imagined a computer program that would make literal visual patterns out of the tens of thousands of numbers, to aid interpretation.

1978-80 I experimented with variations on quilt patterns as a visual design exercise, and thought that the visual syntax and grammar of textile patterns might support the visualization of complex data.

1980-81 Took courses in computer science and programming to acquire necessary skills to create the program which would implement the idea of visualizing complex data sets through visual patterns.

1982-1984 Developed and published a graphics program with special pattern manipulation features, contacted ARPA and other potential sponsors for a visualization program.

1984-1987 Interviewed business owners, executives and managers in a broad range of fields including manufacturing, car rental, data-base management, retail store management, government economic forecasting, etc. to refine specifications of a program. I also intensified my research into the physiology of perception, the psychology of learning and interpreting visual information, and other fields relating to this project.

As a matter of interest, this was a time in the field of perceptual physiology, cognitive science, and the psychology of learning when the emphasis was splitting dramatically from conscious, purposeful thinking skills to machine learning on the one hand and pre-attentive cognition on the other. Triesman's and Livingston’s research represents leading work in the latter field.

Pre-attentive visual processing
Pre-attentive cognition is the processing of visual information such as color, contrast, shape, etc in order to navigate, avoid danger, recognize friends, and perform other complex perceptual/judgment skills without involving fully conscious mechanisms. It occurs almost instantaneously, without comparison, recollection, or decoding of encoded value schemes. It is known to occur at a much higher speed than linear thinking and analysis, since it involves the parallel processing of thousands of bits of visual information in coherent spatial patterns. Pre-attentive processing supports human judgment in all natural environments, but it has been forced out of most modern decision contexts. The careful application of the principles of pre-attentive design will simplify the task of extracting meaningful information from a mass of data in any circumstance.

I was swimming against the current of interest in artificial intelligence which dominated many fields of computer research at the time. I was committed to the idea of not only keeping the human “in the loop” as it was described, but maximizing the essential human aspects of judgment that I felt were never going to be replaced by machines.

Developments in pre-attentive processing were important to me because they supported my intuition that labeling and organizing information according to conscious categories had to be subordinated to perceptual figure-ground relationships between operationally relevant data and the data set in general. I decided at this time that I would not try to gradually evolve conventions of data pictorialism, i.e. pie charts and bar charts. Instead I decided to go directly to the strength of computers and the human visual cortex by displaying large amounts of data carefully organized to depict decision-critical aspects of these sets without summary methods. It would require thorough understanding of how figure ground visual relations should be mapped from decision making criteria.

Based on the results of this research and work, I designed a prototype visualization system on a microcomputer. It presented hundreds of data points in highly designed arrays that allowed a non-specialist the ability to easily separate the foreground data points from the background of the entire data set while keeping everything in view, and hence in perspective. The Vice President of Operations of Valspar paint company grasped the idea in its entirety the first time he saw the demonstration. He said that one of the hardest aspects of his job was having to sign off on exception reports every day, (hundreds of gallons of paint ruined for one reason or another) and that with a complete perspective on the whole operation those exceptions would not weigh so heavily on him.

The foreground data points were defined by the context of the decision to be made. They could be extreme points, average points, incidence of change, or change in certain directions. By moving the cursor over the display, the operator could read specific label information, and by clicking down, contextual information such as statistics relating to the item in question, or the entire data set, could be obtained easily.

As I refined this prototype, I worked with real businesses and researchers to use their data in the context of their management or research interests. The University Center for Urban Research, several start up technology companies, Valspar Paint Corporation and National Rental Car were some of the groups that I worked with in this pursuit.

1987-88 I worked with the Minnesota State Department of Economic Development’s small business unit to develop a business plan and make contacts among potential partners in developing this system. I approached the director of Information Technology at the regional government that employed me full time, but he felt that the approach of visual pattern emphasis would be confusing to the managers at the county who were more linear and verbal in their approach to decision making. Through the State office I was introduced to the President of McGraw Hill, Inc. He saw my demonstration and flew me out to McGraw Hill corporate headquarters in New York to make a presentation to the Executive Board of the corporation. They gave unanimous consent to start a pilot project with one of the divisions of McGraw Hill. I was teamed up with the product development group at California Testing Bureau, which provided tests and test score services to half of the 16,000 school districts in the United States. After initial meetings and review, I was given the challenge of developing visualization products for four very difficult areas selected by the scientists and marketing people. This included ways to present student achievement against probable performance estimates, and presenting individual student's performance on standard tests against their class and their class against the school district performance. I returned a few months later and demonstrated the solution to these problems to the team’s satisfaction. We could not agree to terms for further pursuit of the development at that time, but Gordon Wainwright, the Director of new product development, encouraged me with the assertion that I was only 10 years ahead of my time. I also demonstrated the approach to Carl Adams, a professor at the Carlson School of Management at the U. He characterized my methods as “...being to bar charts and pie charts what the integrated circuit was to the vacuum tube.”

I contacted NASA’s Godard Space Science Data center as a result of a Wall Street Journal article I had read, and met Dr. Lloyd Treinish, the director. He was heavily involved in visualization methods, and as a result of our conversations he invited me to participate in NASA and the National Science Foundation’s first Scientific Visualization conference. It was held at Jet Propulsion Laboratories in Pasadena, in 1988. I attended this conference as an observer, but was invited to chair a workshop on multidimensional, multivariate data visualization methods at the ensuing conference to be held at Stanford University in 1990. In the course of developing the workshop, I contacted researchers and developers in major universities and corporate research centers around the world.

1990 In February of 1990 I chaired the Multidimensional/Multivariate Data Visualization workshop at the second NASA/NSF Scientific Visualization conference at Stanford University. We met over the course of three days, and had presentations from Bell Labs, the Santa Fe Institute, Columbia University, Los Alamos National Laboratory, the University of California at Davis, IBM, and several other groups at the lead edge of visualization of complex data. One of the attendees was the head of the Information Science department of the Naval Research Laboratory. He invited me to submit a paper on my work to a conference he was chairing for the IEEE in June, in San Francisco. He also invited me to propose a workshop at the conference that would follow on the Stanford work.

In June I attended and presented at the first annual IEEE Visualization conference in San Francisco, and my primary publication on this work was included in the proceedings of that conference. I was invited to join the steering committee of the conference and plan the following year’s conference, and to have another workshop in multivariate data visualization.

The data I used for my paper consisted of a merged data set drawing upon sensors in orbit around the earth, on the ground, and in solar orbit. It depicted 13 parameters of the earth’s magnetic field and solar activity for 20 days on an hourly basis in one screen.

1991-1992 I developed and chaired workshops and a tutorial on Multi Dimensional Visualization for the IEEE conferences, and worked for Lawrence Livermore Laboratory part time to develop visualizations of engineering design alternatives for the International Thermonuclear Energy Reactor - a consortium project involving Japanese, Soviet, European and American science institutions trying to develop a workable hot fusion reactor. My work with Livermore resulted in several joint authorships of progress papers presented to sessions of scientists on the ITER project, and ultimately inclusion in a book on advanced visualization methods. I was invited to join the faculty at the University of Massachusetts at Lowell to help develop a visualization center in the Computer Science department, but my lack of academic credentials prevented any development in that area.

2003 The market was ready for what I had done finally. I purchased a Pentium IV computer and a copy of the Microsoft Visual Basic .NET development system and wrote the basic routines of a third major revision of the program. The current version supports the browsing and manipulation of 120,000 data points simultaneously on one high resolution screen, and has a refined set of essential features derived from the various generations of the program. The program is designed to be both an exploration and presentation tool for executive decision makers, managers, and operations supervisors who need to examine performance and resource use over a large organization. It can be used for research in any field that needs to establish the status of events or operational entities against the context of expectations, known performance parameters, or internal categories derived from arbitrary boundary and set inclusion definitions.

I have done the research and development completely independently on this program, in my spare time, on my own equipment and software. In 1988 a principal analyst in the Operations and Planning Department of Hennepin County saw the prototype and tried to create an accommodation for me at work, but the resulting contract was not acceptable. The Information Technology department passed on the project in 1988. In 2003 I again sought out managers in IT who might be interested in visualization, and they referred me to data warehousing and data mining groups who were not apparently active at the moment.

I am exploring the feasibility of patenting aspects of the process. It has gained recognition in small but prestigious circles both as a pioneering effort and a signature method of approaching the problem of seeing both the forest and the trees, as it were, in complex system operations and management.

Purpose of the program (“DataProspect”)

The purpose of the program is to help make operational decisions about large, complex systems, or to find relations of interest in complex data sets. The method used is to objectify the operational components or research objects and paint these objects in a meaningful color and shape on the screen, where they can be browsed and interrogated by the operator.

The definitive method involves normalizing the display of the operations as an array of objects, and allowing the operator to use sliders to assign the most visible colors and shapes to the most critical components. This can be done before loading the display through a table of critical thresholds, or it can be adjusted on the fly, just as a lens can be focused during use.

The lens is a good metaphor for the program. It looks at the entire system, but brings the critical components into the foreground and into clear focus.

The program is different from other visualization methods in its ability to keep the critical features within a complex data set in a simple foreground/background relation to all the data, without simplifying to the point of misrepresentation, or requiring an inordinate amount of specialization from the operator.

System issues

The current version is written in Microsoft’s .NET technology, which lends itself well to integration with enterprise databases (SQL), office automation standards such as Excel and Access, and Internet Information Server services. It could easily be written in Java2 to reside in a Linux, Unix, Windows, or OSX environment through browser technology.

The core technology is a mapping, manipulation, and interrogation technology that can reside on top of conventional or customized data management technologies.


State of the Current Development phase
Currently the program can access, manipulate and interrogate 45 variables on 3000+ cases on a single 1600 x 1200 pixel display. The demo data is census data from each of the United States counties. Once the data is in and parsed, system response on a Pentium IV 2.4 ghz machine is virtually instantaneous…that is, there is so little lag that it doesn’t interfere with the operator’s concentration.

While the primary technology is resolved (defined as subject to patent), there are some enhancements that need finishing. The strength of the program is that it has one conduit to existing enterprise technology: it only needs to be fed an accurate table through any channel, and it does not rely on any proprietary or brand of enterprise support for its function. If it is to be integrated into a browser delivery, this would change.

Specifically, a proof of concept for any given enterprise could be arranged within a short time, ranging from a few days to a few weeks depending upon the number of defining displays that needed to be construed and the state of existing data.

The system parses and organizes itself according to the dimensions and size of the data it is given, and then presents the results in a consistent and intuitive manner. This means that the tool can be managed by one person in an enterprise, and will scale up to any level.

Make an offer.
Contact me at Jeff@zeitguide.org if you are able and willing to lift this promising preliminary work to the level of supporting results and a full time staff.