Handbook of Data Visualization Notes
What Is This
I am going to read Handbook of Data Visualization by Chun-houh Chen, Woldgang Hardle, and Antony Unwin to:
- Practice working with Bokeh. I need to work with Bokeh when showing poll results.
- Get a lot of different examples of Bokeh code so that I can better fine tune LLMs for showing text-to-sql results for poll results.
- Learn more about data visualization.
What Is This Book?
This book contains a collection of chapters by experts in the field to present readers with an up-to-date and comprehensive overview of the state of the art [of data visualization].
It is the third volume of the Handbook of Computational Statistics. This book takes graphics for data visualization seriously. The differences between graphics for presentation and graphics for exploration lie in their form and practice - presentation graphics are generally static and a single graphic is drawn to summarize information presented. Presnetation graphics are like proofs of mathematical theorems, they may give no hint as to how a result was reached, but they should offer comprehensive support for its conclusion.
Exploratory graphics are used for looking for results. They are not intended for presentation. Data Visualization expresses the idea that it involves more than just representing data in a graphical form.
The information behind the display should be revealed in a good display; the graphic should aid readers in seeing the structure of the data.
In this section I am going to be keeping track of some of my notes on Bokeh to remind myself of important things when making charts for display on the web.
- Set sizing_mode="stretch_width" when creating a figure to make sure that the plot does not overflow x and y.
- You should also set the max_width and max_height/ height to a max of 500px and 400px respectively (maybe could do something different for desktop).
- You should use the components function to produce a script and div for your plot. You should set the div to have a min-height of the min_height of the plot to prevent Cumulative Layout Shift.
- Use a histogram for multiple choice / checkbox questions. Use a barchart as well for checkbox questions.
- Use a piechart for answering the question of what percentage of people responded to a survey question.
Principles
Graphic representation of quantitative information has deep roots.

The earliest seeds of visualization arose in geometric diagrams, in tables of the positions of stars and other celestial bodies, and in the making of maps to aid in navigation and exploration. Among the earliest graphical depictions of quantitative information is an anonyous 10th-century multiple time-series graph of the changing position of the seven most prominent bodies over space and time. In the 14th century, the ide of plotting a theoretical function and the logical relation between tabulating values and plotting them appeared. By the 16th century, techiques and instruments for precise observation and measurement of physical quantities and geonetric and celestial position were well developed. Among the most important problems of the 17th century were those concerned with physical measurement - of time, distance and space - for astronomy, surveying, map making, navigtion and territorial expansion. This century also saw great new growth in theory and the dawn of practial application - the rise of analytic geometry and coordinate systems, theories of errors of measurement and estimation, the birth of probability theory, and the beggings of demographic statistics and political arithmetic - the study of population, land tazes, values of goods, etc. for the purpose of understanding the wealth of the state.
With some rudiments of statistical theory, data of iunterest and importance, and the diea of graphic representation at least somewhat established, the 18th century witnessed the expansion of these aspects to new domains and new graphical forms. William Playfair (1759-1823) is widely considered the inventor of most of the graphical forms used today - first the line graph and bar chart, later the piechart and circle graph.
With the fertilizaton provided by the previous innovations of design and technique, the first half of the 19th century witnessed and explosive growth in statistical graphics and thematic mapping, at a rate which would not be equalled until modern times. In statistical graphicsm all of the modern forms of data display were invented: bar and piecharts, histograms, line graphs and time-series plots, contour plotes, scatter plots, and so forth. The use of graphs began to become recognized in some official circles for economic and state planning in the mid 19th century. By the mid-1800s, all the conditions for the raphid growth of visualizatioon had been established - a "perfect storm" for data graphics, Official state offices were established throughout Europe, in recognition of the growing importance of numerical information for social planning, industriailzation, commerce and transportation.
If the late 1800s were the'golden age' of statistical graphics and thematic cartography, the early s can be called the 'modern dark ages' of visualization. Data visualization began to rise from dormancy in the mid-1960s.
This chapter discusses drawing good graphics to visualize the information in data. A good graphic will convey information, but a graphic is always part of a larger whole, the context, which provides its relevance. Histograms or boxplots are right for continuous variables, while barcharts or piecharts are appropriate for categorical variables. There are barcharts, piecharts, histograms, dotplots, boxplots, scatterplots, roseplots, mosaicplots, and many other kinds of data display. The choice depends on the type of data to be displayed and on what it is to be shown.
Defining the scale for the axis for a categorical variable is a matter of choosing an informative ordering. This may depend on what the categories represent or on their relative sizes. For a continuous variable, it is more difficult. The endpoints, divisions, and tick marks have to be chosen. Unless the limits are set by the meaning of the data (grade from 0 to 100), it is good practice to extend the scales beyond the oberved limits and to use readily understandable round values. There is no obligatory requirement to include zero in a scale, but there should always be a reason for not doing so.
Guides may be drawn on a plot as a form of annotation and are useful for emphasizing particular issues, say which values are positive or negative. Sloping guides highlight deviations from linearity.
Ideally, captions should fully explain the graphic they accompany, including giving the source for the data. Relying on explainations in the surrounding text rarely works. Captions should outine information in the graphic and a more detailed description should be able to be found under the text. Annotations are used to highlight particular deatures of a grapgic, For reasons of space there cannot be many of them and they should be used sparingly. Keep graphics and text on the same page. Graphics should be large enough for the reader to see the information in them clearly and not much larger.
Parallel coordinate plots are valuable for displaying large numbers of continuous variables simultaneously.

Yo,e setoes are special because of the strict ordering of the data, and good displays respect temporal ordering. Time scales have to be carefully chosen. The choice of time origin is particularly important, as anyone who looks at the advertised performance of financial funds will know.
Static dispalys of information continue to be the primary graphical method for the display and analysis of data. This is true for presentation purposes, where the vast majority of data diaplys produced for articles and reports are still static in nature ans for data exploration, where many impirtantand statistical duscveries have been made based simply on static displays.

A good example of a graphics system that proides sensivle defaults is the Trellis system. There is a common set of 'graphical parameters' that can be applied to almost any graphical output to affect the appearance of the output. This set includes such things as line color, fill color, line width, line style, and so on. At the lowest level, a plot is simply basic graphical shapes and text, so these must be available. In addition, there must be some way to define coordinate sustems so that graphical elements can be conveniently positioned in sensible locations to make up a ploy.
Graphs are useful entities since they can represent relationships between sets of objects. They are used to model complex systems (e.g., computer and transportation networks) and to visualize relationshops (social networks). In statistics and data analaysis, we usually encounter them as dendograms in cluster analysis, trees in classificiation and regression, and as path diagrams in structural equation models and Bayesian belief diagrams.
This chapter goes deep into drawing graph representations of data. This may be something that I want to return to in the future, but not something that I need right now.
One of the biggest challenges in data visualization is to find general representations of data that can display the multivariate structure of more than two variables.