New Page 1

The Geometry of Data Analysis (Gowerfest II)

Abstracts for invited oral presentations

Casper Albers (University of Groningen, The Netherlands):
Missing values in general forms of Procrustes Analysis

For given configuration matrices X_k, the aim of the General Procrustes Problem is to find transformations T_k such that the Euclidean distance between the transformed configurations X_kT_k, is minimised. The T_k are restricted to some class of matrix. When missing values occur in X_k, these need to be estimated before standard Procrustes algorithms can be applied. Previous work in this field worked on special cases, e.g. when the T_k are required to be orthogonal or when complete rows in X_k are missing. In this talk, a more general approach is taken. An iterative algorithm alternating between an algorithm from constrained quadratic optimisation (Albers, Critchley, Gower, 2009) and a general Procrustes algorithm is developed. This algorithm is shown to be applicable to a very broad class of missing values Procrustes problems. This class includes those problems were isotropic scaling, centring and /or standardisation is required.

Guiseppe Bove (University Roma Tre, Italy):
Asymmetry in proximity data

The analysis of asymmetry and orthogonality presented by John Gower in his famous paper in 1977 inspired many methods in asymmetric multidimensional scaling. In this presentation a review will be considered, focalizing on methods for skew-symmetry with graphical capabilities.

Frank Critchley (The Open University):
Conics, cones and ...

All will be revealed!

Patrick J. F. Groenen (Economic Institute, Erasmus School of Economics, Erasmus University, Rotterdam):
Biplots, Multidimensional Scaling, and Eigenvalues

The work of John Gower is very diverse and broad. However, main themes can be distinguished such as visualization of multivariate data, preferably done through biplots, multidimensional scaling, and eigendecompositions. In all cases, this is combined with rigorous and deep mathematical insights. In this presentation, I will highlight several of his works including the application of the modified Leferrier-Feddeev algorithm to multidimensional scaling, the area biplot, and a new set of icons that should help readers to make the proper interpretation of these visualization methods.

Wojtek Krzanowski (University of Exeter):
Optimal Predictive Partitioning

In many situations, it may be desired to group objects into well-defined classes on the basis of one set of variables and then subsequently to predict the classes of new objects from another set of variables. For example, a bank may categorise customers into distinct classes reflecting their financial behaviour over a period of years, and then wish to assign new customers to future behaviour classes using information obtained from them when they open an account.

Such situations require a blend of cluster analysis and discriminant analysis, striking a compromise between the compactness and integrity of the clustering on one hand and the accuracy of the future assignment to clusters on the other. This talk will describe two algorithms for achieving such a compromise, discuss some of their features, and illustrate their performance for the above financial example.

Ludovic Lebart (Telecom-ParisTech, France):
Data compression, summary and knowledge

We present the viewpoint of a practitioner on the data analysis techniques related to data compression. The role of geometry both in designing the methods and in interpreting the results is discussed. In this context, we deal also with the problem of metadata, together with the problem of the articulation between exploration and inference. How do we use what we know to learn more from the data, and how to use what has been discovered from the data to continue learning from the same data... Less geometrically oriented, the assessment of the observed patterns remains however an essential phase of the knowledge process.

Based on a series of examples, this review gives the opportunity of encountering several times the scientific trajectory of John Gower and, in so doing, to remind some of his significant contributions.

Mark de Rooij (Leiden University, The Netherlands):
The geometry and use of triadic distance models

Triadic distance models define Euclidean distances between triples of points. In the first part of this talk we study the geometry of triadic distances t defined as functions of the Euclidean (dyadic) distances a1, a2, a3 between three points are studied. Special attention is paid to the contours of all points giving the same value of t when a3 is kept constant. These isocontours allow some general comments to be made about the suitability, or not, for practical investigations of certain definitions of triadic distance. We are especially interested in those definitions of triadic distance, designated as canonical, that have optimal properties.

In the second part of the talk we examine the use of triadic distances. The multidimensional scaling (MDS) of triadic distances (MDS3) and a conventional MDS of dyadic distances (MDS2) both give Euclidean representations. When used as an analysis method for multivariate data our analysis suggests that MDS2 and MDS3 can be expected to give very similar results, and this is strongly supported by numerical examples. A situation where MDS3 does provide quite different solutions from MDS2 is when both are applied to three-way contingency tables. In such a case MDS2 models marginal association, whereas MDS3 models conditional association.

Niel le Roux and Sugnet Lubbe (University of Stellenbosch & University of Cape Town, South Africa):
FROM BIPLOTS TO UNDERSTANDING BIPLOTS:
A decade of studying biplot methodology with John Gower

Biplots authored by Gower and Hand in 1996 provides a unified theory underlying different types of biplot. It sparked off many a research project and caught the attention of users of statistics in diverse fields of application. However, over time shortcomings in Biplots were identified. It is written in a rather concentrated style making it difficult for research workers to appreciate fully the geometric subtleties of the biplot family. Subsequently Gower, Lubbe and Le Roux embarked on a project to:

make more readily assessable the geometric background essential for the understanding of biplots and monoplots
provide detailed measures of fit for various types of biplot
develop an extensive collection of R functions for constructing biplots and monoplots
develop tools to create more informative biplots
provide a wealth of illustrative examples drawn from a wide variety of fields of application, illustrating different representatives of the biplot family.

After nearly a decade, a milestone is about to be reached with the forthcoming appearance of Understanding Biplots. In this presentation, we preview some of the material in Understanding Biplots. In particular the following topics receive attention: the geometry of biplot and monoplot reference systems; sample and axis predictivity; the geometry of canonical variate analysis (CVA) as an application of principal component analysis (PCA) in a two-step procedure; creating analysis of distance (AoD) biplots as an application of nonlinear biplot methodology; using R to construct 1D, 2D and 3D biplots; usages of circle projection; parallel axis shift, lambda-scaling and other novelties for creating better biplots; the capabilities of the collection of R functions UBbipl.

Posters

Jose L. Vicente-Villardon (University of Salamanca, Spain):
Geometry of Logistic Biplots for Categorical Data

In many practical situations data is presented in a matrix containing binary or categorical variables. For such cases the classical linear biplots are not suitable in the same way as linear regressions are not suitable for binary or categorical responses. In this paper we present a generalization of the linear biplots for categorical data and study its properties and geometry.