Analysis and Visualisation of Merlot Tasting Notes

Project Number
ComS 17-01

Project Title
Analysis and visualisation of Merlot tasting notes

Project Leader
Fischer, B

Stellenbosch University. Division of Computer Science

Team Members
Van Zyl, P

Project Description
Sensory data such as tasting notes and wine reviews capture the essence of “good” and “bad”
wines – but only in aggregation: a single review may focus on specifics of an individual wine
that is not characteristic for the varietal. It is thus necessary to analyse and aggregate these
textual descriptions and to correlate them with (subjective) quality metrics such as ratings.
However, in order to identify dominant characteristics that correlate with the different rating
categories, it is necessary to aggregate along different dimensions (e.g., origin, vintage, …).

We used the ConceptCloud ( data exploration tool to
analyze the reviews of all single-varietal Merlots in Platter’s By Diners Club South AfricanWine
Guides from 2003 to 2016. We extracted the relevant fields (primarily origin, vintage, star rating,
and the free-text review) for each wine from the SQL database, and used the Stanford corenlp
natural language processing system to extract meaningful phrases from the review texts. We
manually cleaned these phrases before we built a formal context table and used the
ConceptCloud system to explore this data set. In particular, we constructed and analyzed word
clouds that show the distribution of vintages, origins, and taste description phrases within the
different rating categories.

We confirmed that ConceptCloud is stable enough to handle data sets such as the one
analyzed here. We saw no obstacles to scaling this up to larger collections, e.g., the full set of
wine reviews for all varietals. We identified several characteristic traits of both “good” resp.
“bad” Merlots; these are detailed in Sections 5.3 and 5.4 of the attached technical report.

We identified several smaller shortcomings with ConceptCloud’s original visualization and
implemented some improvements already. We found that the natural language processing
component, however, needs substantial improvements to extract more information (in
particular, more consistent key phrases) from the reviews. We suggest to integrate the phrase
extraction with an ontology, taxonomy, or controlled vocabulary to achieve this. We suggest to
integrate more data sources into the underlying data set that was used to build the formal
context table; in particular, we suggest to re-integrate barrelling information (which we
purposefully excluded because it interferes with the taste descriptors), because the barrelling
process is under immediate control of the wine maker. We could also integrate chemical
analysis data, where available.


  – Record end –

Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors