Visualization tools: Voyant, Palladio, Tableau
Some of the other tools looked cool, but I haven’t quite figured out what I might use them for. I guess that speaks to the need to think carefully about what information you’re trying to convey with quantitative visualizations, and what visual language might be best for that purpose. In any event, it’s exciting to know how many possibilities there are for quantitative textual analysis, and I think I’m better equipped than I was a few years ago to figure out how this might all be useful for me!From my blog post, “Re-learning the ropes of Voyant”
To follow up from my initial reintroduction to Voyant, I knew that I wanted to pursue something I hadn’t had time to do for the blog post: take the Latin text I presented a conference paper on, take out the interlinear English translation I’d put in because there isn’t a published translation in English, convert it to a .txt file, and run it through Voyant to see what came up.
For reference, the poem is “Ilias” by Simon Aurea Capra, otherwise known as Simon Chèvre d’Or. It’s a poem about the Trojan war that he’s thought to have written for Henry I, the Count of Champagne, in the mid- to late twelfth century. Here’s the argument I made in the conference paper:
By condensing the history of Troy into a story of familial continuity, this paper will argue, Simon Aurea Capra makes it an exemplary and pedagogical narrative for Henry, who was betrothed to the eight-year-old Marie of France in 1153 and, as he consolidated power in Champagne, concerned himself with the legacy of his own family.From my abstract for “Loyalty to Lineage in Simon Aurea Capra’s Ylias”
Will Voyant reveal similar trends to those I noted in the conference paper? Let’s find out!
Well, this has already taught me something new: the value of the stopwords list. You see, it seems that the most common words in this poem are “but,” “it,” in,” “while,” “not,” “who,” and “with.” I guess if I were going to compare Simon’s use of these words with other twelfth-century Latin writers, to identify stylistic differences or something, this would be super helpful, but to be honest it doesn’t get at what I’m really interested in, which are the big thematic key terms. So I went to work on the stopwords list to see what I got.
Ah, now we’re getting somewhere! I’m still getting words like “erat” and “fuit” (“was”), but I’m also getting the character names “Eneas,” “Paris,” and “Venus,” as well as words like “love,” “fate,” “camps,” “enemies,” “leader,” and “courage/virtue.” Based on my admittedly foggy memory of this poem, I’m actually surprised to see Venus and love as big on this word cluster as they are–I would have expected to see Paris as bigger, and Dido’s not showing up at all. (For reference, in the abstract, I argued “Given the extreme compression of its narrative, its repeated discussions of and apostrophes to Aeneas, Dido, and especially Paris are striking.” The conclusion to draw? Maybe these characters weren’t as overemphasized in Aurea Capra’s poem as I had argued back in the conference paper, or at least, not to the extent that they dominated the poem’s messages for Henry.
On the other hand…
Looking at this list reminds me that Latin is an inflected language, and so many of the references to Paris the character won’t look like the word “Paris” at all–here, I see two references to him, the “Paridis” in the first line of the Contexts tool and the “Paridi” in the sixth.
I think my takeaway is to use these kinds of word cloud tools with care–I can see where it would be very easy to draw conclusions, or to tweak what the tools looked at until they said what one wanted them to say. At the same time, something like the Contexts tool would be a really helpful way of seeing where certain characters or words tend to come up, as long as I remember what language I’m working with and the possible variant spellings to look for.
I’m interested in tracing character arcs and interactions in medieval fiction, but the principle holds perfectly well for Les Miserables as well. Looking at the dataset, I could see that it mapped out how often characters appeared in the same chapter as other characters, which also gave me a clearer idea of what Palladio wanted when looking for a “source” and a “target.” This, I thought, was a dataset that could really help me see how Palladio could be useful to me.From my blog post, “Learning the Ropes of Palladio”
One of the challenges in trying to build on my previous work with Palladio was a) the dataset I used last time, of character interactions in Les Miserables, was pretty much exactly the kind of dataset that I’d like to make for medieval texts, and b) despite my best efforts, I couldn’t find a more medieval dataset that could better model what I’d be using Palladio for. (Though I did find this article about a character name description dataset that sounded interesting.)
As Sherlock Holmes would say, ‘Data, data, data!’
Ultimately, I decided, why reinvent the wheel? I went back to the Les Miserables dataset and thought about what else I could do with it. What questions could I ask and answer using Palladio?
One question that might be interesting for me to explore is whether these isolated characters out on the edge of the map like Gribier are as peripheral to the plot as they seem to be to the rest of the characters. Might there be a way to map cause-effect relationships between different plot events and connect them to to characters, to see which characters are actually connected to important events, even if they don’t interact with many other characters?
Could I do the same thing with clusters of characters, like this one?
These characters are all linked to Fantine–they’re all working-class women, mostly seamstresses, and the wealthy students they’re romantically involved with. Just the idea of a cluster consisting entirely of poor women and rich men seems like it would be thematically significant, regardless of these characters’ impacts on the larger plot. Perhaps looking at other clusters of related characters in Palladio would reveal similarly thematically important groupings within large texts, like the corpus of Arthurian romances or chanson cycles.
As part of my transition to talking about Tableau tools, I also tried to import another dataset to take advantage of the mapping function in Palladio. This proved to be a failure–there were too many variables in the spreadsheet for me to be able to figure out how to isolate the latitude and longitude to create a map layer on them. I’m sure it can be done, but I couldn’t manage to do it. On the other hand, it was meant to be a Tableau dataset. So….
My dive into the world of Tableau taught me two things: 1) this is a powerful tool for people who use statistics in their work, and 2) I have no idea what I’m doing with it.From my blog post, Lies, statistics, and Tableau, or my very short data visualization journey
To be frank, I struggled a little with Tableau, both in terms of being able to figure out its functionality and in terms of being able to figure out how I could use it. But it was a cool enough tool that I really did want to feel at least a bit more comfortable with it, so I decided to turn to the internet.
What I found was this tutorial by Kristen Mapes from 2015, which uses as its dataset information from the Roman de la Rose Digital Library on the manuscripts of Le Roman de la Rose. Now we’re speaking my language–in fact, we’re going right back to where I began, with Beinecke MS 418.
Step by step, I worked my way through the tutorial, and as I went, I could see how this actually could answer questions that were relevant to me.
This map, for example, shows collections in Europe that contain manuscript copies of Le Roman de la Rose. The bigger the circle, the more copies the collection has. The bluer dots are places that have manuscripts with fewer illustrations, and the oranger and redder dots are places that have manuscripts with more illustrations.
This chart maps out numbers of copies by start date. We can see that there are big spikes in 1300 and 1400–my guess would be that that’s just because they’re big round century dates, rather than because there was anything happening in 1300 or 1400 specifically that would make Le Roman de la Rose top the best-seller charts. That’s honestly just a guess, though, I’d have to look more into it to say for sure.
This tree map shows collections sized by how many copies they hold–to the surprise of no medieval literature fan, the BNF (National Library of France) has the most copies.
I only made these by following the tutorial as closely as I could, but it was so exciting to see how I could create these charts using a dataset where I actually understood the different fields of the chart and what was being measured and related to each other. Tableau turned out to be easier than I thought it was, once I figured out how the dragging and dropping system worked. This would be fun to do for other works and authors where I’m interested in the geographic distribution, texts, and number of illustrations in the manuscript witnesses.
The long and short of it is, though I don’t know if I’ll ever be regularly turning to these visualization tools for research, I now feel like I have a much clearer grasp of how I could.