Workshop – Text Analysis Methods in Historical Research
Micki Kaufman
1. (30min) Overview of Kissinger DNSA Text Analysis and Visualizations
2. (1hr:30min) Hands-on Workshop – The Collected Works of Yeats and Cummings
1. Intro
a. the Packet
2. Basic Excel review
i. Examples: Blogger Behavior
ii. Packet Intro
3. Data Collection and Management
a. DownThemAll http://www.downthemall.net/
b. Microsoft Excel / OpenOffice
c. TextWrangler / SublimeText / NotePad http://www.barebones.com/products/textwrangler/
d. NameChanger http://www.mrrsoftware.com/MRRSoftware/NameChanger.html
4. Word Frequency and Correlation
a. AntConc by Laurence Anthony, PhD http://www.antlab.sci.waseda.ac.jp/software.html
5. Topic Modeling
a. MALLET by Andrew McCallum and David Mimno http://mallet.cs.umass.edu/index.php
b. Mimno scripts
6. Sentiment Analysis
a. LIWC2007 by James W. Pennebaker, PhD http://www.liwc.net
7. Visualization
a. Basic Excel Graphing
b. Network Graphing: Gephi http://www.gephi.org
c. Visualization and Statistics: R http://www.r-project.org
d. Interactive Visualizations with d3
3. Appendix
a. Basic Resources
b. Additional Visualization Resources (thanks to Lev Manovich)
Basic Resources
if you are new to data visualization:
How do visualization designers work?
How We Visualized 23 Years of Geo Bee
Contests
Visualizing
The Worlds Well-Being
How
We Visualized Americas Food and Drink
Spending
Visualizing
The Health Care Reform
Look at:
http://flowingdata.com/2010/01/07/11-ways-to-visualize-changes-over-time-a-guide/
Other tutorials:
http://flowingdata.com/category/tutorials/
Additional Resources:
Graphic Design Principles for Information Visualization
Standard techniques for data visualization:
Using basic visualization techniques to create effective infographics:
Nicholas Felton: Annual Reports
GE
Powering the
Kitchen (fathom.info -
Ben Fry)
visualizingone dimensional data (single variable):
pie chart, bar chart
Excel: van_gogh_summary.xlsx
Mondrian: van_gogh_data.txt
histogram
Mondrian: van_gogh_data.txt
Explanation of the differences between a bar chart and a histogram
visualizing time series - line graph
Excel: van_gogh_data.txt
visualizing two dimensional data (two
variables):
scatter plot
explain differences between line graph and scatter plot
Excel: van_gogh_summary.xlsx, van_gogh_data.txt
data transformations (log, etc. - Excel graphs axis options; Mondrian: Calc > Transform)
visualizing multi-dimensional data (multiple variables):
radar plot
Excel: van_gogh_summary.xlsx
parallel plot
Mondrian: van_gogh_data.txt
plot matrix
Mondrian - scatterplot matrix: van_gogh_additional_measurements.txt
Visualizing time:
Early time visualizations:
Time lines and visual histories
Visualizations of singular temporal streams:
One of the most famous visualizations of the last 200 years - Charles Joseph Minards 1869 representation of Napoleons 1812 Russian Campaign - offers other innovative solutions to show time:
Charles Joseph Minard - visualization of Napoleons Russian Campaign
Visualizations of parallel temporal processes:
Here are a few examples of well-known innovative visualization techniques/projects to represent multiple event streams (multiple events which are taking place at the same time:)
The Preservation of Favored Traces
Hans Rossling TED 2006 lecture (video on TED)
Ben Fry: energy use in a kitchen
Visualizations of temporal links (links between events in single or multiple temporal streams):
Citeology: Visualizing the Relationships between Research Publications
Visualizations of temporal processes in cultural artifacts:
Novel Views: visualizations of the novel Les Miserables by Victor Hugo
Culture data visualizations by Santiago Ortiz
Time-based data:
3 million time-based open data sets: http://blog.revolutionanalytics.com/2013/02/quandl-a-wikipedia-for-time-series-data.html
R functions to use these data sets:
http://blog.revolutionanalytics.com/2013/03/quandl-package-released-to-cran.html
Help: http://www.quandl.com/help/r
http://www.simile-widgets.org/timeline/
Visualizing space:
view as many maps as you can:
http://pinterest.com/janwillemtulp/maps/
view
particular examples of recent maps using social media
data:
atNight
Twitter
NYC
Global
Twitter Heartbeat
How
Obama Won Re-election
What are the common features of recent maps driven by big data and social media data? Which maps stand out from the rest and why? What is missing?Why do the techniques for visualizations of temporal processes seem to be more limited in comparison to the richness of techniques and interesting projects in spatial data mapping?
Recommended - references/articles about science of cities
http://www.nytimes.com/2010/12/19/magazine/19Urban_West-t.html
Recommended - historical timelines
http://www.datavis.ca/gallery/timelines.php
Additional Resources:
Resources for Data Visualization (courtesy of Lev Manovich)
Inspiration:
http://tulpinspiration.tumblr.com/
http://infosthetics.com/
Data and visualization blogs worth following
popular websites and blogs about visualization
Innovative visualizations of temporal flows
Use of visualization in museum web site and online media collections
Visualization design patterns:
InfoVis wiki list of visualization design patterns
Spatial data:
http://en.wikipedia.org/wiki/OpenStreetMap
List of visualization and mapping software:
http://selection.datavisualization.ch/
Examples of visualization software which can create maps, timelines and all other basic vis techniques:
https://developers.google.com/chart/interactive/docs/gallery
http://d3js.org/ (currently most popular for web vis)
examples of software for creating interactive web maps:
http://mapbox.com/reinventgreen/
http://cartodb.github.com/torque/examples/uspo.html
lists of other visualization tools:
datavisualization.ch list of visualization and mapping tools
.net list of the top 20 data visualization tools
WikiVis list of visualization tools
Popular software tools and applications for creating visualizations
Software to analyzing and presenting online digital collections
Over 100 Incredible Infographic Tools and Resources (Categorized)