Analysis
Click on the image to view larger images.
Background/Methods: AntConc is one of the useful tools for me since it provides users with opportunities to really look into the context of the keywords and it enables users to perform comparisons between corpus. It is a really helpful tool for searching keywords (when stop words matter in the process). It provides accurate and useful statistics. In addition, Keyword in context (KWIC) is a good way to start looking for patterns in corpus.
I created a graph with my search results which used men|man|he|his|him|himself|male as representation of words related to male and women|woman|she|her|hers|herself|female as representation of words related to female.
Discussion: My initial guess was that there might be a decline in the ratios of men in screenplays/novels, but it turned out that the ratios fluctuate as times go. However, I got an interesting result after I combined the total numbers of searching results that I got from Antconc in the whole corpus.
We could see from the pie chart that the ratios of men in both novels and screenplays are around 66% which is a large proportion. What's more, it is interesting that the ratios that I got by searching gender markers in screenplays and novels corresponded with each other. It indeed shows that women are still in the inferior status in the society.
Background/Methods: In order to combine close reading and distant reading, I decide to zoom in and just focus on the small number of files. This time I focus on the dialogues from the 9 winners of Oscar Best Adapted Screenplays. I found some interesting results.
Wordcloud for Movie Dialogue, distinctive words in movie corpus, in novels corpus
Analysis and discussion:
I got the most frequent words in the corpus are you (3,176), the (2,892), to (2,245), i (2,144), a (1,943). After reading parts of the “The Secret Life of Pronouns”, I had a better understanding of different words in the language use. The author Pennebaker concludes that “Men use articles (a, an, the) more than do women” and “Women use first-person singular pronouns, or I-words, more than men” [1]. Then I looked into the screenwriters of all these screenplays. Sadly, all the screenwriters are male. But it is a little contradictory with the results that the author mentioned in the book since from the Distinctive Words section, we could clearly see that “I”, “im” and “im” are distinctive words for movie Precious and The Descendants. Their screenwriters are all male but why they used so many “I” or “I’m”. Everything suddenly made sense when I looked into the authors of the original novels of these two movies. THEY ARE ALL FEMALE! It is very interesting. It either means that these two screenwriters write like women (there are some examples of male screenwriters write like women in the book) or perhaps they tend to keep the pronouns of the original authors of the book.
Then, I also used AntConc to check my results and it showed similar results as Voyant. So I guess the reason is that these two screenwriters did tend to keep the pronouns of the original authors of the book. I also used AntConc to check if male writers tend to use more articles (a, an, the) than women. And I did find it is the case. These facts never occurred to me before I read the book
These findings give me a new understanding of using Voyant and AntConc. And sometimes stop words are as significant as other words since they might also show important results. Additionally, after focusing on several files in my corpus, I found much more useful results from Voyant. It shows that the so-called “big data” analysis is not always useful.
Analysis and discussion: We could also discover similar patterns by using multi-cloud function of Lexos. The color of "I" is relatively strong in the movie dialogues of The Descendants and Precious. What's more, words like "the", "you" and "to" are frequent words as well which corresponds the conclusion made by Pennebaker. The special characteristic for writing dialogues in screenplays is that screenwriters write down the dialogues within a scene. A screenwriter turned novelist called D.A.Serra says, "for screenwriters, attention to dialogue is considerably more critical to success than it is for narrative writing." This means that there are much higher requirements for screenwriters to write dialogues. And there are no internal thoughts for characters in screenplays which are completely different with novels. In novels, most characters have their own internal thoughts. For the screenplays, screenwriters have to reveal the internal minds of the characters (through dialogues or actions) and that is the reason why there are many "I" in the dialogues of the screenplays.
Background: Lexos provides us with clusters analysis based on delta analysis (based on the most frequent words) and zeta (based on distinctive words), it could identify the differences of a file comparative to another and also show us the relationships across our corpus. It is a useful tool to compare writing styles, diction and word choices of the files among the corpus.
Analysis: For the dendrogram, I loaded 8 screenplays and their corresponding 8 novels. The first graph is the result that I got from selecting to create graphs with 50 most frequent words. The second graph is the result that I got from 100 most frequent words. Initially, I thought each screenplay would be connected to its corresponding novel since they focus on same topics. Surprisingly, most screenplays were grouped together and most novels were calcified as a group. This is a strong evidence that indicates the differences in writing screenplays dialogues and novels dialogues.
Additionally, in the graph on the left, the novel and movie dialogues of Slumdog Millionaire are closed to each other and in the graph on the right, these two files are separate from each other. At the same time, the novel and movies dialogues of 12 Years are connected together which means that they are similar. There must be some unusual writing in the movie dialogues of 12 Years as Slave that makes itself separate from the group of movie dialogues. It would require more close reading for further investigation.
Another thing we noticed here is that the screenplay of the Big Short is grouped with that of the Social Network. Both these two novels were based on real life stories. The Big Short is about the financial crisis in 2008 and the Social Network is about the creation of Facebook. These two novels are in the same genres and that might be the reasons that why they were brought together.
Background: Alchemy is a famous sentiment analysis tool that is capable of computing document-level sentiment. Its algorithms looks for words that carry a positive or negative connotation then figures out which person, place or thing they are referring to.
Methods: To continue my investigation on similarities and differences between the two dialogues corpora, I uploaded several dialogues of novels and movies to discover some correlations between the two.
Analysis: One interesting that we could see that all the dialogues were shown to be negative documents sentiments according to Alchemy. I thought there would be a distorted to more negative or extreme sentiment for novels since for movies, actors and actresses will have facial expression to express sentiments but authors of novels need more dialogues with but from the result that I got so far it is not the case.
Background: "In a novel, a writer can take their time to build up chapters of details and characters leading to major plot points, obstacles, intricacies, climax and resolution. In a screenplay, each scene is a chapter per say, following the 3 Act structure and often incorporating the Hero’s Journey to get to your climax and resolution." -- from An Introduction To ScreenWriting[4]
Methods: Indico is an API similar to Alchemy but allows me to input and manipulate files easily. It is reported to show more accurate sentiment results than Alchemy according to a comparison on this website . Indico returns sentiments on a scale from 1 to 5 which is similar to movie review systems for IMDB (5 is positive and 1 is negative). I used Indico and created the graphs of the change of sentiments through the novels and screenplays.
Analysis: We could see that screenwriters tend to create more high climaxes in the diagram and the climax mostly happens at the end of the screenplays. For the screenplay dialogues graph of the Social Network, there are two high climaxes at the end of the movie. Another difference between these two plots is that the changes of sentiments through the novels tend to display more fluctuations. In the novel dialogues graph of social networks, we could see 6 local minimum sentiments and 4 local maximum sentiments. However, there are only 3 local minima and 3 local maxima in sentiments graph of the movie dialogues.
As filmmaker Kumar states in her blog, "a script is a masterwork of compression. It’s about the bones of the story and the characters and the dialogues, and finding visuals to render these in the most dynamic way; whereas in a novel, a writer has leisure to expand on the setting or mood, and delve into the thoughts of her characters in any number of ways. I enjoy this leisure, but I’d like to think that because of the discipline I’ve acquired as a screenwriter, I won’t get carried away with it." [2] Compared to screenplays, there are more flexibility in writing novels. Therefore, the plots of novels are usually more robust. In order to make the screenplays as robust as the novels, screenwriters have to try other methods to make the movies stand out. One method is to create more incidents and depict more details.
Background/Methods: In order to fully understand the results from Indico and Alchemy, I decided to look into different types of sentiments in the dialogues. Although Alchemy provides similar sentiment analysis, it is difficult to create graphs from their API. Instead, I found an R package that allows me to accomplish this. I used R package syuzhet which allows me to graph different sentiments of a document such as trust, surprise, sadness, joy, fear, disgust, anticipation and anger.
Analysis: It's surprising to see that the sentiments of screenplays and novels are roughly the same. The sentiments of trust are high for all these files.
The percentage of surprise and fear emotions of the screenplays is mostly higher that that of the novels. It is possible that screenwriters add more elements in conversation to excite the audience. The percentage of anticipation emotions of the screenplays is mostly lower that that of the novels. It might be because the screenwriters tend to lower the anticipation of the characters in order to create resolution and climax at the end.
There are many similarities between screenplays and novels shown by different kinds of analysis. From the discussion and analysis above, we could see that there is consistency of gender markers and document sentiments between the two.
However, differences between the two are more than the similarities. First, they have different length. Screenplays are usually around 100 pages but they usually have more dialogues than novels. Second, dialogues in screenplays usually show less personal and internal thoughts than in novels. Third, the diction of dialogues in screenplays is very different from that of novels (from dendrogram). Besides, screenplays have more restrictions compared to novels such as story structures and plot arrangements. Novels are more robust and flexible. They have the freedom to create and describe anything that appears in the novel. In order to create different from the novels, screenwriters tend to put higher climaxes according to the 3 Act Structure (from Syd Field Paradigm [4]) and it is common for them to put a plot climax near the end of the movies.
Some lessons that I learned from the experience of this semester: sometimes, stop words could show something very useful. Whether to clean the stop words depends on the corpus that we analyze and the research question that we ask. For a complicated and mixed corpus with different files, stop words would be useful. For a research question that focuses on diction similarities and differences, stop words could also provide many insights. Additionally, close reading is very important to analyze screenplays.
Finally, I want to especially thank Professor Faull's numerous suggestions and insightful ideas. They always pointed me to the right directions and gave me hopes when I stared at the computer screen for hours in the late night.
[1] Pennebaker, James W. The Secret Life of Pronouns: What Our Words Say about Us. New York: Bloomsbury, 2011.
[2] Kumar, Priyanka. "Should You Write a Novel or a Screenplay?" The Huffington Post. TheHuffingtonPost.com. Web. 07 May 2016.
[3] Walker, Marilyn A., Grace I. Lin, and Jennifer Sawyer. "An Annotated Corpus of Film Dialogue for Learning and Characterizing Character Style." LREC. 2012.
[4] Schützenhofer, Sara. "How to Analyze a Film Script." Web.
[5]"An Introduction To ScreenWriting." ScreenCraft. Tue. 9 May 2016.