Episodes with the most polarized opinions

An analysis of what makes episodes great

Buffy writers sorted by the quality of their episodes

Buffy directors sorted by the quality of their episodes

Seasons of Buffy sorted by the quality of their episodes

Buffy characters sorted by their possible contribution to episode quality

Predicted episode quality based on writer, director, characters, and when aired

Episodes that are much better or much worse than predicted

How the results on this site were obtained

Frequently Asked Questions

(or, Questions Anticipated to be Frequently Asked)

Although this page is called a "F.A.Q.," it is not really a F.A.Q. because it was written before the many of these questions were asked. Instead, a better name would be a "Q.A.F.A." (Questions Anticipated to be Frequently Asked).

How did you get the ratings?

Simply put, the webmaster searched the internet for lists of the best and worst episodes of Buffy the Vampire Slayer. He used these lists to create average ratings for each episode of the series. More detail is on the Method Page.

What do you mean by points for episodes?

The points for each episode represents its average ranking on best and/or worst episode lists, with 144 points meaning that the episode was rated the best episode on all lists and 1 point meaning that the episode was rated the worst episode on every list. Fans were not unanimous on any episode, so point totals for all episodes fall between these two extremes. Again, more detail is on the Method Page.

What do you mean when you say that something explains a certain percentage of episode quality? For example, what does it mean when you say that "only 1.7% of the quality of an episode can be attributed to the season when it aired"?

Suppose that someone wrote the titles to all 144 episodes on separate cards, drew one an random, and asked me to guess how many points that episode has. My best guess would be 72.5, the average score for all episodes. However, no episode has exactly 72.5 points. Some have more, and some have fewer. In this case, suppose that the person picked "The Wish" with 91.84 points. I would be off by 19.34 points. These 19.34 points would be error due to the fact that "The Wish" is a much better than average episode.

Suppose that the person drawing "The Wish" gave me one hint, that it was a third season episode. Using this information, I would guess 75.12, the average score for third season episodes. Now I am off by only 16.72 points.

Sometimes, this information could be misleading. If the person drew "Anne" (with 66.91 points) instead of "The Wish," knowing that the episode is a third season episode would make my guess worse, not better. That is, I would be off by 8.21 points instead of being off by 5.59 points. Across all 144 episodes, however, my guesses would improve a little (1.7%) if I knew the seasons of each episode.

[Note to statisticians: I know that this concept is more complicated than I described. However, I would like to see you explain this concept to lay people in only three short paragraphs without boring them.]

What does it mean when you say that you analyzed the effect of a director, actor, or character "after taking into account when the episode aired and whether the episode is part of a two-part episode."

All this means is that the data on the page look at how much better we can predict the quality of an episode once we already know what season the episode aired in, whether the episode was the first, second-to-last, or the last episode of a season, and whether it was half of a two-part episode. This way, characters, writers, and directors are neither punished for not participating in good episodes nor rewarded for not participating in bad episodes if they were not part of the show for that season. This also takes into account the fact that the last two episodes of a season (especially the season finale) tend to be better than other episodes and that season openers tend to be worse than expected when you take into account the fact that Joss Whedon often writes and/or directs them.

How is what you did for the Buffy Phenomenon different from what you did on its sister site, the Phi-Phenomenon?

The Phi-Phenomenon is a study of films and film tastes. There have been at least a few hundred thousand feature films made throughout the world. Many films have been lost forever and new ones are being made every day. Nobody has come close to seeing every film. It is not reasonable to include all feature films ever made in any analysis. Instead, only a sample of the best films can be studied. There have been only 144 episodes of Buffy ever made, and it is very unlikely that there will be any more (the Webmaster would prefer not to ascknowledge the existence of the comic books). This means that the entire population of Buffy episodes can be analyzed, and lists of worst episodes are as useful as lists of best episodes. In fact, they are more useful because they are rarer.

Also, data analysis is much simpler with a population of a manageable size. Rather than the complicated points formula that the Phi-Phenomenon uses, the Buffy Phenomenon can simply calculate the average ranking of an episode. Even if a list includes only the top-ten episodes, one can simply infer that the remaining 134 episodes are tied for eleventh.

Your statistical procedures are completely unwarranted. For example, you violate the assumptions of the … [insert statistical jargon that would confuse and bore the ordinary reader]. How can you justify your procedure?

First, look at the answer above and note that I am dealing with the entire population of episodes, not a sample. I can make more definite conclusions when I am studying the entire population. Second, I am simply presenting rankings of episodes of a television show; I am not trying to land a rover on Mars. Precision is not crucial.

How can you possibly say that X is a great episode? It was awful.
How can you possibly say that Y is a bad episode? It was great.

The webmaster is not saying that X is a great episode or that Y is a bad episode, other fans are. Some episodes seem to divide fans. These are listed on the Polarized Episodes Page. Even well-liked or hated episodes have their critics or fans. Some people hate "Once More with Feeling" and a few love "Doublemeat Palace." It is also possible that you missed something in episode X that others see or that you see something in episode Y that others do not.

If it is any consolation, the webmaster's #9 episode is barely in the top 50, and he is not so fond of some highly ranked episodes.

How confident are you in your rankings? Can you really be sure that episode #71 is better than episode #72?

Not very confident. A given rank means that we are at least 50% confident that the episode is better than any given lower ranked episode and weaker than any given higher ranked episode.

The rankings can be off for a few different reasons. Not every list on the internet is included in the analyses. Most lists do not rank all episodes. Fans who post lists on the internet may have different opinions than fans who do not post lists.

I have noticed numerous errors in your listings of writers and directors. Don't you know that Joss Whedon and Marti Noxon ghostwrote half of "Conversations with Dead People" and that Joss Whedon ghostdirected the England scenes in "Lessons" and "Beneath You"?

Only credited directors and writers are listed on this site. The Internet Movie Database is the final arbiter as to who is the credited writer and director for a given episode. This site ignores uncredited writers and directors.

The Phi-Phenomenon has found that there are three different tastes in film. Are there different tastes in Buffy episodes such as preference for funny episodes versus dramatic, heartwrenching episodes or for high school episodes versus later episodes?

Most of lists that this site analyzes skip episodes or mention only a handful of episodes. These lists do not lend themselves to an analysis of tastes in Buffy episodes. An analysis was performed on a subsample of 135 lists from people who rated or ranked the most episodes. The results suggest that there are two tastes in Buffy episodes. Most fans, called "Loyalists" think that the show retained its high quality throughout all seven seasons. A minority, called "Jumpers," believe that the show "jumped the shark" during the fourth season, most likely due to the characters graduating high school, the introduction of Riley and or the Initiative arc, and/or Tara's appearance and Willow's change in sexual orientation. A more detailed analysis is introduced on the tastes page.

Does this mean that all fans are either a Loyalist or a Jumper and not both?

Not really. Labeling a fan as either a Loyalist or a Jumper is an (over)simplification. In fact, it is mathematically impossible for someone to be a pure Jumper or a pure Loyalist. All fans share each taste at least a little.

In effect, three general factors affect everyone’s ratings of Buffy episodes: (1) the extent to which the person holds the Loyalist taste, (2) the extent to which the person holds the Jumper taste, and (3) all other factors including the extent to which the person holds less well-defined tastes, idiosyncratic factors, and random factors such as how well the person remembers a particular episode when making the list. The purpose of formulas used to identify these two tastes is to maximize the number of people who are strongly influenced by one of the two tastes and weakly influenced by the other. However, each taste influences all fans.

In the subsample of lists used to identify the two tastes, about 44% (59/135) were strong Loyalists, and about 25% (34/135) were strong Jumpers. Of the remaining about 10% (14/135) were a blend of the two tastes and 21% (28/135) had more idiosyncratic tastes. It is likely that fans as a whole fall in similar proportions.

Are Jumpers nothing more than Tara haters?

Once the general tastes became clear but before analyzing the differences between the two tastes, the webmaster came up with four theories that could explain why the Jumpers may believe that the quality of the show declined in the later seasons:

Moving the characters from high school to college,

The Initiative arc in the fourth season,

The introduction of Riley as a character and as a love interest for Buffy, and

The introduction of Tara as a character and as a love interest for Willow.

The Webmaster then looked at the data and found that, of the four theories, the data best supported theory 4. If there is only one reason why Jumpers believe that the quality declined over the last few seasons, it is probably this. However, it is very unlikely that there is a single reason for this effect, and it is very unlikely that the same reason or reasons apply for all Jumpers. It is likely that most people who dislike Tara are Jumpers, it is very unlikely that all, or even most, Jumpers dislike Tara.

Will you do these analyses for other televison shows like Star Trek, The Simpsons, or Babylon 5?

There may be analyses of more shows if there is enough interest. Any analyses that take place will be only on long-running shows, so there will be none on Firefly unless a network picks it up for a few more seasons. Ideally, no analyses will be done on a series that is still airing new episodes, like The Simpsons, but they may be done over a summer hiatus if there is sufficient interest.