History and Background
Cantor and Naive Set theory
It is a common misconception that set theory was initially developed and popularized by the German mathematician Georg Cantor. While Cantor was a leading figure in the early development and adoption of set theory, the notation and rhetoric of sets had been in use by various mathematicians for years prior to his research. Cantor entered into German academic circles while his peers and mentors were attempting to explore the structure of the real numbers and the true nature of infinity. In hindsight, it is clear that much of the work done during this period involved the use of sets, but at the time there was no standardized definition of a set and no common notation for sets and set operations.
Cantor's most famous contributions to set theory math in general was his work regarding the size of infinite sets. His association with other academics made him very comfortable with the idea of infinite groups of numbers such as \(\mathbb{R}\)--the set of all real numbers--or \(\mathbb{N}\)--the set of all natural numbers (e.g. 1,2,3,...). Cantor wondered, even though both sets are infinitely large, if one was technically larger than the other. After all, it makes sense that \(\mathbb{R}\) should be larger than \(\mathbb{N}\) since the real numbers contain all the natural numbers, plus every number that can be represented as an infinite decimal. Cantor proved this intuitive observation using the idea of a bijective function.
A bijective function, or just bijection, is a function that shows a one-to-one correspondence between the elements of two sets. Cantor reasoned that if such a function existed between the real numbers and the natural numbers then \(\mathbb{R}\) and \(\mathbb{N}\) must be the same size. His proof that no such function exists has famously been called his diagonalization argument.
The Diagonalization Argument
The argument begins with the assumption that a bijection does exist between the natural numbers, \(\mathbb{N}\), and the all the set of all real numbers between zero and one, \(\{x:0\leq x\leq 1,x\in \mathbb{R}\}\). This function would allow the creation of a table that listed all the natural numbers--starting at one and going on to infinity--and the unique real number the function would map to each natural number. By taking the first digit of the first real number and adding one, then taking the second digit of the second real number and adding one, and continuing to do the same for every real number list by the function, Cantor showed that he could always create a new real number that was guaranteed to not be listed. This proved that the set of all real numbers is not just larger than the set of all natural numbers, the set of all real numbers just between zero and 1 is larger than the set of all real numbers. He called the size of the natural numbers "countable"--since it is possible to map them to the "counting" numbers--and the size of the real numbers as uncountable.
This is very intuitive in a naive understanding of set theory. However, Cantor used the same reasoning to prove a few other counter-intuitive points. Even though there is no bijection between the natural numbers and the real numbers, there is a bijection between the natural numbers and the rational numbers--the numbers that can be written as a proper fraction, commonly denoted as \(\mathbb{Q}\)--and the integers--whole numbers both positive and negative, commonly denoted as \(\mathbb{Z}\).
Even though one might intuitively assume that there are obviously twice as many positive and negative integers as there are just positive integers Cantor's diagonalization proved that the size, or cardinality, of these two sets is exactly the same. Additionally, even though there are infinitely many natural numbers and infinitely many real numbers, the infinity that describes the cardinality of the real numbers must be larger than the infinity that describes the cardinality of the natural numbers.
Cantor's work was met with mixed emotions from the mathematical community. Some more traditional mathematicians regarded the idea of different infinities as complete nonsense, while others rejoiced in it. Even before it was well defined and well accepted, set theory began poking holes in the structure of mathematics people had been taking for granted for hundreds of years. Just the idea of a mathematical set was enough to show that infinity was far more complicated than anyone had ever imagined, and it invited others to continue Cantor's work. David Hilbert, another highly influential German mathematician and a contemporary of Cantor, believed that set theory could become the rigorous foundation mathematics needed as the world plowed forward into the 20th century and that it could be the key to a deeper understanding of the true structure of math.
Before set theory could begin to lay that foundation, however, the properties and definitions of sets needed to be well defined. In the decades following Cantor's groundbreaking work, it became clear that both the common conception of a set and some of Cantor's definitions, lacked complete consistency. Perhaps the most famous of these inconsistencies was discovered by the British mathematician Bertrand Russell.
Russell's Paradox
Russell was deeply fascinated by Cantor's work, especially where it seemed to contradict with his personal belief in the universal set, or the set that includes everything. As Russell studied more and more of Cantor's theories and considered their implications he became increasingly sure that there was a major inconsistency. No one is entirely sure when he finally discovered the paradox that would later be named for him, but his notes say that it was sometime in 1901.
His paradox came as a result of some naive rules governing how sets could be created. It was commonly accepted that a set could be constructed from all the objects that meet a certain criterion. For example, the set of all perfect squares could be constructed as \(S = \{x:x=n^2,n\in \mathbb{N}\}\) which says that the set \(S\) is composed of all the objects that are equal to the square of a natural number. Using a similar process and notation, Russell constructed the following set \(R=\{x:x\not\in x\}\) which says that the set \(R\) is composed of all the objects that are not an element of themselves. In other words, \(R\) is the set of all sets that do not contain themselves. Russell's question was then "is \(R\) a member of itself".
This paradox is equivalent to the famous "Barber Paradox." In one of his essays, Russell described the Barber Paradox by saying that "You can define the barber as 'one who shaves all those, and those only, who do not shave themselves'. The question is, does the barber shave himself?" In this form it is easy to see that if the barber shaves himself, he must not be shaved by the barber. However, if he is not shaved by the barber he must not shave himself and therefore must be shaved by the barber. The contradiction is that the barber shaves himself if and only if he does not shave himself.
Likewise, the set \(R\) defined above is an element of itself if and only if it is not an element of itself. The fact that such a set could be constructed from the current rules of set theory showed that one could create a set which does not exist. The solution to this paradox came in the form of the ''vicious circle principle'' which says that a set can be formed based on a common criterion only if one has specified exactly the objects to which the criterion is being applied. This invalidates \(R\) because the criterion "is not a member of itself" would not apply to \(R\) itself since it is not well defined before the criterion is applied.
Russell's paradox and its resolution show that the naive understanding of sets that was prevalent at the time was not mathematically rigorous enough to be consistent. Just like Cantor, Russell was able to show that the conventional understanding of math was lacking in ways that no one had yet observed. In order for set theory to become truly consistent and rigorous it needed to be axiomatized, meaning that there need to be a series of axioms--essentially rules or definitions that constitute the building blocks of a mathematical discipline--to standardize it. The first set of axioms came from Ernst Zermelo as he needed them to prove on of Cantor's original theorems. His axioms were adjusted slightly by other mathematicians over the years and are now known as the Zermelo-Frankel axioms (or ZF). They continue to be the basis for most set theoretic math done today.