Generation One: the Digital Librarian

The Great Library of Alexandria

While historians debate the exact details, the fire that spread from the Egyptian harbor into Alexandria during Julius Caesar’s campaign in 48 BCE is widely believed to have damaged or destroyed parts of the Great Library, resulting in a significant loss of classical knowledge.

The battle of Alexandria in 48 BCE was one of control over the Pharaoh’s crown and thereby control over the breadbasket of the classical Mediterranean world. Caesar considered Cleopatra, sister of the Pharaoh Ptolemy, as the more loyal vassal to Roman rule and sided with her claim to the throne resulting in a Roman siege of the city.

While many tales of Cleopatra speak of how her beauty and charm seduced the Roman emperor to her cause, overall Roman interest in controlling Egypt and bringing it into the empire as a loyal vassal state were grounded in very pragmatic rationale. Rome was an expansionist empire run on the back of its army campaigns to claim new territories for the empire. Egypt was a rich trading state with a fertile river valley that grew enough wheat to feed a large and expansive army. The Roman axiom of “bread and circuses” as levers of government heavily relied on Egyptian wheat to produce said bread.

Egypt was a wealthy powerhouse of trade in the classical era. Grains grown along the Nile river valley traded in the markets of Alexandria situated at the mouth of the river where it flowed into the Mediterranean sea. Merchants from the four corners of the classical world converged in the port of Alexandria to trade for grain. Those merchants brought gold, spices, and most importantly – books to Alexandria. Recognizing the value and power of knowledge, the Ptolemaic Pharaohs ordered that every merchant vessel docked at Alexandria be searched. Any books found were seized, copied, and the copies returned to the merchant whilst the originals were stored in Alexandria’s library.

The ethically dubious method of collecting books in the era before copyrights built the largest library and largest sole collection of human knowledge in the classical era in the city of Alexandria. At its height, the great library was said to have upwards of 400,000 books. Even scholars today lament that fire in 48 BCE which spread from the harbor, into the city, and pillaged the great library. Had the library survived, many scholars argue that the collective knowledge contained in its books and scrolls could have kicked off the technological revolution associated with the Renaissance era during the classical era.

The fire of 48 BCE may have robbed humanity of a 1,000 year jump start on technological advancement and it paints the historic important role of information accessibility. We had to wait till the invention of the printing press to distribute knowledge in the form of books at the scale necessary to kick off the enlightenment era where technologies known during the classical era were rediscovered and distributed to a growing literate audience of artists, philosophers, and scientists. The pattern is clear throughout history: Creating a library and indexing information for distribution has historical precedence for the advancement of human technology and culture, so it is little surprise that creating an index and library of information in the internet age changed everything and set off another period of rapid advancements in technology paired with radical cultural change.

Building the Digital Great Library

While the internet had been around since the 1960s, it was primarily used and crafted by university and government computer scientists. Personal computer adoption in the 1980s led to the first pioneers on the internet outside of largely elite scientific circles with AOL and CompuServe creating the earliest access point to the nascent text based web. In 1994, the launch of the first graphical web browser – Netscape Navigator – changed everything.

Netscape was the first web browser with a mass audience and broad consumer appeal. It won that appeal by creating an easy gateway to both navigate the internet and interact with web content. As a graphical browser, Netscape rendered images within the web browser thus obviating the need to download an image and open it in another program in order to view it. The Netscape revolution – accessible image + text based web access that people could use with minimal training – set off a Cambrian explosion of web content. Like life finding a niche in all corners of earth’s early oceans, the number of websites multiplied exponentially.

The gates of the internet now open and new users flooding in, the proliferation of web content created a problem for the growing internet. It was becoming impossible to keep up with and find all of the new content and websites constantly popping up. Forum sites, such as Yahoo’s early incarnation, relied on human curation to go out, find good content, and organize it into categories. Once upon a time, people went to Yahoo and clicked on category pages such as Sports or Fashion in order to find a list of websites curated towards their interest.

Human curation played an important role in the early internet, but curators couldn’t keep up with the growing list of potential web properties coming into existence. Nor could they keep up with content changes on their growing lists of websites. The internet was distributed, growing at a wild pace, and was highly disorganized until 1998, when two Stanford Ph.D. students, Larry Page and Sergey Brin, launched a company with a mission to organize the world’s information.

Google’s founders – Larry Page and Sergey Brin – saw the challenges inherent in trying to curate an exponentially growing set of web content and envisioned a method of making all of the web’s content easily accessible. Their solution was to develop a web crawler that scanned the internet, read web pages, and created relevance matches based on the website’s content. They then built a web portal where people could input natural language search queries and return a page of links to sites that best match their search queries. Their revolutionary PageRank algorithm determined a site’s importance based on how many other sites linked to it, creating a system that could intelligently rank results. Essentially, they created index cards for every website on the internet and a librarian that put whatever index card you asked for into your hands.

That digital librarian, translating human written text and matching it against relevant web site content was the first generation of Algorithms that shaped both how we used the internet and, ultimately, the very nature of the internet itself. For the first time in history, a set of mathematical equations around keyword matching and relevance decided what content gets in front of the eyeballs of people – an important distinction that sets the precedent of subsequent generations of web algorithms. This first generation algorithm, rooted in deciding what web content gets seen and by whom granted Google the web’s first big fortune and made it the most influential technology monopoly of the early 2000s.

The Librarian turned Kingmaker

While Google’s core search product started and continues with the mission statement of making the world’s information accessible for free, Google’s business managers have managed to generate billions in profit off of a product that is free to the users. Their product has made or broken the fortunes of many other web site owners. To this day, Google’s core product is an ostensibly unbiased relevance rank of websites that match and are most likely to satisfy the information requests of the user. Google made its fortune first by selling advertisements within its relevance rankings then by serving relevant advertisers on content websites.

This technology was a great boon to people upon its launch. Google became the most visited website in the world and stayed on top of that list for years. Googling became a verb recognized by the Webster dictionary in 2006. People could ask the world’s biggest internet property for whatever information they wanted and would get back relevant results. They were one click away from a series of webpages that engaged their interest and Google brought eyeballs to all corners of the internet.

Paying to get access to eyeballs online is big business. By 2024, digital advertising had grown into a $300+ billion industry globally. Aside from directly paying Google to appear at the top of Google searches for certain keywords, businesses invest time and money on search engine optimization (SEO) with the goal of ranking better within Google search results. Businesses are willing to fork over vast sums of money to Google for relevance within search because they are attracting web visitors to their website. In most cases, the businesses regularly paying Google for visitors are able to convert those visitors into paying customers on their website and thus secure a profit themselves after advertisement fees and their other business costs.

In the early twenty-first century, Google was the way for internet business to be found online. Ranking on Google made or broke businesses. Changes to Google’s algorithm and the shakeups to ranking order that came with them were the subject of many articles and news pieces. If Google delivered traffic to a site owner, their site would thrive. Likewise, no traffic would cause sites to wither and die on the vine. The mathematical formulas of an algorithm hosted on one web page decided the fate of and fortunes of people around the globe. Google’s algorithm leveraged the powers of a librarian to become the internet’s first gatekeeper.

In organizing the internet, Google brought order to the distributed system that was the internet. Content producers, who often fully owned the website on which their content was distributed, had a significant incentive to adjust their content to be discovered by Google. Well ordered site maps, keywords in metadata that matched words written on the webpage, and page description with succinct call to actions became the norm. These changes helped cement a somewhat templated look and feel to the internet. Word order and structure impacted how well Google could read a website and thus an implicit standardization dictated common elements to sites such as navigation bars and using headers to summarize and segment pages.

Website owners didn’t just stop at design changes to make their content easier to read. They also adapted their content to take advantage of the commercial opportunities made available from a reliable source of web traffic. Business owners tailored content around what sort of queries people were inputting into Google. Search trends, reporting Google generates that show what people are asking at any given time, have provided a fascinating insight into how people react to current events as well as given marketers insight into what products are attracting attention. Website owners used this data to help measure the audience appetite for content that, if produced on their own web properties, could potentially convert visitors from Google into revenue generating customers.

Google’s business model evolved beyond simple search results. Google made its first billion on selling advertising space within its own library, but many of its subsequent billions came in from Google establishing itself as a leader within the Advertising Technology (Ad Tech) sector, where platforms like Google auction off advertising space on publisher websites typically on per view (known as impression) basis. This created a profitable feedback loop for Google. Google rewarded strong content producers that created content that was relevant and ranked well on search results with traffic. It then partnered with and sold advertising slots on those same websites, thus monetizing its own free to use website ranking index.

Many of these content producers were news sites, product recommendation pages, or generally pages with content written by knowledge holders and distributed with the intention of earning advertising revenue. Thus, Google’s leadership within the Ad Tech industry helped to subsidize quality content production or cloud based tools with advertising revenue. This helped keep a lot of news content of the era outside of paywalls or many widgets and quality of life web tools free because the development work of creating news articles or web tools could be monetized by sustainable advertising revenue. Indeed, the pervasive assumption of internet users of the early 2000s was that almost everything online would be free to use forever and that DNA of free to the user is pervasive in many of today’s top internet properties, though the price of that freedom is now much higher than most people realize.

It is ironic that Google, which made its billions by creating a library and order within the open internet, is now in many ways facing challenges to its dominance in the information marketplace. Ad Tech, which evolved to solve the problem of monetizing the distribution of content across many different websites, incentivized a generation of content creators to make their information accessible to the world. Google still reaps immense profits from monetizing how people find content online when that content is distributed on websites that are vertically organized, owning both the content and distribution medium (website). Yet, the sites where people spend their time online have significantly changed since Google launched the first generation of algorithms. The rise of social media platforms in the late 2000s and 2010s challenged Google’s primacy and created a duopoly in digital advertising. This next generation of algorithms – the social media recommendation engines – would shape not just how we find information, but what information finds us.

Perhaps the most profound irony in this digital transformation lies in how history repeats itself across millennia. The Great Library of Alexandria was built upon the forceful extraction of knowledge without regard to ownership – ships were searched, books seized, and copies returned while originals remained in the library’s collection. Today’s digital librarians, from search engines to artificial intelligence systems, operate with similar disregard for ownership, crawling and indexing content without explicit permission, training on vast datasets of human creativity without attribution or compensation. 

In both the classical and digital eras, the entities controlling information repositories justify their methods by claiming to serve the greater good of making knowledge universally accessible. In both cases, they accumulate unprecedented power and influence in the process. As we examine subsequent generations of algorithms, this tension between democratizing information and exploiting its creators remains unresolved, suggesting that our struggles with information equity today are not novel problems but ancient ones, merely accelerated and amplified by technology’s reach.

Subscribe to Better With Robots for fresh, thought-provoking insights into the AI revolution and its real-world impacts. We use Substack to manage our newsletter. Our free plan keeps you up-to-date and you have the option of showing further support for our writing by choosing a paid subscription.

Leave a Comment