The term SEO can also refer to "search engine optimizers," a term adopted by an industry of consultants who carry out search engine optimization on behalf of clients, and by employees of site owners who may perform SEO services in-house. Search engine optimizers often offer SEO as a stand-alone service or as a part of a larger marketing campaign. Because effective SEO can require making changes to the source code of a site, it is often very helpful when incorporated into the initial development and design of a site, leading to the use of the term "Search Engine Friendly" to describe designs, menus, content management systems and shopping carts that can be optimized easily and effectively.
Search Engine Optimization is considered by many to be a subset of search engine marketing, a term used to describe a process of improving the volume or quality of traffic to a web site from search engines, usually in "natural" ("organic" or "algorithmic") search results.
Search Engine Optimization is marketing to people online ("visitors") by understanding how search algorithms work and what these visitors might search for, to help match those visitors with sites offering what they are interested in finding.
Marketing efforts may also be seen in more narrow vertical search engines involving areas such as local search.
The goal of site owners and consultants engaging in Search Engine Optimization is to entice qualified visitors to their website. The quality of visitor traffic can be measured by how often a visitor using a specific keyword phrase leads to a desired "conversion" action, such as making a purchase, viewing or downloading a certain page, requesting further information, signing up for a newsletter, or taking some other specific action.
Creating web pages with SEO in mind does not necessarily mean creating content more favorable to algorithms than human visitors. Some SEO efforts may involve optimizing a site's coding, presentation, and structure, without making very noticeable changes to human visitors, such as incorporating a clear hierarchical structure to a site, and avoiding or fixing problems that might keep search engine indexing programs from fully spidering a site.Other, more noticeable efforts, involve including unique content on pages that can be easily indexed and extracted from those pages by search engines while also appealing to human visitors.
Webmasters and content providers began optimizing sites for search engines in the mid-1990s, as the first search engines were cataloging the early Web.
Initially, all a webmaster needed to do was submit a page, or URI, to the various engines which would send a spider to "crawl" that page, extract links to other pages from it, and return information found on the page to be indexed. The process involves a search engine spider downloading a page and storing it on the search engine's own server, where a second program, known as an indexer, extracts various information about the page, such as the words it contains and where these are located, as well as any weight for specific words, as well as any and all links the page contains, which are then placed into a scheduler for crawling at a later date.
Site owners started to recognize the value of having their sites highly ranked and visible in search engine results, creating an opportunity for both "white hat" and "black hat" SEO practitioners. Indeed, by 1996, email spam could be found on usenet touting SEO services. The earliest known use of the phrase "search engine optimization" was a spam message posted on Usenet on July 26, 1997.
At first, search engines were supplied with information about pages by the webmasters themselves. Early versions of search algorithms relied on webmaster-provided information such as the keyword meta tag, or index files in engines like ALIWEB. Meta-tags provided a guide to each page's content. But indexing pages based upon meta data was found to be less than reliable, mostly because webmasters abused meta tags by including keywords that had nothing to do with the content of their pages, to artificially increase page impressions for their Website and increase their Ad Revenue. Cost Per Impression was at the time the common means of monetizing content websites. Inaccurate, incomplete, and inconsistent meta data in meta tags caused pages to rank for irrelevant searches, and fail to rank for relevant searches. Search engines responded by developing more complex ranking algorithms, taking into account additional factors including:
- Text within the title element
- Domain name
- URL directories and file names
- HTML tags: headings, emphasized (<em>) and strongly emphasized (<strong>) text
- Term frequency, both in the document and globally, often misunderstood and mistakenly referred to as Keyword density
- On page keyword proximity
- On page keyword adjacency
- On page keyword sequence
- Alt attributes for images
- Text within NOFRAMES tags
- Web content development
- Sitemaps
There are no major search engines that consider meta keywords in their ranking algorithms these days, the way that Altavista did in the late 90s. The value of meta keywords are, however, not readily known because of the secrecy used during the ranking of pages by the search engines. One could recommend the use of meta keywords in webpages, but there may be little value in doing so. However, some sites continue to use them. The "description" tag is, however, claimed by most SEO-experts to be more important and is recommended by Yahoo! in their search indexing help page.
Web content providers also manipulated a number of attributes within the HTML source of a page in an attempt to rank well in search engines.
By relying so much upon factors exclusively within a webmaster's control, early search engines suffered from abuse and ranking manipulation. To provide better results to their users, search engines had to adapt to ensure their SERPs showed the most relevant search results, rather than unrelated pages stuffed with numerous keywords by unscrupulous webmasters. This led to the rise of a new kind of search engine.
Sophisticated Search Engine Ranking Algorithms
Google brought a new concept to evaluating web pages. This concept, called PageRank, has been important to the Google algorithm from the start. PageRank is an algorithm that weights a page's importance based upon the quantity and quality of incoming links. PageRank estimates the likelihood that a given page will be reached by a web user who randomly surfs the web, and follows links from one page to another. In effect, this means that some links are more valuable than others, as a higher PageRank page is more likely to be reached by the random surfer.
The PageRank algorithm proved very effective, and Google began to be perceived as serving the most relevant search results. On the back of strong word of mouth from programmers, Google became a popular search engine. Off-page factors such as PageRank and hyperlink analysis were considered as well as on-page factors to enable Google to avoid the kind of manipulation seen in search engines focusing primarily upon on-page factors for their rankings.
Despite being difficult to game, webmasters had already developed link building tools and schemes to influence the Inktomi search engine, and these methods proved similarly applicable to gaining PageRank. Many sites focused on exchanging, buying, and selling links, often on a massive scale.
Inktomi, an earlier search engine using similar off-page factors, had forced webmasters to develop link building tools and schemes to influence searches; these same tools proved applicable to Google's PageRank system. Thus an online industry spawned focused on selling links designed to improve PageRank and link popularity. To drive human site visitors, links from higher PageRank pages sell for more money.
A proxy for the PageRank metric is still displayed in the Google Toolbar, though the displayed value is rounded to the nearest integer, and the toolbar is believed to be updated less frequently than the value used internally by Google. In 2002 a Google spokesperson stated that PageRank is only one of more than 100 algorithms used in ranking pages, and that while the PageRank toolbar is interesting for users and webmasters, "the value to search engine optimization professionals is limited" because the value is only an approximation. Many experienced SEOs recommend ignoring the displayed PageRank.
Google — and other search engines — have, over the years, developed a wider range of off-site factors they use in their algorithms. The Internet was reaching a vast population of non-technical users who were often unable to use advanced querying techniques to reach the information they were seeking and the sheer volume and complexity of the indexed data was vastly different from that of the early days. Combined with increases in processing power, search engines have begun to develop predictive, semantic, linguistic and heuristic algorithms. Around the same time as the work that led to Google, IBM had begun work on the Clever Project, and Jon Kleinberg was developing the HITS algorithm.
As a search engine may use hundreds of factors in ranking the listings on its SERPs; the factors themselves and the weight each carries can change continually, and algorithms can differ widely, with a web page that ranks #1 in a particular search engine possibly ranking #200 in another search engine, or even on the same search engine a few days later.
Google, Yahoo, Microsoft and Ask.com do not disclose the algorithms they use to rank pages. Some SEOs have carried out controlled experiments to gauge the effects of different approaches to search optimization. Based on these experiments, often shared through online forums and blogs, professional SEOs attempt to form a consensus on what methods work best, although consensus is rarely, if ever, actually reached. SEO-focused communities are, in some respects, anti-collaborative, as the very nature of SEO requires establishing a significant competitive advantage over other practitioners. For this reason, those disclosing the greatest number of tips and algorithmic nuances are rarely the most skilled. As the community selects against full disclosure, due to market pressure, the information available to the public should not be interpreted as anything but the most well-known and historically-known practices.
SEOs widely agree that the signals that influence a page's rankings include:
- Keywords in the title tag.
- Keywords in links pointing to the page.
- Keywords appearing in visible text.
- Link popularity.
- PageRank of the page (for Google).
- Keywords in Heading Tag H1,H2 and H3 Tags in webpage.
- Linking from one page to inner pages.
- Placing punch line at the top of page.
There are many other signals that may affect a page's ranking, indicated in a number of patents held by various search engines, such as historical data.
More Than Just Search Engine Algorithms
Search engine optimization often involves more than just rankings. By improving the quality of a page's search listings, more users will select that page. Factors that may improve search listing quality include good copy writing such as an attention-grabbing title, an interesting description and a domain and URL that reinforce the legitimacy of the site. Some commentators have noted that domains with lots of hyphens look spammy and may discourage click throughs.
Relationship Between SEO and Search Engines
The first mentions of Search Engine Optimization do not appear on Usenet until 1997, a few years after the launch of the first Internet search engines. The operators of search engines recognized quickly that some people from the webmaster community were making efforts to rank well in their search engines, and even manipulating the page rankings in search results. In some early search engines, such as Infoseek, ranking first was as easy as grabbing the source code of the top-ranked page, placing it on your website, and submitting a URL to instantly index and rank that page.
Due to the high value and targeting of search results, there is potential for an adversarial relationship between search engines and SEOs. In 2005, an annual conference named AirWeb was created to discuss bridging the gap and minimizing the sometimes damaging effects of aggressive web content providers.
Some more aggressive site owners and SEOs generate automated sites or employ techniques that eventually get domains banned from the search engines. Many search engine optimization companies, which sell services, employ long-term, low-risk strategies, and most SEO firms that do employ high-risk strategies do so on their own affiliate, lead-generation, or content sites, instead of risking client websites.
Some SEO companies employ aggressive techniques that get their client websites banned from the search results. The Wall Street Journal profiled a company that allegedly used high-risk techniques and failed to disclose those risks to its clients. Wired reported the same company sued a blogger for mentioning that they were banned. Google's Matt Cutts later confirmed that Google did in fact ban Traffic Power and some of its clients.
Some search engines have also reached out to the SEO industry, and are frequent sponsors and guests at SEO conferences and seminars. In fact, with the advent of paid inclusion, some search engines now have a vested interest in the health of the optimization community. All of the main search engines provide information/guidelines to help with site optimization: Google's, Yahoo!'s, MSN's and Ask.com's. Google has a Sitemaps program to help webmasters learn if Google is having any problems indexing their website and also provides data on Google traffic to the website. Yahoo! has Site Explorer that provides a way to submit your URLs for free (like MSN/Google), determine how many pages are in the Yahoo! index and drill down on inlinks to deep pages. Yahoo! has an Ambassador Program and Google has a program for qualifying Google Advertising Professionals.
Getting Into Search Engines' Databases
Today's major search engines, by and large, do not require any extra effort to submit to, as they are capable of finding pages via links on other sites.
However, Google and Yahoo offer submission programs, such as Google Sitemaps, for which an XML type feed can be created and submitted. Generally, however, a simple link from a site already indexed will get the search engines to visit a new site and begin spidering its contents. It can take a few days or even weeks from the acquisition of a link from such a site for all the main search engine spiders to begin indexing a new site, and there is usually not much that can be done to speed up this process.
Once the search engine finds a new site, it uses a crawler program to retrieve and index the pages on the site. Pages can only be found when linked to with visible hyperlinks. However, some search engines, such as Google, are starting to read links created within Flash.
Search engine crawlers may look at a number of different factors when crawling a site, and many pages from a site may not be indexed by the search engines until they gain more PageRank, links or traffic. Distance of pages from the root directory of a site may also be a factor in whether or not pages get crawled, as well as other importance metrics. Cho et al. described some standards for those decisions as to which pages are visited and sent by a crawler to be included in a search engine's index.
A few search engines, such as Yahoo!, operate paid submission services that guarantee crawling for either a set fee or CPC. Such programs usually guarantee inclusion in the database, but does not guarantee specific ranking within the search results.
Blocking robots
Webmasters can instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain. Additionally, a page can be explicitly excluded from a search engine's database by using a meta tag specific to robots.
When a search engine visits a site, the robots.txt located in the root directory is the first file crawled. The robots.txt file is then parsed, and will instruct the robot as to which pages are not to be crawled. As a search engine crawler may keep a cached copy of this file, it may on occasion crawl pages a webmaster does not wish crawled.
Pages typically prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches.
Search Engine Optimization Techniques
Some industry commentators classify search engine optimization methods, and the practitioners who utilize them, as either "white hat SEO", or "black hat SEO". Many SEO consultants approach the black and white hat dichotomy as a convenient but unfortunate, and misleading, over-simplification that makes the industry look bad as a whole. This labeling puts SEO techniques into two broad categories: techniques that search engines recommend as part of good design, and those techniques that search engines do not approve of and attempt to minimize the effect of, also referred to as spamdexing. The comparison of white hat to black hat (spamdexing) methods is analogous to "positioning" compared to "guerilla marketing", with the latter spoiling the reputation of marketing as a whole. Most reputable SEO consultants do not offer spamming and spamdexing techniques amongst the services that they provide to clients.
Preferred "White Hat" Methods
A search engine optimization tactic, technique or method is considered "White Hat" if it conforms to the search engines' guidelines and/or involves no deception. This is an important distinction to note because the search engine guidelines are not written as a series of rules or commandments. Instead, they are written merely as "guidelines". However, White Hat SEO is not just about following guidelines, but is also about ensuring that the content a search engine indexes and subsequently ranks is the same content a visitor will see.
White Hat advice is generally summed up as creating quality content for users, not for search engines, and then making that content easily accessible to the search engine spiders, rather than "game" the system. White Hat SEO is in many ways similar to website development that promotes accessibility, although the two are not identical.