What is the diference between indexing and crawling?

zinavo · 11-10-2018, 12:51 PM

What is the difference between indexing and crawling?

zeus · 11-12-2018, 04:32 AM

When a search engine bot or crawler first comes to your site checking your content and so on. Its called crawling and when they got indexed in the search engine databases so for people find your page its called indexed.

wittycookie · 11-12-2018, 08:39 AM

Crawling and indexing are two particular things and this is generally misconstrued in the SEO business. Crawling implies that Googlebot takes a gander at all the content/code on the page and analyzes it. Indexing implies that the page is qualified to appear in Google's search results. They aren't commonly comprehensive.

digitalprem · 11-14-2018, 06:35 AM

Crawling and indexing are two distinct things and this is commonly misunderstood in the SEO industry. Crawling means that Googlebot looks at all the content/code on the page and analyzes it. Indexing means that the page is eligible to show up in Google’s search results. They aren’t mutually inclusive.

We look at it as if Googlebot were a person who is a tour guide, and he’s walking down a hallway that has many closed doors. If Google is allowed to a crawl a page (a room), he can open the door and actually look at what’s inside (crawling). Once inside the room, there might be a sign that says he’s allowed to show people the room (able to index; the page shows up in SERPs), or the sign might say that he’s not allowed to show people the room (“noindex” meta robots tag; the page was crawled since he was able to look inside, but will NOT show up in SERPs since he’s instructed not to show people the room). If he’s blocked from crawling a page (let’s say there’s a sign on the outside of the door that says “Google, don’t come in here”), then he won’t go inside and look around, and because of that fact, he doesn’t know whether or not he’s supposed to show people the room because those instructions are actually inside of the room. So he won’t look inside the room but he’ll still point out the room (index) to people and tell them they can go inside if they want. Even if there’s an instruction on the inside of the room telling him not to let people go to the room (“noindex” meta robots tag), he’ll never see it since he was instructed not to go into the room in the first place.
So blocking a page via robots.txt means it IS eligible to be indexed, regardless of whether you have an “index” or “noindex” meta robots tag within the page itself (since Google won’t be able to see that because it’s blocked from crawling, so by default it treats it as indexable). Of course, this means that the page’s ranking potential is lessened (since it can’t actually analyze the content on the page, therefore the ranking signals are all off-page + domain authority). If you’ve ever seen a search result where the description says something like “This page’s description is not available because of robots.txt”, that’s why.

Crawling, Indexing, and Ranking

When SEOs ask the question if Googlebot can crawl JavaScript, we tend to think the answer is ‘yes’. Because Google does actually render JavaScript, and extracts links from it, and ranks those pages. So does it really matter that it’s not the crawler that handles JavaScript, but the indexer? Do we really need to know that different processes handle different things if the outcome is that Google ranks JavaScript pages?

Yes, actually. We do need to know that.

Despite the incredible sophistication of Googlebot and Caffeine, what JavaScript content actually does is make the entire process of crawling and indexing enormously inefficient. By embedding content and links in JavaScript, we are asking – nay, demanding – that Google puts in the effort to render all our pages.

Which, to its credit, Google will actually do. But that takes time, and a lot of interplay between the crawler and indexer.

And, as we know, Google does not have infinite patience. The concept of ‘crawl budget’ – an amalgamation of different concepts around crawl prioritization and URL importance (Dawn Anderson is an expert on this) – tells us that Google will not try endlessly to crawl all your site’s pages. We have to help a bit and ensure that the pages we want to be crawled and indexed are easily found and properly canonicalized.

JavaScript = Inefficiency

What JavaScript frameworks do, is inject a layer of complexity in to this equation.

What should be a relatively simple process, where the crawler finds your site’s pages and the indexer then evaluates them, becomes a cumbersome endeavor. On JavaScript sites where most or all internal links are not part of the HTML source code, in the first instance the crawler finds only a limited set of URLs. It then has to wait for the indexer to render these pages and extract new URLs, which the crawler then looks at and sends to the indexer. And so on, and so forth.

With such JavaScript-based websites, crawling and indexing becomes slow and inefficient.

What this also means is that the evaluation of a site’s internal link graph has to happen again and again as new URLs are extracted from JavaScript. With every new set of pages the indexer manages to pry from the site’s JavaScript code, the internal site structure has to be re-evaluated and a page’s relative importance is changed.

This can lead to all kinds of inefficiencies where key pages are deemed unimportant due to a lack of internal link value, or relatively unimportant pages are seen as high value because there are plain HTML links pointing to it that don’t require JavaScript rendering to see.

And because pages are crawled and rendered according to their perceived importance, you could actually see Google spending a lot of time crawling and rendering the wrong pages and spending very little time on the pages you actually want to rank.

RH-Calvin · 11-15-2018, 12:29 PM

Crawling is the process or reading through your webpage source by search engine spiders. They provide a cache certificate after a successful crawl. Indexing is updating the cached webpages in search engine database. Indexed webpages are now ready for search engine rankings.

sanjaytech · 11-23-2018, 12:44 PM

-Crawling or spidering is a term used when Google, or another search engine, sends its bot to a web page or web post and "reads" the page. Crawling is the first part of having a search engine recognize your page and show it in search results.
-A page is indexed by Google after the completion of crawling. It just analysed the content and meaning then it will be stored in the Google database.

yuva12 · 11-23-2018, 01:10 PM

Crawling means that Googlebot looks at all the content/code on the page and analyzes it.
Indexing means that the page is eligible to show up in Google's search results.

jonathan brown · 12-18-2018, 08:37 AM

Hi,

Crawling:
Crawling (or spidering) is when Google (or other search engines) send a bot to a web page or web post and “read” the page. Don’t let this be confused with having that page being indexed. Crawling is the first part of having a search engine recognize your page and show it in search results. Having your page crawled, however, does not necessarily mean your page was indexed and will be found. Pages are crawled for a variety of reasons, and the most common is having an XML sitemap that Google reads and then points to your new page.

Indexing:
Having your page Indexed by Google is the next step after it gets crawled. As stated, it does not mean that every site that gets crawled get indexed, but every site indexed had to be crawled. If Google deems your new page worthy to be used, then Google will index it. After your page is indexed, Google then comes up with how your page should be found in their search.

SearchEngineLabs · 12-19-2018, 06:10 AM

Basically the Search Engine Works on 3 steps:
1) Crawling
2) Indexing
3) retrieving

Crawling is nothing but the google crawlers or google bots come to our website and read all the websites and url's in our website.

Indexing is nothing but making copy of all the url's and all the pages in our website.

Retrieving means save all the data that is indexed in the database of the google.



Remember me