Let’s take a look at the different elements
The crawler is like a spider; it tries to connect the entire internet into one big web. The crawler visits and reads a page and collects the information it contains. It then discovers links on the page, and starts visiting and reading those pages, continuing to collect information from all the pages it reads.
The crawler regularly returns to the page to see if something has changed. Crawling around the internet takes time, and how often the crawler returns to a page can vary.
The crawler is also known as a bot, and has to act in accordance with the terms set by the sites it visits. After reading a page the crawler stores the pages it reads in the search engine index.
Example: the crawler visits a page with a chocolate cookie recipe, and reads all the ingredients and the method for making and baking the cookies. This page has three links to other chocolate cookie recipes, and once the crawler discovers them it visits those three pages, looking for ingredients and step by step instructions. The crawler now knows four chocolate cookie recipes.
All the pages visited by the crawler are then stored in the search engine index. The index is like a very big book containing a lot of information. The information from the pages visited is stored and organized based on the content of the pages: things like how many times a word is on the page, if it is in the title, when it was published and how many links the page has are also recorded.
Example: Back to the chocolate cookie recipe – the index now knows the relation between the cookie recipes, that they all need to have added sugar, and that one of them is made with delicious dark chocolate chunks. It also knows when they came online, and if they have been changed.
Every real search engine has its own way of calculating what pages are the most important to a user searching for something. Things the algorithm looks at are; the page title, links on the page, links to the page, creation date, but also things like the user’s preferred language and geographical location can be used to rank the results.
These two pages are what most people think of when they say search engine. A few of the big ones would be Google, Bing, Yahoo, Yandex and Baidu. If you know of others please share them in the forum.
When a word is typed into a search engine, that word, also known as a query, it is sent back to the index, and the search engine starts looking for pages containing that specific word. The so-called ranking algorithm now determines what pages it finds most relevant and then presents those in the search engine result page.
This means the ranking algorithm decides what a user sees as the results. Some search engines, like Google, have a number of elements recorded, not only about the pages but also about the users’ previous searches, browsing history, what videos the user watched and what ads the user has seen and clicked. Adding this to the ranking means that not everybody gets the same result, even if they searched for the same thing.
Example: You know the index has four pages with recipes. So what happens when you search for “chocolate cookie recipe”? Well, as expected all the recipe pages are shown in the results.
Looking for “chocolate cookie recipe with chunks” will show all of the recipes, but the one with chunks will be at the top because the algorithm guesses it is more relevant due to the “chunks”.
If you search for “chocolate chunks” the recipe with delicious dark chocolate chunks will be shown, but not the other recipes since they don’t have chunks in them.
A search engine like Google saves your search history, location, device and many more things about you – this all influences the result of your search results. This information is often sold to the advertisers using an ad network.
The fact that you have a profile and your behaviour is well known by the search engines gives you the results the search engine thinks are relevant for you. This will make the search result look very relevant. Since 2009 Google has been personalising the results you see, based on you behaviour and search history, (Official Google announcement about personalised search). This type of personalisation is by critics often called a “Filter Bubble” (Wikipedia)
Example: When you have been searching for “Chocolate Cookies” the search engine will also know you are interested in chocolate cookies, they will also know your age, your location, your device information. And due to their advertising platform your local baker and the fitness gym can target advertisements at you based on your behaviour and location.
A private search engine will let you search for free – free as in not saving your search history, not using cookies or collecting your behaviour to make it possible to correlate it with your other activities online.
Metasearch engines can also offer private searches. One big difference from a “real” search engine is they often buy and collects results from a third-party index. This is in many cases the big “real” search engines in combination with other sources.
findx is a free search engine from Europe; it has its own crawler and its own index and promises not to share your history or behaviour – with the way findx is built it is not even possible to do it. When you send your search to the index, findx will not learn anything about you – and cannot tell anyone that you like those delicious big chunks of chocolate in your cookies.