Indexing only the valuable, non-spam pages of the internet keeps us on our toes! Millions of spammy pages are created ad hoc, and their shady owners try to game the system with all sorts of techniques to make these pages rise to the top of the search results.
We’re sure you’ve had a spam email or two in your inbox. Or, if you have published your email address on the web, perhaps you’ll be inundated with tens or hundreds of spam emails each day. For emailed spam automatic filters help reduce the amount of spam that gets through to you. And of course, over time you learned how you yourself can recognise what is a spam or phishing email and delete or report it to your local authorities.
As Findx is a new search engine and is building its search index from scratch with its crawler, the Findxbot, it was a small learning curve for us make our crawler identify web spam, and to index just the ‘valuable’ pages. When the Findxbot looks at a page, it adds the links it finds on that page to its indexing queue. It is a program so it can’t be subjective – it must use fixed criteria to decide whether a site is spammy or not. So, when building a crawler based search engine, avoiding spam sites is impossible, but we decided to take some actions against the spammers.
Now that we have been growing the Findx index for a while, we could analyze how well the Findxbot was performing. Which types of sites add too many spam links to the indexing queue?
We want to make it clear – we aren’t censoring porn sites based on any moral reasons. We know that many people prefer private search engines for … hrmm “private” searches, including porn – that was at least the feedback we saw in our Reddit AMA!
We made the decision to not index adult sites to help the Findxbot prioritise its resources so it can build a quality index, as free of adult related web-spam as possible and to have a better foundation to provide relevant and useful search results.
So, we have deleted the overwhelming number of adult websites from the Findx index, and blocked them they are now no longer indexed. The block list is based on our own adult content filter, but we have for now, decided not to share the specific details of the block list.
We have had many discussions about “When does a search engine become a censor?” Do you think we are doing that now, and is this change OK for you? Let us know if you have questions or concerns by leaving a note for us in our forum.
Our algorithm and the way we classify adult content now prevents of a lot of unwanted content, but it’s a computer so it misses the boundary cases, and you are so much better than a computer!
If you do find a spammy website or one that is displaying ‘mature content’ in your Findx search results, please report the site using the quality rating feature.
Help us make Findx as useful for you as possible by giving us your feedback.