I've today learned about Yioop!, an open source PHP search engine and crawler that is rather new.

My interest in PHP search engines came about when I discovered Sphider, an older PHP search engine that hasn't had a release since 2013 but was interesting to play around with.

PHP is not highly regarded as a language of search engines and crawlers, but PHP is accessible, it's everywhere, so it's fun to see these search engine and crawler projects using PHP.

Yioop! is written in PHP. It is GPLv3'd and its hardware requirements are low. It just needs an Apache web server with PHP greater than 5.4.

Features

Of course Yioop can search. The parent site, SeekQuarry, tells all about its search features, and about its crawler which can not only crawl regular web pages, but also archive formats like Wikipedia XML or Open Directory Project-RDFs, and email or database dumps. It supports crawls using different data sources. 

There's a news update service that can index newsfeeds on an hourly basis so that Yioop can provide fresh results.

It also has some features that I don't completely understand, but which make me curious and which I will learn of when I soon install Yioop. "Yioop can be configured to allow users to create discussion groups, blogs, and wikis", the Yioop site says. Also, "Yioop's wiki mechanism can be used to build websites."

I'm really curious to install this and maybe fire up an old box to crawl 24/7. I used to crawl using another crawler (I forget which one), and it was always fun to check it in the morning to see what the crawler found and how many pages it had crawled overnight.

Trying a live instance of Yioop!

There's a live instance of Yioop! available for you to run searches on at Yioop.com.

Interesting to me is that on Yioop.com 1,024,370,486 pages are indexed. The owner of Yioop.com blogs about his crawls and his goals to achieve higher numbers of pages crawled with his crawls.

The search itself was slow. It took over 17 seconds to return my search for "php scripts". And I wonder what factors lead to such a great amount of time to return the search and whether there are software optimizations that could result in a sub second return of results.

But I did notice that the same search performed a second time gave me results in 0.04256 seconds, so there does seem to be some caching going on. 

I do like Yioop's presentation as a search engine though. The results give a lot of information next to each link. There's a link to the cached page, to similar pages, to inbound links, and the IP address of the server hosting each site is listed and you can click on that to find other pages on the same IP.

Yioop also features image search. And I don't think it's as advanced so much that there's any image analysis involved, like the type of image search that Google provides, but it was interesting to see an image search feature on a lesser known PHP search engine. There's also the ability to search for videos.

Final thoughts

PHP isn't normally thought of as the language of choice for a search engine, but Yioop looks pretty good. I'm interested in trying it out and doing some crawling just for fun. With its ability to index different types of data, it might be useful for a lot of purposes.