My earlier Solr and TYPO3 tutorial was really popular. This is fresher and for the most recent versions of TYPO3 and the EXT:solr extension at the time of publication. All of the instructions are complete but I will continue to update this guide with images and corrections if necessary. Please comment below if you have any questions or corrections.

Introduction

Last year I set up Solr on my web site with the EXT:solr extension for TYPO3. I was just learning about Solr at the time, and to help myself and others I wrote a tutorial that showed how to set up Solr and configure it to work with TYPO3 with the help of the wonderful EXT:solr extension. 

The solr extension for TYPO3 (EXT:solr) is really amazing. It's more than just a simple solr interface. It handles the schema needed to properly set up the Solr cores, and it handles all of the indexing operations. It handles the display of search results, and a lot more. The extension is the magic ingredient. TYPO3 is very fortunate to have the EXT:solr plugin. I've noticed that other CMS don't have such good options for Solr integration. Once you get it all set up, it works great. Top notch search for a top notch CMS. You will see that the results you get from searching with Solr are much better than what you get from the indexed_search extension.

Since there is a newer version of TYPO3 since my last tutorial on this subject, and there's a newer version of the EXT:solr extension, I offer here a new tutorial that will show you how to take a plain Ubuntu 16.04 server running TYPO3, and add an amazingly good search system based on Solr.

Prerequisites

Here's what kind of system I used to make this guide:

  • Ubuntu 16.04 
  • PHP 7.3
  • TYPO3 9.5.5 with The Official Introduction Package installed at the time of TYPO3 installation

For the best results, try this guide with the above configuration. This guide will be easy for you if you have experience with the Linux command line and a good idea of where different files and directories are on a Linux system. Of course, TYPO3 experience is required.

Install Java

The first thing to do in this tutorial is to install Java. I always use Oracle's Java. I'm sure you can use the OpenJDK, but in this tutorial we will install Oracle Java 11 from the Linux Uprising PPA.

The commands to install Oracle Java 11 are as follows:

apt-get install software-properties-common

sudo add-apt-repository ppa:linuxuprising/java
sudo apt update
sudo apt install oracle-java11-installer

Log out of your terminal session and log back in again

At this point the $JAVA_HOME environment variable won't be set until you log out and then log back in again. So just log out of your terminal session and then SSH back into your server. You'll be able to then type in echo $JAVA_HOME and you'll see the correct path to Java. You'll see something like:

root@yourserver:~# echo $JAVA_HOME
/usr/lib/jvm/java-11-oracle

It wasn't until June 24, 2019 that I realized I left this step out of this tutorial. I apologize to anyone who wasn't able to get the rest of this tutorial to work because this step is needed before you can install Solr in the next step. Sorry. Also, I think there's a command you can type to activate the $JAVA_HOME environment variable settings without logging out and logging back in again, but I forgot what that command is, so for now just log out and log back in again before proceeding to the following steps.

Install Solr

The most recent version of Solr that is supported by the EXT:solr extension is Solr version 7.6.0.  I found that version compatibility infomation here in the documentation for the extension.  I seem to recall that the EXT:solr version will work with newer 7.x.x versions of Solr, but for this tutorial we will use Solr 7.6.0. 

Run the following commands to install Solr as a service on your server:

wget archive.apache.org/dist/lucene/solr/7.6.0/solr-7.6.0.tgz
tar xzf solr-7.6.0.tgz solr-7.6.0/bin/install_solr_service.sh --strip-components=2
sudo bash ./install_solr_service.sh solr-7.6.0.tgz

Now Solr will be installed and running as a service on your system. Solr will start every time your system boots. 

If you'd like to see the Solr admin area you can visit http://example.com:8983/solr/ in your browser, replacing "example.com" with your domain or ip address. The Solr admin area is not password secured by default. I may write about securing the Solr admin page in another article.

Install EXT:solr

From the backend of your TYPO3 web site, install the extension solr from the Extension module. The scheduler extension will be installed automatically as a dependency.

Prepare Solr for TYPO3

I don't have an in depth knowlege of Solr, but I know that Solr needs a special configuration that will allow it to work with the TYPO3 EXT:solr extension. That's what this step is about.

Luckily the EXT:solr extention comes with the configuration and all you have to do is copy it to your Solr installation.

Run the following two commands. They will copy configuration that comes with EXT:solr into your Solr installation:

sudo cp -r /home/example/public_html/typo3conf/ext/solr/Resources/Private/Solr/configsets /var/solr/data
sudo cp /home/example/public_html/typo3conf/ext/solr/Resources/Private/Solr/solr.xml /opt/solr/server/solr/solr.xml

You will need to modify the commands above so the path to your TYPO3 installation is correct.

Create a Solr core for the main language of your web site

In my previous Solr + TYPO3 tutorial I showed how to set up Solr cores for all three languages that come configured with The Official Introduction Package for TYPO3. With TYPO3 9.5.5 and the latest version of EXT:solr (9.0.2), the method of configuring TYPO3 and Solr for multiple languages no longer works. So right now in this tutorial we will only be configuring Solr and TYPO3 for the main language of the TYPO3 web site. If I learn how to get multiple languages working with these most recent versions of TYPO3 and the solr extension, I will update this tutorial.

The main language of my web site is English so I am going to create a Solr core for the English language.

You'll need to have curl installed for this step. Go ahead and install it like this:

sudo apt install curl

Now here's how to create an English language core by sending the request directly to Solr with curl:

curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=core_en&configSet=ext_solr_9_0_0&schema=english/schema.xml&dataDir=../../data/english"

That's for an English core. If the main language of your site is different than English, then you would modify the command above accordingly. For instance, if your sites main language is German (Deutch), then you could create a German core with the following command:

curl "http://localhost:8983/solr/admin/cores?action=CREATE&name=core_de&configSet=ext_solr_9_0_0&schema=german/schema.xml&dataDir=dataDir=../../data/german"

You could create additional cores as well, for as many languages you have. But as I mentioned before, this tutorial will only focus on configuring one main core for use with TYPO3 and EXT:solr.

 

Include the Static Templates for EXT:solr in Your Site Template

From the WEB->Template module of your TYPO3 site, edit the main template of your TYPO3 site to include the following two static templates that come with the EXT:solr extension.

  • Search - Base configuration (solr)
  • Search - Default stylesheets (solr)

Among other things, the first static template adds the following typoscript configuration:

plugin.tx_solr {
    solr {
        scheme = http
        host = localhost
        port = 8983
        path = /solr/core_en/
        username =
        password =
    }
}

You don't have to add any of that information because it's already loaded by the static template. But if your solr server was on a different host, then this is what you would need to modify in your constants. Notice that you can also add a username and password if your Solr server requires it. But in this tutorial, you won't need to do any of that.

Also if your main language is not English you would add some typoscript in the constants section of your template. For instance, if your language is German and you created a German core, you would add the following in the Constants section of your template:

plugin.tx_solr.solr.path = /solr/core_de/

But if you've done everything exactly the way I've explained in this tutorial you don't need to add anything.

Add a Domain Record to the Root Page of Your Web Site

EXT:solr requires that you have a domain record for your site. From the "list" module add a domain record to the root page of your site.

Initialize the Solr Connections

Now that everything is configured, you're almost ready. You'll now need to initialize the Solr connections. You'll do this by clicking on the "lightening bolt" icon on the top right side of the TYPO3 backend page and where a menu opens up you will see the menu item called "Initialize Solr connections". Click on that and TYPO3 will connect to Solr.

If you don't see the "Initialize Solr connections" menu item, try logging out of TYPO3 and then logging back in again.

Now check to make sure everything is working properly. Under the System section of the TYPO3 administration area, click on the "Reports" module. Then click on the "Status Report" link. Scroll down to the "Solr" section of the reports. If the background color of all of the Solr related information is green, then that means everything is set up properly. If there is a red background for anything that is related to Solr, then that probably means there are problems with your setup. 
 

Add Page Records to the Index Queue

From the APACHE SOLR module of the backend, open the Index Queue module. This is where you will tell the EXT:solr plugin what to index.

Under the heading Index Queue Initialization select the checkbox next to where it says "pages".

Now click the button that says Queue Selected Content for Indexing. The page records will then be added to the index queue. The index queue tells the EXT:solr extension what needs to be indexed.

Create a scheduled task to process the index queue

Now go to the Scheduler module that can be found at SYSTEM->Scheduler from the TYPO3 backend.

Create a new scheduled task by clicking on the "+" sign at the top of the page. For the "Class" field of the form, select Index Queue Worker. You will receive an error message when you save if you don't enter a Frequency, so you can enter "1"  for the frequency. For the "Number of Documents to Index" set it to 100. Everything else can be left as the default values. Click "Save".

Now go back to the Scheduler module. Find the task you created and run it by clicking the play button icon on the right side of the entry. 

It will take a while for TYPO3 to index the pages in the queue. While you're waiting you could load your Solr Admin page in a browser and look at the Overview section for your Solr core. While refreshing the page you should see the number of indexed documents increase while the scheduler task is running. Loading the APACHE SOLR -> Index Queue module in another browser tab is another good place to check up on the progress of indexing. It will show a progress bar that tells you how much of the site has been indexed.

It is also possible to index the site using the APACHE SOLR -> Index Queue module by clicking the "Index now" button underneath the Index Queue Status bar. But it is important that you get your site working with indexing through the Scheduler extension which will allow you to automate indexing.

If I've ever had problems setting up Solr with TYPO3 it has been with this step. Oftentimes the scheduler task would never complete, or it would result in an "Internal Server Error". What I learned was that this was mostly due to low PHP resource limits or possibly that the server specs were too low on RAM or processing power. On a decent system with PHP settings set to what is required by TYPO3, I usually don't run into any problems.You may find that you receive errors when indexing, or that the script times out. If so, you may need to adjust your PHP max_execution_time or other variables accordingly. For this tutorial I had the memory_limit set at 256M and max_execution_time at 240 and everything worked well.

Normally, you will have a cron job run the scheduler task, but right now we will let it run manually. If the script times out because your php_execution_time variable is set to low, you could just run the task again. I'll explain below how to set up a cron job to run the scheduler task to run on a schedule so that your pages will be indexed automatically.

When the progress bar for the scheduler task reaches 100% that means your site is indexed and you can move on to the next step.

If everything is working as it should, you will be able to visit your Solr administration URL at http://YourServerAddress:8983/solr/#/core_en (substitute core_en  with a different core if your site isn't in English) and see that there are some documents indexed. 

Add a search form to your site and search your site

Now that you have your web site indexed, you can use the plugin included with EXT:solr to include a search form on your site. The same plugin will also display search results.

Do this by creating a new page under the root page of your site (This is the page named "Congratulations")  and name your new page "Search". This will be your search page where you will insert the plugin that displays the search form and the search results. Make sure the Search page is unhidden so that it will be visible in the front end of the web site.

On your new Search page, create a new content element. When you are viewing the "New content element" flexform, select the tab that says "Plugins" and then insert the element titled "General Plugin". Then go to the "Plugin" tab of the flexform and under the dropdown menu titled, "Selected Plugin", select the item labeled "Search: Form, Result, Additional Components".

Then just click the "Save" button and when you visit your Search page in the frontend you'll see a search form and a button to submit your query. Try searching for "content elements".

Setup TYPO3 to index pages automatically

You already created a scheduler task, but you have so far run it manually to index pages.

Instead of visiting the Scheduler module every time you want to index new pages on your TYPO3 web site, you can set up a cron job that will index new pages automatically as frequently as you desire. That means that any time you create a new page on your site, it will be added to the Index Queue and the scheduler task will automatically index the new pages at regular times.

There is a special command you need to have the cron job run. Based on my TYPO3 install location it looks like what you see below. You must modify it according to your TYPO3 install location.

/home/typo3andsolr/public_html/typo3/sysext/core/bin/typo3 scheduler:run --task=1

Notice the end part that says --task=1. This simply means that the scheduler script will run the task with scheduler task UID = 1. Since you started with a bare install of the Official Introduction Package, the number for your task should be "1", but if you setup the task on a TYPO3 site with existing scheduler tasks, you will need to replace "--task=1" in your cron command with the correct number of the task corresponding to the task that processes the index queue.

Once you have set up a cron job to run your Index Queue Worker task automatically, you can create some new pages and they will be automatically indexed when your cron runs.