Getting Started with Solr/Lucene on Windows Azure
In this tutorial we showcase how to configure and host
Solr/Lucene in Windows Azure using multi-instance replication for index-serving
and single-instance for index generation with a persistent index mounted in
Azure storage. Typical scenarios we address with
this sample are commercial and publisher sites that need to scale the traffic
with increasing query volume and need to index maximum 16 TB of data and require
couple of index updates per day.
Pre-requisites
Before you begin working with Solr/Lucene on Windows Azure
there are a few things you will need to have in place on your development system.
Please note that the Windows Azure platform runs on 64bit hardware. As such
all the applications developed for Windows Azure should be created, packed and
deployed from a 64bit platform with the appropriate 64bit versions of software.
Windows Azure SDK
To take advantage of the Windows Azure platform you will
need to install the Windows Azure SDK. This SDK gives you access to features
such as local development emulators and the packaging tools.
The Windows Azure SDK can be found at
http://www.microsoft.com/windowsazure/sdk/
Create a Windows Azure subscription
If you do not already have a Windows Azure subscription
you will need to create one. A free 90 day trial can be obtained at
http://www.microsoft.com/windowsazure/free-trial/
.
Create a Windows Azure Storage Account
After you have a subscription you will need to create a
Windows Azure Storage Account.
-
Login to your Windows Azure Portal:
http://windows.azure.com
-
Click New Storage Account on the toolbar after logging
in

-
In the dialog that pops up you will need to fill in your subscription
and the URL. We recommend selecting and creating an affinity group. This
will help group your services into one location and help optimize for speed.

Create a Windows Azure Hosted Service Account
In order to run a site on Windows Azure you will need to
create a new Hosted Service. This can be done quite simply with the following
steps.
-
Login to your Windows Azure Portal or
http://windows.azure.com click on Home
if already signed in
Click New Hosted Service on the toolbar
after logging in

-
In the dialog that pops up you will need to fill in some basic
details about your new hosted service. To keep the storage and hosted services
in the same data center be sure you choose the affinity group you created
in the previous section. This will ensure the fastest possible interconnection
speeds.

Create Windows Azure Service Management Certificates
Before your application can access the Windows Azure Service
Management API to create automated deployments you need to create a certificate.
This certificate is used by Windows Azure to valid that your application does
indeed have authorization to access and make changes to your service through
the Windows Azure Service Management API.
Because this topic is much bigger than this tutorial in
scope please see the MSDN article
How to: Manage Management Certificates in Windows Azure
The steps are summarized below:
-
Create a self-signed certificate using IIS Manager: click on yout
computer name on the left, choose "Server Certificates", and then "Create
Self-Signed Certificate on the right".
-
Start/Run certmgr.msc . Find your Truster Root Certification Authorities/Certificates
on tye left, by friendly name you have it on the previous setp.
-
Right click/All Tasks/Export it as .cer and upload to Azure as
management certificate
-
Install it to Personal certificate store: copy/Paste from "Truster
Root Certification Authorities/Certificates" to "Personal/Certificates"
Install the Java JRE
By default your Windows Azure deployment comes only with
the basics required to run a website. This allows you to take full control of
the customizations of the environment. Because Solr/Lucene runs on Java you
will need to have the Java JRE on your machine to create the package for Windows
Azure.
Windows Azure is 64bit so you will need to be sure and
download the 64bit version.
You can find the download at:
http://www.java.com/en/download/manual.jsp
Please note: You are not required to use the Java JRE listed
above. Compatible JRE version can be found on the Solr/Lucene manual at
http://lucene.apache.org/solr/tutorial.html#Requirements
Install Solr/Lucene
Installing Solr/Lucene is not a difficult process, simply
download and extract the file from the archive and you will be ready to start
using Solr/Lucene. The current solution has only been tested against Solr/Lucene
3.4. You can try 3.5 if you would like, but as of the writing of this document
the newly released 3.5 is still untested.
If you already have Solr/Lucene installed great, however
for the purposes of this tutorial C:\apache-solr-3.4.0 will be
used as the Solr/Lucene location.
The Solr/Lucene manual is available at:
http://wiki.apache.org/solr/
Solr/Lucene 3.4 can be downloaded from:
http://www.apache.org/dyn/closer.cgi/lucene/solr/
Download the Windows Azure Solr/Lucene project
The project files for the Windows Azure Solr/Lucene project
are hosted on Github and are available at
https://github.com/Microsoft-Interop/Windows-Azure-Solr/tags
For this tutorial you will the project will be located
at C:\temp\solr
Configure the package tool
There are a few configuration options that are machine
and project specific you will need to set before continuing. PackSolzrConfig.xml
contains is the configuration file. Open C:\Temp\solr\PackSolzr\PackSolzrConfig.xml
for editing and you should see

The following is a description of each tag
|
Tag
|
Description
|
|
ReplSolzrLocation
|
The location of the ReplSolzr project files within the Windows Azure
Solr/Lucene project. This tutorial uses C:\Temp\solr\ReplSolzr
|
|
JreLocation
|
Location of the Java JRE to be packaged for Windows Azure
|
|
Solr/LuceneLocation
|
Location of the Solr/Lucene installation to be packaged for Windows
Azure. This tutorial uses C:\apache-solr-3.4.0
|
|
CsdefLocation
|
Location of the Windows Azure service definition file to use for the
Windows Azure package
|
|
CspackOutputLocation
|
Location on the file system to output the created Windows Azure package.
This tutorial uses C:\temp\solr\build
|
|
AzureSdkBinLocation
|
Location of the Windows Azure SDK executable on your machine
This will be located at C:\Program Files\Windows Azure SDK\<version>\bin
|
|
ForEmulator
|
When "true" the Solr/Lucene project will run in the local Windows Azure
development emulator.
When "false" Solr/Lucene projet will be packaged for deployment to Windows
Azure
Recommendation "false".
|
|
AdminWebRoleVMSize
|
Size of the Windows Azure instance for the admin web role.
See
http://msdn.microsoft.com/en-us/library/windowsazure/ee814754.aspx
Value recommended for this sample Small
|
|
Solr/LuceneMasterHostWorkerRoleVMSize
|
Size of the Windows Azure instances for the Solr/Lucene master web role
See
http://msdn.microsoft.com/en-us/library/windowsazure/ee814754.aspx
Value recommended for this Solr Master dedicated for IndexGeneration
ExtraLarge
|
|
Solr/LuceneSlaveHostWorkerRoleVMSize
|
Size of the Windows Azure instances for the Solr/Lucene slave web roles
See
http://msdn.microsoft.com/en-us/library/windowsazure/ee814754.aspx
Value recommended for this Solr Slave dedicated for IndexServing
Large
|
Create the package
Creating the package is done using one command in the terminal.
This command takes one parameter called configFilePath, which
is the path to the file you just edited.
-
Open a Windows Azure command prompt and change to the Windows
Azure Solr/Lucene project directory with the following command
cd
C:\temp\solr\PackSolzr
-
You will be able to call the PackSolzr.exe with the config
file to create your Windows Azure Solr/Lucene package with the following command
PackSolzr.exe /configFilePath=PackSolzrConfig.xml
(Note the "=" sign above and no space)
Note: The config file you edited previously should live in this directory,
thus why you did not supply a full path to /configFilePath
Configure the deployment tool
There are several ways in which you can deploy packages
to Windows Azure, one of which is manually through the Portal. This project,
however, includes a simple command line tool that will deploy the package to
Windows Azure for you. For the deployment command to work you will need to supply
some configuration information, this information is stored in the DeploySolzrConfig.xml
file. Open C:\temp\solr\DeploySolzr\DeploySolzrConfig.xml for
editing and you should see

The following is a description of each tag
|
Tag
|
Description
|
|
Solr/LuceneStorageAccName
|
Endpoint of the storage account you created in the Pre-requisites section
|
|
Solr/LuceneStorageAccKey
|
Primary or secondary access key for the listed storage account
|
|
HostedServiceName
|
Endpoint of the hosted service you created in the Pre-requisites section
|
|
SubscriptionId
|
The Windows Azure subscription id for your service
|
|
CertThumbprint
|
Windows Azure Service Management certificate thumbprint
|
|
DeploymentName
|
Human friendly name to recognize your deployment by
|
|
Solr/LuceneMasterHostWorkerRoleInstCount
|
Number of roles to start the deployment with.
Only one Master role is required since the data is persistent via Azure
Drive.
Recommended value is 1.
|
|
Solr/LuceneSlaveHostWorkerRoleInstCount
|
Number of roles to start the deployment with.
Note: 2 instances is the minimum required to maintain SLA
|
|
AdminWebRoleInstCount
|
Number of roles to start the deployment with.
Note: 2 instances is the minimum required to maintain SLA
|
|
DeploymentPckgLoc
|
Location of package created by the PackSolzr.exe command. This tutorial
uses 0C:\temp\solr\build\ReplSolzr.cspkg
|
|
BlobBaseUrl
|
URL to the Windows Azure Storage account blob endpoint. <Solr/LuceneStorageAccName>
is the same as the tag listed above
The value is
https://<Solr/LuceneStorageAccName>.blob.core.windows.net
|
|
CloudDriveSize
|
Size of the cloud storage drive in MB.
Note: Each drive is limited to 1 terabyte in size
|
|
Solr/LuceneMasterHostWorkerRoleExternalEndpointPort
|
Port to contact the Solr/Lucene master on: 21000
|
|
Solr/LuceneSlaveHostWorkerRoleExternalEndpointPort
|
Port to contact the Solr/Lucene slave worker on 20000
|
Run the deployment
Deploying the package is done using one command in the
terminal. This command takes one parameter called configFilePath,
which is the path to the file you just edited.
-
Open a Windows Azure command prompt and change to the Windows
Azure Solr/Lucene project directory with the following command
cd
C:\temp\solr\DeploySolzr
-
You will be able to call the DeploySolzr with the config
file to create your Windows Azure Solr/Lucene package with the following command
DeploySolzr.exe /configFilePath=DeploySolzrConfig.xml
Note the "=" sign and no space
Note: The config file you edited previously should live in this directory,
thus why you did not supply a full path to /configFilePath
Administering Solr/Lucene
After the deployment is running you will want to administer
your Solr/Lucene installation. This is done through the administrative panel
of your web role and should be available at a link similar to:
http://<Deployment_Endpoint>.cloudapp.net
This application will have the following functionality
-
Crawl - used to get public web content
-
Import data - used to index and replicate the data across Solr
Slave instances
-
Note: An import
finished successfully when Solr Slaves and the Solr Master will have the same
generation index and index size
-
Search once the replication of the index finished for both Solr
slave instances

Connecting to Solr/Lucene externally
In cases where you may which to connect directly to Solr/Lucene
from an external source you will need to use links similar to the following:
Master node
http://<Deployment_Endpoint>.cloudapp.net:<Master_Port>/solr
Slave nodes
http://<Deployment_Endpoint>.cloudapp.net:<Slave_Port>/solr
Where the port number is whatever you configured in the
Configure the deployment tool section.