Free trial

Getting Started with Solr/Lucene on Windows Azure

In this tutorial we showcase how to configure and host Solr/Lucene in Windows Azure using multi-instance replication for index-serving and single-instance for index generation with a persistent index mounted in Azure storage. Typical scenarios we address with this sample are commercial and publisher sites that need to scale the traffic with increasing query volume and need to index maximum 16 TB of data and require couple of index updates per day.

Pre-requisites

Before you begin working with Solr/Lucene on Windows Azure there are a few things you will need to have in place on your development system. Please note that the Windows Azure platform runs on 64bit hardware. As such all the applications developed for Windows Azure should be created, packed and deployed from a 64bit platform with the appropriate 64bit versions of software.

Windows Azure SDK

To take advantage of the Windows Azure platform you will need to install the Windows Azure SDK. This SDK gives you access to features such as local development emulators and the packaging tools.

The Windows Azure SDK can be found at http://www.microsoft.com/windowsazure/sdk/

Create a Windows Azure subscription

If you do not already have a Windows Azure subscription you will need to create one. A free 90 day trial can be obtained at http://www.microsoft.com/windowsazure/free-trial/ .

Create a Windows Azure Storage Account

After you have a subscription you will need to create a Windows Azure Storage Account.

  •           Login to your Windows Azure Portal:
    http://windows.azure.com

  •         Click New Storage Account on the toolbar after logging in

    Description: C:\Users\a-beloba\Desktop\solr\01.png

  •          In the dialog that pops up you will need to fill in your subscription and the URL. We recommend selecting and creating an affinity group. This will help group your services into one location and help optimize for speed.

 

Create a Windows Azure Hosted Service Account

In order to run a site on Windows Azure you will need to create a new Hosted Service. This can be done quite simply with the following steps.

  •          Login to your Windows Azure Portal or http://windows.azure.com click on Home if already signed in
    Click New Hosted Service on the toolbar after logging in

    Description: C:\Users\a-beloba\Desktop\solr\05.png

  •          In the dialog that pops up you will need to fill in some basic details about your new hosted service. To keep the storage and hosted services in the same data center be sure you choose the affinity group you created in the previous section. This will ensure the fastest possible interconnection speeds.

Create Windows Azure Service Management Certificates

Before your application can access the Windows Azure Service Management API to create automated deployments you need to create a certificate. This certificate is used by Windows Azure to valid that your application does indeed have authorization to access and make changes to your service through the Windows Azure Service Management API.

Because this topic is much bigger than this tutorial in scope please see the MSDN article How to: Manage Management Certificates in Windows Azure

 

The steps are summarized below:

  •          Create a self-signed certificate using IIS Manager: click on yout computer name on the left, choose "Server Certificates", and then "Create Self-Signed Certificate on the right".

  •          Start/Run certmgr.msc . Find your Truster Root Certification Authorities/Certificates on tye left, by friendly name you have it on the previous setp.

  •          Right click/All Tasks/Export it as .cer and upload to Azure as management certificate

  •          Install it to Personal certificate store: copy/Paste from "Truster Root Certification Authorities/Certificates" to "Personal/Certificates"

Install the Java JRE

By default your Windows Azure deployment comes only with the basics required to run a website. This allows you to take full control of the customizations of the environment. Because Solr/Lucene runs on Java you will need to have the Java JRE on your machine to create the package for Windows Azure.

Windows Azure is 64bit so you will need to be sure and download the 64bit version.

You can find the download at: http://www.java.com/en/download/manual.jsp

Please note: You are not required to use the Java JRE listed above. Compatible JRE version can be found on the Solr/Lucene manual at http://lucene.apache.org/solr/tutorial.html#Requirements

Install Solr/Lucene

Installing Solr/Lucene is not a difficult process, simply download and extract the file from the archive and you will be ready to start using Solr/Lucene. The current solution has only been tested against Solr/Lucene 3.4. You can try 3.5 if you would like, but as of the writing of this document the newly released 3.5 is still untested.

If you already have Solr/Lucene installed great, however for the purposes of this tutorial C:\apache-solr-3.4.0 will be used as the Solr/Lucene location.

The Solr/Lucene manual is available at: http://wiki.apache.org/solr/

Solr/Lucene 3.4 can be downloaded from: http://www.apache.org/dyn/closer.cgi/lucene/solr/

Download the Windows Azure Solr/Lucene project

The project files for the Windows Azure Solr/Lucene project are hosted on Github and are available at https://github.com/Microsoft-Interop/Windows-Azure-Solr/tags

For this tutorial you will the project will be located at C:\temp\solr

Configure the package tool

There are a few configuration options that are machine and project specific you will need to set before continuing. PackSolzrConfig.xml contains is the configuration file. Open C:\Temp\solr\PackSolzr\PackSolzrConfig.xml for editing and you should see

The following is a description of each tag

Tag

Description

ReplSolzrLocation

The location of the ReplSolzr project files within the Windows Azure Solr/Lucene project. This tutorial uses C:\Temp\solr\ReplSolzr

JreLocation

Location of the Java JRE to be packaged for Windows Azure

Solr/LuceneLocation

Location of the Solr/Lucene installation to be packaged for Windows Azure. This tutorial uses C:\apache-solr-3.4.0

CsdefLocation

Location of the Windows Azure service definition file to use for the Windows Azure package

CspackOutputLocation

Location on the file system to output the created Windows Azure package. This tutorial uses C:\temp\solr\build

AzureSdkBinLocation

Location of the Windows Azure SDK executable on your machine

 

This will be located at C:\Program Files\Windows Azure SDK\<version>\bin

ForEmulator

When "true" the Solr/Lucene project will run in the local Windows Azure development emulator.

 

When "false" Solr/Lucene projet will be packaged for deployment to Windows Azure

Recommendation "false".

AdminWebRoleVMSize

Size of the Windows Azure instance for the admin web role.

See http://msdn.microsoft.com/en-us/library/windowsazure/ee814754.aspx

 

Value recommended for this sample  Small

 

Solr/LuceneMasterHostWorkerRoleVMSize

Size of the Windows Azure instances for the Solr/Lucene master web role

 

See http://msdn.microsoft.com/en-us/library/windowsazure/ee814754.aspx

 

Value recommended for this Solr Master dedicated for IndexGeneration   ExtraLarge

 

 

Solr/LuceneSlaveHostWorkerRoleVMSize

Size of the Windows Azure instances for the Solr/Lucene slave web roles

 

 See http://msdn.microsoft.com/en-us/library/windowsazure/ee814754.aspx

 

Value recommended for this Solr Slave dedicated for IndexServing   Large

 

 

Create the package

Creating the package is done using one command in the terminal. This command takes one parameter called configFilePath, which is the path to the file you just edited.

  •          Open a Windows Azure command prompt and change to the Windows Azure Solr/Lucene project directory with the following command

    cd C:\temp\solr\PackSolzr

  •          You will be able to call the PackSolzr.exe with the config file to create your Windows Azure Solr/Lucene package with the following command

    PackSolzr.exe /configFilePath=PackSolzrConfig.xml

    (Note the "=" sign above and no space)


    Note: The config file you edited previously should live in this directory, thus why you did not supply a full path to /configFilePath

Configure the deployment tool

There are several ways in which you can deploy packages to Windows Azure, one of which is manually through the Portal. This project, however, includes a simple command line tool that will deploy the package to Windows Azure for you. For the deployment command to work you will need to supply some configuration information, this information is stored in the DeploySolzrConfig.xml file. Open C:\temp\solr\DeploySolzr\DeploySolzrConfig.xml for editing and you should see

The following is a description of each tag

Tag

Description

Solr/LuceneStorageAccName

Endpoint of the storage account you created in the Pre-requisites section

Solr/LuceneStorageAccKey

Primary or secondary access key for the listed storage account

HostedServiceName

Endpoint of the hosted service you created in the Pre-requisites section

SubscriptionId

The Windows Azure subscription id for your service

CertThumbprint

Windows Azure Service Management certificate thumbprint

DeploymentName

Human friendly name to recognize your deployment by

Solr/LuceneMasterHostWorkerRoleInstCount

Number of roles to start the deployment with.

 

Only one Master role is required since the data is persistent via Azure Drive.

Recommended value is 1.

Solr/LuceneSlaveHostWorkerRoleInstCount

Number of roles to start the deployment with.

 

Note: 2 instances is the minimum required to maintain SLA

AdminWebRoleInstCount

Number of roles to start the deployment with.

 

Note: 2 instances is the minimum required to maintain SLA

DeploymentPckgLoc

Location of package created by the PackSolzr.exe command. This tutorial uses 0C:\temp\solr\build\ReplSolzr.cspkg

BlobBaseUrl

URL to the Windows Azure Storage account blob endpoint. <Solr/LuceneStorageAccName> is the same as the tag listed above

The value is

https://<Solr/LuceneStorageAccName>.blob.core.windows.net

 

CloudDriveSize

Size of the cloud storage drive in MB.

 

Note: Each drive is limited to 1 terabyte in size

Solr/LuceneMasterHostWorkerRoleExternalEndpointPort

Port to contact the Solr/Lucene master on: 21000

Solr/LuceneSlaveHostWorkerRoleExternalEndpointPort

Port to contact the Solr/Lucene slave worker on 20000

 

Run the deployment

Deploying the package is done using one command in the terminal. This command takes one parameter called configFilePath, which is the path to the file you just edited.

  •          Open a Windows Azure command prompt and change to the Windows Azure Solr/Lucene project directory with the following command

    cd C:\temp\solr\DeploySolzr

  •         You will be able to call the DeploySolzr with the config file to create your Windows Azure Solr/Lucene package with the following command

    DeploySolzr.exe /configFilePath=DeploySolzrConfig.xml

    Note the "=" sign and no space


    Note: The config file you edited previously should live in this directory, thus why you did not supply a full path to /configFilePath

Administering Solr/Lucene

After the deployment is running you will want to administer your Solr/Lucene installation. This is done through the administrative panel of your web role and should be available at a link similar to:

http://<Deployment_Endpoint>.cloudapp.net

This application will have the following functionality

  •          Crawl - used to get public web content

  •          Import data - used to  index and replicate the data across Solr Slave  instances

  • Note: An import finished successfully when Solr Slaves and the Solr Master will have the same generation index and index size

  •          Search  once the replication of the index finished for both Solr slave instances

 

Connecting to Solr/Lucene externally

In cases where you may which to connect directly to Solr/Lucene from an external source you will need to use links similar to the following:

Master node

http://<Deployment_Endpoint>.cloudapp.net:<Master_Port>/solr

Slave nodes

http://<Deployment_Endpoint>.cloudapp.net:<Slave_Port>/solr

Where the port number is whatever you configured in the Configure the deployment tool section.

 
SalesBuy
 
1.888.555.1212
Technical SupportSupport
Yes
No
Was this page useful?