Automatic ID generation in Apache Solr
On this page
I have been working on Apache Solr for last few months, and have been recieving requirements to speed up query process. As part of the investigation, i found out as retrieved documents’ unique id generation contributes query processing.And hence i have decided to add this post.
# Data Structure
Our sample data structure (field section from schema.xml) looks like specified below:
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<field name="name" type="text_general" indexed="true" stored="true" />
<field name="_version_" type="long" indexed="true" stored="true" />
</fields>In addition to this, I’ve added the information about which field is the one that should contain unique identifiers. This was also done in schema.xml file:
<uniqueKey>id</uniqueKey># Solr Configuration
In addition to changes in the schema.xml file, i need to modify the solrconfig.xml file and introduce a proper UpdateRequestProcessorChain like specified below:
<updateRequestProcessorChain>
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.LogUpdateProcessoryFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>Above informs Solr that id field contents are to be generated automatically.
# Simple Test
Enough with the configuration, time to test the configuration. Run below command from terminal to update document before querying indexed documents.
$> curl -XPOST 'localhost:8993/solr/update?commit=true' --data-binary '<add><doc><field name="name">Test</field></doc></add>' -H 'Content-type:application/xml'If above command runs successfully without any errors, document will get indexed. After then, in order to query below command can be used:
$> curl -XGET 'localhost:8993/solr/select?q=_:_&indent=true'Above will return queried documents specified below:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">*:*</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="name">Test</str>
<str name="id">1cdee8b4-c42d-4101-8301-4dc350a4d522</str>
<long name="_version_">1439726523307261952</long>
</doc>
</result>
</response>If you analyze response, you can see the unique identifier was automatically generated. Now if you run same commands ( addition of document & query ) then result would looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="indent">true</str>
<str name="q">*:*</str>
</lst>
</lst>
<result name="response" numFound="2" start="0">
<doc>
<str name="name">Test</str>
<str name="id">1cdee8b4-c42d-4101-8301-4dc350a4d522</str>
<long name="_version_">1439726523307261952</long>
</doc>
<doc>
<str name="name">Test</str>
<str name="id">9bedcb5f-1b71-4ab7-80a9-9882a6bf319e</str>
<long name="_version_">1439726693819351040</long>
</doc>
</result>
</response>As you can see both documents show two different unique identifier generated by solr.