I have a project for a master's seminar that consists of taking a list of 107605
records of articles and we need to enter the information at a percolator type index to finally enter texts through an interface, percolate them and highlight the related words.
For this we have, by console, the following steps:
curl -XPUT 'localhost:9200/my-index?pretty' -H 'Content-Type: application/json' -d' { "mappings": { "_doc": { "properties": { "title": { "type": "text" }, "query": { "type": "percolator" } } } } } '
curl -XPUT 'localhost:9200/my-index/_doc/1?refresh&pretty' -H 'Content-Type: application/json' -d' { "CourseId":35, "UnitId":12390, "id":"16069", "CourseName":"ARK102U_ARKEOLOJİK ALAN YÖNETİMİ", "FieldId":8, "field":"TARİH", "query": { "span_near" : { "clauses" : [ { "span_term" : { "title" : "dünya" } }, { "span_term" : { "title" : "mirası" } }, { "span_term" : { "title" : "sözleşmesi" } } ], "slop" : 0, "in_order" : true } } } '
As we can see, as a query all the words included in the title field of the record will be entered. 3. We make the query by entering the text:
curl -XGET 'localhost:9200/my-index/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query" : {
"percolate" : {
"field" : "query",
"document" : {
"title" : "Arkeoloji, arkeolojik yöntemlerle ortaya çıkarılmış kültürleri, dünya mirası sözleşmesi sosyoloji, coğrafya, tarih, etnoloji gibi birçok bilim dalından yararlanarak araştıran ve inceleyen bilim dalıdır. Türkçeye yanlış bir şekilde \"kazıbilim\" olarak çevrilmiş olsa da kazı, arkeolojik araştırma yöntemlerinden sadece bir tanesidir."
}
}
},
"highlight": {
"fields": {
"title": {}
}
}
}
'
The records come in a file json
and until the momendo I capture them and include them in a dictionary, but from there to there I do not know how to continue. This is my approach:
import json
from elasticsearch_dsl import (
DocType,
Integer,
Percolator,
Text,
)
# Read the json File
json_data = open('titles.json').read()
data = json.loads(json_data)
docs = data['response']['docs']
# Creating a elasticsearch connection
# connections.create_connection(hosts=['localhost'], port=['9200'], timeout=20)
"""
curl -XPUT 'localhost:9200/my-index?pretty' -H 'Content-Type: application/json' -d'
{
"mappings": {
"_doc": {
"properties": {
"title": {
"type": "text"
},
"query": {
"type": "percolator"
}
}
}
}
}
'
"""
class Documment(DocType):
course_id = Integer()
unit_id = Integer()
# title = Text()
id = Integer()
course_name = Text()
field_id = Integer()
field = Text()
class Meta:
index = 'titles_index'
properties={
'title': Text(),
'query': Percolator()
}
"""
"query": {
"span_near" : {
"clauses" : [
{ "span_term" : { "title" : "dünya" } },
{ "span_term" : { "title" : "mirası" } },
{ "span_term" : { "title" : "sözleşmesi" } }
],
"slop" : 0,
"in_order" : true
}
}
"""
for doc in docs:
terms = docs['title'].split(“ ”)
course_id = docs['CourseId']
unit_id = docs['UnitId']
id = docs['id']
course_name = docs['CourseName']
field_id = docs['FieldId']
field = docs['field']
How should I continue the development?
Thank you very much.