希望给同样遇见es的你带来帮助,也希望彼此有更多的讨论
版本选择6.4.3
1-Java 客户端的使用 (下)
批量插入
聚合查询
scroll-scan
批量插入
一般快速导入数据,会选择批量插入的方式,比如重新索引数据的时候
@Override
public void bulk(List<CometIndex> list) {
BulkRequest request = new BulkRequest();
try {
for (CometIndex cometIndex:list){
request.add(new IndexRequest(CometIndexKey.INDEX_NAME, CometIndexKey.INDEX_NAME)
.source(objectMapper.writeValueAsString(cometIndex), XContentType.JSON));
}
BulkResponse bulkResponse = client.bulk(request, RequestOptions.DEFAULT);
if (bulkResponse.hasFailures()) {
log.info("all success");
}
TimeValue took = bulkResponse.getTook();
log.info("[批量新增花费的毫秒]:{},({}),{}", took, took.getMillis(), took.getSeconds());
}catch (Exception e){
e.printStackTrace();
}
}
@Test
public void bulkAdd(){
List<CometIndex>list=new ArrayList<>();
int count=0;
for (int i=0;i<1000;i++){
CometIndex cometIndex=new CometIndex();
cometIndex.setCometId((long)i);
cometIndex.setAuthor("心机boy");
cometIndex.setCategory("movie");
cometIndex.setContent("肖申克的救赎"+i);
cometIndex.setDescription("肖申克的救赎满分");
cometIndex.setEditor("cctv");
cometIndex.setTitle("肖申克的救赎"+i);
cometIndex.setCreateTime(new Date());
list.add(cometIndex);
count++;
if (count%100==0) {
searchService.bulk(list);
list.clear();
}
}
}
聚合查询
1-Metric聚合
基于一组文档进行聚合,比如mysql中的MIN(), MAX(), STDDEV(), SUM() 等方法。
获取最大的值
GET _search
{
"aggs":{
"max_id":{
"max":{
"field":"cometId"
}
}
}
}
2-Bucketing聚合
基于检索构成了逻辑文档组,满足特定规则的文档放置到一个桶里,每一个桶关联一个key。比如mysql中的group by。
按照分类聚合
GET _search
{
"aggs" : {
"category_agg" : {
"terms" : { "field" : "category",
"order" : { "_count" : "desc" }
}
}
}
}
按照分类分组聚合后继续按照编辑分组聚合
GET _search
{
"aggs" : {
"category_agg" : {
"terms" : { "field" : "category",
"order" : { "_count" : "desc" }
},
"aggs" : {
"author_agg" : {
"terms": {
"field": "editor"
}
}
}
}
}
}
@Override
public Map <Object,Long> aggregateCategory() {
Map <Object,Long>result=new HashMap<>();
try {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
TermsAggregationBuilder aggregation = AggregationBuilders.terms(CometIndexKey.CATEGORY_AGG)
.field(CometIndexKey.CATEGORY).order((BucketOrder.aggregation("_count", false)));
searchSourceBuilder.aggregation(aggregation).size(0);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices(CometIndexKey.INDEX_NAME);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
Terms byCategoryAggregation = aggregations.get(CometIndexKey.CATEGORY_AGG);
if(byCategoryAggregation.getBuckets()!=null && !byCategoryAggregation.getBuckets().isEmpty()){
List <? extends Terms.Bucket>list=byCategoryAggregation.getBuckets();
for (Terms.Bucket bucket:list){
bucket.getDocCount();
bucket.getKey();
log.info("key:{},value:{}",bucket.getKey(),bucket.getDocCount());
result.put(bucket.getKey(),bucket.getDocCount());
}
}
}catch (Exception e){
log.error("agg error");
return result;
}
return result;
}
@Test
public void testAgg(){
Map<Object,Long>result=searchService.aggregateCategory();
for (Map.Entry<Object,Long> entry : result.entrySet()) {
System.out.println("Key = " + entry.getKey() + ", Value = " + entry.getValue());
}
}
聚合的种类很多,这里只给出简单的一种,大家可以多在dev_tools中尝试
1-from-size 的限制: 数据越多,其效率就越低
2-scroll:
滚动搜索,它会及时获取一个快照(先做一次初始化搜索把所有符合搜索条件的结果缓存起来生成一个快照,然后持续地、批量地从快照里拉取数据直到没有数据剩下)。这不会受到后来对索引的改变的影响。
3-scan:
深度分页的最耗资源的部分就是对结果的整体排序,但是如果我们关闭排序,那么可以消耗极少资源返回所有的文档.
我们可以使用 scan 搜索类型。scan 会告诉ES 不去排序,而是仅仅从每个仍然有结果的分片中返回下一批数据。
@Override
public void scrollScan() {
try {
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L));
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices(CometIndexKey.INDEX_NAME);
searchRequest.scroll(scroll);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
searchSourceBuilder.size(1000);
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
SearchHits searchHits=searchResponse.getHits();
log.info("scrollId:{},total:{}",scrollId,searchHits.getTotalHits());
SearchHit[] hits=searchHits.getHits();
while (hits != null && hits.length > 0) {
for (SearchHit hit : hits) {
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
log.info("title:{}",sourceAsMap.get(CometIndexKey.TITLE));
}
log.info("scrollId:{}",scrollId);
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
scrollId = searchResponse.getScrollId();
log.info("scrollId:{}",scrollId);
hits = searchResponse.getHits().getHits();
}
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);
boolean succeeded = clearScrollResponse.isSucceeded();
log.info("ScrollRequest result:{}",succeeded);
}catch (Exception e){
e.printStackTrace();
}
}
@Test
public void scrollScan(){
searchService.scrollScan();
}
掌握了使用的api,我们可以通过批量插入数据的api,生成数据,然后进行测试.
后面会介绍我们怎么使用它。
打开App,阅读手记