手记

ElasticSearch 实用学习笔记

ElasticSearch

Author: CodingGorit

Date: 2020年10月22日

Note:学习笔记记录自 B站狂神说:ElasticSearch 学习

一、学习大纲

  1. 安装
  2. 生态圈
  3. 分词器 lk
  4. RestFul 操作 ES
  5. CRUD
  6. SpringBoot 继承 ElasticSearch (从原理分析!!!)
  7. 爬虫爬取数据!!! 京东
  8. 实战,模拟全文检索

搜索相关使用 ES(大数据量下使用)

> Lucene 是一套信息检索工具包 (Jar 包,不包含 搜索引擎系统)! Solr
>
> 包含的:索引结构!读写索引的工具!排序,搜索规则… 工具类
>
> Lucene 和 EslasticSearch 关系:
>
> ElasticSearch 是基于 Lucene 做了一些封装 和 增强

二、ElasticSearch 概述

简称 es

  • 一个开源的高扩展的 分布式全文检索引擎
  • 近乎实时的存储,检索数据
  • es使用 java 开发并使用 Licene 作为其核心来实现所有索引 和 搜索功能
  • 它的目的是通过简单的 RESTFul API,来隐藏 Lucene 的复杂性,从而让全文搜索变得简单

三、ElasticSearch 安装

  • JDK 1.8
  1. 下载,解压

  2. 熟悉目录:

bin: 启动文件
	config: 配置文件
	log4j: 日志文件
	jvm.options: java 虚拟机先关的配置
	elasticsearch.xml:	elasticsearch 的配置文件!
lib: 相关 jar 包
logs: 日志
modules: 功能模块
plugins: 插件 ik	
	
  1. 启动,访问 9200
  2. 访问测试:localhost:9200

> 安装可视化插件 es head 插件

npm install
npm run start

在 elasticSearch.yml 配置跨域

http.cors.enabled: true
http.cors.allow-origin: "*"

安装 kibana

  1. 下载,解压
  2. 国际化

找到 config 下的 kibana.yml 文件,修改最后一行为 i18n.locale: “zh-CN”

四、ES 核心概念

  1. 索引
  2. 字段类型 (mapping)
  3. 文档(documents)

集群、节点、索引、类型、文档、分片、映射是什么?

> ElasticSearch 是面向文档,关系型数据库 和 elasticSearch 客观的对比! 一切都是 JSON
>
> {
>
> }

名词对应

ElasticSearch Relational DB
索引(indices) 数据库(database)
types 表(tables)
documents 行(rows)
fields 字段(columns)

elasticSearch (集群)中可以包含多个索引(数据库),每个索引中可以包含多个类型(表),每个类型下又包含多个文档(行),每个文档又包含多个字段(列)

物理设计

elasticSearch 一个就是一个集群

文档

一条条记录

user
	zs: 15
	ls: 22

类型

自动识别, string,

索引

数据库

五、IK 分词器插件

下载好的添加到 plugin 中

跳过,第 8 集

  • elasticsearch-plugin 可以通过这个命令来查看加载进来的插件

  • ik_smart(最少切分) 和 ik_max_word(最细粒度划分)

  • kibana 测试

  • 自定义分词

六、 Rest 风格说明

基础 Rest 命令

method url 地址 描述
PUT localhost:9200/索引名称/类型名称/文档 id 创建文档(指定文档 id)
POST localhost:9200/索引名称/类型名称 创建文档(随机文档 id)
POST localhost:9200/索引名称/类型名称/文档id/_update 修改文档
DELETE localhost:9200/索引名称/类型名称/文档id 删除文档
GET localhost:9200/索引名称/类型名称/文档id 查询文档通过文档 id
POST localhost:9200/索引名称/类型名称/_seaarch 查询所有数据

> 基本测试

6.1 创建索引

  1. 创建一个索引
PUT /索引名/~类型名~/文档id
{
  "name":"Gorit",
  "age": 18,
  "gender": "male"
}

返回值,数据成功添加

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1", 
  "_version" : 1, // 修改次数
  "result" : "created", // 状态
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

  1. 创建索引规则
PUT /test1/
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      },
      "birthday": {
        "type": "date"
      }
    }
  }
}

返回值

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test1"
}

es 默认配置字段类型!

6.2 查询

GET test

# 结果
{
  "test" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203146037",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "q47lWt_4ToOBo1rxQ1pPNw",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test"
      }
    }
  }
}


GET test1

{
  "test1" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birthday" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203453667",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "a-upVXJwR7u7JZztTjyVGg",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test1"
      }
    }
  }
}

扩展:通过 _cat/ 可以获得 es 当前很多的信息

GET _cat/health

GET _cat/indices?v

6.3 修改索引

> 提交 PUT,覆盖即可

修改数据

PUT /test/type1/1
{
  "name":"Gorit111",
  "age": 18,
  "gender": "male"
}

修改结果

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

新的方法 POST 命令更新

POST /test/_doc/1/_update
{
  "doc": {
      "name":"张三"
  }
}

// 结果
{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "张三",
    "age" : 18,
    "gender" : "male"
  }
}

6.4 删除索引

> 删除索引!!!

DELETE test

通过 delete 命令实现删除,根据你的请求来判断删除的是索引 还是 文档

七、关于文档的操作

7.1 基本操作 (复习巩固)

  1. 添加数据(添加多条记录)
PUT /gorit/user/1
{
  "name": "CodingGorit",
  "age": 23,
  "desc": "一个独立的个人开发者",
  "tags": ["Python","Java","JavaScript"]
}

PUT /gorit/user/2
{
  "name": "龙",
  "age": 20,
  "desc": "全栈工程师",
  "tags": ["Python","JavaScript"]
}

结果:

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "gorit",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
  1. 获取数据
GET /gorit/user/_search   # 查询所有数据

GET /gorit/user/1 # 查询单个数据
  1. 更新数据 PUT
PUT /gorit/user/3
{
  "name": "李四222",
  "age": 20,
  "desc": "Java开发工程师",
  "tags": ["Python","Java"]
}

# PUT 更新字段不完整,数据会被滞空
  1. post _update , 推荐使用这种方式!
# 修改方式和 PUT 一样会使数据滞空
POST /gorit/user/1
{
  "doc": {
    "name": "coco"
  }
}

# 修改数据不会滞空, 效率更加高效
POST /gorit/user/1/_update
{
  "doc": {
    "name": "coco"
  }
}

简单的搜索!

# 查询一条记录
GET /gorit/user/1

# 查询所有
GET /gorit/user/_search

# 条件查询 [精确匹配] ,如果我们没有个这个属性设置字段,它会背默认设置为 keyword,这个 keyword 字段就是使用全匹配来匹配的,如果是 text 类型,模糊查询就会起效果
GET /gorit/user/_search?q=name:coco

7.2 复杂的查询搜索:select(排序、分页、高亮、模糊查询、精确查询)!

  1. 过滤加指定字段查询
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "李四"
    }
  },
  "_source": ["name","desc"]
}

7.3 排序

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "gorit"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ]
}

7.4 分页查询

使用字段 from 和 size 进行分页查询,方式和 limit pageSize 是一模一样的

  1. from 从第几页开始
  2. 返回多少条数据
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "李四"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ],
  "from": 0,
  "size": 1
}

7.5 filiter 区间查询

# 根据年龄的范围大小查询
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "age": {
              "gte": 1,
              "lte": 25
            }
          }
        }
      ]
    }
  }
}
  • gt 大于
  • gte 大于等于
  • lt 小于
  • lte 小于等于

7.6 布尔值查询

must (and), 所有的条件都要符合 where id=1 and name = xxx

# 布尔查询
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        },{
          "match": {
            "age": "16"
          }
        }
      ]
    }
  }
}

7.7 匹配多个条件

同时匹配即可

# 多个条件用空格隔开,只要满足一个即可被查出,这个时候可以根据分值判断
GET /gorit/user/_search
{
  "query": {
    "match": {
      "tags": "Java Python"
    }
  }
}

7.7 精确查询

term 查询是直接通过倒排索引指定的词条进程精确的查找的!

关于分词

  • term,直接精确查询

  • match:会使用分词器解析!!(先分析文档,然后通过分析的文档进行查询!!!)

两个类型 text keyword

结论:

  • text 可分
  • keyword 不可再分

7.8 高亮查询

# 高亮查询, 搜索的结果,可以高亮显示, 也能添加自定义高亮条件
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "Gorit"
    }
  },
  "highlight": {
    "pre_tags": "<h3 class="key" token punctuation">:#FF0000;">", 
    "post_tags": "</h3>", 
    "fields": {
      "name": {}
    }
  }
}

# 响应结果
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.6375021,
    "hits" : [
      {
        "_index" : "gorit",
        "_type" : "user",
        "_id" : "6",
        "_score" : 1.6375021,
        "_source" : {
          "name" : "Gorit",
          "age" : 16,
          "desc" : "运维工程师",
          "tags" : [
            "Linux",
            "c++",
            "python"
          ]
        },
        "highlight" : {
          "name" : [
            "<h3 class="key" token punctuation">:#FF0000;">Gorit</h3>"
          ]
        }
      }
    ]
  }
}

这些 MySQL 也可以做,只是 MySQL 效率更低

  • 匹配
  • 按照条件匹配
  • 精确匹配
  • 区间范围匹配
  • 匹配字段过滤
  • 多条件查询
  • 高亮查询
  • 倒排索引

八、集成 SpringBoot

> 找官方文档

> 具体测试

  1. 创建索引
  2. 判断索引是否存在
  3. 删除索引
  4. 创建文档
  5. 操作文档
// 坐标依赖
		org.springframework.bootspring-boot-starter-data-elasticsearch

// 核心代码            
package cn.gorit;

import cn.gorit.pojo.User;
import com.alibaba.fastjson.JSON;
import javafx.scene.control.IndexRange;
import org.apache.lucene.util.QueryBuilder;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContent;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.json.JSONObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.http.codec.cbor.Jackson2CborDecoder;

import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

/**
 * es 7.6.2 API 测试
 */
@SpringBootTest
class DemoApplicationTests {

	// 名称匹配
	@Autowired
	@Qualifier("restHighLevelClient")
	private RestHighLevelClient client;

	@Test
	void contextLoads() {

	}
	// 索引的创建
	@Test
	void testCreateIndex() throws IOException {
		// 1. 创建索引请求  等价于 PUT /gorit_index
		CreateIndexRequest request = new CreateIndexRequest("gorit_index");
		// 2. 执行创建请求 IndicesClient, 请求后获得响应
		CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
		System.out.println(response);
	}

	// 测试获取索引,判断其是否存在
	@Test
	void testGetIndexExist() throws IOException {
		GetIndexRequest request = new GetIndexRequest("gorit_index");
		boolean exist = client.indices().exists(request,RequestOptions.DEFAULT);
		System.out.println(exist);
	}

	// 删除索引
	@Test
	void testDeleteIndex() throws IOException {
		DeleteIndexRequest request = new DeleteIndexRequest("gorit_index");
		// 删除
		AcknowledgedResponse delete	= client.indices().delete(request,RequestOptions.DEFAULT);
		System.out.println(delete.isAcknowledged());
	}

	// 添加文档
	@Test
	void testAddDocument() throws IOException {
		// 创建对象
		User u = new User("Gorit",3);
		// 创建请求
		IndexRequest request = new IndexRequest("gorit_index");

		// 规则 PUT /gorit_index/_doc/1
		request.id("1");
		request.timeout(TimeValue.timeValueSeconds(3));
		request.timeout("1s");

		// 将数据放入请求 json
		IndexRequest source = request.source(JSON.toJSONString(u), XContentType.JSON);
		// 客户端发送请求
		IndexResponse response = client.index(request, RequestOptions.DEFAULT);

		System.out.println(response.toString());
		System.out.println(response.status());// 返回对应的状态 CREATED
	}

	// 获取文档,判断存在  get /index/_doc/1
	@Test
	void testIsExists() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");

		// 不获取返回的 _source 的上下文了
		getRequest.fetchSourceContext(new FetchSourceContext(false));
		getRequest.storedFields("_none_");

		boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
		System.out.println(exists);
	}

	// 获取文档信息
	@Test
	void testGetDocument() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");
		GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(getResponse.getSourceAsString());
		System.out.println(getResponse); // 返回全部的内容和命令是一样的
	}

	// 更新文档信息
	@Test
	void testUpdateDocument() throws IOException {
		UpdateRequest updateRequest = new UpdateRequest("gorit_index", "1");
		updateRequest.timeout("1s");

		User user = new User("CodingGoirt", 18);
		updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

		UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(updateResponse.status());
		System.out.println(updateResponse); // 返回全部的内容和命令是一样的
	}

	// 删除文档记录
	@Test
	void testDeleteDocument() throws IOException {
		DeleteRequest deleteRequest = new DeleteRequest("gorit_index", "1");
		deleteRequest.timeout("1s");

		DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(deleteResponse.status());
		System.out.println(deleteResponse); // 返回全部的内容和命令是一样的
	}

	// 特殊的,真的项目。 批量插入数据

	@Test
	void testBulkRequest() throws IOException {
		BulkRequest bulkRequest = new BulkRequest();
		bulkRequest.timeout("10s");

		ArrayList userList = new ArrayList&lt;&gt;();
		userList.add(new User("张三1",1));
		userList.add(new User("张三2",2));
		userList.add(new User("张三3",3));
		userList.add(new User("张三4",4));
		userList.add(new User("张三5",5));
		userList.add(new User("张三6",6));
		userList.add(new User("张三7",7));

		// 批处理请求
		for (int i=0;iorg.jsoupjsoup1.10.2com.alibabafastjson1.2.68org.springframework.bootspring-boot-starter-data-elasticsearchorg.springframework.bootspring-boot-starter-thymeleaforg.springframework.bootspring-boot-starter-weborg.springframework.bootspring-boot-devtoolsruntimetrueorg.springframework.bootspring-boot-configuration-processortrueorg.projectlomboklomboktrueorg.springframework.bootspring-boot-starter-testtestorg.junit.vintagejunit-vintage-engine

爬虫

配置文件

package cn.gorit.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * Spring 步骤
 * 1. 找对象
 * 2. 放到 spring 中使用
 * 3. 分析源码
 *
 * @Classname ElasticSearchConfig
 * @Description TODO
 * @Date 2020/10/21 17:20
 * @Created by CodingGorit
 * @Version 1.0
 */
@Configuration // xml -bean
public class ElasticSearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                )
        );
        return client;
    }

}

> 爬取京东搜索的内容

config 配置类

package cn.gorit.util;

import cn.gorit.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.stereotype.Component;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

/**
 * @Classname HtmlParseUtil
 * @Description TODO
 * @Date 2020/10/21 23:17
 * @Created by CodingGorit
 * @Version 1.0
 */
@Component
public class HtmlParseUtil {

//    public static void main(String[] args) throws Exception {
//        new HtmlParseUtil().parseJD("英语").forEach(System.out::println);
//    }

    public List parseJD(String keyword) throws Exception {
        // 请求 url
        // 联网,不能获取 ajax 数据
        String url = "https://search.jd.com/Search?keyword=wd&amp;enc=utf-8";
        // 解析网页 (返回的  Document 对象)
        Document document = Jsoup.parse(new URL(url.replace("wd",keyword)),30000);
        // 获取所有节点标签
        Element element = document.getElementById("J_goodsList");
        // 获取所有的 li 元素
        Elements elements = element.getElementsByTag("li");
        // 获取元素中的内容
        List goodsList = new ArrayList&lt;&gt;();
        for (Element e: elements) {
            String img = e.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = e.getElementsByClass("p-price").eq(0).text();
            String title = e.getElementsByClass("p-name").eq(0).text();

            goodsList.add(new Content(title,img,price));
//            System.out.println(img);
//            System.out.println(price);
//            System.out.println(title);
        }
        return goodsList;
    }
}

Service 方法

package cn.gorit.service;

import cn.gorit.pojo.Content;
import cn.gorit.util.HtmlParseUtil;
import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

/**
 * @Classname ContentService
 * @Description TODO
 * @Date 2020/10/22 18:44
 * @Created by CodingGorit
 * @Version 1.0
 */
@Service
public class ContentService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    // 不能直接使用,只要 Spring 容器
    public static void main(String[] args) throws Exception {
        new ContentService().parseContent("java");
    }

    // 1. 解析数据放入 es 索引中
    public Boolean parseContent (String keywords) throws Exception {
        // 获取查询到的列表的信息
        List contents = new HtmlParseUtil().parseJD(keywords);
        // 把查询到的数据放入 es 中
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i=0;i &lt; contents.size();++i) {
            bulkRequest.add(
                    new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)),XContentType.JSON));
        }
        BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulkResponse.hasFailures();
    }

    // 2. 获取这些数据,实现基本的搜索功能
    public List&gt; searchPagehighLight   (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo &lt;= 1)
            pageNo = 1;

        // 条件清晰
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        // 高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false);
        highlightBuilder.preTags("<span token operator">:#FF0000;">");
        highlightBuilder.postTags("</span>");
        builder.highlighter(highlightBuilder);

        // 执行搜索
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList&gt; list= new ArrayList&lt;&gt;();
        for (SearchHit hit: searchResponse.getHits().getHits()) {
            // 解析高亮的字段
            Map highlightFields = hit.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map sourceAsMap = hit.getSourceAsMap();// 原来的结果
            // 解析高亮字段,将原来的字段换成我们高亮的字段即可
            if (title != null) {
                Text[] fragments = title.fragments();
                StringBuilder nTitle = new StringBuilder();
                for (Text text:fragments) {
                    nTitle.append(text);
                }
                sourceAsMap.put("title",nTitle);
            }
            list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可
        }
        return list;
    }

    // 2. 获取这些数据,实现基本的搜索功能
    public List&gt; searchPage (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo &lt;= 1)
            pageNo = 1;

        // 条件清晰
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));


        // 执行搜索
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList&gt; list= new ArrayList&lt;&gt;();
        for (SearchHit hit: searchResponse.getHits().getHits()) {

            list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可
        }
        return list;
    }
}

Controller

package cn.gorit.controller;

import cn.gorit.pojo.Content;
import cn.gorit.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.io.IOException;
import java.util.List;
import java.util.Map;

/**
 * @Classname ContentController
 * @Description TODO
 * @Date 2020/10/22 18:45
 * @Created by CodingGorit
 * @Version 1.0
 */
@RestController
public class ContentController {

    @Autowired
    private ContentService service;

    /**
     * 将数据添加到 ES 中
     * @param keyword
     * @return
     * @throws Exception
     */
    @GetMapping("/parse/{keyword}")
    public Boolean pares(@PathVariable("keyword")  String keyword) throws Exception {
        return service.parseContent(keyword);
    }

    /**
     * 查询 ES 的数据
     * @param keyword
     * @param pageNo
     * @param pageSize
     * @return
     * @throws IOException
     */
    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List&gt; search(@PathVariable("keyword") String keyword,@PathVariable("pageNo") int pageNo, @PathVariable("pageSize") int pageSize) throws IOException {
        if (pageNo == 0) {
            pageNo = 1;
        }
        return service.searchPage(keyword, pageNo, pageSize);
    }
}

前后端分离

POSTMAN 测试

搜索高亮

> 一套项目,多端运用

十、总结

  1. ElasticSearch 基本使用
  2. SpringBoot 整合 ES
  3. 实战搜索

> 个人开源项目 (Coding-With-Java ) 欢迎大家点赞

[TOC]

ElasticSearch

Author: CodingGorit

Date: 2020年10月22日

Note:学习笔记记录自 B站狂神说:ElasticSearch 学习

一、学习大纲

  1. 安装
  2. 生态圈
  3. 分词器 lk
  4. RestFul 操作 ES
  5. CRUD
  6. SpringBoot 继承 ElasticSearch (从原理分析!!!)
  7. 爬虫爬取数据!!! 京东
  8. 实战,模拟全文检索

搜索相关使用 ES(大数据量下使用)

> Lucene 是一套信息检索工具包 (Jar 包,不包含 搜索引擎系统)! Solr
>
> 包含的:索引结构!读写索引的工具!排序,搜索规则… 工具类
>
> Lucene 和 EslasticSearch 关系:
>
> ElasticSearch 是基于 Lucene 做了一些封装 和 增强

二、ElasticSearch 概述

简称 es

  • 一个开源的高扩展的 分布式全文检索引擎
  • 近乎实时的存储,检索数据
  • es使用 java 开发并使用 Licene 作为其核心来实现所有索引 和 搜索功能
  • 它的目的是通过简单的 RESTFul API,来隐藏 Lucene 的复杂性,从而让全文搜索变得简单

三、ElasticSearch 安装

  • JDK 1.8
  1. 下载,解压

  2. 熟悉目录:

bin: 启动文件
	config: 配置文件
	log4j: 日志文件
	jvm.options: java 虚拟机先关的配置
	elasticsearch.xml:	elasticsearch 的配置文件!
lib: 相关 jar 包
logs: 日志
modules: 功能模块
plugins: 插件 ik	
	
  1. 启动,访问 9200
  2. 访问测试:localhost:9200

> 安装可视化插件 es head 插件

npm install
npm run start

在 elasticSearch.yml 配置跨域

http.cors.enabled: true
http.cors.allow-origin: "*"

安装 kibana

  1. 下载,解压
  2. 国际化

找到 config 下的 kibana.yml 文件,修改最后一行为 i18n.locale: “zh-CN”

四、ES 核心概念

  1. 索引
  2. 字段类型 (mapping)
  3. 文档(documents)

集群、节点、索引、类型、文档、分片、映射是什么?

> ElasticSearch 是面向文档,关系型数据库 和 elasticSearch 客观的对比! 一切都是 JSON
>
> {
>
> }

名词对应

ElasticSearch Relational DB
索引(indices) 数据库(database)
types 表(tables)
documents 行(rows)
fields 字段(columns)

elasticSearch (集群)中可以包含多个索引(数据库),每个索引中可以包含多个类型(表),每个类型下又包含多个文档(行),每个文档又包含多个字段(列)

物理设计

elasticSearch 一个就是一个集群

文档

一条条记录

user
	zs: 15
	ls: 22

类型

自动识别, string,

索引

数据库

五、IK 分词器插件

下载好的添加到 plugin 中

跳过,第 8 集

  • elasticsearch-plugin 可以通过这个命令来查看加载进来的插件

  • ik_smart(最少切分) 和 ik_max_word(最细粒度划分)

  • kibana 测试

  • 自定义分词

六、 Rest 风格说明

基础 Rest 命令

method url 地址 描述
PUT localhost:9200/索引名称/类型名称/文档 id 创建文档(指定文档 id)
POST localhost:9200/索引名称/类型名称 创建文档(随机文档 id)
POST localhost:9200/索引名称/类型名称/文档id/_update 修改文档
DELETE localhost:9200/索引名称/类型名称/文档id 删除文档
GET localhost:9200/索引名称/类型名称/文档id 查询文档通过文档 id
POST localhost:9200/索引名称/类型名称/_seaarch 查询所有数据

> 基本测试

6.1 创建索引

  1. 创建一个索引
PUT /索引名/~类型名~/文档id
{
  "name":"Gorit",
  "age": 18,
  "gender": "male"
}

返回值,数据成功添加

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1", 
  "_version" : 1, // 修改次数
  "result" : "created", // 状态
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

  1. 创建索引规则
PUT /test1/
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "long"
      },
      "birthday": {
        "type": "date"
      }
    }
  }
}

返回值

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "test1"
}

es 默认配置字段类型!

6.2 查询

GET test

# 结果
{
  "test" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "gender" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203146037",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "q47lWt_4ToOBo1rxQ1pPNw",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test"
      }
    }
  }
}


GET test1

{
  "test1" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "birthday" : {
          "type" : "date"
        },
        "name" : {
          "type" : "text"
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1603203453667",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "a-upVXJwR7u7JZztTjyVGg",
        "version" : {
          "created" : "7060299"
        },
        "provided_name" : "test1"
      }
    }
  }
}

扩展:通过 _cat/ 可以获得 es 当前很多的信息

GET _cat/health

GET _cat/indices?v

6.3 修改索引

> 提交 PUT,覆盖即可

修改数据

PUT /test/type1/1
{
  "name":"Gorit111",
  "age": 18,
  "gender": "male"
}

修改结果

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "test",
  "_type" : "type1",
  "_id" : "1",
  "_version" : 2,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

新的方法 POST 命令更新

POST /test/_doc/1/_update
{
  "doc": {
      "name":"张三"
  }
}

// 结果
{
  "_index" : "test",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 3,
  "_seq_no" : 2,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "name" : "张三",
    "age" : 18,
    "gender" : "male"
  }
}

6.4 删除索引

> 删除索引!!!

DELETE test

通过 delete 命令实现删除,根据你的请求来判断删除的是索引 还是 文档

七、关于文档的操作

7.1 基本操作 (复习巩固)

  1. 添加数据(添加多条记录)
PUT /gorit/user/1
{
  "name": "CodingGorit",
  "age": 23,
  "desc": "一个独立的个人开发者",
  "tags": ["Python","Java","JavaScript"]
}

PUT /gorit/user/2
{
  "name": "龙",
  "age": 20,
  "desc": "全栈工程师",
  "tags": ["Python","JavaScript"]
}

结果:

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{
  "_index" : "gorit",
  "_type" : "user",
  "_id" : "1",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}
  1. 获取数据
GET /gorit/user/_search   # 查询所有数据

GET /gorit/user/1 # 查询单个数据
  1. 更新数据 PUT
PUT /gorit/user/3
{
  "name": "李四222",
  "age": 20,
  "desc": "Java开发工程师",
  "tags": ["Python","Java"]
}

# PUT 更新字段不完整,数据会被滞空
  1. post _update , 推荐使用这种方式!
# 修改方式和 PUT 一样会使数据滞空
POST /gorit/user/1
{
  "doc": {
    "name": "coco"
  }
}

# 修改数据不会滞空, 效率更加高效
POST /gorit/user/1/_update
{
  "doc": {
    "name": "coco"
  }
}

简单的搜索!

# 查询一条记录
GET /gorit/user/1

# 查询所有
GET /gorit/user/_search

# 条件查询 [精确匹配] ,如果我们没有个这个属性设置字段,它会背默认设置为 keyword,这个 keyword 字段就是使用全匹配来匹配的,如果是 text 类型,模糊查询就会起效果
GET /gorit/user/_search?q=name:coco

7.2 复杂的查询搜索:select(排序、分页、高亮、模糊查询、精确查询)!

  1. 过滤加指定字段查询
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "李四"
    }
  },
  "_source": ["name","desc"]
}

7.3 排序

GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "gorit"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ]
}

7.4 分页查询

使用字段 from 和 size 进行分页查询,方式和 limit pageSize 是一模一样的

  1. from 从第几页开始
  2. 返回多少条数据
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "李四"
    }
  },
  "sort": [
    {
     "age": {
       "order": "desc"
     }
    }
  ],
  "from": 0,
  "size": 1
}

7.5 filiter 区间查询

# 根据年龄的范围大小查询
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "age": {
              "gte": 1,
              "lte": 25
            }
          }
        }
      ]
    }
  }
}
  • gt 大于
  • gte 大于等于
  • lt 小于
  • lte 小于等于

7.6 布尔值查询

must (and), 所有的条件都要符合 where id=1 and name = xxx

# 布尔查询
GET /gorit/user/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "gorit"
          }
        },{
          "match": {
            "age": "16"
          }
        }
      ]
    }
  }
}

7.7 匹配多个条件

同时匹配即可

# 多个条件用空格隔开,只要满足一个即可被查出,这个时候可以根据分值判断
GET /gorit/user/_search
{
  "query": {
    "match": {
      "tags": "Java Python"
    }
  }
}

7.7 精确查询

term 查询是直接通过倒排索引指定的词条进程精确的查找的!

关于分词

  • term,直接精确查询

  • match:会使用分词器解析!!(先分析文档,然后通过分析的文档进行查询!!!)

两个类型 text keyword

结论:

  • text 可分
  • keyword 不可再分

7.8 高亮查询

# 高亮查询, 搜索的结果,可以高亮显示, 也能添加自定义高亮条件
GET /gorit/user/_search
{
  "query": {
    "match": {
      "name": "Gorit"
    }
  },
  "highlight": {
    "pre_tags": "<h3 class="key" token punctuation">:#FF0000;">", 
    "post_tags": "</h3>", 
    "fields": {
      "name": {}
    }
  }
}

# 响应结果
#! Deprecation: [types removal] Specifying types in search requests is deprecated.
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.6375021,
    "hits" : [
      {
        "_index" : "gorit",
        "_type" : "user",
        "_id" : "6",
        "_score" : 1.6375021,
        "_source" : {
          "name" : "Gorit",
          "age" : 16,
          "desc" : "运维工程师",
          "tags" : [
            "Linux",
            "c++",
            "python"
          ]
        },
        "highlight" : {
          "name" : [
            "<h3 class="key" token punctuation">:#FF0000;">Gorit</h3>"
          ]
        }
      }
    ]
  }
}

这些 MySQL 也可以做,只是 MySQL 效率更低

  • 匹配
  • 按照条件匹配
  • 精确匹配
  • 区间范围匹配
  • 匹配字段过滤
  • 多条件查询
  • 高亮查询
  • 倒排索引

八、集成 SpringBoot

> 找官方文档

> 具体测试

  1. 创建索引
  2. 判断索引是否存在
  3. 删除索引
  4. 创建文档
  5. 操作文档
// 坐标依赖
		org.springframework.bootspring-boot-starter-data-elasticsearch

// 核心代码            
package cn.gorit;

import cn.gorit.pojo.User;
import com.alibaba.fastjson.JSON;
import javafx.scene.control.IndexRange;
import org.apache.lucene.util.QueryBuilder;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedRequest;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContent;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.MatchAllQueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.json.JSONObject;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.http.codec.cbor.Jackson2CborDecoder;

import java.io.IOException;
import java.util.ArrayList;
import java.util.concurrent.TimeUnit;

/**
 * es 7.6.2 API 测试
 */
@SpringBootTest
class DemoApplicationTests {

	// 名称匹配
	@Autowired
	@Qualifier("restHighLevelClient")
	private RestHighLevelClient client;

	@Test
	void contextLoads() {

	}
	// 索引的创建
	@Test
	void testCreateIndex() throws IOException {
		// 1. 创建索引请求  等价于 PUT /gorit_index
		CreateIndexRequest request = new CreateIndexRequest("gorit_index");
		// 2. 执行创建请求 IndicesClient, 请求后获得响应
		CreateIndexResponse response = client.indices().create(request, RequestOptions.DEFAULT);
		System.out.println(response);
	}

	// 测试获取索引,判断其是否存在
	@Test
	void testGetIndexExist() throws IOException {
		GetIndexRequest request = new GetIndexRequest("gorit_index");
		boolean exist = client.indices().exists(request,RequestOptions.DEFAULT);
		System.out.println(exist);
	}

	// 删除索引
	@Test
	void testDeleteIndex() throws IOException {
		DeleteIndexRequest request = new DeleteIndexRequest("gorit_index");
		// 删除
		AcknowledgedResponse delete	= client.indices().delete(request,RequestOptions.DEFAULT);
		System.out.println(delete.isAcknowledged());
	}

	// 添加文档
	@Test
	void testAddDocument() throws IOException {
		// 创建对象
		User u = new User("Gorit",3);
		// 创建请求
		IndexRequest request = new IndexRequest("gorit_index");

		// 规则 PUT /gorit_index/_doc/1
		request.id("1");
		request.timeout(TimeValue.timeValueSeconds(3));
		request.timeout("1s");

		// 将数据放入请求 json
		IndexRequest source = request.source(JSON.toJSONString(u), XContentType.JSON);
		// 客户端发送请求
		IndexResponse response = client.index(request, RequestOptions.DEFAULT);

		System.out.println(response.toString());
		System.out.println(response.status());// 返回对应的状态 CREATED
	}

	// 获取文档,判断存在  get /index/_doc/1
	@Test
	void testIsExists() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");

		// 不获取返回的 _source 的上下文了
		getRequest.fetchSourceContext(new FetchSourceContext(false));
		getRequest.storedFields("_none_");

		boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
		System.out.println(exists);
	}

	// 获取文档信息
	@Test
	void testGetDocument() throws IOException {
		GetRequest getRequest = new GetRequest("gorit_index", "1");
		GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(getResponse.getSourceAsString());
		System.out.println(getResponse); // 返回全部的内容和命令是一样的
	}

	// 更新文档信息
	@Test
	void testUpdateDocument() throws IOException {
		UpdateRequest updateRequest = new UpdateRequest("gorit_index", "1");
		updateRequest.timeout("1s");

		User user = new User("CodingGoirt", 18);
		updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

		UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(updateResponse.status());
		System.out.println(updateResponse); // 返回全部的内容和命令是一样的
	}

	// 删除文档记录
	@Test
	void testDeleteDocument() throws IOException {
		DeleteRequest deleteRequest = new DeleteRequest("gorit_index", "1");
		deleteRequest.timeout("1s");

		DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
		// 打印文档的内容
		System.out.println(deleteResponse.status());
		System.out.println(deleteResponse); // 返回全部的内容和命令是一样的
	}

	// 特殊的,真的项目。 批量插入数据

	@Test
	void testBulkRequest() throws IOException {
		BulkRequest bulkRequest = new BulkRequest();
		bulkRequest.timeout("10s");

		ArrayList userList = new ArrayList&lt;&gt;();
		userList.add(new User("张三1",1));
		userList.add(new User("张三2",2));
		userList.add(new User("张三3",3));
		userList.add(new User("张三4",4));
		userList.add(new User("张三5",5));
		userList.add(new User("张三6",6));
		userList.add(new User("张三7",7));

		// 批处理请求
		for (int i=0;iorg.jsoupjsoup1.10.2com.alibabafastjson1.2.68org.springframework.bootspring-boot-starter-data-elasticsearchorg.springframework.bootspring-boot-starter-thymeleaforg.springframework.bootspring-boot-starter-weborg.springframework.bootspring-boot-devtoolsruntimetrueorg.springframework.bootspring-boot-configuration-processortrueorg.projectlomboklomboktrueorg.springframework.bootspring-boot-starter-testtestorg.junit.vintagejunit-vintage-engine

爬虫

配置文件

package cn.gorit.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * Spring 步骤
 * 1. 找对象
 * 2. 放到 spring 中使用
 * 3. 分析源码
 *
 * @Classname ElasticSearchConfig
 * @Description TODO
 * @Date 2020/10/21 17:20
 * @Created by CodingGorit
 * @Version 1.0
 */
@Configuration // xml -bean
public class ElasticSearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient() {
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")
                )
        );
        return client;
    }

}

> 爬取京东搜索的内容

config 配置类

package cn.gorit.util;

import cn.gorit.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.stereotype.Component;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

/**
 * @Classname HtmlParseUtil
 * @Description TODO
 * @Date 2020/10/21 23:17
 * @Created by CodingGorit
 * @Version 1.0
 */
@Component
public class HtmlParseUtil {

//    public static void main(String[] args) throws Exception {
//        new HtmlParseUtil().parseJD("英语").forEach(System.out::println);
//    }

    public List parseJD(String keyword) throws Exception {
        // 请求 url
        // 联网,不能获取 ajax 数据
        String url = "https://search.jd.com/Search?keyword=wd&amp;enc=utf-8";
        // 解析网页 (返回的  Document 对象)
        Document document = Jsoup.parse(new URL(url.replace("wd",keyword)),30000);
        // 获取所有节点标签
        Element element = document.getElementById("J_goodsList");
        // 获取所有的 li 元素
        Elements elements = element.getElementsByTag("li");
        // 获取元素中的内容
        List goodsList = new ArrayList&lt;&gt;();
        for (Element e: elements) {
            String img = e.getElementsByTag("img").eq(0).attr("data-lazy-img");
            String price = e.getElementsByClass("p-price").eq(0).text();
            String title = e.getElementsByClass("p-name").eq(0).text();

            goodsList.add(new Content(title,img,price));
//            System.out.println(img);
//            System.out.println(price);
//            System.out.println(title);
        }
        return goodsList;
    }
}

Service 方法

package cn.gorit.service;

import cn.gorit.pojo.Content;
import cn.gorit.util.HtmlParseUtil;
import com.alibaba.fastjson.JSON;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightField;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

/**
 * @Classname ContentService
 * @Description TODO
 * @Date 2020/10/22 18:44
 * @Created by CodingGorit
 * @Version 1.0
 */
@Service
public class ContentService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    // 不能直接使用,只要 Spring 容器
    public static void main(String[] args) throws Exception {
        new ContentService().parseContent("java");
    }

    // 1. 解析数据放入 es 索引中
    public Boolean parseContent (String keywords) throws Exception {
        // 获取查询到的列表的信息
        List contents = new HtmlParseUtil().parseJD(keywords);
        // 把查询到的数据放入 es 中
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i=0;i &lt; contents.size();++i) {
            bulkRequest.add(
                    new IndexRequest("jd_goods")
                    .source(JSON.toJSONString(contents.get(i)),XContentType.JSON));
        }
        BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulkResponse.hasFailures();
    }

    // 2. 获取这些数据,实现基本的搜索功能
    public List&gt; searchPagehighLight   (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo &lt;= 1)
            pageNo = 1;

        // 条件清晰
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));

        // 高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false);
        highlightBuilder.preTags("<span token operator">:#FF0000;">");
        highlightBuilder.postTags("</span>");
        builder.highlighter(highlightBuilder);

        // 执行搜索
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList&gt; list= new ArrayList&lt;&gt;();
        for (SearchHit hit: searchResponse.getHits().getHits()) {
            // 解析高亮的字段
            Map highlightFields = hit.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map sourceAsMap = hit.getSourceAsMap();// 原来的结果
            // 解析高亮字段,将原来的字段换成我们高亮的字段即可
            if (title != null) {
                Text[] fragments = title.fragments();
                StringBuilder nTitle = new StringBuilder();
                for (Text text:fragments) {
                    nTitle.append(text);
                }
                sourceAsMap.put("title",nTitle);
            }
            list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可
        }
        return list;
    }

    // 2. 获取这些数据,实现基本的搜索功能
    public List&gt; searchPage (String keyword, int pageNo,int pageSize) throws IOException {
        if (pageNo &lt;= 1)
            pageNo = 1;

        // 条件清晰
        SearchRequest searchRequest = new SearchRequest("jd_goods");

        SearchSourceBuilder builder = new SearchSourceBuilder();

        builder.from(pageNo);
        builder.size(pageSize);
        // 精准匹配
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title",keyword);
        builder.query(termQueryBuilder);
        builder.timeout(new TimeValue(60, TimeUnit.SECONDS));


        // 执行搜索
        searchRequest.source(builder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        // 解析结果
        ArrayList&gt; list= new ArrayList&lt;&gt;();
        for (SearchHit hit: searchResponse.getHits().getHits()) {

            list.add(hit.getSourceAsMap()); // 高亮的字段替换为原来的内容即可
        }
        return list;
    }
}

Controller

package cn.gorit.controller;

import cn.gorit.pojo.Content;
import cn.gorit.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.bind.annotation.RestControllerAdvice;

import java.io.IOException;
import java.util.List;
import java.util.Map;

/**
 * @Classname ContentController
 * @Description TODO
 * @Date 2020/10/22 18:45
 * @Created by CodingGorit
 * @Version 1.0
 */
@RestController
public class ContentController {

    @Autowired
    private ContentService service;

    /**
     * 将数据添加到 ES 中
     * @param keyword
     * @return
     * @throws Exception
     */
    @GetMapping("/parse/{keyword}")
    public Boolean pares(@PathVariable("keyword")  String keyword) throws Exception {
        return service.parseContent(keyword);
    }

    /**
     * 查询 ES 的数据
     * @param keyword
     * @param pageNo
     * @param pageSize
     * @return
     * @throws IOException
     */
    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List&gt; search(@PathVariable("keyword") String keyword,@PathVariable("pageNo") int pageNo, @PathVariable("pageSize") int pageSize) throws IOException {
        if (pageNo == 0) {
            pageNo = 1;
        }
        return service.searchPage(keyword, pageNo, pageSize);
    }
}

前后端分离

POSTMAN 测试

搜索高亮

> 一套项目,多端运用

十、总结

  1. ElasticSearch 基本使用
  2. SpringBoot 整合 ES
  3. 实战搜索

> 个人开源项目 (Coding-With-Java ) 欢迎大家点赞

1人推荐
随时随地看视频
慕课网APP