Elasticsearch搜索引擎

Using Elasticsearch搜索引擎

Environment

  1. java环境
  2. Elasticsearch
  3. Kibana
  4. mysql

    Start Using

    你可以从 elastic 的官网 elastic.co/downloads/elasticsearch 获取最新版本的 Elasticsearch
1
curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.6.tar.gz
1
tar -xvf elasticsearch-5.6.6.tar.gz
1
cd elasticsearch-5.6.6/bin
1
./elasticsearch

==可以详细阅读以下配置文件elasticsearch.yml==

成功启动输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
./elasticsearch
[2018-01-27T09:44:57,842][INFO ][o.e.n.Node ] [] initializing ...
[2018-01-27T09:44:57,960][INFO ][o.e.e.NodeEnvironment ] [En-tHLt] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [167gb], net total_space [232.5gb], types [hfs]
[2018-01-27T09:44:57,960][INFO ][o.e.e.NodeEnvironment ] [En-tHLt] heap size [990.7mb], compressed ordinary object pointers [true]
[2018-01-27T09:44:57,990][INFO ][o.e.n.Node ] node name [En-tHLt] derived from node ID [En-tHLt0TlSTPXD9baXjbQ]; set [node.name] to override
[2018-01-27T09:44:57,990][INFO ][o.e.n.Node ] version[6.1.2], pid[1368], build[5b1fea5/2018-01-10T02:35:59.208Z], OS[Mac OS X/10.12.6/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_121/25.121-b13]
[2018-01-27T09:44:57,990][INFO ][o.e.n.Node ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/Users/bighandsome/Documents/elasticsearch, -Des.path.conf=/Users/bighandsome/Documents/elasticsearch/config]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [aggs-matrix-stats]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [analysis-common]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [ingest-common]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [lang-expression]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [lang-mustache]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [lang-painless]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [mapper-extras]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [parent-join]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [percolator]
[2018-01-27T09:44:58,926][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [reindex]
[2018-01-27T09:44:58,926][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [repository-url]
[2018-01-27T09:44:58,926][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [transport-netty4]
[2018-01-27T09:44:58,926][INFO ][o.e.p.PluginsService ] [En-tHLt] loaded module [tribe]
[2018-01-27T09:44:58,927][INFO ][o.e.p.PluginsService ] [En-tHLt] no plugins loaded
[2018-01-27T09:45:00,635][INFO ][o.e.d.DiscoveryModule ] [En-tHLt] using discovery type [zen]
[2018-01-27T09:45:01,182][INFO ][o.e.n.Node ] initialized
[2018-01-27T09:45:01,182][INFO ][o.e.n.Node ] [En-tHLt] starting ...
[2018-01-27T09:45:01,428][INFO ][o.e.t.TransportService ] [En-tHLt] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2018-01-27T09:45:04,501][INFO ][o.e.c.s.MasterService ] [En-tHLt] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {En-tHLt}{En-tHLt0TlSTPXD9baXjbQ}{SE6_wSZkTLmGvTIQpMHXTQ}{127.0.0.1}{127.0.0.1:9300}
[2018-01-27T09:45:04,506][INFO ][o.e.c.s.ClusterApplierService] [En-tHLt] new_master {En-tHLt}{En-tHLt0TlSTPXD9baXjbQ}{SE6_wSZkTLmGvTIQpMHXTQ}{127.0.0.1}{127.0.0.1:9300}, reason: apply cluster state (from master [master {En-tHLt}{En-tHLt0TlSTPXD9baXjbQ}{SE6_wSZkTLmGvTIQpMHXTQ}{127.0.0.1}{127.0.0.1:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-27T09:45:04,530][INFO ][o.e.h.n.Netty4HttpServerTransport] [En-tHLt] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2018-01-27T09:45:04,530][INFO ][o.e.n.Node ] [En-tHLt] started
[2018-01-27T09:45:04,718][INFO ][o.e.g.GatewayService ] [En-tHLt] recovered [3] indices into cluster_state
[2018-01-27T09:45:05,149][INFO ][o.e.c.r.a.AllocationService] [En-tHLt] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[megacorp][2]] ...]).

接着启动Kibana

1
2
cd Kibana/bin
./kibana

成功启动输出:

1
2
3
4
5
6
7
8
./kibana
log [01:47:19.506] [info][status][plugin:kibana@6.1.2] Status changed from uninitialized to green - Ready
log [01:47:19.561] [info][status][plugin:elasticsearch@6.1.2] Status changed from uninitialized to yellow - Waiting for Elasticsearch
log [01:47:19.594] [info][status][plugin:console@6.1.2] Status changed from uninitialized to green - Ready
log [01:47:19.613] [info][status][plugin:metrics@6.1.2] Status changed from uninitialized to green - Ready
log [01:47:19.922] [info][status][plugin:timelion@6.1.2] Status changed from uninitialized to green - Ready
log [01:47:19.926] [info][listening] Server running at http://localhost:5601
log [01:47:19.949] [info][status][plugin:elasticsearch@6.1.2] Status changed from yellow to green - Ready

当然端口号是可以配置的,

1
cat config/kibana.yml

至此单节点的Elasticsearch已经可以使用了,准确的说是在我们启动Elasticsearch之后就可以使用了,Kibana是我们用来方便管理的插件。

简单操作

Elasticsearch对主流语言提供了支持,在这里我们选择同用的RESTful方式

所有其他语言可以使用 RESTful API 通过端口 9200 和 Elasticsearch 进行通信,你可以用你最喜爱的 web 客户端访问 Elasticsearch 。事实上,正如你所看到的,你甚至可以使用 curl 命令来和 Elasticsearch 交互。

增加某一项

1
2
3
4
5
6
7
8
PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}

注意,路径 /megacorp/employee/1 包含了三部分的信息:

megacorp
索引名称
employee
类型名称
1
特定雇员的ID

再放2个:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}

检索文档

1
GET /megacorp/employee/1

轻量搜索

1
GET /megacorp/employee/_search

结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
{
"took": 6,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "megacorp",
"_type": "employee",
"_id": "3",
"_score": 1,
"_source": {
"first_name": "Douglas",
"last_name": "Fir",
"age": 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_index": "megacorp",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}

匹配搜索

1
2
3
4
5
6
7
8
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}

全文搜索

1
2
3
4
5
6
7
8
GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}

短语搜索

找出一个属性中的独立单词是没有问题的,但有时候想要精确匹配一系列单词或者短语 。 比如, 我们想执行这样一个查询,仅匹配同时包含 “rock” 和 “climbing” ,并且 二者以短语 “rock climbing” 的形式紧挨着的雇员记录。

1
2
3
4
5
6
7
8
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}

剩余特性

其他语法已经特性参照《Elasticsearch权威指南》

数据库导入

如果我们想要搜索的数据在数据库中该怎么办呢。

  1. 配置mysql文件
  2. 使用tool

    配置mysql文件

    我的 my.cnf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# Example MySQL config file for medium systems.
#
# This is for a system with little memory (32M - 64M) where MySQL plays
# an important part, or systems up to 128M where MySQL is used together with
# other programs (such as a web server)
#
# MySQL programs look for option files in a set of
# locations which depend on the deployment platform.
# You can copy this option file to one of those
# locations. For information about these locations, see:
# http://dev.mysql.com/doc/mysql/en/option-files.html
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.
# The following options will be passed to all MySQL clients
[client]
default-character-set=utf8
#password = your_password
port = 3306
socket = /tmp/mysql.sock
# Here follows entries for some specific programs
# The MySQL server
[mysqld]
character-set-server=utf8
init_connect='SET NAMES utf8
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
character-set-server=utf8
init_connect='SET NAMES utf8'
# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking
# Replication Master Server (default)
# binary logging is required for replication
log-bin=mysql-bin
# binary logging format - mixed recommended
binlog_format=ROW
# required unique id between 1 and 2^32 - 1
# defaults to 1 if master-host is not set
# but will not function as a master if omitted
server-id = 1
# Replication Slave (comment out master section to use this)
#
# To configure this host as a replication slave, you can choose between
# two methods :
#
# 1) Use the CHANGE MASTER TO command (fully described in our manual) -
# the syntax is:
#
# CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
# MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
#
# where you replace <host>, <user>, <password> by quoted strings and
# <port> by the master's port number (3306 by default).
#
# Example:
#
# CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
# MASTER_USER='joe', MASTER_PASSWORD='secret';
#
# OR
#
# 2) Set the variables below. However, in case you choose this method, then
# start replication for the first time (even unsuccessfully, for example
# if you mistyped the password in master-password and the slave fails to
# connect), the slave will create a master.info file, and any later
# change in this file to the variables' values below will be ignored and
# overridden by the content of the master.info file, unless you shutdown
# the slave server, delete master.info and restart the slaver server.
# For that reason, you may want to leave the lines below untouched
# (commented) and instead use CHANGE MASTER TO (see above)
#
# required unique id between 2 and 2^32 - 1
# (and different from the master)
# defaults to 2 if master-host is set
# but will not function as a slave if omitted
#server-id = 2
#
# The replication master for this slave - required
#master-host = <hostname>
#
# The username the slave will use for authentication when connecting
# to the master - required
#master-user = <username>
#
# The password the slave will authenticate with when connecting to
# the master - required
#master-password = <password>
#
# The port the master is listening on.
# optional - defaults to 3306
#master-port = <port>
#
# binary logging - not required for slaves, but recommended
#log-bin=mysql-bin
# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = /usr/local/mysql/data
#innodb_data_file_path = ibdata1:10M:autoextend
#innodb_log_group_home_dir = /usr/local/mysql/data
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
#innodb_buffer_pool_size = 16M
#innodb_additional_mem_pool_size = 2M
# Set .._log_file_size to 25 % of buffer pool size
#innodb_log_file_size = 5M
#innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
#innodb_lock_wait_timeout = 50
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates
default-character-set=utf8
[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout

tools

  1. 1
    go get github.com/siddontang/go-mysql-elasticsearch
  2. 1
    cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
  3. 1
    make
  4. Start ./bin/go-mysql-elasticsearch -config=./etc/river.toml and enjoy it.

配置文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# MySQL address, user and password
# user must have replication privilege in MySQL.
my_addr = "127.0.0.1:3306"
my_user = "root"
my_pass = "root"
# Elasticsearch address
es_addr = "127.0.0.1:9200"
# Path to store data, like master.info, and dump MySQL data
data_dir = "./var"
# Inner Http status address
stat_addr = "127.0.0.1:12800"
# pseudo server id like a slave
server_id = 1001
# mysql or mariadb
flavor = "mysql"
# mysqldump execution path
mysqldump = "mysqldump"
# MySQL data source
[[source]]
schema = "bysj"
# Only below tables will be synced into Elasticsearch.
# "test_river_[0-9]{4}" is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023
# I don't think it is necessary to sync all tables in a database.
tables = ["goods"]
# Below is for special rule mapping
[[rule]]
schema = "bysj"
table = "goods"
index = "goods_name"

大致就是这样。。

回见~