Using Elasticsearch搜索引擎

Environment

java环境
Elasticsearch
Kibana
mysql
Start Using
你可以从 elastic 的官网 elastic.co/downloads/elasticsearch 获取最新版本的 Elasticsearch

1	curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.6.tar.gz

1	tar -xvf elasticsearch-5.6.6.tar.gz

1	cd elasticsearch-5.6.6/bin

1	./elasticsearch

==可以详细阅读以下配置文件elasticsearch.yml==

成功启动输出:

./elasticsearch
[2018-01-27T09:44:57,842][INFO ][o.e.n.Node               ] [] initializing ...
[2018-01-27T09:44:57,960][INFO ][o.e.e.NodeEnvironment    ] [En-tHLt] using [1] data paths, mounts [[/ (/dev/disk1)]], net usable_space [167gb], net total_space [232.5gb], types [hfs]
[2018-01-27T09:44:57,960][INFO ][o.e.e.NodeEnvironment    ] [En-tHLt] heap size [990.7mb], compressed ordinary object pointers [true]
[2018-01-27T09:44:57,990][INFO ][o.e.n.Node               ] node name [En-tHLt] derived from node ID [En-tHLt0TlSTPXD9baXjbQ]; set [node.name] to override
[2018-01-27T09:44:57,990][INFO ][o.e.n.Node               ] version[6.1.2], pid[1368], build[5b1fea5/2018-01-10T02:35:59.208Z], OS[Mac OS X/10.12.6/x86_64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_121/25.121-b13]
[2018-01-27T09:44:57,990][INFO ][o.e.n.Node               ] JVM arguments [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/Users/bighandsome/Documents/elasticsearch, -Des.path.conf=/Users/bighandsome/Documents/elasticsearch/config]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [aggs-matrix-stats]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [analysis-common]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [ingest-common]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [lang-expression]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [lang-mustache]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [lang-painless]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [mapper-extras]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [parent-join]
[2018-01-27T09:44:58,925][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [percolator]
[2018-01-27T09:44:58,926][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [reindex]
[2018-01-27T09:44:58,926][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [repository-url]
[2018-01-27T09:44:58,926][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [transport-netty4]
[2018-01-27T09:44:58,926][INFO ][o.e.p.PluginsService     ] [En-tHLt] loaded module [tribe]
[2018-01-27T09:44:58,927][INFO ][o.e.p.PluginsService     ] [En-tHLt] no plugins loaded
[2018-01-27T09:45:00,635][INFO ][o.e.d.DiscoveryModule    ] [En-tHLt] using discovery type [zen]
[2018-01-27T09:45:01,182][INFO ][o.e.n.Node               ] initialized
[2018-01-27T09:45:01,182][INFO ][o.e.n.Node               ] [En-tHLt] starting ...
[2018-01-27T09:45:01,428][INFO ][o.e.t.TransportService   ] [En-tHLt] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2018-01-27T09:45:04,501][INFO ][o.e.c.s.MasterService    ] [En-tHLt] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {En-tHLt}{En-tHLt0TlSTPXD9baXjbQ}{SE6_wSZkTLmGvTIQpMHXTQ}{127.0.0.1}{127.0.0.1:9300}
[2018-01-27T09:45:04,506][INFO ][o.e.c.s.ClusterApplierService] [En-tHLt] new_master {En-tHLt}{En-tHLt0TlSTPXD9baXjbQ}{SE6_wSZkTLmGvTIQpMHXTQ}{127.0.0.1}{127.0.0.1:9300}, reason: apply cluster state (from master [master {En-tHLt}{En-tHLt0TlSTPXD9baXjbQ}{SE6_wSZkTLmGvTIQpMHXTQ}{127.0.0.1}{127.0.0.1:9300} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[2018-01-27T09:45:04,530][INFO ][o.e.h.n.Netty4HttpServerTransport] [En-tHLt] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2018-01-27T09:45:04,530][INFO ][o.e.n.Node               ] [En-tHLt] started
[2018-01-27T09:45:04,718][INFO ][o.e.g.GatewayService     ] [En-tHLt] recovered [3] indices into cluster_state
[2018-01-27T09:45:05,149][INFO ][o.e.c.r.a.AllocationService] [En-tHLt] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[megacorp][2]] ...]).

接着启动Kibana

1 2	cd Kibana/bin ./kibana

成功启动输出:

./kibana
  log   [01:47:19.506] [info][status][plugin:kibana@6.1.2] Status changed from uninitialized to green - Ready
  log   [01:47:19.561] [info][status][plugin:elasticsearch@6.1.2] Status changed from uninitialized to yellow - Waiting for Elasticsearch
  log   [01:47:19.594] [info][status][plugin:console@6.1.2] Status changed from uninitialized to green - Ready
  log   [01:47:19.613] [info][status][plugin:metrics@6.1.2] Status changed from uninitialized to green - Ready
  log   [01:47:19.922] [info][status][plugin:timelion@6.1.2] Status changed from uninitialized to green - Ready
  log   [01:47:19.926] [info][listening] Server running at http://localhost:5601
  log   [01:47:19.949] [info][status][plugin:elasticsearch@6.1.2] Status changed from yellow to green - Ready

当然端口号是可以配置的，

1	cat config/kibana.yml

至此单节点的Elasticsearch已经可以使用了，准确的说是在我们启动Elasticsearch之后就可以使用了，Kibana是我们用来方便管理的插件。

简单操作

Elasticsearch对主流语言提供了支持，在这里我们选择同用的RESTful方式

所有其他语言可以使用 RESTful API 通过端口 9200 和 Elasticsearch 进行通信，你可以用你最喜爱的 web 客户端访问 Elasticsearch 。事实上，正如你所看到的，你甚至可以使用 curl 命令来和 Elasticsearch 交互。

增加某一项

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

注意，路径 /megacorp/employee/1 包含了三部分的信息：

megacorp
索引名称
employee
类型名称
1
特定雇员的ID

再放2个:

PUT /megacorp/employee/2
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}
PUT /megacorp/employee/3
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}

检索文档

1	GET /megacorp/employee/1

轻量搜索

1	GET /megacorp/employee/_search

结果：

{
   "took":      6,
   "timed_out": false,
   "_shards": { ... },
   "hits": {
      "total":      3,
      "max_score":  1,
      "hits": [
         {
            "_index":         "megacorp",
            "_type":          "employee",
            "_id":            "3",
            "_score":         1,
            "_source": {
               "first_name":  "Douglas",
               "last_name":   "Fir",
               "age":         35,
               "about":       "I like to build cabinets",
               "interests": [ "forestry" ]
            }
         },
         {
            "_index":         "megacorp",
            "_type":          "employee",
            "_id":            "1",
            "_score":         1,
            "_source": {
               "first_name":  "John",
               "last_name":   "Smith",
               "age":         25,
               "about":       "I love to go rock climbing",
               "interests": [ "sports", "music" ]
            }
         },
         {
            "_index":         "megacorp",
            "_type":          "employee",
            "_id":            "2",
            "_score":         1,
            "_source": {
               "first_name":  "Jane",
               "last_name":   "Smith",
               "age":         32,
               "about":       "I like to collect rock albums",
               "interests": [ "music" ]
            }
         }
      ]
   }
}

匹配搜索

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
}

全文搜索

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}

短语搜索

找出一个属性中的独立单词是没有问题的，但有时候想要精确匹配一系列单词或者短语。比如，我们想执行这样一个查询，仅匹配同时包含 “rock” 和 “climbing” ，并且二者以短语 “rock climbing” 的形式紧挨着的雇员记录。

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    }
}

剩余特性

其他语法已经特性参照《Elasticsearch权威指南》

数据库导入

如果我们想要搜索的数据在数据库中该怎么办呢。

配置mysql文件
使用tool
配置mysql文件
我的 my.cnf

# Example MySQL config file for medium systems.
  #
  # This is for a system with little memory (32M - 64M) where MySQL plays
  # an important part, or systems up to 128M where MySQL is used together with
  # other programs (such as a web server)
  #
  # MySQL programs look for option files in a set of
  # locations which depend on the deployment platform.
  # You can copy this option file to one of those
  # locations. For information about these locations, see:
  # http://dev.mysql.com/doc/mysql/en/option-files.html
  #
  # In this file, you can use all long options that a program supports.
  # If you want to know which options a program supports, run the program
  # with the "--help" option.
  # The following options will be passed to all MySQL clients
  [client]
  default-character-set=utf8
  #password   = your_password
  port        = 3306
  socket      = /tmp/mysql.sock
  # Here follows entries for some specific programs
  # The MySQL server
  [mysqld]
  character-set-server=utf8
  init_connect='SET NAMES utf8
  port        = 3306
  socket      = /tmp/mysql.sock
  skip-external-locking
  key_buffer_size = 16M
  max_allowed_packet = 1M
  table_open_cache = 64
  sort_buffer_size = 512K
  net_buffer_length = 8K
  read_buffer_size = 256K
  read_rnd_buffer_size = 512K
  myisam_sort_buffer_size = 8M
  character-set-server=utf8
  init_connect='SET NAMES utf8'
# Don't listen on a TCP/IP port at all. This can be a security enhancement,
# if all processes that need to connect to mysqld run on the same host.
# All interaction with mysqld must be made via Unix sockets or named pipes.
# Note that using this option without enabling named pipes on Windows
# (via the "enable-named-pipe" option) will render mysqld useless!
#
#skip-networking
  # Replication Master Server (default)
  # binary logging is required for replication
  log-bin=mysql-bin
    # binary logging format - mixed recommended
    binlog_format=ROW
      # required unique id between 1 and 2^32 - 1
      # defaults to 1 if master-host is not set
      # but will not function as a master if omitted
      server-id   = 1
    # Replication Slave (comment out master section to use this)
    #
    # To configure this host as a replication slave, you can choose between
    # two methods :
    #
    # 1) Use the CHANGE MASTER TO command (fully described in our manual) -
    #    the syntax is:
    #
    #    CHANGE MASTER TO MASTER_HOST=<host>, MASTER_PORT=<port>,
    #    MASTER_USER=<user>, MASTER_PASSWORD=<password> ;
    #
    #    where you replace <host>, <user>, <password> by quoted strings and
    #    <port> by the master's port number (3306 by default).
    #
    #    Example:
    #
    #    CHANGE MASTER TO MASTER_HOST='125.564.12.1', MASTER_PORT=3306,
    #    MASTER_USER='joe', MASTER_PASSWORD='secret';
    #
    # OR
    #
    # 2) Set the variables below. However, in case you choose this method, then
    #    start replication for the first time (even unsuccessfully, for example
    #    if you mistyped the password in master-password and the slave fails to
    #    connect), the slave will create a master.info file, and any later
    #    change in this file to the variables' values below will be ignored and
    #    overridden by the content of the master.info file, unless you shutdown
    #    the slave server, delete master.info and restart the slaver server.
    #    For that reason, you may want to leave the lines below untouched
    #    (commented) and instead use CHANGE MASTER TO (see above)
    #
    # required unique id between 2 and 2^32 - 1
    # (and different from the master)
    # defaults to 2 if master-host is set
    # but will not function as a slave if omitted
    #server-id       = 2
    #
    # The replication master for this slave - required
    #master-host     =   <hostname>
    #
    # The username the slave will use for authentication when connecting
    # to the master - required
    #master-user     =   <username>
    #
    # The password the slave will authenticate with when connecting to
    # the master - required
    #master-password =   <password>
    #
    # The port the master is listening on.
    # optional - defaults to 3306
    #master-port     =  <port>
    #
    # binary logging - not required for slaves, but recommended
    #log-bin=mysql-bin
      # Uncomment the following if you are using InnoDB tables
      #innodb_data_home_dir = /usr/local/mysql/data
      #innodb_data_file_path = ibdata1:10M:autoextend
      #innodb_log_group_home_dir = /usr/local/mysql/data
      # You can set .._buffer_pool_size up to 50 - 80 %
      # of RAM but beware of setting memory usage too high
      #innodb_buffer_pool_size = 16M
      #innodb_additional_mem_pool_size = 2M
      # Set .._log_file_size to 25 % of buffer pool size
      #innodb_log_file_size = 5M
      #innodb_log_buffer_size = 8M
      #innodb_flush_log_at_trx_commit = 1
      #innodb_lock_wait_timeout = 50
        [mysqldump]
        quick
        max_allowed_packet = 16M
          [mysql]
          no-auto-rehash
          # Remove the next comment character if you are not familiar with SQL
          #safe-updates
          default-character-set=utf8
        [myisamchk]
        key_buffer_size = 20M
        sort_buffer_size = 20M
        read_buffer = 2M
        write_buffer = 2M
          [mysqlhotcopy]
          interactive-timeout

tools

1	go get github.com/siddontang/go-mysql-elasticsearch

1	cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch

1
make
Start ./bin/go-mysql-elasticsearch -config=./etc/river.toml and enjoy it.

配置文件

# MySQL address, user and password
# user must have replication privilege in MySQL.
my_addr = "127.0.0.1:3306"
my_user = "root"
my_pass = "root"
# Elasticsearch address
es_addr = "127.0.0.1:9200"
# Path to store data, like master.info, and dump MySQL data
data_dir = "./var"
# Inner Http status address
stat_addr = "127.0.0.1:12800"
# pseudo server id like a slave
server_id = 1001
# mysql or mariadb
flavor = "mysql"
# mysqldump execution path
mysqldump = "mysqldump"
# MySQL data source
[[source]]
schema = "bysj"
# Only below tables will be synced into Elasticsearch.
# "test_river_[0-9]{4}" is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023
# I don't think it is necessary to sync all tables in a database.
tables = ["goods"]
# Below is for special rule mapping
[[rule]]
schema = "bysj"
table = "goods"
index = "goods_name"

大致就是这样。。