Elasticsearch Interview Questions
Q. What is Elasticsearch?
“Elasticsearch is Open source, cross-paltform, scalable, full-text search and analytical engine based on Apache Lucene technology. It help in NRT (Near Real Time) analysis and full text search on big volume of data on distributed clustered environment.”
- Elasticsearch is developed by Apache in Java Language.
- Elasticsearch store records in form of JSON documents as key and value.
- By Default Schema free if required schema can added by mapping from client app.
- Access by HTTP over the browser, by application through Elasticsearch REST Client API or Elasticsearch Transport Client.
- Elasticsearch Organization provide some application and plug-in for making Elasticsearch more useful like Kibana for doing search and Analysis by different charts and Dashboard.
Q. What are the advantages of Elasticsearch?
- Elasticsearch is implemented on Java, which makes it compatible on almost every platform.
- Elasticsearch is Near Real Time (NRT), in other words after one second the added document is searchable in this engine.
- Elasticsearch cluster is distributed, which makes it easy to scale and integrate in any big organizations.
- Creating full backups of data are easy by using the concept of gateway, which is present in Elasticsearch.
- Elasticsearch REST uses JSON objects as responses, which makes it possible to invoke the Elasticsearch server with a large number of different programming languages.
- Elasticsearch supports almost every document type except those that do not support text rendering.
- Handling multi-tenancy is very easy in Elasticsearch when compared to Apache Solr.
Q. What are the Disadvantages of Elasticsearch?
Elasticsearch does not have multi-language support in terms of handling request and response data in JSON while in Apache Solr, where it is possible in CSV, XML and JSON formats.
Elasticsearch have a problem of Split Brain situations, but in rare cases.
Q. What is a Type in Elasticsearch?
In Elasticsearch, Type signifies the class of similar data. It can signify a name for making and is beneficial for the abstractions or for indicating the similar yet not identical data.
Q. What is a cluster in Elasticsearch?
Cluster is a collection of one or multiple servers which consists of the data and also serves the federated indexing across all the different nodes. By default, a cluster can be identified by a significant name, i.e., Elasticsearch.
Q. What is the ingest node in Elasticsearch?
In Elasticsearch, an ingest node is a type of note that can be utilized during the documentation process before indexing. It is a part of the Elasticsearch cluster, and it intercepts the index request and bulk applying the transformation and later passes it back to the index.
Q. What is a quorum in Elasticsearch?
The quorum by default, is set to action.write_onsistency. in case the quorum is not fulfilled then the index returns after the timeout with an error. Elasticsearch documentation follows the rule for write_consiatency level in quorum as quorum(>replicas/2+1).
Q. What is Index in Elasticsearch?
Index – An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data.
An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.
Q. What is Document in Elasticsearch?
Document – A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format.
Q. What are Shards in Elasticsearch and Explain the concept?
An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.
To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.
Q. What are the benefits of Sharding in Elasicsearch?
Sharding is important for two primary reasons:
It allows you to horizontally split/scale your content volume
It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput
Q. What are Replicas and Explain what do you understand?
In a network/cloud environment where failures can be expected any time, it is very useful and highly recommended to have a failover mechanism in case a shard/node somehow goes offline or disappears for whatever reason.
To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.
Q. What are the benefits of Replicas in Elasticsearch?
Replication is important for two primary reasons:
It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.
Q. What is the minimum Java version required to install Elasticsearch?
To install Elasticsearch on a machine, you require having at least Java 8.
Q. What are the different types of aggregations in Elasticsearch?
There are many different types of aggregations, each with its own purpose and output.
Metric – Aggregations that keep track and compute metrics over a set of documents.
Matrix – A family of aggregations that operate on multiple fields and produce a matrix result based on the values extracted from the requested document fields. Unlike metric and bucket aggregations, this aggregation family does not yet support scripting.
Pipeline – Aggregations that aggregate the output of other aggregations and their associated metrics
Q. What are Indices APIs?
The indices APIs are used to manage individual indices, index settings, aliases, mappings, and index templates.
Q. What is cat API in Elasticsearch?
All the cat commands accept a query string parameter help to see all the headers and info they provide, and the /_cat command alone lists all the available commands.
Q. What are the different cat commands available in Elasticsearch cat API?
The different commands available in cat APIs are:
- cat aliases, cat allocation, cat count, cat fielddata
- cat health, cat indices, cat master, cat nodeattrs
- cat nodes, cat pending tasks, cat plugins, cat recovery
- cat repositories, cat thread pool, cat shards, cat segments
- cat snapshots, cat templates
Q. What is Query DSL in Elasticsearch?
Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:
Leaf query clauses – Leaf query clauses look for a particular value in a particular field, such as the match, term or range queries. These queries can be used by themselves.
Compound query clauses – Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the bool or dis_max query), or to alter their behavior (such as the constant_score query).
Q. What are SHARDS in Elasticsearch?
A shard is an instance of a Lucene index that is a self-contained search engine that indexes and handles queries for a subset of the data in an Elasticsearch cluster. Basically, a shard is a unit at which Elasticsearch distributes data around the cluster.
Q. What are REPLICAS in Elasticsearch?
In Elasticsearch, Replicas are copies of the shards.
Q. What is a Tokenizer in ElasticSearch?
In Elasticsearch, a tokenizer receives a stream of characters, breaks it up into individual tokens, and outputs a stream of tokens.
Q. How to create an index in ElasticSearch Cluster?
To create an index, all you have to do is pass the index name without other parameters, which creates an index using default settings. The syntax for creating a new index in Elasticsearch cluster is: PUT /<index>.
Q. How to delete an index in Elastic search?
You can delete an index in Elastic search with the syntax: DELETE /<index>.
Q. How to list all indexes of a Cluster in ElasticSearch?
The way to list all indexes of a Cluster in ElasticSearch are:
using the _cat API
using the v query parameter
using the _aliases API
Q. How to add a Mapping in an Index in ElasticSearch?
PutMapping is used to add a Mapping in an Index in ElasticSearch.
Q. How does aggregation work in Elasticsearch?
In Elasticsearch, all analytics are built using aggregations i.e. Elasticsearch Aggregations provide the ability to group and perform calculations and statistics by using a simple search query.
Q. How does Elasticsearch store data?
Elasticsearch accepts documents in JSON format instead of storing information as rows of columnar data. It stores that original representation as it came in.
Q. How to check elastic search server is running or not?
You Can check or verify whether the elasticsearch is running by typing $smarts/bin/sm_service show.
Q. What is an Analyzer in ElasticSearch?
In Elasticsearch, Analyzers are the special algorithms that determine how a string field in a document is transformed into terms in an inverted index.
Q. What is Ingest Node in Elasticsearch?
In Elasticsearch, an ingest node is a type of Elasticsearch node which can be used to process documents prior to indexing.
Q. What is Tribe Node in Elasticsearch?
In Elasticsearch, the tribe node is a node that works by retrieving the cluster state from all connected clusters and merging them into a global cluster state. With this information at hand, it is able to perform read and write operations against the nodes in all clusters as if they were local.
For more Click Here