向量 Field Index

本指南介绍在 collection 中的向量 field 上创建和管理 index 的基本操作。

概述

利用存储在 index 文件中的元数据，Milvus 将您的数据组织成专门的结构，在搜索或查询期间便于快速检索所请求的信息。

Milvus 提供了几种 index 类型和度量来对 field 值进行排序，以便进行高效的相似性搜索。下表列出了不同向量 field 类型支持的 index 类型和度量。目前，Milvus 支持各种类型的向量数据，包括浮点嵌入（通常称为浮点向量或密集向量）、二进制嵌入（也称为二进制向量）和稀疏嵌入（也称为稀疏向量）。有关详细信息，请参阅内存 Index 和相似性度量。

浮点嵌入二进制嵌入稀疏嵌入

度量类型	Index 类型
欧几里得距离 (L2) 内积 (IP) 余弦相似度 (COSINE)	FLAT IVF_FLAT IVF_SQ8 IVF_PQ GPU_IVF_FLAT GPU_IVF_PQ HNSW DISKANN

度量类型	Index 类型
Jaccard (JACCARD) Hamming (HAMMING)	BIN_FLAT BIN_IVF_FLAT

度量类型	Index 类型
IP	SPARSE_INVERTED_INDEX
BM25	SPARSE_INVERTED_INDEX

从 Milvus 2.5.4 开始，SPARSE_WAND 正在被弃用。建议使用 "inverted_index_algo": "DAAT_WAND" 来获得等效性，同时保持兼容性。有关更多信息，请参阅稀疏向量。

建议为向量 field 和经常访问的标量 field 都创建 index。

准备工作

如管理 Collection 中所述，如果在 collection 创建请求中指定了以下任何条件，Milvus 会在创建 collection 时自动生成 index 并将其加载到内存中：

向量 field 的维度和度量类型，或
Schema 和 index 参数。

下面的代码片段重新利用现有代码来建立与 Milvus 实例的连接，并创建一个不指定其 index 参数的 collection。在这种情况下，collection 缺少 index 并保持未加载状态。

要准备 index，使用 MilvusClient 连接到 Milvus 服务器，并通过使用 create_schema()、add_field() 和 create_collection() 来设置 collection。

要准备 index，使用 MilvusClientV2 连接到 Milvus 服务器，并通过使用 createSchema()、addField() 和 createCollection() 来设置 collection。

要准备 index，使用 MilvusClient 连接到 Milvus 服务器，并通过使用 createCollection() 来设置 collection。

Python Java Node.js

from pymilvus import MilvusClient, DataType

# 1. Set up a Milvus client
client = MilvusClient(
    uri="http://localhost:19530"
)

# 2. Create schema
# 2.1. Create schema
schema = MilvusClient.create_schema(
    auto_id=False,
    enable_dynamic_field=True,
)

# 2.2. Add fields to schema
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=5)

# 3. Create collection
client.create_collection(
    collection_name="customized_setup", 
    schema=schema, 
)

import io.milvus.v2.client.ConnectConfig;
import io.milvus.v2.client.MilvusClientV2;
import io.milvus.v2.common.DataType;
import io.milvus.v2.service.collection.request.CreateCollectionReq;

String CLUSTER_ENDPOINT = "http://localhost:19530";

// 1. Connect to Milvus server
ConnectConfig connectConfig = ConnectConfig.builder()
    .uri(CLUSTER_ENDPOINT)
    .build();

MilvusClientV2 client = new MilvusClientV2(connectConfig);

// 2. Create a collection

// 2.1 Create schema
CreateCollectionReq.CollectionSchema schema = client.createSchema();

// 2.2 Add fields to schema
schema.addField(AddFieldReq.builder().fieldName("id").dataType(DataType.Int64).isPrimaryKey(true).autoID(false).build());
schema.addField(AddFieldReq.builder().fieldName("vector").dataType(DataType.FloatVector).dimension(5).build());

// 3 Create a collection without schema and index parameters
CreateCollectionReq customizedSetupReq = CreateCollectionReq.builder()
.collectionName("customized_setup")
.collectionSchema(schema)
.build();

client.createCollection(customizedSetupReq);

// 1. Set up a Milvus Client
client = new MilvusClient({address, token});

// 2. Define fields for the collection
const fields = [
    {
        name: "id",
        data_type: DataType.Int64,
        is_primary_key: true,
        autoID: false
    },
    {
        name: "vector",
        data_type: DataType.FloatVector,
        dim: 5
    },
]

// 3. Create a collection
res = await client.createCollection({
    collection_name: "customized_setup",
    fields: fields,
})

console.log(res.error_code)  

// Output
// 
// Success
// 

为 Collection 建立 Index

要为 collection 创建 index 或为 collection 建立 index，使用 prepare_index_params() 来准备 index 参数，并使用 create_index() 来创建 index。

要为 collection 创建 index 或为 collection 建立 index，使用 IndexParam 来准备 index 参数，并使用 createIndex() 来创建 index。

要为 collection 创建 index 或为 collection 建立 index，使用 createIndex()。

Python Java Node.js

# 4.1. Set up the index parameters
index_params = MilvusClient.prepare_index_params()

# 4.2. Add an index on the vector field.
index_params.add_index(
    field_name="vector",
    metric_type="COSINE",
    index_type="IVF_FLAT",
    index_name="vector_index",
    params={ "nlist": 128 }
)

# 4.3. Create an index file
client.create_index(
    collection_name="customized_setup",
    index_params=index_params,
    sync=False # Whether to wait for index creation to complete before returning. Defaults to True.
)

import io.milvus.v2.common.IndexParam;
import io.milvus.v2.service.index.request.CreateIndexReq;

// 4 Prepare index parameters

// 4.2 Add an index for the vector field "vector"
IndexParam indexParamForVectorField = IndexParam.builder()
    .fieldName("vector")
    .indexName("vector_index")
    .indexType(IndexParam.IndexType.IVF_FLAT)
    .metricType(IndexParam.MetricType.COSINE)
    .extraParams(Map.of("nlist", 128))
    .build();

List<IndexParam> indexParams = new ArrayList<>();
indexParams.add(indexParamForVectorField);

// 4.3 Crate an index file
CreateIndexReq createIndexReq = CreateIndexReq.builder()
    .collectionName("customized_setup")
    .indexParams(indexParams)
    .build();

client.createIndex(createIndexReq);

// 4. Set up index for the collection
// 4.1. Set up the index parameters
res = await client.createIndex({
    collection_name: "customized_setup",
    field_name: "vector",
    index_type: "AUTOINDEX",
    metric_type: "COSINE",   
    index_name: "vector_index",
    params: { "nlist": 128 }
})

console.log(res.error_code)

// Output
// 
// Success
// 

参数	描述
`field_name`	The name of the target file to apply this object applies.
`metric_type`	The algorithm that is used to measure similarity between vectors. Possible values are IP, L2, COSINE, JACCARD, HAMMING. This is available only when the specified field is a vector field. For more information, refer to Indexes supported in Milvus.
`index_type`	The name of the algorithm used to arrange data in the specific field. For applicable algorithms, refer to In-memory Index and On-disk Index.
`index_name`	The name of the index file generated after this object has been applied.
`params`	The fine-tuning parameters for the specified index type. For details on possible keys and value ranges, refer to In-memory Index.
`collection_name`	The name of an existing collection.
`index_params`	An IndexParams object containing a list of IndexParam objects.
`sync`	Controls how the index is built in relation to the client's request. Valid values: `True` (default): The client waits until the index is fully built before it returns. This means you will not get a response until the process is complete. `False`: The client returns immediately after the request is received and the index is being built in the background. To find out if index creation has been completed, use the describe_index() method.

参数	描述
`fieldName`	The name of the target field to apply this IndexParam object applies.
`indexName`	The name of the index file generated after this object has been applied.
`indexType`	The name of the algorithm used to arrange data in the specific field. For applicable algorithms, refer to In-memory Index and On-disk Index.
`metricType`	The distance metric to use for the index. Possible values are IP, L2, COSINE, JACCARD, HAMMING.
`extraParams`	Extra index parameters. For details, refer to In-memory Index and On-disk Index.

参数	描述
`collection_name`	现有 collection 的名称。
`field_name`	要创建 index 的 field 名称。
`index_type`	要创建的 index 类型。
`metric_type`	用于测量向量距离的度量类型。
`index_name`	要创建的 index 名称。
`params`	其他特定于 index 的参数。

注意

目前，您只能为 collection 中的每个 field 创建一个 index 文件。

检查 Index 详情

创建 index 后，您可以检查其详情。

要检查 index 详情，使用 list_indexes() 列出 index 名称，使用 describe_index() 获取 index 详情。

要检查 index 详情，使用 describeIndex() 获取 index 详情。

Python Java Node.js

# 5. Describe index
res = client.list_indexes(
    collection_name="customized_setup"
)

print(res)

# Output
#
# [
#     "vector_index",
# ]

res = client.describe_index(
    collection_name="customized_setup",
    index_name="vector_index"
)

print(res)

# Output
#
# {
#     "index_type": ,
#     "metric_type": "COSINE",
#     "field_name": "vector",
#     "index_name": "vector_index"
# }

import io.milvus.v2.service.index.request.DescribeIndexReq;
import io.milvus.v2.service.index.response.DescribeIndexResp;

// 5. Describe index
// 5.1 List the index names
ListIndexesReq listIndexesReq = ListIndexesReq.builder()
    .collectionName("customized_setup")
    .build();

List<String> indexNames = client.listIndexes(listIndexesReq);

System.out.println(indexNames);

// Output:
// [
//     "vector_index"
// ]

// 5.2 Describe an index
DescribeIndexReq describeIndexReq = DescribeIndexReq.builder()
    .collectionName("customized_setup")
    .indexName("vector_index")
    .build();

DescribeIndexResp describeIndexResp = client.describeIndex(describeIndexReq);

System.out.println(JSONObject.toJSON(describeIndexResp));

// Output:
// {
//     "metricType": "COSINE",
//     "indexType": "AUTOINDEX",
//     "fieldName": "vector",
//     "indexName": "vector_index"
// }

// 5. Describe the index
res = await client.describeIndex({
    collection_name: "customized_setup",
    index_name: "vector_index"
})

console.log(JSON.stringify(res.index_descriptions, null, 2))

// Output
// 
// [
//   {
//     "params": [
//       {
//         "key": "index_type",
//         "value": "AUTOINDEX"
//       },
//       {
//         "key": "metric_type",
//         "value": "COSINE"
//       }
//     ],
//     "index_name": "vector_index",
//     "indexID": "449007919953063141",
//     "field_name": "vector",
//     "indexed_rows": "0",
//     "total_rows": "0",
//     "state": "Finished",
//     "index_state_fail_reason": "",
//     "pending_index_rows": "0"
//   }
// ]
// 

您可以检查在特定 field 上创建的 index 文件，并收集使用此 index 文件索引的行数统计信息。

删除 Index

如果不再需要 index，您可以简单地删除它。

在删除 index 之前，请确保首先释放它。

要删除 index，使用 drop_index()。

要删除 index，使用 dropIndex()。

Python Java Node.js

# 6. Drop index
client.drop_index(
    collection_name="customized_setup",
    index_name="vector_index"
)

// 6. Drop index

DropIndexReq dropIndexReq = DropIndexReq.builder()
    .collectionName("customized_setup")
    .indexName("vector_index")
    .build();

client.dropIndex(dropIndexReq);

// 6. Drop the index
res = await client.dropIndex({
    collection_name: "customized_setup",
    index_name: "vector_index"
})

console.log(res.error_code)

// Output
// 
// Success
// 

概述​

准备工作​

为 Collection 建立 Index​

检查 Index 详情​

删除 Index​

概述

准备工作

为 Collection 建立 Index

检查 Index 详情

删除 Index