新方法可索引和搜索拍碱基级核苷酸资源

文章正文
发布时间:2024-11-11 19:04

本期文章:《自然—方法学》:Online/在线发表

美国国立卫生研究院Richa Agarwala团队实现拍碱基级核苷酸资源的索引和搜索。2024年5月16日,《自然—方法学》杂志在线发表了这项成果。

目前,对于大多数研究人员来说,搜索资源中大量且快速增长的核苷酸内容是不切实际的,例如序列读取档案(Sequence Read Archive)中的运行和GenBank中全基因组枪式测序项目的组装。

研究人员报道了Pebblescout,这是一种通过提供索引和搜索功能来浏览此类内容的工具。索引使用对资源中的序列进行密集采样。搜索功能可在明确定义的保证下,找到与用户查询有短序列匹配的主题(运行或集合),并根据匹配的信息量对其进行排序。研究人员通过创建八个数据库来说明Pebblescout的功能,这些数据库索引了超过3.7个拍碱基。

Pebblescout的网络服务可通过https://pebblescout.ncbi.nlm.nih.gov访问。这些研究表明,对于各种查询长度,Pebblescout都能提供一种数据驱动的方法来查找大型核苷酸资源的相关子集,从而大大减少了下游分析的工作量。研究人员还表明,Pebblescout的结果与MetaGraph和Sourmash相比毫不逊色。

附:英文原文

Title: Indexing and searching petabase-scale nucleotide resources

Author: Shiryev, Sergey A., Agarwala, Richa

Issue&Volume: 2024-05-16

Abstract: Searching vast and rapidly growing nucleotide content in resources, such as runs in the Sequence Read Archive and assemblies for whole-genome shotgun sequencing projects in GenBank, is currently impractical for most researchers. Here we present Pebblescout, a tool that navigates such content by providing indexing and search capabilities. Indexing uses dense sampling of the sequences in the resource. Search finds subjects (runs or assemblies) that have short sequence matches to a user query, with well-defined guarantees and ranks them using informativeness of the matches. We illustrate the functionality of Pebblescout by creating eight databases that index over 3.7 petabases. The web service of Pebblescout can be reached at https://pebblescout.ncbi.nlm.nih.gov. We show that for a wide range of query lengths, Pebblescout provides a data-driven way for finding relevant subsets of large nucleotide resources, reducing the effort for downstream analysis substantially. We also show that Pebblescout results compare favorably to MetaGraph and Sourmash.

DOI: 10.1038/s41592-024-02280-z

Source: https://www.nature.com/articles/s41592-024-02280-z

期刊信息

Nature Methods:《自然—方法学》,创刊于2004年。隶属于施普林格·自然出版集团,最新IF:47.99
官方网址:https://www.nature.com/nmeth/
投稿链接:https://mts-nmeth.nature.com/cgi-bin/main.plex

首页
评论
分享
Top