Контакты/Проезд  Доставка и Оплата Помощь/Возврат
История
  +7(495) 980-12-10
  пн-пт: 10-18 сб,вс: 11-18
  shop@logobook.ru
   
    Поиск книг                    Поиск по списку ISBN Расширенный поиск    
Найти
  Зарубежные издательства Российские издательства  
Авторы | Каталог книг | Издательства | Новинки | Учебная литература | Акции | Хиты | |
 

Simulating Information Retrieval Test Collections, Bodo Billerbeck, David Hawking, Nick Craswell, Paul Thomas


Варианты приобретения
Цена: 11365.00р.
Кол-во:
 о цене
Наличие: Отсутствует. 
Возможна поставка под заказ. Дата поступления на склад уточняется после оформления заказа


Добавить в корзину
в Мои желания

Автор: Bodo Billerbeck, David Hawking, Nick Craswell, Paul Thomas
Название:  Simulating Information Retrieval Test Collections
ISBN: 9781681739571
Издательство: Mare Nostrum (Eurospan)
Классификация:

ISBN-10: 1681739577
Обложка/Формат: Paperback
Страницы: 184
Вес: 0.33 кг.
Дата издания: 30.11.2020
Серия: Synthesis lectures on information concepts, retrieval, and services
Язык: English
Размер: 23.50 x 19.10 x 1.02 cm
Читательская аудитория: Professional and scholarly
Ключевые слова: Asian history,History of science,Physics, HISTORY / Asia / China,SCIENCE / History,SCIENCE / Physics / General
Рейтинг:
Поставляется из: Англии
Описание: Simulated test collections may find application in situations where real datasets cannot easily be accessed due to confidentiality concerns or practical inconvenience. They can potentially support Information Retrieval (IR) experimentation, tuning, validation, performance prediction, and hardware sizing. Naturally, the accuracy and usefulness of results obtained from a simulation depend upon the fidelity and generality of the models which underpin it. The fidelity of emulation of a real corpus is likely to be limited by the requirement that confidential information in the real corpus should not be able to be extracted from the emulated version. We present a range of methods exploring trade-offs between emulation fidelity and degree of preservation of privacy.

We present three different simple types of text generator which work at a micro level: Markov models, neural net models, and substitution ciphers. We also describe macro level methods where we can engineer macro properties of a corpus, giving a range of models for each of the salient properties: document length distribution, word frequency distribution (for independent and non-independent cases), word length and textual representation, and corpus growth.

We present results of emulating existing corpora and for scaling up corpora by two orders of magnitude. We show that simulated collections generated with relatively simple methods are suitable for some purposes and can be generated very quickly. Indeed it may sometimes be feasible to embed a simple lightweight corpus generator into an indexer for the purpose of efficiency studies.

Naturally, a corpus of artificial text cannot support IR experimentation in the absence of a set of compatible queries. We discuss and experiment with published methods for query generation and query log emulation.

We present a proof-of-the-pudding study in which we observe the predictive accuracy of efficiency and effectiveness results obtained on emulated versions of TREC corpora. The study includes three open-source retrieval systems and several TREC datasets. There is a trade-off between confidentiality and prediction accuracy and there are interesting interactions between retrieval systems and datasets. Our tentative conclusion is that there are emulation methods which achieve useful prediction accuracy while providing a level of confidentiality adequate for many applications.



Simulating Information Retrieval Test Collections

Автор: Bodo Billerbeck, David Hawking, Nick Craswell, Paul Thomas
Название: Simulating Information Retrieval Test Collections
ISBN: 1681739593 ISBN-13(EAN): 9781681739595
Издательство: Mare Nostrum (Eurospan)
Рейтинг:
Цена: 14276.00 р.
Наличие на складе: Нет в наличии.

Описание: Simulated test collections may find application in situations where real datasets cannot easily be accessed due to confidentiality concerns or practical inconvenience. They can potentially support Information Retrieval (IR) experimentation, tuning, validation, performance prediction, and hardware sizing. Naturally, the accuracy and usefulness of results obtained from a simulation depend upon the fidelity and generality of the models which underpin it. The fidelity of emulation of a real corpus is likely to be limited by the requirement that confidential information in the real corpus should not be able to be extracted from the emulated version. We present a range of methods exploring trade-offs between emulation fidelity and degree of preservation of privacy.

We present three different simple types of text generator which work at a micro level: Markov models, neural net models, and substitution ciphers. We also describe macro level methods where we can engineer macro properties of a corpus, giving a range of models for each of the salient properties: document length distribution, word frequency distribution (for independent and non-independent cases), word length and textual representation, and corpus growth.

We present results of emulating existing corpora and for scaling up corpora by two orders of magnitude. We show that simulated collections generated with relatively simple methods are suitable for some purposes and can be generated very quickly. Indeed it may sometimes be feasible to embed a simple lightweight corpus generator into an indexer for the purpose of efficiency studies.

Naturally, a corpus of artificial text cannot support IR experimentation in the absence of a set of compatible queries. We discuss and experiment with published methods for query generation and query log emulation.

We present a proof-of-the-pudding study in which we observe the predictive accuracy of efficiency and effectiveness results obtained on emulated versions of TREC corpora. The study includes three open-source retrieval systems and several TREC datasets. There is a trade-off between confidentiality and prediction accuracy and there are interesting interactions between retrieval systems and datasets. Our tentative conclusion is that there are emulation methods which achieve useful prediction accuracy while providing a level of confidentiality adequate for many applications.

Laboratory Experiments in Information Retrieval

Автор: Tetsuya Sakai
Название: Laboratory Experiments in Information Retrieval
ISBN: 9811345813 ISBN-13(EAN): 9789811345814
Издательство: Springer
Рейтинг:
Цена: 6988.00 р.
Наличие на складе: Есть у поставщика Поставка под заказ.

Описание: Covering aspects from principles and limitations of statistical significance tests to topic set size design and power analysis, this book guides readers to statistically well-designed experiments. Although classical statistical significance tests are to some extent useful in information retrieval (IR) evaluation, they can harm research unless they are used appropriately with the right sample sizes and statistical power and unless the test results are reported properly. The first half of the book is mainly targeted at undergraduate students, and the second half is suitable for graduate students and researchers who regularly conduct laboratory experiments in IR, natural language processing, recommendations, and related fields.Chapters 1–5 review parametric significance tests for comparing system means, namely, t-tests and ANOVAs, and show how easily they can be conducted using Microsoft Excel or R. These chapters also discuss a few multiple comparison procedures for researchers who are interested in comparing every system pair, including a randomised version of Tukey's Honestly Significant Difference test. The chapters then deal with known limitations of classical significance testing and provide practical guidelines for reporting research results regarding comparison of means. Chapters 6 and 7 discuss statistical power. Chapter 6 introduces topic set size design to enable test collection builders to determine an appropriate number of topics to create. Readers can easily use the author’s Excel tools for topic set size design based on the paired and two-sample t-tests, one-way ANOVA, and confidence intervals. Chapter 7 describes power-analysis-based methods for determining an appropriate sample size for a new experiment based on a similar experiment done in the past, detailing how to utilize the author’s R tools for power analysis and how to interpret the results. Case studies from IR for both Excel-based topic set size design and R-based power analysis are also provided.


ООО "Логосфера " Тел:+7(495) 980-12-10 www.logobook.ru
   В Контакте     В Контакте Мед  Мобильная версия