Information Retrieval system (Java/Lucene)
• Using Lucene, parse and index HTML documents that a given folder and its subfolders contain. List all parsed files.
• Consider the English language and use a stemmer for it (e.g. Porter Stemmer).
• Select an available search index or create a new one (if not available in the chosen directory).
• Make possible for the user to choose the ranking model, Vector Space Model (VS) or Okapi BM25 (OK).
• Print a ranked list of relevant articles given a search query. The output should contain 10 most relevant documents with their rank, title and summary, relevance score and path.
• Search multiple fields concurrently (multifield search): not only search the document’s text (body tag), but also its title.
The program must run without any corrections or modifications of the runtime environment (Java 8) or source code! It should process the input: java -jar ir.jar [path to document folder] [path to index folder] [VS/OK] [query]