This article gives instructions for loading Wikipedia articles in to ElasticSearch. I did this on Windows, but all of these steps should work on any java friendly platform.
- Download ElasticSearch
- Download stream2es
- Download Wikipedia articles
- Start ElasticSearch
- Run stream2es
Download Wikipedia articles
- Move the stream2es file to your ElasticSearch bin folder. I put stream2es here c:\elasticsearch-1.5.2\bin\
- Move the Wikipedia archive (enwiki-latest-pages-articles.xml.bz2) to your ElasticSearch bin folder too.
- Run the stream2es java file:
C:\elasticsearch-1.5.2\bin>java -jar stream2es wiki --target http://localhost:9200/mywiki --log debug --source /enwiki-latest-pages-articles.xml.bz2
- You can change the “mywiki” to whatever you want your specific ElasticSearch index name to be.
- I had some trouble getting stream2es to find my wikipedia archive path on Windows, but the / in front of the file name worked.