A Blog Generated On Demand Just-In-Time... which are fancy words for a search engine that adapts to your needs, built with the power of community. https://wearebuildingthefuture.com
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
Roberto Treviño Cervantes ae957cd5c2 Removed ads [temp] 1 week ago
static Removed ads [temp] 1 week ago
templates Removed ad from results page 1 week ago
.gitignore TESTING: peer network; vectors are now turned into strings 3 months ago
COPYING.txt Initial commit 8 months ago
Monad.py Increased max items for hnsw index 1 week ago
README.md Fixed tokenization and some dependencies 1 month ago
README.txt Initial commit 8 months ago
bootstrap.sh Added new requeriments to bootstrap script 1 week ago
build_index.sh No logs for crawler script 2 months ago
count_annoy_index.py Updated crawlers for performance; beautified code 7 months ago
count_peers.py TESTING: peer registry 3 months ago
create_annoy_index.py Increased max items for hnsw index 1 week ago
forms.py Initial commit 8 months ago
future.py Fail silently when no results found in local index 1 week ago
linkreaper.py Crawler now saves origin for images 2 weeks ago
passenger_wsgi.py Initial commit 8 months ago
save_index.sh Fixed init scripts and configuration of crawler 2 months ago
stoplist.hdf5 Initial commit 8 months ago
sw.js Testing service worker 7 months ago
tranco_JKGY.csv Shortened tranco list 1 month ago

README.md

Website GitHub Keybase BTC
Buy Me A Coffee

FUTURE

Screenshot_20200517_192300

FUTURE is a search engine that improves on traditional methods of search by keyword by instead relying on machine learning techniques to encode words as vectors, and capture their meaning and be able to return more precise matches, all while dropping user tracking as only the query is sufficient to retrieve meaningful data. It was written with Python for backend, using Tensorflow and PyTorch and web technologies for the frontend.

FUTURE IS DISTRIBUTED UNDER THE GNU GPL v3

INSTRUCTIONS

In order to get FUTURE working, first you will need to install the appropiate tensorflow and pytorch packages for your system. After that, it is only necessary that you run the following commands, which have been tested on Arch Linux, Open SuSe and Ubuntu:

./bootstrap.sh

The last command will never finish in a feasible amount of time, as it is building the index. However it can always be paused with CTRL+C and resumed later. Shell scripts to automate tasks are provided and are aptly named.

Pause the crawler with CTRL+C, and execute:

./save_index.sh

Finally, start the server, and point your browser to 0.0.0.0:3000 with the command below:

./future.py

HACKING

Out of the box, FUTURE is designed as a web search engine, which means that running the ./bootstrap.sh script provided will only prepare it to search web pages. However, it is hackable down to the core, therefore, you can open indexer.py and tinker with it to save other types of data into the LMDB database, or perhaps refer to the Monad class on the Monad.py and write the files to handle the creation of the database and the index yourself.

If you were to modify the data that is saved into the database, you may also need to change how it is served in an HTML template, and for that refer to the lines 240-327 of future.py, where you can adapt the code that manages the database to whatever suits your needs.

For further modifications, feel free to fork the project, but bear in mind the terms of the GPL v3 license.

DEPENDENCIES

Below are listed all the projects upon which FUTURE rests.

Name License
Flask BSD 3-Clause
Werkzeug BSD 3-Clause
SymSpell MIT
Polyglot GPL v3
Beautifulsoup BSD 2-Clause
BSON Python bindings Apache 2.0
NumPy BSD 3-Clause
GeoPy MIT
SciKit Learn BSD 3-Clause
Pandas BSD 3-Clause
Gensim LGPL 2.1
NLTK Apache 2.0
Scrapy BSD License
H5PY BSD 3-Clause
LMBD OpenLDAP
LMBD Python bindings OpenLDAP
tldextract BSD 3-Clause
WTForms BSD 3-Clause
Flask_wtf BSD 3-Clause
HNSWLib Apache 2.0
JQuery MIT
JQuery UI MIT
Particles JS MIT
Ionicons MIT
Source Sans Pro OFL 1.1
GloVe Apache 2.0
SPARQLWrapper W3C License
TextScrambler BSD-like

FUTURE on w3m

asciicast