Circular virus detection in sub linear time in Big Data
By Dr. Samiruzzaman Samir

PATTERN

TEXT

RESULT

  • Choose K Bound:
  • Choose Filters:
  • Remove False Positives:

About this tool

This online tool searches an input text taken from circular virus DNA for matches of an input pattern. This is a browser based circular virus detection tool in sub linear time which dramatically reduces the size of data by using filter based approximate circular pattern matching approach and finnd out the match in human genome data. All the filter based algoritms are designed and developed by Dr. Samiruzzaman Samir. This is the first web based circular pattern matching tool that operates in client side. The idea behind the approach is quite simple and intuitive. The users do not need to install any software or do not need to upload the big file in the server. The development approach works with big data in client side by using lower memory working as chunk by chunk without uploading any data to web server. Instead of program running in the web server, the java script code is transported in the browser in run-time and does the computation. The user does not require to install any software because the computation is done in the browser. This web based tool is effective lightweight filtering technique to reduce the search space of the Circular Pattern Matching problem which works for both exact and approximate circular pattern matching.

K Bound: Describes the maximum number of mismatches between pattern and text that may be ignored. K = 0 is equivalent to exact circular string matching. K = 1 means all but 1 character was matched etc.

Filters: There are 1 to 3 filters that may be applied in the filtering stage. Applying additional filters may increase the search time but reduces the size of the final list of candidate locations.

  1. Value Sum
  2. Absolute Character Difference Sum
  3. Individual Character Count

Remove False Positives: If this is not checked, the list of locations produced will represent candidate locations of matches and may contain false positives. If this is checked, the list of locations will be the final correct list of match locations.