Super Search Category/Indexing Issues

Forum to discuss the NewsLeecher SuperSearch Service.
• Before posting SuperSearch questions, please read the SuperSearch Usage Guide.
• The current SuperSearch status is listed on the Services Status Page.
Forum rules
IMPORTANT : Be sure to read the NewsLeecher forums policy before posting.
enigmatl
Posts: 37
Joined: Sat May 07, 2005 1:10 pm

Post by enigmatl » Sun May 08, 2005 2:35 am

As for HOW newsrover searches? I'm told (and the program reveals) that they apparentely subscribe to multiple servers and maintain headers for these servers. Supernews, Giganews, and a few others from which you can select your search results to be based on or you can do as I do and just select "all". I'm assuming that's the only way it can be done at this point is by getting your headers from some provider or group of providers and then maintaining a database. The author of newsrover emplied once if memory serves me correctly that he's using some Microsoft database to maintain the search system (though I forget which one, it was a support question I had several months ago when I had a search question).

Does newsleecher do it better in some manner? If so, how? Now, I'm not asking for the "secret recipe" that would be disrespectful. What I am asking is just if there is any advantage despite the shortcomings of the search queries being very rudamentary? Do results come from one server or many servers? Any advantage at all? "custom" just doesn't tell me much.

User avatar
Destroyer
Posts: 639
Joined: Wed Feb 16, 2005 3:15 pm

Post by Destroyer » Sun May 08, 2005 5:06 pm

The retention is good for supersearch, about 50 days, it gets its headers from giganews iirc

More info on what it searches here

The best thing however is that spiral is always updating newsleecher and supersearch :D

enigmatl
Posts: 37
Joined: Sat May 07, 2005 1:10 pm

Post by enigmatl » Mon May 09, 2005 1:37 am

The retention for supersearch may be good but so is newsrover's. They use giganews too and giganews is just one of the servers they use. And they let you chose one or all of the servers to get your results. At the same time, their search is not literal. A dot between 2 words won't screw your search nor will searching for words out of order. Plus, on newsrover, you can use the ^ operator to exclude a word from the search if you're getting too many irrelevant results. Plus, you can search on newsrover for filename.*. Newsrover also doesn't force you to select a category. It lets you however IF YOU WANT TO. But the default category is all files. So sorry but a search engine more conscious of it's own resources and load than it's users which COSTS MONEY can not touch a search engine which has more functionality and is free. Spiral definately has a better program minus the search engine but that search engine isn't worth a penny right now in the face of *free* newsrover searching which gives you 10 times better control over how your searches come out at the same speed. And, though I love newsleecher's interphase, in the end, I open a newsreader, search, tag, download, and minimize it to the system tray. I don't know many people who buy a newsreader to look at it. So, I hope Spiral fixes the search engine. At the very least, it's a requirement if he wants my $30 bucks. Because right now, it's laughable that anyone is gonna pay that on a YEARLY basis. lol

enigmatl
Posts: 37
Joined: Sat May 07, 2005 1:10 pm

Post by enigmatl » Mon May 09, 2005 11:22 am

hmf, I had no choice. I bought the program even with the search flaws mainly because newsleecher returns 5,000 results and newsrover returns 500. At the very least, I can survive with the refine filter and that very filter gives me a very good idea that I should pass onto the author.

I guess I'll use newsrover just when newsleecher searches don't suffice which when I think about it won't happen all that much. I think the .6 minute turnaround time was also a selling point to me. When I started newsrover about 2 years ago, I believe I was told the search turnaround was near 12 hours. Now there's a difference maker that makes me at least want to have newsleecher on my computer for a year or so.

User avatar
Destroyer
Posts: 639
Joined: Wed Feb 16, 2005 3:15 pm

Post by Destroyer » Mon May 09, 2005 3:59 pm

I have tried newsrovers search engine and I personally thought it was rubbish compared to supersearch.. Everyone to thier own though

The turnaround of supersearch is less than a minute :)

enigmatl
Posts: 37
Joined: Sat May 07, 2005 1:10 pm

Post by enigmatl » Wed May 11, 2005 2:05 pm

You think Newsrover's search is rubbish and you don't tell us why? That's really not fair. Granted, Newsrover's search is still crude in comparison to google but he's got the basics down. The search is not literal and you can even add a ^ symbol before a word you want excluded from the search. Plus, newsrover doesn't force you to chose between 1 of 3 groups to search for. (I'm finding that I forget to switch all the time in newsleecher) Now, I subscribed to Newsleecher because I wanted to be using it while a lot of stuff was going on as it seems like the Newsrover author ran out of steem about 2 years ago. However, before that happened, he did impliment a search engine that is still superior to Newsleecher's and what's more, there is now talk that Newsbin is getting a search. And then there's Grabit and that's free. So, all I'm saying is the day is coming where "does the newsreader have a search? Good sign me up." isn't gonna be the deal. It's gonna be more about which one has the best search. So, just saying newsrover's search is rubbish without really offering any explanation is not really a fair responce. At least not until/unless the day comes when Spiril decides to squash them and I do believe that a search engine enhancement is all that would take especially with how cool the result page for the supersearch already looks.

Latest example that shows the need for a new search engine: That new obnoxious/funny video they showed on Smackdown last week - John Cena - Bad Man. (the video - not the mp3) Try and find it once. Comes right up in Newsrover. Oh sure. You can put Cena in and then filter bad. I just thought of that. But what if John Cena (or the subject of the next search) drew over 5,000 names? That video might not make it to the list. Calling Newsrover's search engine anything but superior at this point is just silly. And, until I no longer need to run newsrover (to find out what to put into newsleecher's search box to make files I want come up so I can tag them), superior it will remain.

But I hope that changes.

User avatar
Destroyer
Posts: 639
Joined: Wed Feb 16, 2005 3:15 pm

Post by Destroyer » Wed May 11, 2005 5:05 pm

It finds more results for things I search for and i also know which groups it is getting the files from and which groups it doesent search through.

I also dont mind searching through 2 categories to get a file.

If you like newsrovers better than you should use it instead of supersearch

Grabit is a good news program and i think its great its free but the search isnt as fast and doesent bring up many results at all. Newsleecher you have more idea of what files you could be missing (i.e which groups it searches) and how old it the files are. Along with a great interface

Red Dwarf
Forum Moderator
Posts: 3982
Joined: Fri Feb 18, 2005 9:25 am

Post by Red Dwarf » Wed May 11, 2005 11:39 pm

I really like supersearch, but enigmatl is right IMO that supersearch could use some enhancing in this area.
Grtz RD

If I had a really good tagline, I would put it here.

Lips
Forum Moderator
Posts: 3804
Joined: Thu Mar 18, 2004 6:57 pm

Post by Lips » Thu May 12, 2005 12:26 am

enigmatl wrote:Now, I subscribed to Newsleecher because I wanted to be using it while a Granted, Newsrover's search is still crude in comparison to google but he's got the basics down. The search is not literal and you can even add a ^ symbol before a word you want excluded from the search.
Red Dwarf wrote:I really like supersearch, but enigmatl is right IMO that supersearch could use some enhancing in this area.
Enhancing the search abilities so that SSearch would not need to be so "literal" has been requested before. One method would be to allow for regex to be used for the searches. I agree that enhancements would be welcome in this area.

--
Lips

enigmatl
Posts: 37
Joined: Sat May 07, 2005 1:10 pm

Post by enigmatl » Thu May 12, 2005 1:16 pm

I agree. Spiral? If you would, would you tell us what your intentions are as far as making the changes to supersearch talked about in this thread? I for one am a patient person (as would be indicated by the fact that I bought the program for a year despite my dislike for the current state of the search engine vs the other guys' offerings) but it would help to hear your take on this so that some of us (me for one) could know if this is a program we should get comfortable with or not. So please, let us know.

User avatar
Destroyer
Posts: 639
Joined: Wed Feb 16, 2005 3:15 pm

Post by Destroyer » Fri May 13, 2005 4:40 pm

If you read the latest news Spiral has upgraded the maximum results returned limit upto 10000 :)

Red Dwarf
Forum Moderator
Posts: 3982
Joined: Fri Feb 18, 2005 9:25 am

Post by Red Dwarf » Fri May 13, 2005 11:31 pm

Nice, but isn't a solution.
Grtz RD

If I had a really good tagline, I would put it here.

enigmatl
Posts: 37
Joined: Sat May 07, 2005 1:10 pm

Post by enigmatl » Sat May 14, 2005 12:45 pm

Do not get me wrong. I appreciate the 10,000 results A LOT. This is a wonderful thing that will go miles and miles into making the search engine the greatest usenet searcher in the world. But Spiril please understand, from the perspective of this newly paying user who is cheering you and your program on that respectfully, it may as well be 1 result or it may be a million, the problem still remains and please understand the magnitude of the problem. Nobody here knows HOW a header is going to read. Will they post "My video HDTV episode 1" or will they post "My video Episode 1 HDTV" or how about the ever popular "My.Video-Episode.1--HDTV"? and keep in mind during this that it should be emagined that there are other episodes of "My Video" and there are other formats of the same episode like VCD depending on how someone out there might encode it. A non literal search is not an "added feature" that would make the search engine "nice." It's a "required core feature" that makes the search engine frankly "usable". Also, an "all" option needs to be added above general, audio, and pictures because while those are nice sometimes, it's an inconvenience to hve to chose one.

So please understand that I shouldn't have to pull up newsrover and do a search just to find out how a header is worded so I can put it into Newsleecher to download. Wheather it's for cost efficiency or what, cost efficiency is great but not at the expense of the quality of your program. Newsrover has been doing ungrouped non-literal searches for years, Newsbin is working on it, grabit does it for free and I think you can do it too. Could you not at the very least address your PAYING USERS on this matter? Do we not deserver a yes/no on this? Perhaps an answer as to why it wasn't non-literal from the start? While there's nobody MAKING you say anything, I can tell you it would go miles tword shaping our opinion of the program and all of those users who haven't signed up yet but love the interphase and want to bad might just do it. Even I sat on the sidelines for weeks. Only reason I bought the program was I had some extra cash. So please if you'd be so kind, speak on this topic. Thank you.

User avatar
Spiril
Site Admin
Posts: 4278
Joined: Fri Nov 07, 2003 3:11 am

Post by Spiril » Sat May 14, 2005 6:45 pm

To begin with I actually built-up the entire SuperSearch engine the same way as grabit, usenetjunkie, nzb4u and other usenet searchers work. This was done by feeding all the usenet data into an SQL database, make a fulltext search index on the subjects database table and allow users to perform boolean (and, or, not) searching using the fulltext index.

Unfortunately this quickly proved to be an unusable solution due to the demand for high retention and due to the massive amount of searches performed per second. So I trashed the standard database approach, and made my own engine entirely from scratch. The homemade database is more than 10+ times faster than the optimized SQL approach, because it is specifically designed to perform fast searches, and nothing else. But the (current) disadvantage is that it has to be very simple to be able to perform the fast searches, and that's why and, or, not searches aren't available at the moment.

I find almost all I need by using the Result Filter to narrow down my search results, so the missing boolean search feature isn't a prob for me personally. But I understand that a lot of users want the bool search feature, so I'll look into optimizing the engine even more and make room to implement it sometime in the future. Can't promise an exact date yet, though. But it'll be put on the todo list.

Btw, one of the advantages with the current SuperSearch searcher, is that it returns hits on sub-words. If you, for example, search for "Discovery", you will also get hits for "TheDiscovery" and "DiscoveryChannel".

---
You say Newsbin is working on implementing a search feature?

Btw, enigmatl, grabit search doesn't seem to be free anymore...
bug fixed. no idea how. hate it when that happens. trying to break it again now. will. not. be. defeated.

enigmatl
Posts: 37
Joined: Sat May 07, 2005 1:10 pm

Post by enigmatl » Sun May 15, 2005 4:07 am

Hey Spiril, I can appreciate all this and by the way, if your search altarythm and custom database is 10x faster than SQL, it's just too bad you don't work where I work. We could sure use that kinda talent. :-) But functionality (boolean) is very important for the above mentioned reasons like not knowing exactly what to search for and have it come up when you need more than 2 words. So I have a suggestion/challenge for you of sorts if you think it's a good idea.

Without touching the basic result filter because I know many of us do love how it narrows down your results as you type which is nice, make another option, perhaps an advanced result filter that will let us search the 10,000 results that came up from the first search. This is good for your database as the occasional failed search may not need repeated. It is also good for us because search code is kinda easy and yet at the same time, you could create the search engine from hell and all the work would be done on our computer. It wouldn't matter if I made a mile long search query. It wouldn't even hit your search database and you would have the best search index in the world because of the 10,000 results + the advanced searching. Plus the work to create this, to run a boolean search on 10,000 results that you got back, it wouldn't take too long to create.

I hope you're keen on this idea. Sounds exciting to me just typing it.

P.S. As far as work goes, I've always thought highly of sequal for searching payroll databases and the likes. If it's really possible to produce a custom database that much faster, I really need to do some homework and maybe get a raise! :-)

Post Reply