|
|
|
 |

August 22nd, 2003, 02:18 AM
|
 |
Major General
|
|
Join Date: Oct 2002
Posts: 2,174
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Re: Scam Or Not?
Quote:
Originally posted by Imperator Fyron:
You do not have to load every single address into active memory at once. In fact, with that loop, each address is deleted from active memory (essentially) after it is checked against the one you are comparing it to.
|
Apparently you missed the numerous disclaimers:
rather than "is"; my note at the bottom
Quote:
Granted, there are several ways to shave time off of the above analysis, but that just gives a general idea of what it would take.
|
and the to indicate it was a worst-case (for that algorythm, worst-case = no duplicates) analysis.
In the worst case, the Last entry checked must be checked against every other entry, and so all must be available (in memory, or accessed from the disk). The point was to give a general idea of what was required, not the exact algorythm needed. Things would be further complicated by the likelyhood that it isn't a matter of a single database of addresses being worked with. There are a zillion (exaggeration) assumptions in my analysis, and several valid shortcuts that could be built into the algorythm. It's an estimate to support what I said that DavidG had doubts about, not an exact analysis for that particular number set.
__________________
Of course, by the time I finish this post, it will already be obsolete. C'est la vie.
|

August 22nd, 2003, 02:22 AM
|
 |
Lieutenant Colonel
|
|
Join Date: Jan 2002
Location: Dundas, Ontario, Canada
Posts: 1,498
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Re: Scam Or Not?
Quote:
Originally posted by Jack Simth:
quote: Originally posted by DavidG:
The biggest software company in the world that has written some of the most complex programs can't remove duplicate addresses from a list??? What's wrong with this picture.
|
They could, it's just a matter of the computer time required. The standard algorythm for removeing duplicates goes something like:
Well I'm not going to dispute your math (frankly I didn't take the time to really understand it ) But I have MSN Messanger at work. I signed up for the service and provided my e-mail adress to MS exactly ONE time. And yet I got that message 10 times. Face it MS got something screwed up. Other corps with large databases seem to get things OK. (like Symantec)
|

August 22nd, 2003, 02:25 AM
|
 |
Lieutenant Colonel
|
|
Join Date: Jan 2002
Location: Dundas, Ontario, Canada
Posts: 1,498
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Re: Scam Or Not?
Quote:
Originally posted by Jack Simth:
However, apparently, it's possible to get on their lists more than once
|
Perhaps this is the problem. Now surely this shouldn't be hard to avoid.
|

August 22nd, 2003, 02:26 AM
|
 |
Major General
|
|
Join Date: Oct 2002
Posts: 2,174
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Re: Scam Or Not?
Quote:
Originally posted by DavidG:
Well I'm not going to dispute your math (frankly I didn't take the time to really understand it ) But I have MSN Messanger at work. I signed up for the service and provided my e-mail adress to MS exactly ONE time. And yet I got that message 10 times. Face it MS got something screwed up. Other corps with large databases seem to get things OK. (like Symantec)
|
Granted; MS could have done much better. The analysis only applies when going back to fix it, not when compiling the list originally. It is easier to maintain a no-duplicates list than it is to change a list to the no-duplicates variety.
__________________
Of course, by the time I finish this post, it will already be obsolete. C'est la vie.
|

August 22nd, 2003, 02:30 AM
|
 |
Shrapnel Fanatic
|
|
Join Date: Feb 2001
Location: Waterloo, Ontario, Canada
Posts: 11,451
Thanks: 1
Thanked 4 Times in 4 Posts
|
|
Re: Scam Or Not?
Best would be an algorithm which checks new addresses against the list before even putting them in!
Keep N small in the first place, and there's less trouble later.
Each submitting machine could keep a filter list of the Last couple submissions so as to cut down on the work the main server has to do.
If you had a sorted list, then the duplicate checking would be really easy.
Decent insertion routines would help a lot too.
Bucket sort to servers holding a piece of the list, then insert using your favorite routine.
Get the n^2 work done as it drips in, so you have years to spend on the problem, instead of rushing it just before trying to send emails.
__________________
Things you want:
|

August 22nd, 2003, 02:51 AM
|
 |
Major General
|
|
Join Date: Oct 2002
Posts: 2,174
Thanks: 0
Thanked 0 Times in 0 Posts
|
|
Re: Scam Or Not?
Actually, if you maintain it as a sorted, no-duplicates list from the start, the total computational effort is actually more along the lines of n*lg(n), which isn't unreasonable, even with a billion entries.
Come to think of it, one could always dump the current list into such an algorythm, and then only use ~ 30,000,000,000 comparisons - roughly eight hours, using the same numbers as the earlier analysis for a single machine. Nevermind; MS has no real excuse.
__________________
Of course, by the time I finish this post, it will already be obsolete. C'est la vie.
|
Thread Tools |
|
Display Modes |
Hybrid Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is On
|
|
|
|
|