DFSR disaster - replication malfunction
i think found new , terrible bug in dfsr. before 'helpful people' jump on me making such sacriligious statement, have credentials understanding how dfsr works, or @ least how supposed to.
we starting using dsfr on our w2k3 servers when first became available, found many of bugs had , applied many hotifxes through ms csc calls relating problem. learnt lot then.
we have applied post sp2 hotifxes , continue have 'impossible' 4412 events file changed on multiple servers @ same time. happens in folders used single individual can @ 1 site @ time , other 'changed' file quite old, months or years old. old file has been 'changed' months or longer without being replicated dumped in conflictanddeleted folder when legimate change made elsewhere.
microsoft anti-virus product (hello computer associates, we're dumping product because of unresolved issue after 2 years of inaction), customer think dfsr should more resilient or ms should more supportive of 3rd party issues this.
anyway, because of unresolved issue anti-virus don't run antivirus on our new w2k8 server yet, it'll forefront when problems causes dfsr same vendor.
we have 8 replication groups , have either 1 or 2 folders in them. every replication group contains 1 hub server on core lan , in 1 case, have been primary when replication groups created. should mention dfsr has been working several years in our environement (except impossible 4412 events mentioned).
so added new w2k8 server replication groups on friday morning expecting weekend on new server on highspeed lan, largest group less 10gb , 1 staging area smaller total folder size.
all replication groups 1 folder worked fine expected , data appeared on new sever within short time. quite impressed this.
all replication groups 2 folders, 1 folder appeared on new server quickly, other folder stayed in 'waiting initial replication' status long time. restarted dfsr services , clicked replicate option on w2k8 server. didn't seem @ time.
i spent quite of of time researching stuck status, had similar problem solved ms csc hotfix in days, have good understanding of supposed happen.
there informative thread here http://social.technet.microsoft.com/forums/en-us/winserverfiles/thread/734f2d78-757a-4e7f-ba68-8525a678129c/ on topic. not same problem had, similar symptoms. lots of helpful information on how works.
i found happening folders waiting initial replication members having dumped in conflictanddeleted folder dfsr. because folder has quota (660mb default) lost quite bit of data. recovered via vb scripts found on these forums (thankyou!). recovered shadow copy, vss finicky software , deletes shadows @ lightest provocation, trying to restore shadow copy morning.
here's take on , why think terrible bug in dfsr.
all replication groups 1 folder worked fine.
i had 4 groups 2 folders, replicated 1 folder , other folder stuck in waiting initial replication state, several days in cases.
adding new member replication group uses primary of replication group initiate replication. primary had 1 of memeber on core lan, there never others @ time.
they replicated 1 folder correctly, dfsr must have worked out primary that group replicate folder properly.
the other members deleting in other folder, can't have been 'conflict' reasons since new member's folder empty. must have been 'delete' reason files being put in conflictanddeleted folder. there 4412 event every file deleted.
so delete every file must have been replicating empty folder other members. in case dfsr must have though new server primary replication group @ time.
until saw happen have thought impossible , i'm sure many ms folk saying same.
i officially 'scared' of dfsr technology now. it's still running , working today. wonder when going turn on me again , wipe out next time.
some other information situation avoid obvious questions:
connections not disabled , enabled on these repl groups. if case dumped in preexisting , nothing lost because has no quota. been there done that.
the replication groups failed 1 @ time, 1 group doing 'its thing' deleting on member servers while remaining ones stayed in waiting initial replication state. sat in state days before waking , deleting everything. (i frantically trying work out going on. 3 days later when third group started deleting stuff).
i got the fourth repl group before woke , destroyed everything. deleted group and created new 1 same members , folders (as same dfsprivate) and worked expected. so i'm discounting corrupt dfsprivate folder structure or database. 4 @ once not co-incidence.
some anecdotal evidence problem. these replication groups created long time ago, before w2k3sp2 came out in cases.
i think order created folders in targets created important shared data folder first , folder contains the user's redirected desktop , document folders second in each repl group. that's how work, pretty sure that's did. in each case replication groups deleted data, user's desktops , documents folders bit dust. important one, believe first folder created in each group, 1 replicated properly.
one other thing folders got deleted, vista clients , used offline files sync client copy (standard folder redirection policy). not think that's relevant, stranger things have happened in software failures in past.
if can think of mistake made caused problem, i'd love hear it.
sadly, can't collect logs or provide other hard evidence since folders moved , replicating again or not using dfsr anymore. don't expect attempting again in near future.
cheers,
mark.
hello mark,
thanks feedback.
according description, added new windows server 2008 server dfsr replication group 2 folders. first folder replicated properly; however, files in second folder deleted , moved conflictanddeleted folder. not see obvious incorrect configuration. in addition, mentioned, these files moved “conflictanddeleted” folder there must kind of conflicts occurred during replication. however, cannot pinpoint root cause @ time without logs perform deep research.
i sorry inconvenience problem may have brought you, , understand frustration , locate root cause. know may not want reproduce issue now. however, whenever troubleshoot issue further please collect following logs when issue reproduced. helpful understand issue clearly. meanwhile, suggest refer following blog active directory team analyze root cause. in addition, may enable audit on file system identify deletes these files.
where’s file? root cause analysis of frs , dfsr data deletion
thank feedback. meanwhile, microsoft continues collect product feedback connect web site, appreciate efforts in submitting feedback via following channels improve our products.
windows server 2008 feedback home
https://connect.microsoft.com/windowsserverfeedback
posting provided "as is" no warranties, , confers no rights.
Windows Server > File Services and Storage
Comments
Post a Comment