AndreaVedaldi
UniversityofCaliforniaatLosAngeles
Contents
1Introduction
2Userreference:thesiftfunction
2.1Scalespaceparameters.......2.2Detectorparameters........2.3Descriptorparameters.......2.4DirectaccesstoSIFTcomponents
........................................................................................................................
1123344456
AInternals
A.1Scalespaces...........................................A.2Thedetector..........................................A.3Thedescriptor.........................................
1Introduction
ThesenotesdescribeanimplementationoftheScale-InvariantTransformFeature(SIFT)detectoranddescriptor[1].TheimplementationisdesignedtoproduceresultscompatibletoLowe’sversion.1DesignedfortheMATLABenvironment,itisbrokendownintoseveralMandMEXfilesthatenablerunningonlyportionofthealgorithm.
TheSIFTdetectorextractsfromanimageacollectionofframesorkeypoints.Theseareorienteddisksattachedtoblob-alikestructuresoftheimage.Astheimagetranslates,rotatesandscales,theframestracktheseblobsandthusthedeformation.Bycanonization,i.e.bymappingtheframestoareference(acanonicaldisk),theeffectofsuchdeformationonthefeatureappearanceisremoved.
TheSIFTdescriptorisacoarsedescriptionoftheedgefoundintheframe.Duetocanonization,descriptorsareinvarianttotranslations,rotationsandscalingsandaredesignedtoberobusttoresidualsmalldistortions.
TheSIFTdetectoranddescriptorarediscussedindepthin[1].Hereweonlydescribetheinterfacetoourimplementationand,intheAppendix,sometechnicaldetails.
2Userreference:thesiftfunction
TheSIFTdetectorandtheSIFTdescriptorareinvokedbymeansofthefunctionsift,whichprovidesaunifiedinterfacetoboth.
Example1(Invocation).ThefollowinglinesruntheSIFTdetectoranddescriptorontheimagedata/test.jpg.
1See
http://www.cs.ubc.ca/~lowe/keypoints/
1
I=imread(’data/test.png’);I=double(rgb2gray(I)/256);
[frames,descriptors]=sift(I,’Verbosity’,1);
Thepairoption-value’Verbosity’,1causesthefunctiontoprintadetailedprogressreport.Thesiftfunctionreturnsa4×KmatrixframescontainingtheSIFTframesanda128×Kmatrixdescriptorscontainingtheirdescriptors.Eachframeischaracterizedbyfournumberswhichareinorder(x1,x2)forthecenteroftheframe,σforitsscaleandθforitsorientation.Thecoordinates(x1,x2)arerelativetotheupper-leftcorneroftheimage,whichisassignedcoordinates(0,0),andmaybefractionalnumbers(sub-pixelprecision).Thescaleσisthesmoothinglevelatwhichtheframehasbeendetected.Thisnumbercanalsobeinterpretedassizeoftheframe,whichisusuallyvisualizedasadiskofradius6σ.Eachdescriptorisavectordescribingcoarselytheappearanceoftheimagepatchcorrespondingtotheframe(furtherdetailsarediscussedinAppendixA.3).Typicallythisvectorhasdimension128,butthisnumbercanbechangedbytheuserasdescribedlater.
OnceframesanddescriptorsoftwoimagesI1andI2havebeencomputed,siftmatchcanbeusedtoestimatethepairsofmatchingfeatures.ThisfunctionusesLowe’smethodtodiscardambiguousmatches[1].Theresultisa2×Mmatrix,eachcolumnofwhichisapair(k1,k2)ofindicesofcorrespondingSIFTframes.
Example2(Matching).LetusassumethattheimagesI1andI2havebeenloadedandprocessedasinthepreviousexample.Thecode
matches=siftmatch(descriptors1,descriptors2);storesinmatchesthematchingpairs,onepercolumn.
Thepackageprovidessomeancillaryfunctions;youcan•useplotsiftframetoplotSIFTframes;
•useplotsiftdescriptortoplotSIFTdescriptors;•useplotmatchestoplotfeaturematches;
•usesiftreadtoreadfilesproducedbyLowe’simplementation.
Example3(Visualization).LetI1,I2andmatchesbeasinthepreviousexample.Tovisualizethematchesissue
plotsiftmatches(I1,I2,frames1,frames2,matches)
Thesiftfunctionhasmanyparameters.ThedefaultvalueshavebeenchosentoemulateLowe’soriginalimplementation.Althoughourcodedoesnotresultinframesanddescriptorsthatare100%equivalent,ingeneraltheyarequitesimilar.
2.1Scalespaceparameters
TheSIFTdetectoranddescriptorareconstructedfromtheGaussianscalespaceofthesourceimageI(x).TheGaussianscalespaceisthefunction
G(x;σ)=(gσ∗I)(x)
wheregσisanisotropicGaussiankernelofvarianceσ2I,xisthespatialcoordinateandσisthescalecoordinate.Thealgorithmmakeuseofanotherscalespacetoo,calleddifferenceofGaussian(DOG),whichis,coarselyspeaking,thescalederivativeoftheGaussianscalespace.
SincethescalespaceG(x;σ)representsthesameinformation(theimageI(x))atdifferentlevelsofscale,itissampledinaparticularwaytoreduceredundancy.Thedomainofthevariableσis
2
∆
discretizedinlogarithmicstepsarrangedinOoctaves.EachoctaveisfurthersubdividedinSsub-levels.Thedistinctionbetweenoctaveandsub-levelisimportantbecauseateachsuccessiveoctavethedataisspatiallydownsampledbyhalf.Octavesandsub-levelsareidentifiedbyadiscreteoctaveindexoandsub-levelindexsrespectively.Theoctaveindexoandthesub-levelindexsaremappedtothecorrespondingscaleσbytheformula
σ(o,s)=σ02o+s/S,
o∈omin+[0,...,O−1],
s∈[0,...,S−1]
(1)
whereσ0isthebasescalelevel.
ThesiftfunctionacceptsthefollowingparametersdescribingtheGaussianscalespacebeingused:
•NumOctaves.ThisisthenumberofoctavesOin(1).
•FirstOctave.Indexofthefirstoctaveomin:theoctaveindexovariesinomin,...,omin+O−1.Itisusuallyeither0or−1.Settingominto−1hastheeffectofdoublingtheimagebeforecomputingtheGaussianscalespace.•NumLevels.Thisisthenumberofsub-levelsSin(1).•Sigma0.Basesmoothing:Thisistheparameterσ0in(1).
•SigmaN.Nominalpre-smoothing:Thisisthenominalsmoothingleveloftheinputimage.Thealgorithmassumesthattheinputimageisactually(gσn∗I)(x)asopposedtoI(x)andadjuststhecomputationsaccording.Usuallyσnisassumedtobehalfpixel(0.5).
2.2Detectorparameters
TheSIFTframes(x,σ)arepointsoflocalextremumoftheDOGscalespace.Theselectionofsuchpointsiscontrolledbythefollowingparameters:
•Threshold.Localextremathreshold.Localextremawhosevalue|G(x,;σ)|isbelowthisnumberarerejected.•EdgeThreshold.Localextremalocalizationthreshold.Ifthelocalextremumisonavalley,thealgorithmdiscardsitasitistoounstable.Extremaareassociatedwithascoreproportionaltotheirsharpnessandrejectedifthescoreisbelowthisthreshold.•RemoveBoundaryPoints.Boundarypointsremoval.Ifthisparameterissetto1(true),frameswhicharetooclosetotheboundaryoftheimagearerejected.
2.3Descriptorparameters
TheSIFTdescriptorisaweightedandinterpolatedhistogramofthegradientorientationsandlocationsinapatchsurroundingthekeypoint.Thedescriptorhasthefollowingparameters:
•Magnif.Magnificationfactorm.Eachspatialbinofthehistogramhassupportofsizemσ,whereσisthescaleoftheframe.•NumSpatialBins.Numberofspatialbins.Togetherwiththenextparameter,thisnumberdefinestheextensionanddimensionofthedescriptor.Thedimensionofthedescriptor(thetotalnumberofbins)isequaltoNumSpatialBins2×NumOrientBinsanditsextension(thepatchwherethegradientstatisticiscollected)hasradiusNumSpatialBins×mσ/2.•NumOrientBins.Numberoforientationbins.
3
Symbolo∈[omin,omin+O−1]s∈[smin,smax]σ(o,s)=σ02o+s/Sσ0
M0,N0
M00No=N2o,Mo=2oxo∈[0,...,No]×[0,...,Mo]x=2oxoF(·,σ(o,·))G(x,σ)D(x,σ)
DescriptionOctaveindexandrangeScaleindexandrangeScalecoordinateformulaBasescaleoffset
Basespatialresolution(octaveo=0)OctavelatticesizeformulasSpatialindexesandragesSpatialcoordinateformulaOctavedataGaussianscalespaceDOGscalespace
Inthecodeo,O,omins,smin,smaxsigma0
octavegssdogss
Figure1:Scalespaceparameters.TheSIFTdescriptorsusestwoscalespaces:aGaussianscalespaceandaDifferenceofGaussianscalespace.Botharedescribedbytheseparameters.
2.4DirectaccesstoSIFTcomponents
TheSIFTcodeisdecomposedinseveralMandMEXfiles,eachimplementingaportionofthealgo-rithm.Theseprogramscanberunontheirownorreplaced.AppendixAcontainsinformationusefultodothis.
Example4(ComputingtheSIFTdescriptordirectly).Sometimesitisusefultorunthedescriptorcodealone.Thiscanbedonebycallingthefunctionsiftdescriptor(whichisactuallyaMEXfile.)Seethefunctionhelpforfurtherdetails.
References
[1]D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkeypoints.InternationalJournalof
ComputerVision,2(60):91–110,2004.
A
A.1
Internals
Scalespaces
HereascalespaceisafunctionF(x,σ)∈Rofaspatialcoordinatex∈R2andascalecoordinateσ∈R+.SinceascalespaceF(·,σ)typicallyrepresentsthesameinformationatvariousscalesσ∈R,itsdomainissampledinaparticularwayinordertoreducetheredundancy.
Thescalecoordinateσisdiscretizedinlogarithmicstepsaccordingto
σ(s,o)=σ02o+s/S,
o∈Z,
s=0,...,S−1
whereoistheoctaveindex,sisthescaleindex,S∈Nisthescaleresolutionandσ0∈R+isthebasescaleoffset.Notethatitispossibletohaveoctavesofnegativeindex.
Thespatialcoordinatexissampledonalatticewitharesolutionwhichisafunctionoftheoctave.Wedenotexothespatialindexforoctaveo;thisindexismappedtothecoordinatexby
x=2oxo,
o∈Z,
xo∈[0,...,No−1]×[0,...,Mo−1].
4
where(No,Mo)isthespatialresolutionofoctaveo.If(M0,N0)isthetheresolutionofthebaseoctaveo=0,theresolutionoftheotheroctavesisobtainedas
No=
N0
,2oMo=
M0
.2oItwillbeusefultostoresomescalelevelstwice,acrossdifferentoctaves.WedothisbyallowingtheparameterstobenegativeorgreaterthanS.Formally,wedenotetherangeofsas[smin,smax].Wealsodenotetherangeoftheoctaveindexoas[omin,omin+O−1],whereO∈Nisthetotalnumberofoctaves.SeeFigure1forasummaryofthesesymbols.
TheSIFTdetectormakesuseofthetwoscalespacesdescribednext.GaussianScaleSpace.TheGaussianscalespaceofanimageI(x)isthefunction
G(x,σ)=(gσ∗I)(x)
wherethescaleσ=σ02o+s/Sissampledasexplainedintheprevioussection.Thisscalespaceiscomputedbythefunctiongaussianss.Inpractice,itisassumedthattheimagepassedtothefunctiongaussianssisalreadypre-smoothedatanominallevelσn,sothatG(x,σ)=(g√σ2−σ2∗I)(x).Assuggestedin[1],thepyramidiscomputedincrementallyfromthebottombysuccessiveconvolutionswithsmallkernels.
DifferenceofGaussiansscalespace.ThedifferenceofGaussians(DOG)scalespaceisthescale“derivative”oftheGaussianscalespaceG(x,σ)alongthescalecoordinateσ.Itisgivenby
D(x,σ(s,o))=G(x,σ(s+1,o))−G(x,σ(s,o)).
ItisobtainedfromtheGaussianscalespacebythediffssfunction.
Remark1(Lowe’sparameters).Lowe’simplementationusesthefollowingparameters:
σn=0.5,
σ0=1.6·21/S,
omin=−1,
S=3
∆
n
∆
Inordertocomputetheoctaveo=−1,theimageisdoubledbybilinearinterpolation(fortheenlargedimageσn=1).Inordertodetectextremaatallscales,theDifferenceofGaussianscalespacehass∈[smin,smax]=[−1,S].SincetheDifferenceofGaussianscalespaceisobtainedbydifferentiatingtheGaussianscalespace,thelatterhass∈[smin,smax]=[−1,S+1].TheparameterOissettocoveralloctaves(i.e.asbigaspossible.)
A.2Thedetector
TheSIFTframes(or“keypoints”)areaselectionof(sub-pixelinterpolated)points(x,σ)oflocalextremumoftheDOGscale-spaceD(x,σ),togetherwithanorientationθderivedfromthespatialderivativeoftheGaussianscale-spaceG(x,σ).Forwhatconcernsthedetector(andbeingingeneraldifferentforthedescriptor),the“support”ofakeypoint(x,σ)isaGaussianwindowH(x)ofdeviationσw=1.5σ.Inpractice,thewindowistruncatedat|x|≤4σw.
TheGaussianandDOGscalespacesarederivedasinSectionA.1.InthisSection,theparametersS,O,smin,smax,omin,σ0refertotheDOGscalespace.TheGaussianscalespacehasexactlythesameparametersoftheDOGscalespaceexceptforsDOGmaxwhichisequaltosmax−1.
Theextractionofthekeypointsiscarriedoneoctavepertimeandarticulatedinthefollowingsteps:
•Detection.KeypointsaredetectedaspointsoflocalextremumofD(x,σ)(SectionA.1).Intheimplementationthefunctionsiftlocalmaxextractssuchextremabylookingat9×9×9neighborhoodsofsamples.
5
x2
Npx1
-2-1.5-1-0.50.511.52mσFigure2:SIFTdescriptorlayout.Theactualsizeofaspatialbinismσwhereσisthescaleofthekeypointandm=3.0isanominalfactor.
Astheoctaveisrepresentedbya3Darray,thefunctionsiftlocalmaxreturnsindexesk(inMatlabconvetion)thataretobemappedtoscalespaceindexes(x1,x2,s)by
k−1=x2+x1Mo+(s−smin)MoNo.
Alternatively,onecanuseind2subtomaptheindexktoasubscript(i,j,l)andthenuse
x1=j−1,
x2=i−1,
s=l−1+smin.
Becauseofthewaysuchmaximaaredetected,onehasalways1≤x2≤Mo−2,1≤x1≤No−2andsmin+1≤s≤smax−1.
Sinceweareinterestedbothinlocalmaximaandminima,theprocessisrepeatedfor−G(x,σ).(Ifonlypositivemaximaandnegativeminimaareofinterest,anotheroptionistotakethelocalmaximaof|G(x,σ)|directly,whichisquicker.)
•Sub-pixelrefinement.Afterbeingextractedbysiftlocalmax,theindex(x1,x2,s)isfittedtothelocalextremumbyquadraticinterpolation.Atthesametime,athresholdonthe“intensity”D(x,σ)andatestonthe“peakedness”oftheextremumisappliedinordertorejectweakpointsorpointsonedges.Theseoperationsareperformedbythesiftrefinemxfunction.Theedgerejectionstepisexplainedindetailinthepaper[1].Thesub-pixelrefinementisaninstanceofNewton’salgorithm.
•Orientation.Theorientationθofakeypoint(x,σ)isobtainedasthepredominantorien-tationofthegradientinawindowaroundthekeypoint.Thepredominantorientationisob-tainedasthe(quadraticallyinterpolated)maximumofthehistogramofthegradientorientations∠∇G(x1,x2,σ)withinawindowaroundthekeypoint.Thehistogramisweightedbothbythemagnitudeofthegradient|∇G(x1,x2,σ)|andaGaussianwindowcenteredonthekeypointandofdeviation1.5σ(theGaussianwindowdefinestheregionofinterestaswell).Aftercollectingthedatainthebinsandbeforecomputingthemaximum,thehistogramissmoothedbyamovingaveragefilter.Inadditiontotheglobalmaximum,eachlocalmaximumwithavalueabove0.8%ofthemaxiumisretainedaswell.ThusforeachlocationandscalemultipleSIFTframesmightbegenerated.Thesecomputationsarecarriedbythefunctionsiftormx.
A.3Thedescriptor
TheSIFTdescriptorofakeypoint(x,σ)isalocalstatisticoftheorientationsofthegradientoftheGaussianscalespaceG(·,σ).
6
Histogramlayout.TheSIFTdescriptor(Figure2)isanhistogramofthegradientorientationsaugmentedand2-DlocationsinthesupportoftheSIFTframe.Formally,thedomainofthehistogramarethetuples(x,θ)∈R2×R/Z.ThebinsformathreedimensionallatticewithNp=4binsfor
2
eachspatialdirectionandNo=8binsfortheorientationforatotalofNpNo=128components(thesenumberscanbechangedbysettingtheappropriateparameters).Eachspatialbinissquarewithunitaryedge.ThewindowH(x)isGaussianwithdeviationequaltohalftheextensionofthespatialbinrange,thatisNp/2.
Keypointnormalization.Inordertoachieveinvariance,thehistogramlayoutisprojectedontheimagedomainaccordingtotheframeofreferenceofthekeypoint.Thespatialdimensionsaremultipliedbymσwhereσisthescaleofthekeypointandmisanominalfactor(equalto3.0bydefault).Thelayoutisalsorotatedsothattheaxisx1isalignedtothedirectionθofthekeypoint.Weighting.ThehistogramisweightedbythegradientmodulusandaGaussianwindowedandtri-linearlyinterpolated.Moreindetail,eachsample(x1,x2,∠∇G(x,σ))is
•weightedby|∇G(x,σ)|;
•weightedbythegaussianwindowH(x);
•projectedonthecentersoftheeightsurroundingbins;
•summedtoeachofthisbinsproportionallytoitsdistancefromtherespectivecenter.
Remark2.(Lowe’simpelmentation)InordertoachievefullcompatibilitywithLowe’soriginalimple-mentation,oneneedstopayattentiontomanylittledetailsasthememorylayoutofthedescriptorandtheconventionforthegradientorientations.Thesearedetailedinthesourcecode.
7
因篇幅问题不能全部显示,请点此查看更多更全内容