21 January 2011

Fast recursive directory search with .NET

For a current project, I needed to traverse a document tree and find all the MP4 files, store their properties in a custom class, then search on each file's name using the custom class' property. Then an HTML anchor tag needed to be created so we could change the Real Media file links to point to the MP4 version of the files; the new link would include some JavaScript for the onClick event to pop up a video player for the MP4 file.

Here's the simple custom File class. If you'd like to copy any of the code, simply double-click in the code to select all of it, then use Ctrl/Cmd-C to copy.
Public Class File
Private myFilePath As String
Private myFileName As String
Private myTopLevelFolder As String
Private myCurrentFolder As String

Public Property FilePath As String
Get
Return Me.myFilePath
End Get
Set(ByVal value As String)
Me.myFilePath = value
End Set
End Property

Public Property FileName As String
Get
Return Me.myFileName
End Get
Set(ByVal value As String)
Me.myFileName = value
End Set
End Property

Public Property TopLevelFolder As String
Get
Return Me.myTopLevelFolder
End Get
Set(ByVal value As String)
Me.myTopLevelFolder = value
End Set
End Property

Public Property CurrentFolder As String
Get
Return Me.myCurrentFolder
End Get
Set(ByVal value As String)
Me.myCurrentFolder = value
End Set
End Property
End Class
I started with the great code from the Java2S Site, which searches a directory recursively. For this project, I considered using an ArrayList, because these data structures have great, built-in search functions. A closer examination and some research revealed a better solution: the .NET Dictionary class, which provides great performance for searches.

The Dictionary object stores name-value pairs and functions much like a hash table, but with one major difference: no overhead for boxing and unboxing, as is the case with hash tables. It required just a few changes to the code to use a Dictionary to store the values, and then to search on the filename and path (the same filename might be in multiple directories, so we need to use the path as well to uniquely identify the file).

Here's the main form's code:
' VB.NET folder search code courtesy
' http://www.java2s.com/Code/VB/File-Directory/Findafilesearchdirectoryrecursively.htm

Imports System
Imports System.Windows.Forms
Imports System.IO
Imports System.Text.RegularExpressions
Imports System.Collections.Specialized
Imports System.Text

Public Class FrmFileSearch
Inherits Form

' label that displays current directory
Friend WithEvents lblDirectory As Label

' label that displays directions to user
Friend WithEvents lblDirections As Label

' button that activates search
Friend WithEvents cmdSearch As Button

' text boxes for inputting and outputting data
Friend WithEvents txtInput As TextBox
Friend WithEvents txtOutput As TextBox

#Region " Windows Form Designer generated code "

Public Sub New()
MyBase.New()

'This call is required by the Windows Form Designer.
InitializeComponent()

'Add any initialization after the InitializeComponent() call

End Sub

'Form overrides dispose to clean up the component list.
Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)
If disposing Then
If Not (components Is Nothing) Then
components.Dispose()
End If
End If
MyBase.Dispose(disposing)
End Sub

'Required by the Windows Form Designer
Private components As System.ComponentModel.Container

'NOTE: The following procedure is required by the Windows Form Designer
'It can be modified using the Windows Form Designer.
'Do not modify it using the code editor.
<System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()
Me.txtOutput = New System.Windows.Forms.TextBox()
Me.lblDirections = New System.Windows.Forms.Label()
Me.lblDirectory = New System.Windows.Forms.Label()
Me.txtInput = New System.Windows.Forms.TextBox()
Me.cmdSearch = New System.Windows.Forms.Button()
Me.SuspendLayout()
'
'txtOutput
'
Me.txtOutput.BackColor = System.Drawing.SystemColors.Control
Me.txtOutput.Font = New System.Drawing.Font("Microsoft Sans Serif", 9.75!, System.Drawing.FontStyle.Bold, System.Drawing.GraphicsUnit.Point, CType(0, Byte))
Me.txtOutput.HideSelection = False
Me.txtOutput.Location = New System.Drawing.Point(17, 134)
Me.txtOutput.Multiline = True
Me.txtOutput.Name = "txtOutput"
Me.txtOutput.ReadOnly = True
Me.txtOutput.ScrollBars = System.Windows.Forms.ScrollBars.Vertical
Me.txtOutput.Size = New System.Drawing.Size(460, 61)
Me.txtOutput.TabIndex = 4
Me.txtOutput.UseWaitCursor = True
'
'lblDirections
'
Me.lblDirections.Font = New System.Drawing.Font("Microsoft Sans Serif", 10.0!, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, CType(0, Byte))
Me.lblDirections.Location = New System.Drawing.Point(17, 82)
Me.lblDirections.Name = "lblDirections"
Me.lblDirections.Size = New System.Drawing.Size(375, 17)
Me.lblDirections.TabIndex = 1
Me.lblDirections.Text = "Enter Path to Search:"
'
'lblDirectory
'
Me.lblDirectory.Font = New System.Drawing.Font("Microsoft Sans Serif", 10.0!, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, CType(0, Byte))
Me.lblDirectory.Location = New System.Drawing.Point(17, 17)
Me.lblDirectory.Name = "lblDirectory"
Me.lblDirectory.Size = New System.Drawing.Size(375, 85)
Me.lblDirectory.TabIndex = 0
Me.lblDirectory.Text = "Current Directory:"
'
'txtInput
'
Me.txtInput.Location = New System.Drawing.Point(17, 108)
Me.txtInput.Name = "txtInput"
Me.txtInput.Size = New System.Drawing.Size(334, 20)
Me.txtInput.TabIndex = 3
Me.txtInput.Text = "c:\RM_Conversion\"
'
'cmdSearch
'
Me.cmdSearch.Font = New System.Drawing.Font("Microsoft Sans Serif", 8.0!, System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point, CType(0, Byte))
Me.cmdSearch.Location = New System.Drawing.Point(355, 108)
Me.cmdSearch.Name = "cmdSearch"
Me.cmdSearch.Size = New System.Drawing.Size(122, 20)
Me.cmdSearch.TabIndex = 2
Me.cmdSearch.Text = "Search Directory"
'
'FrmFileSearch
'
Me.AutoScaleBaseSize = New System.Drawing.Size(5, 13)
Me.ClientSize = New System.Drawing.Size(501, 208)
Me.Controls.Add(Me.txtOutput)
Me.Controls.Add(Me.txtInput)
Me.Controls.Add(Me.cmdSearch)
Me.Controls.Add(Me.lblDirections)
Me.Controls.Add(Me.lblDirectory)
Me.Name = "FrmFileSearch"
Me.StartPosition = System.Windows.Forms.FormStartPosition.CenterScreen
Me.Text = "Using Regular Expressions"
Me.ResumeLayout(False)
Me.PerformLayout()

End Sub

#End Region

Dim currentDirectory As String = Directory.GetCurrentDirectory
Dim directoryList As String()
Dim fileArray As String()

'Dim files As New ArrayList()
Dim files As Dictionary(Of String, File) = New Dictionary(Of String, File)

Dim found As NameValueCollection = New NameValueCollection()


Private Sub txtInput_KeyDown(ByVal sender As System.Object, _
ByVal e As System.Windows.Forms.KeyEventArgs) _
Handles txtInput.KeyDown

If (e.KeyCode = Keys.Enter) Then
cmdSearch_Click(sender, e)
End If

End Sub

Private Sub cmdSearch_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles cmdSearch.Click

Dim onClick As String
Dim myFile As File
Dim builder As New StringBuilder
Dim count As Integer = 0
Dim startTime As DateTime = DateTime.Now
Dim image As String
Dim save As Boolean
Dim ver5FileName As String = String.Empty
Dim ver2FileName As String = String.Empty
Dim finalFileName As String = String.Empty

Dim foundFile As Boolean = False

If txtInput.Text <> "" Then

' verify that user input is a valid directory name
If Directory.Exists(txtInput.Text) Then
currentDirectory = txtInput.Text

' reset input text box and update display
lblDirectory.Text = "Current Directory:" & vbCrLf & _
currentDirectory

' show error if user does not specify valid directory
Else
MessageBox.Show("Invalid Directory", "Error", _
MessageBoxButtons.OK, MessageBoxIcon.Error)

Return
End If

End If

' clear text boxes
txtInput.Text = ""
txtOutput.Text = ""

onClick = " onclick=" & Chr(34) & "window.open(this.href,'popup','width=347,height=322,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false" & Chr(34) & ">Play Video</a>"

' search directory
SearchDirectory(currentDirectory)

count = files.Count

' Loop over each key-value pair object
For Each kvp As KeyValuePair(Of String, File) In files
foundFile = False
finalFileName = String.Empty

' Convert the object to the custom File type.
myFile = TryCast(kvp.Value, File)

' Thumbnail image used by the Flash player.
image = "generic.jpg"

' We'll use a StringBuilder for its performance.
builder.Append("File: ")
builder.Append(myFile.CurrentFolder.Replace("\", "/")).AppendLine().AppendLine()

builder.Append("<a href=")
builder.Append(Chr(34))
builder.Append("/mp4-conv/player.htm?file=http://flash.mydomain.com/mp4-conv/")
builder.Append(myFile.CurrentFolder.Replace("\", "/"))
builder.Append("ℑ=/mp4-conv/posters/")
builder.Append(image)
builder.Append(Chr(34))
builder.Append(onClick).AppendLine()
builder.Append("----------------------------------------").AppendLine.AppendLine()
Next kvp

' Clear output for new search.
found.Clear()

' Calculate the execution time using the TimeSpan class.
Dim executionTime As TimeSpan = DateTime.Now - startTime

' Only put general processing info into the textbox for best performance.
txtOutput.Text &= count.ToString & " files processed in " &_
executionTime.Minutes.ToString() & " minutes and " &_
executionTime.Seconds.ToString() & " seconds."

Dim path As String = "c:\RM_Conversion\links.txt"

' If our text file already exists, delete it.
' We'll write the HTML anchor tags to this file.
If System.IO.File.Exists(path) Then
System.IO.File.Delete(path)
Else ' If text file doesn't exist, create it.
System.IO.File.Create(path)
End If

' Write the StringBuilder object to the text file.
' Fetch whether the write was successful (Boolean).
save = SaveTextToFile(builder.ToString, path)

txtOutput.Text &= vbCrLf & "Wrote to " &_
path & "? " & save
files.Clear()

End Sub ' cmdSearch_Click

' Write to a file.
Public Function SaveTextToFile(ByVal strData As String, _
ByVal FullPath As String, _
Optional ByVal ErrInfo As String = "") As Boolean

Dim bAns As Boolean = False
Dim objReader As StreamWriter
Try
objReader = New StreamWriter(FullPath)
objReader.Write(strData)
objReader.Close()
bAns = True
Catch Ex As Exception
ErrInfo = Ex.Message
End Try
Return bAns
End Function

' Dirty way to determine the top level folder in the path.
' There's a way to do this in reg exp, but this worked :)
Private Function GetTopLevelFolder(ByVal filePath As String) As String
Dim arrChar As Array = filePath.Substring(filePath.LastIndexOf("RM_Conversion\") + 7).ToCharArray()
Dim value As String = ""

For i As Integer = 0 To arrChar.Length - 1
If arrChar(i) <> "\" Then
value &= arrChar(i)
Else
Exit For
End If
Next

Return value
End Function


' search directory using regular expression
Private Sub SearchDirectory(ByVal currentDirectory As String)

' for file name without directory path
Try
Dim fileName As String = ""
Dim myFile As String
Dim myDirectory As String
Dim myFileObj As File
Dim currentFolder As String

' regular expression for extensions matching pattern
Dim regularExpression As Regex = _
New Regex("([a-zA-Z0-9]+\.(?<extension>\w+))")

' stores regular-expression-match result
Dim matchResult As Match

Dim fileExtension As String ' holds file extensions

' number of files with given extension in directory
Dim extensionCount As Integer

' get directories
directoryList = _
Directory.GetDirectories(currentDirectory)

' get list of files in current directory
fileArray = Directory.GetFiles(currentDirectory, "*.mp4")

' iterate through list of files
For Each myFile In fileArray
fileName = myFile.Substring( _
myFile.LastIndexOf("\") + 1)

matchResult = regularExpression.Match(fileName)


If (matchResult.Success) Then
fileExtension = matchResult.Result("${extension}")
Else
fileExtension = "[no extension]"
End If


If (found(fileExtension) = Nothing) Then
currentFolder = myFile.ToString().Substring(currentDirectory.LastIndexOf("RM_Conversion\") + 7).Replace("\", "/")

myFileObj = New File()
myFileObj.FileName = fileName
myFileObj.FilePath = myFile.ToString()
myFileObj.TopLevelFolder = Me.GetTopLevelFolder(myFile.ToString())
myFileObj.CurrentFolder = currentFolder
files.Add(myFile.ToString(), myFileObj)
Else
extensionCount = _
Convert.ToInt32(found(fileExtension)) + 1

found(fileExtension) = extensionCount.ToString()
End If
Next

For Each myDirectory In directoryList
SearchDirectory(myDirectory)
Next

Catch unauthorizedAccess As UnauthorizedAccessException
MessageBox.Show("Some files may not be visible due to" _
& " permission settings", "Warning", _
MessageBoxButtons.OK, MessageBoxIcon.Information)

End Try
End Sub
End Class

Public Class MainClass

Shared Sub Main()
Dim myform As Form = New FrmFileSearch()
Application.Run(myform)
End Sub ' Main

End Class

No comments: