Home All Groups Group Topic Archive Search About

Batch Script Text file parse

Author
24 Mar 2009 5:13 PM
tator.usenet
Newbie here.

I am trying to get a single line of text from each file within a
directory of text files.

Example

I have directory X that contains files:

file1.txt
file2.txt
file3.txt

Each text file is similar in structure, and contains lines like:

Some mumbo jumbo
Uniqe ID: xyzpdq
some other mumbo jumbo
etc, etc, etc.

In all cases, line 2 contains "Unique ID:" and then the unique
identifier text, which is what I need to extract.
In the above case, I need the "xyzpdq".  Note that this is what
changes in each test file.

I want to run a script that will parse either this unique ID, or the
entire 2nd line of text (in which case I can just trim it later) -
from all .txt files within the folder.

any help?

Thanks.

Author
24 Mar 2009 5:24 PM
T Lavedas
On Mar 24, 1:13 pm, tator.use***@gmail.com wrote:
Show quoteHide quote
> Newbie here.
>
> I am trying to get a single line of text from each file within a
> directory of text files.
>
> Example
>
> I have directory X that contains files:
>
> file1.txt
> file2.txt
> file3.txt
>
> Each text file is similar in structure, and contains lines like:
>
> Some mumbo jumbo
> Uniqe ID: xyzpdq
> some other mumbo jumbo
> etc, etc, etc.
>
> In all cases, line 2 contains "Unique ID:" and then the unique
> identifier text, which is what I need to extract.
> In the above case, I need the "xyzpdq".  Note that this is what
> changes in each test file.
>
> I want to run a script that will parse either this unique ID, or the
> entire 2nd line of text (in which case I can just trim it later) -
> from all .txt files within the folder.
>
> any help?
>
> Thanks.

How do you want the output stored/presented.  Is it necessary to know
what file it comes from?  Or are you just after a list of the unique
IDs?

The simplest way to do this is ...

  find "Unique ID:" d:\pathspec\*.txt > output.txt

This will create a file with a line of hyphens followed by the file's
name; followed by the matching Unique ID line from that file.  This
will repeat for all the .txt files in the named folder.

If you want JUST the unique IDs, with none of the other stuff, try
something like this ...

  (for /f "tokens=3" %%a in ('find "Unique ID:" d:\pathspec\*.txt') do
echo.%%a) > output.txt

If you need/want something else, you will need to be more specific
about your requirements/desires.

Tom Lavedas
***********
http://there.is.no.more/tglbatch/
Author
24 Mar 2009 5:55 PM
tator.usenet
On Mar 24, 10:24 am, T Lavedas <tglba***@cox.net> wrote:
Show quoteHide quote
> On Mar 24, 1:13 pm, tator.use***@gmail.com wrote:
>
>
>
>
>
> > Newbie here.
>
> > I am trying to get a single line of text from each file within a
> > directory of text files.
>
> > Example
>
> > I have directory X that contains files:
>
> > file1.txt
> > file2.txt
> > file3.txt
>
> > Each text file is similar in structure, and contains lines like:
>
> > Some mumbo jumbo
> > Uniqe ID: xyzpdq
> > some other mumbo jumbo
> > etc, etc, etc.
>
> > In all cases, line 2 contains "Unique ID:" and then the unique
> > identifier text, which is what I need to extract.
> > In the above case, I need the "xyzpdq".  Note that this is what
> > changes in each test file.
>
> > I want to run a script that will parse either this unique ID, or the
> > entire 2nd line of text (in which case I can just trim it later) -
> > from all .txt files within the folder.
>
> > any help?
>
> > Thanks.
>
> How do you want the output stored/presented.  Is it necessary to know
> what file it comes from?  Or are you just after a list of the unique
> IDs?
>
> The simplest way to do this is ...
>
>   find "Unique ID:" d:\pathspec\*.txt > output.txt
>
> This will create a file with a line of hyphens followed by the file's
> name; followed by the matching Unique ID line from that file.  This
> will repeat for all the .txt files in the named folder.
>
> If you want JUST the unique IDs, with none of the other stuff, try
> something like this ...
>
>   (for /f "tokens=3" %%a in ('find "Unique ID:" d:\pathspec\*.txt') do
> echo.%%a) > output.txt
>
> If you need/want something else, you will need to be more specific
> about your requirements/desires.
>
> Tom Lavedas
> ***********http://there.is.no.more/tglbatch/- Hide quoted text -
>
> - Show quoted text -

Perfect! - ... almost - any way to get the info on the same line so I
can easily open it in two columns in excel (e.g. - column 1 = file
name, column 2 = unique ID?  I'm sure I can manage with excel, but
would be easier to not have to deal with it.

In any case, very much appreciate the help!
Author
24 Mar 2009 8:42 PM
T Lavedas
On Mar 24, 1:55 pm, tator.use***@gmail.com wrote:
Show quoteHide quote
> On Mar 24, 10:24 am, T Lavedas <tglba***@cox.net> wrote:
>
>
>
> > On Mar 24, 1:13 pm, tator.use***@gmail.com wrote:
>
> > > Newbie here.
>
> > > I am trying to get a single line of text from each file within a
> > > directory of text files.
>
> > > Example
>
> > > I have directory X that contains files:
>
> > > file1.txt
> > > file2.txt
> > > file3.txt
>
> > > Each text file is similar in structure, and contains lines like:
>
> > > Some mumbo jumbo
> > > Uniqe ID: xyzpdq
> > > some other mumbo jumbo
> > > etc, etc, etc.
>
> > > In all cases, line 2 contains "Unique ID:" and then the unique
> > > identifier text, which is what I need to extract.
> > > In the above case, I need the "xyzpdq".  Note that this is what
> > > changes in each test file.
>
> > > I want to run a script that will parse either this unique ID, or the
> > > entire 2nd line of text (in which case I can just trim it later) -
> > > from all .txt files within the folder.
>
> > > any help?
>
> > > Thanks.
>
> > How do you want the output stored/presented.  Is it necessary to know
> > what file it comes from?  Or are you just after a list of the unique
> > IDs?
>
> > The simplest way to do this is ...
>
> >   find "Unique ID:" d:\pathspec\*.txt > output.txt
>
> > This will create a file with a line of hyphens followed by the file's
> > name; followed by the matching Unique ID line from that file.  This
> > will repeat for all the .txt files in the named folder.
>
> > If you want JUST the unique IDs, with none of the other stuff, try
> > something like this ...
>
> >   (for /f "tokens=3" %%a in ('find "Unique ID:" d:\pathspec\*.txt') do
> > echo.%%a) > output.txt
>
> > If you need/want something else, you will need to be more specific
> > about your requirements/desires.
>
> > Tom Lavedas
> > ***********http://there.is.no.more/tglbatch/-Hide quoted text -
>
> > - Show quoted text -
>
> Perfect! - ... almost - any way to get the info on the same line so I
> can easily open it in two columns in excel (e.g. - column 1 = file
> name, column 2 = unique ID?  I'm sure I can manage with excel, but
> would be easier to not have to deal with it.
>
> In any case, very much appreciate the help!

OK, now I know what your really wanted.  Maybe something like this
will serve ...

echo."Header 1","Header 2" > output.csv
( for %%a in (d:\pathspec\*.txt) do (
  for /f "tokens=3" %%B in ('find "Unique ID:" ^< %%a') do (
    echo."%%a",%%B)
  )
  ) >> output.csv
start "" output.csv

Tom Lavedas
***********
http://there.is.no.more/tglbatch/
Author
24 Mar 2009 5:35 PM
Pegasus [MVP]
<tator.use***@gmail.com> wrote in message
Show quoteHide quote
news:231436ae-72ed-499e-a100-0447517ea7af@a5g2000pre.googlegroups.com...
> Newbie here.
>
> I am trying to get a single line of text from each file within a
> directory of text files.
>
> Example
>
> I have directory X that contains files:
>
> file1.txt
> file2.txt
> file3.txt
>
> Each text file is similar in structure, and contains lines like:
>
> Some mumbo jumbo
> Uniqe ID: xyzpdq
> some other mumbo jumbo
> etc, etc, etc.
>
> In all cases, line 2 contains "Unique ID:" and then the unique
> identifier text, which is what I need to extract.
> In the above case, I need the "xyzpdq".  Note that this is what
> changes in each test file.
>
> I want to run a script that will parse either this unique ID, or the
> entire 2nd line of text (in which case I can just trim it later) -
> from all .txt files within the folder.
>
> any help?
>
> Thanks.

Here is a batch file solution:
@echo off
for %%a in ("d:\temp\*.txt") do call :Sub %%a
goto :eof

:Sub
for /F "skip=1 tokens=3" %%b in ('type "%*"') do (
  echo %* %%b & goto :eof
)

Its advantage is that it's simple. It's drawback is that it's slow and that
it will probably trip over so-called "poison characters". If you want
something robust and fast then a VB Script file would be a better solution.
How about having a go at it yourself, then requesting specific advice here
instead of asking for the whole thing to be delivered on a platter?
Author
24 Mar 2009 5:38 PM
Richard Mueller [MVP]
<tator.use***@gmail.com> wrote in message
Show quoteHide quote
news:231436ae-72ed-499e-a100-0447517ea7af@a5g2000pre.googlegroups.com...
> Newbie here.
>
> I am trying to get a single line of text from each file within a
> directory of text files.
>
> Example
>
> I have directory X that contains files:
>
> file1.txt
> file2.txt
> file3.txt
>
> Each text file is similar in structure, and contains lines like:
>
> Some mumbo jumbo
> Uniqe ID: xyzpdq
> some other mumbo jumbo
> etc, etc, etc.
>
> In all cases, line 2 contains "Unique ID:" and then the unique
> identifier text, which is what I need to extract.
> In the above case, I need the "xyzpdq".  Note that this is what
> changes in each test file.
>
> I want to run a script that will parse either this unique ID, or the
> entire 2nd line of text (in which case I can just trim it later) -
> from all .txt files within the folder.
>
> any help?
>
> Thanks.

Someone else can supply a batch file solution. Here is a VBScript solution:
==========
Option Explicit

Dim strFolder, objFSO, objFolder, objItem, objFile, strLine, strSearch,
strID

Const ForReading = 1

' Specify the directory.
strFolder = "c:\scripts"

' Specify the string to search for.
' Make all lower case for comparison (not spelling).
strSearch = "unique id:"

' Bind to the folder object.
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder(strFolder)

' Enumerate all files in the folder.
For Each objItem in objFolder.Files
    ' Open the file with a textstream object.
    Set objFile = objFSO.OpenTextFile(objItem.Path, ForReading)
    ' Read each line of the file.
    Do Until objFile.AtEndOfStream
        strLine = Trim(objFile.ReadLine)
        ' Check for ID search string.
        If (InStr(LCase(strLine), strSearch) = 1) Then
            ' The ID is assumed to follow the first ":" in the line.
            strID = Trim(Mid(strLine, InStr(strLine, ":") + 1))
            Wscript.Echo strID
            ' No need to read any more of the file.
            Exit Do
        End If
    Loop
    ' Close the file.
    objFile.Close
Next

--
Richard Mueller
MVP Directory Services
Hilltop Lab - http://www.rlmueller.net
--