Hi,

I'm writing a paper and I need to c=
alculate tf-idf. Whit your help I managed to get results, I needed, but the=
problem is that I need to be able to explain how each number was gotten. S=
o I tried to understand how idf was calculated and the numbers i get don=
9;t correspond to those I should get . =C2=A0

I have 3 documents (each line a document)

a a b c m m

=
e a c d e e

d j k l m m c

When I calcula=
te tf, I get this=C2=A0

(1048576,[99,100,106,107,108,109],[1.0,1.=
0,1.0,1.0,1.0,2.0])

(1048576,[97,98,99,109],[2.0,1.0,1.0,2.0])

(1048576,[97,99,100,101],[1.0,1.0,1.0,3.0]

idf is supposedly calculated idf =3D log((m + 1) / (d(t) + 1))

m=
-number of documents (3 in my case).

d(t) - in how many document=
s is term present

a: log(4/3) =3D0.1249387366

b: log(4/=
2) =3D0.3010299957

c: log(4/4) =3D0

d: log(4/3) =3D0.12=
49387366

e: log(4/2) =3D0.3010299957

l: log(4/2) =3D0.3=
010299957

m: log(4/3) =3D0.1249387366

Wh=
en I output =C2=A0idf vector ` idf.idf.toArray.filter(_.>(0)).distinct.f=
oreach(println(_)) `

I get :

1.3862943611198906

0.6931471805599453

Best regards,