Log in

No account? Create an account
Gavin Greig [userpic]

Mind your language!

April 9th, 2005 (12:37 am)

I've spent most of this week at work thinking about language.

What we do is to generate personality profiles, using a psychological model developed by the company founder. These profiles are then used in a number of scenarios - mostly business oriented, but our services and materials are also used in education and in relationship counselling.

The job of the software development team is to write software that creates the profiles, and we're currently engaged in a major rewrite in order to integrate and extend a lot of independent utilities that have grown up over the years. This includes more universal support for the languages - over 20 of them at the time of writing - that some of our materials are translated into. Hence the contemplation of language.

English may be a difficult language to learn, but it's an easy one to work with. We are remarkably unfussy about structure and grammar compared to other languages. This means there are less helpful rules to follow for novices, but for our purposes it means you can write a piece of text once about a man, make a copy with all the "he"s changed to "she"s for a woman (and a few other easily identifiable common changes), and you're done. And by and large, this approach of storing only a male and a female version works for other languages too, though they may require more extensive modification to take account of the gender of the subject.

However, it doesn't work universally. There are languages that require modification depending on whether a man or a woman is discussing the person who's the subject of the text. There are languages where you don't just have to consider "singular or plural", but you may also have to consider "dual", which is when exactly two people or objects are being discussed. And you may have to modify your language depending on the relative social standing of the person it is addressed to. The classic example of this is Japanese, which takes a number of different forms depending on context, but you can find it even in languages as close to home as French and German, both of which contain a formal and a less formal version of "you". (Or should that be "thou"? :-)

What we're tentatively looking at as a solution is to allow text to apply to one or more "grammar contexts", each grammar context being a single combination of the possibilities sketched out above. We're going to have to be careful about how we manage it though - if we combine all the possibilities we might need, including all levels of Japanese formality and such considerations as whether we need to define neuter and mixed genders (for non-personalised text and text discussing mixed groups, respectively), then we could end up with about 4000 possible "grammar contexts" which might - or might not - apply to each individual text item. Er... let us know if you think we've missed anything, as we'd rather know now!

At the moment, we think the solution is sound in its flexibility and can kept within reasonable bounds of complexity by being careful about how much of it we actually use and expose - we don't have to define all 4000 possible grammar contexts, for example - but it's a scary thought how complex the whole business could become.

And I haven't even mentioned issues like defining the languages and scripts themselves, though thankfully there are international standards to help us out with those.


Posted by: Nik Whitehead (sharikkamur)
Posted at: April 9th, 2005 09:32 am (UTC)

It doesn't help that so many of the constructions, particularly those dealing with gender and number are completely alien to native English speakers, does it? If you have grown up with a language that allows you to just change 'he', 'him' and 'his' for 'she', 'her' and 'hers' or 'it', 'its' and 'its' the very prospect of having to suddenly change all of the verbs and everything else to fit is quite daunting.

Believe me, I am currently daunted and seriously doubt my ability to do it, particularly when even simple things like numbers change their form depending on gender - and not of the item they refer to either.

I'd never really considered the lack of rules in English as being what made it difficult to learn, but now you've brought the matter up I can see the point. It must also make it very difficult for speech recognition systems, as they must have a working grammar to support the recognition process.

I take it that your software generates sentences on the fly then, rather than just having a large database for each language?

Hmm... off at a bit of a tangent, I wonder if that's another reason why native English-speakers don't learn too many other languages - having absorbed a language with few rules, they are put off by the number of rules in other languages?

Posted by: Gavin Greig (ggreig)
Posted at: April 9th, 2005 11:11 am (UTC)
Black Hat

Profiles are built from a selection of appropriate "statements", so that no two profiles are identical, even given the same inputs. We don't actually "build" individual sentences - that would be way too complicated! - but as we do more in the way of multi-person profiles, where the participants may provide feedback on each other, we have to start considering more seriously the issue of which grammatical variations of each statement we need to have available pre-prepared in each language. As it is, even with single person profiles, the simplistic "he/she" differentiation can be limiting for translators (and could therefore reflect less favourably, if slightly unfairly, on the quality of our profiling).

I think you're right, the impedance barrier between English and other languages due to the presence or absence of strict grammatical construction rules works both ways, depending on which approach you're more used to. It's a bit sad that there's more motivation for others to learn English than for us to pick up their languages, as it panders to English-speakers' laziness in crossing that barrier. I would have to include myself in that condemnation, as my French is now rusty to the point of uselessness and my German was never much good.

When I was at school, there were some pupils learning German who went to an English teacher to ask for tuition in grammar outside hours, in order to give them a better understanding of what was being asked of them in learning German. The teacher actually got in trouble for agreeing to teach them, albeit after hours, as teaching formal grammar wasn't in favour at the time. I hope things have improved a bit since then.

Posted by: Nik Whitehead (sharikkamur)
Posted at: April 9th, 2005 02:35 pm (UTC)

I don't think it has improved at all, unfortunately. That's also one of my big problems learning Icelandic - I never did formal grammar beyond nouns, verbs and adjectives either, so trying to work out and remember cases is a real stinker.

Part of this, I suspect, is down to the loss of Latin from schools. I'd have loved to have done Latin, and I'm sure that the rigourous examination of grammar it required would have made it much easier for me now.

As for other languages... I can still manage what I think of as 'survival French'. I can survive in it as long as people speak slowly and don't expect anything more than pleasantries and shopping. My German has deteriorated to a lot of words and a few phrases, but it's surprising how often I find myself looking for an Icelandic word and realising that I know it in German.

Posted by: meepfrog (meepfrog)
Posted at: April 10th, 2005 11:31 am (UTC)

On learning Old Icelandic I found it a mixture of English, German and Russian.

As to Latin, I /am/ younger than you, and I learnt Latin at my local comprehensive, so obviously it's a fault of England rather than all of Britain. :P

4 Read Comments