Subversion visually explained in 30 sec - svn tutorial

This video tutorial tries to explain the basics of how SVN is used. The buttons at the bottom are movie controls. Scroll down for conflict explanation.

svn.swf



This explains a conflict while working with Subversion:

svn_conflict.swf


Summary of SVN commands (do these commands with TortoiseSVN by right clicking on a file or directory):

  • SVN checkout

    The first command to use Subversion. Downloads repository contents to a folder on your PC, makes this folder special SVN folder with SVN icon.
  • Update

    Merges new changes from repository to your working copy. Update before every commit is important because of possible conflicts, that must be resolved by conversation with your teammate, because concurrent changes cannot be resolved automatically.
  • Commit

    Megres changes from your yorking copy to SVN repository.
  • Resolved

    Tells SVN that the conflicted file is now OK.
  • Add

    Schedules new file to be saved to repository with next commit.
  • Delete

    Shedules file to be deleted from repository with next commit. SVN keeps all history, so accidental deletes are not a problem.
  • Revert

    Undoes all changes to your working copy since last commit. Usefull if you really mess up and want to throw away the changes and start over.
  • Log

    Shows nice log of changes sorted by date, even with simple statistics.

kick it on DotNetKicks.com

Posted by Martin Konicek on 10:46 PM 18 comments

Why does XML have attributes?

Have you ever wondered why there are also attributes in XML and not just elements, causing many never-ending "element vs attribute" discussions?
And why is XML so redundant - repeating the tag name in the ending tag, thus making the document much larger?
I asked Charles Goldfarb, the author of SGML - the original format from the 1960s (already had <tags> and attributes) from which XML and HTML were derived.
He was very kind to reply to me. Enjoy:


- In <elem>text</elem>, why repeat "elem" for the second time?

In fact, there is an option in SGML that allows you to omit the element-type
name from an end-tag, but it is rarely supported. (In XML the option does not
exist at all.) The name is there primarily for the convenience of humans
debugging the markup, as there are large document types, such as aircraft
maintenance manuals, where elements can nest very deeply.


- Why are there attributes in SGML? One could express the same just with elements.

In fact, one could just as easily express the same just with attributes!

The basic notion is that elements represent objects while attributes represent
their properties. However, the property "content" is given a convenient syntax
-- everything between the start-tag and matching end-tag -- which makes it
simpler to express content hierarchy by simply nesting elements.

In a nutshell, if information is part of the content of the document, represent
it as an element. If it is some other property, represent it as an attribute.

I wish you the best of luck with your studies.

Charles Goldfarb

--
©2008 Charles F. Goldfarb * www.xmlhandbook.com * www.xmlbooks.com
The XML Handbook?* 5th Edition ISBN 0-13-049765-7 * 100,000 in print!
--

kick it on DotNetKicks.com

Posted by Martin Konicek on 1:26 PM 3 comments

Implement your own Parallel.For in C#

This article is intended for .NET 2 or 3.5. If you are on .NET 4, use System.Threading.Parallel class.

I was thinking about how Parallel.For could be implemented. I wrote my own, I am using it in my own project and it scales very well.

Here it is:

public class Parallel
{
/// <summary>
/// Parallel for loop. Invokes given action, passing arguments
/// fromInclusive - toExclusive on multiple threads.
/// Returns when loop finished.
/// </summary>
public static void For(int fromInclusive, int toExclusive, Action action)
{
    // chunkSize = 1 makes items to be processed in order.
    // Bigger chunk size should reduce lock waiting time and thus
    // increase paralelism.
    int chunkSize = 4;

    // number of process() threads
    int threadCount = Environment.ProcessorCount;
    int index = fromInclusive - chunkSize;
    // locker object shared by all the process() delegates
    var locker = new object();

    // processing function
    // takes next chunk and processes it using action
    var process = delegate()
    {
        while (true)
        {
            int chunkStart = 0;
            lock (locker)
            {
                // take next chunk
                index += chunkSize;
                chunkStart = index;
            }
            // process the chunk
            // (another thread is processing another chunk 
            //  so the real order of items will be out-of-order)
            for (int i = chunkStart;  i < chunkStart + chunkSize; i++)
            {
                if (i >= toExclusive) return;
                action(i);
            }
        }
    };

    // launch process() threads
    IAsyncResult[] asyncResults = new IAsyncResult[threadCount];
    for (int i = 0; i < threadCount; ++i)
    {
        asyncResults[i] = process.BeginInvoke(null, null);
    }
    // wait for all threads to complete
    for (int i = 0; i < threadCount; ++i)
    {
        process.EndInvoke(asyncResults[i]);
    }
}


As noted in the code, by setting chunkSize to 1 we can make the items be processed in order. With bigger chunk size, items can be processed in mixed non-deterministic order, which is ok in some application, like when you want to modify all items of a collection or render lines of a picture.
Bigger chunkSize should reduce lock waiting time and thus increase overall speed. But too big chunkSize is bad too, because if the work is split into only a few big parts, there is not enough parallelism exposed - we could find ourselves waiting for the last single thread to finish its large chunk.

Using Parallel.For is simple:

Parallel.For(0, 1000, delegate(int i)
{
    // your parallel code
    Thread.Sleep(100);
    Console.WriteLine(i);
});


kick it on DotNetKicks.com

Posted by Martin Konicek on 10:18 PM 8 comments

Serialize object graph to XML in .Net

How to serialize any data structure to XML? My first idea was XmlSerializer. But then I found out it had some serious drawbacks. Luckily, there is a better option - NetDataContractSerializer.

In .Net, there are a few classes for (de)serialization. This is an overview of their features:


XmlSerializer

  • Cannot serialize circular references.
  • If more objects point to the same object, its copies are created in the xml for each of these references.
  • Has to know all types that could be encountered during serialization in advance - throws an exception on unknown type. Known types are passed in the constructor to XmlSerializer or marked by XmlIncludeAttribute.
  • Generates simple readable xml.

DataContractSerializer
  • Has to know types in advance - like XmlSerializer.
  • Can serialize circular references - construct with preserveObjectReferences = true
  • Used by WCF.
NetDataContractSerializer - better!
  • Serializes object graph properly including circular references.
  • Doesn't have to know types in advance.
  • However, MSDN states that it can be only used in .Net <-> .Net communication, which is ok also for storing object in a file.
  • Generates simple readable xml.
BinarryFormatter
  • Works well, like NetDataContractSerializer.
  • Disadvantage is that it serializes to binary format, which make its unusable e.g. for saving to a file that user could later edit.
  • The output is smallest thanks to the binary format.
SoapFormatter
  • Deprecated. Cannot serialize generic collections
- All serializers need the type to be serialized marked by SerializableAttribute.

kick it on DotNetKicks.com

What does it mean that XmlSerializer has to know all types that could be encountered during serialization in advance?
Imagine that we have two classes: Base and Derived.

[Serializable]

public class Base

{

public string name;

public Base()

{

name = "base instance";

}

}

[Serializable]

public class Derived : Base

{

public Derived left;

public Derived right;

public Derived()

{

}

}


What if we have a reference to Base and we actually don't want to care about the actual type?

Base b = new Derived();

// we only know we are holding reference to Base

// and we don't want to care about the actual type

XmlSerializer ser = new XmlSerializer(typeof(Base));

// serialize

using (FileStream fs = File.Create(AppDomain.CurrentDomain.BaseDirectory + "data.xml"))

{

// XmlSerializer throws an Exception

ser.Serialize(fs, b);

}

// deserialize

using (FileStream fs = File.OpenRead(AppDomain.CurrentDomain.BaseDirectory + "data.xml"))

{

Base baseDeserialized = (Base)ser.Deserialize(fs);

Derived deserialize = baseDeserialized as Derived;

}


XmlSerializer throws an exception, because it encounters an "unknown" type - Derived. We could solve this by passing all the possible derived types in constructor of XmlSerializer or tagging all by XmlIncludeAttribute. This is of course inconvenient if you have a lot of classes. The worst thing is that when you add a derived class, you have to change code elsewhere.
NetDataContractSerializer doesn't have this problem.

The second issue with XmlSerializer is that it cannot serialize complex object graph. What does it mean "to serialize object graph"?

Derived top = new Derived();

top.left = new Derived();

top.left.name = "left son";

top.right = new Derived();

top.right.name = "right son";

top.left.right = new Derived();

// top

// / \

// left right

// \ /

// bottom

top.right.left = top.left.right;

XmlSerializer ser = new XmlSerializer(typeof(Derived));

using (FileStream fs = File.Create(AppDomain.CurrentDomain.BaseDirectory + "data.xml"))

{

ser.Serialize(fs, top);

}

using (FileStream fs = File.OpenRead(AppDomain.CurrentDomain.BaseDirectory + "data.xml"))

{

Derived deserialized = (Derived)ser.Deserialize(fs);

// false - we want true

bool ok = deserialized.left.right == deserialized.right.left;

}


After deserialization,
deserialized.left.right == deserialized.right.left is false, that means the object graph is different. Worse - XmlSerializer cannot serialize circular references at all.
Again, NetDataContractSerializer doesn't have any of these problems.

kick it on DotNetKicks.com

Posted by Martin Konicek on 10:59 PM 207 comments